-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Open
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.
Description
Self Checks
- I have searched for existing issues, including closed ones.
- I confirm that I am using English to submit this report.
Describe the bug
In deepdoc/parser/html_parser.py, the TITLE_TAGS mapping currently defines:
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "#####", "h5": "#####", "h6": "######"}Here, both h4 and h5 are mapped to #####.
This seems incorrect. Based on Markdown heading levels, h4 should likely map to ####, while h5 should remain #####.
Expected behavior
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "####", "h5": "#####", "h6": "######"}Why this matters
When HTML headings are converted into text chunks, h4 content is currently promoted to the same level as h5, which loses heading hierarchy information and may affect downstream readability / chunk structure.
File
deepdoc/parser/html_parser.py
Suggested fix
Change:
"h4": "#####"to:
"h4": "####"Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.