Comments (5)
Since the parsing engines behind selectolax follow all the standards, they omit the doctype.
Not sure if we can retrieve the original doctype, I need to check. In HTML5, the doctype is the same for all websites.
You can test it in Chrome/Firefox devtools: document.documentElement.outerHTML
returns no doctype.
from selectolax.
@rushter Thanks for your response. What do you mean by "follow all the standards"? Unless theres a standards which states that the doctype is not part of the document (very unlikely) I think it should be included. This would also be inline with what other parsers do
A doctype is not part of the documentElement, it's part of the document. You can verify this in devtools document.doctype
Doctype is indeed same for all html5 websites but not for all websites. My use case requires me to make edits to a given html so I need to maintain the doctype.
Feel free to close this issue if you feel its better to file this in the Modest
parsing engine project
from selectolax.
I will add doctype, but most likely, only next week.
from selectolax.
I made a simple fix, please test it.
from selectolax.
I'll test it within 24hr and report back. Thank you!
from selectolax.
Related Issues (20)
- Node.child should be named Node.first_child ? HOT 2
- Awful text parsing issue HOT 6
- Release wheel for python 3.12 HOT 5
- Tags out of order in returned list when using css to specify multiple tags HOT 5
- What is/was the format for the pages/pages.json file? HOT 1
- HTMLParser and LexborHTMLParser search differently HOT 1
- css_matches of LexborHTMLParser does not free memory HOT 2
- [Typing] `_Attributes` in .pyi stub file is missing dictionary methods like `__getitem__`
- Selectolax couldn't load large html string (87MB) but lxml could HOT 3
- I am still getting this error even with the update - not able to load large html contents HOT 1
- Error in LexborHTMLParser HOT 7
- Memory leak HOT 3
- Memory leak when using LexborHTMLParser HOT 1
- Segmentation fault with Lexbor engine HOT 2
- Allow regular expressions in `text_contains` / `any_text_contains` HOT 2
- Adding AdvancedHTMLParser to benchmark HOT 2
- Weird issue in rendering HTML HOT 4
- Cannot import name modest HOT 1
- ModuleNotFoundError: No module named 'selectolax.parser'; 'selectolax' is not a package HOT 1
- Best way to handle content not found? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from selectolax.