Comments (12)
Hello @Thewildweb !
Just curious. I tried to reproduce the issue you had by doing:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
from selectolax.parser import HTMLParser
response = requests.get("https://edition.cnn.com/")
bs4 = HTMLParser(response.text)
test= bs4.css_first("span[itemprop='example")
print(test)
and the return was:
File "selectolax\parser.pyx", line 101, in selectolax.parser.HTMLParser.css_first
File "selectolax\node.pxi", line 458, in selectolax.parser.Node.css_first
File "selectolax\node.pxi", line 441, in selectolax.parser.Node.css
File "selectolax\selector.pxi", line 16, in selectolax.parser.Selector.__init__
File "selectolax\selector.pxi", line 57, in selectolax.parser.Selector._prepare_selector
ValueError: Bad CSS Selectors: span[itemprop='example
Also if you actually do write a correct example:
bs4.css_first("span[itemprop='example'")
None
Im not quite sure how you were able to produce this bug but it is very intersting for me to know as I might had a similar issue. Wasn't sure if its related to the same thing however.
from selectolax.
@lexborisov Thanks!
from selectolax.
Hi, sorry for the late reply. Here is an exaple.
import requests
from selectolax.parser import HTMLParser
resp = requests.get("https://www.python.org/")
tree = HTMLParser(resp.text)
# bad css selector 'href' between quotes
a_hrefs = tree.css("a['href']")
# now it hangs indefinitly
from selectolax.
Hi, sorry for the late reply. Here is an exaple.
import requests from selectolax.parser import HTMLParser resp = requests.get("https://www.python.org/") tree = HTMLParser(resp.text) # bad css selector 'href' between quotes a_hrefs = tree.css("a['href']") # now it hangs indefinitly
I see! That's probably due to incorrect css selector. Could agree that it is not good and should trigger a exception, there is a way to avoid that by doing
a_hrefs = tree.css('a[href^=""')
even though im not expert with css selectors :D
from selectolax.
We need to fix this in Modest.
It hangs when parsing a CSS selector.
from selectolax.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
from selectolax.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
from selectolax.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
Thank you for the information :) I see. Maybe soon it will support it but even though this is already pretty fast so excited to see if it can be even faster;
from selectolax.
It is not a huge issue. I thought it would be nice to throw an exception if it was an easy job.
A, even faster parser would be awesome. I'm coming from bs4, so selectolax feels instant...
from selectolax.
Hi,
I can deal with this tomorrow lexborisov/Modest#84.
from selectolax.
Seems to have fixed in Modest.
from selectolax.
https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?
It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.
Does it only happen when using CSS selector?
It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.
Hi man! Just got a comment about new update!
Is that something you plan to add to selectolax? 😁
from selectolax.
Related Issues (20)
- Node.child should be named Node.first_child ? HOT 2
- Awful text parsing issue HOT 6
- Release wheel for python 3.12 HOT 5
- Tags out of order in returned list when using css to specify multiple tags HOT 5
- What is/was the format for the pages/pages.json file? HOT 1
- HTMLParser and LexborHTMLParser search differently HOT 1
- css_matches of LexborHTMLParser does not free memory HOT 2
- [Typing] `_Attributes` in .pyi stub file is missing dictionary methods like `__getitem__`
- Selectolax couldn't load large html string (87MB) but lxml could HOT 3
- I am still getting this error even with the update - not able to load large html contents HOT 1
- Error in LexborHTMLParser HOT 7
- Memory leak HOT 3
- Memory leak when using LexborHTMLParser HOT 1
- Segmentation fault with Lexbor engine HOT 2
- Allow regular expressions in `text_contains` / `any_text_contains` HOT 2
- Adding AdvancedHTMLParser to benchmark HOT 2
- Weird issue in rendering HTML HOT 4
- Cannot import name modest HOT 1
- ModuleNotFoundError: No module named 'selectolax.parser'; 'selectolax' is not a package HOT 1
- Best way to handle content not found? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from selectolax.