Comments (15)
@LemonsoftLtd You need to install the python-devel
package.
Something like yum install python-devel
or yum install python3-devel
should fix the problem.
from selectolax.
What version of selectolax and Python are you running on?
I tried your example and it works fine for me.
I changed the open command, since Python 3 is very strict about encodings.
open('selectolax_bug.log', encoding='utf-8')
Eintrag hinzufügen
| Administration
Donnerstag, 16. August 2018 02:03Willkomen in unserem Gästebuch. Hier können Sie einen Beitrag hinterlassen.
Gästebuch
606911-606920
606901-606910
606891-606900
606881-606890
606871-606880
606861-606870
606851-606860
606841-606850
606831-606840
....
....
....
from selectolax.
Thanks a lot for the quick reply.
That is very weird indeed. This is the setup on my system:
Linux, Ubuntu 18.04
python 3.6.5
selectolax 0.1.7
I just tried everything in a fresh virtualenv.
○ → pip list
Package Version
---------- -------
pip 18.0
selectolax 0.1.7
setuptools 40.4.3
wheel 0.31.1
Here are the exact steps that produce the error on my system in that virtualenv in the python console:
○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
... test = f.read()
...
>>> tree = HTMLParser(test)
>>> tree.body.text()
Segmentation fault (core dumped)
from selectolax.
I'll check it this weekend.
from selectolax.
perfect, thanks in advance for your time and effort
from selectolax.
I've pushed the fix, it's not very clever and depends on the compiler, but it should work on most of the systems. I will rewrite my text parsing algorithm in the future. Currently, it uses recursion approach and fails on your example because of the stack size limits.
You can fix the old version by simply increasing the stack size:
➜ ~ ulimit -s
8192
➜ ~ ulimit -s 16000
➜ ~ ulimit -s
16000
Please try the new version:
pip install --no-cache-dir selectolax==0.1.8
from selectolax.
Hey. very nice.
The quickfix with setting ulimit -s
works.
However, the new version 0.1.8
doesn't seem to fix the problem for me by default. I still have to set the ulimit
for that one to work too.
pip list
Package Version
---------- -------
pip 18.0
selectolax 0.1.8
setuptools 40.4.3
wheel 0.31.1
○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
... test = f.read()
...
>>> tree = HTMLParser(test)
>>> tree.body.text()
Segmentation fault (core dumped)
but after setting
○ → ulimit -s 16000
it works
○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
... test = f.read()
...
>>> tree = HTMLParser(test)
>>> tree.body.text()
>>>
Or do I need to recompile any of the C backend? I'm not very familiar with Cython or C extensions.
from selectolax.
My fix relies on a compiler and it can ignore my instruction. I will fix this issue later.
For the time being, you could use something like this:
>>> import resource
>>> soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
>>> resource.setrlimit(resource.RLIMIT_STACK, (soft * 4, hard))
>>>
The code above increases the stack limit from 8kb to 32kb.
from selectolax.
that works, perfect.
Again, thanks a lot for your time and effort.
from selectolax.
@rushter was there any work done on this since the ticket was closed?
from selectolax.
@mindscratch Nope. Do you have the same problem?
from selectolax.
Yes, using selectolax 0.1.10 with python 3.6.6 (on CentOS 7 kernel 4.18) and html content that's ~14mb.
Using the rlimit hack in the python code has worked so far.
from selectolax.
@mindscratch I've fixed the problem. Can you please check?
pip install selectolax==0.1.12
from selectolax.
Hello, i tried steps for my Centos 7. I couldn't install.
[root@dhpc09 ehealth]# pip install selectolax
Collecting selectolax
Using cached https://files.pythonhosted.org/packages/42/7b/07342f02e9857a866dbd1d57ebc0de9c894d46fb4ee5283193b7496b59d0/selectolax-0.1.12.tar.gz
Installing collected packages: selectolax
Running setup.py install for selectolax ... error
Complete output from command /usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-93v7sd98/selectolax/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-u01qla9m/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/selectolax
copying selectolax/__init__.py -> build/lib.linux-x86_64-3.6/selectolax
running egg_info
writing selectolax.egg-info/PKG-INFO
writing dependency_links to selectolax.egg-info/dependency_links.txt
writing top-level names to selectolax.egg-info/top_level.txt
reading manifest file 'selectolax.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CONTRIBUTING.rst'
warning: no files found matching 'HISTORY.rst'
warning: no previously-included files found matching 'selectolax/*.so'
warning: no files found matching 'modest/include/*'
warning: no files found matching 'modest/source/*'
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
warning: no files found matching '*.jpg' under directory 'docs'
writing manifest file 'selectolax.egg-info/SOURCES.txt'
copying selectolax/node.pxi -> build/lib.linux-x86_64-3.6/selectolax
copying selectolax/parser.c -> build/lib.linux-x86_64-3.6/selectolax
copying selectolax/parser.pxd -> build/lib.linux-x86_64-3.6/selectolax
copying selectolax/parser.pyx -> build/lib.linux-x86_64-3.6/selectolax
copying selectolax/selector.pxi -> build/lib.linux-x86_64-3.6/selectolax
running build_ext
building 'selectolax.parser' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/selectolax
creating build/temp.linux-x86_64-3.6/modest
creating build/temp.linux-x86_64-3.6/modest/source
creating build/temp.linux-x86_64-3.6/modest/source/modest
creating build/temp.linux-x86_64-3.6/modest/source/modest/finder
creating build/temp.linux-x86_64-3.6/modest/source/modest/layer
creating build/temp.linux-x86_64-3.6/modest/source/modest/node
creating build/temp.linux-x86_64-3.6/modest/source/modest/render
creating build/temp.linux-x86_64-3.6/modest/source/modest/style
creating build/temp.linux-x86_64-3.6/modest/source/mycore
creating build/temp.linux-x86_64-3.6/modest/source/mycore/utils
creating build/temp.linux-x86_64-3.6/modest/source/mycss
creating build/temp.linux-x86_64-3.6/modest/source/mycss/declaration
creating build/temp.linux-x86_64-3.6/modest/source/mycss/media
creating build/temp.linux-x86_64-3.6/modest/source/mycss/namespace
creating build/temp.linux-x86_64-3.6/modest/source/mycss/property
creating build/temp.linux-x86_64-3.6/modest/source/mycss/selectors
creating build/temp.linux-x86_64-3.6/modest/source/mycss/values
creating build/temp.linux-x86_64-3.6/modest/source/myencoding
creating build/temp.linux-x86_64-3.6/modest/source/myfont
creating build/temp.linux-x86_64-3.6/modest/source/myhtml
creating build/temp.linux-x86_64-3.6/modest/source/myport
creating build/temp.linux-x86_64-3.6/modest/source/myport/posix
creating build/temp.linux-x86_64-3.6/modest/source/myport/posix/mycore
creating build/temp.linux-x86_64-3.6/modest/source/myport/posix/mycore/utils
creating build/temp.linux-x86_64-3.6/modest/source/myunicode
creating build/temp.linux-x86_64-3.6/modest/source/myurl
gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Imodest/include/ -I/usr/include/python3.6m -c selectolax/parser.c -o build/temp.linux-x86_64-3.6/selectolax/parser.o -DMODEST_BUILD_OS=Linux -DMyCORE_OS_Linux -DMODEST_PORT_NAME=posix -DMyCORE_BUILD_WITHOUT_THREADS=YES -DMyCORE_BUILD_DEBUG=NO -O2 -pedantic -fPIC -Wno-unused-variable -Wno-unused-function -std=c99
selectolax/parser.c:177:20: fatal error: Python.h: No such file or directory
#include "Python.h"
^
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-93v7sd98/selectolax/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-u01qla9m/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-93v7sd98/selectolax/
from selectolax.
After yum install -y python36-devel.x86_64
, now installed very well. Thanks.
from selectolax.
Related Issues (20)
- Node.child should be named Node.first_child ? HOT 2
- Awful text parsing issue HOT 6
- Release wheel for python 3.12 HOT 5
- Tags out of order in returned list when using css to specify multiple tags HOT 5
- What is/was the format for the pages/pages.json file? HOT 1
- HTMLParser and LexborHTMLParser search differently HOT 1
- css_matches of LexborHTMLParser does not free memory HOT 2
- [Typing] `_Attributes` in .pyi stub file is missing dictionary methods like `__getitem__`
- Selectolax couldn't load large html string (87MB) but lxml could HOT 3
- I am still getting this error even with the update - not able to load large html contents HOT 1
- Error in LexborHTMLParser HOT 7
- Memory leak HOT 3
- Performance optimization css_first
- .child and .last_child not working when those child are in a separeted html line
- Content of scripts always being outputed with .text() HOT 2
- Why have .text_lexbor (publicly available) if it's equivalent to .text() with default parameters ? HOT 3
- Feature request : Having Node.copy() or LexborNode.copy()
- Cannot import name modest HOT 1
- ModuleNotFoundError: No module named 'selectolax.parser'; 'selectolax' is not a package HOT 1
- Best way to handle content not found? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from selectolax.