Comments (6)
Is the encoding of your terminal/text file set to UTF8?
from spacy.
Fix and tests:
#33
from spacy.
I actually have this issue as well. You can test it with the word "fiancé"
s = "fiancé"
tok = nlp(s)
print(tok[0].lemma_)
File "spacy/tokens.pyx", line 439, in spacy.tokens.Token.lemma_.__get__ (spacy/tokens.cpp:8854)
File "spacy/strings.pyx", line 73, in spacy.strings.StringStore.__getitem__ (spacy/strings.cpp:1652)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 5: unexpected end of data
from spacy.
Sorry to leave this for so long. I'm working on securing a major contract, that would ensure this project stays funded for a long time.
This is the first pull request to the code itself that I've wanted to merge, and I stalled on setting up the Contributors' License Agreement stuff.
I've adapted the Oracle Contributor's Agreement, and am using the signing process that Medium use, where you attach a file to the first pull request you make with a given GitHub username. This seems unambiguous enough.
I know that ignoring this for two weeks isn't the right way to make you feel like the project is worth bothering with, and I understand if you can't accept the CLA terms for whatever reason. But, if you still want to contribute this patch, please follow the steps here: https://github.com/honnibal/spaCy/blob/master/contributors/cla.md
from spacy.
It's me who should be apologetic, since I forgot about the dual-licensing. I should have just waited for you to come up with your own one or two line change, but the fix was so trivial I didn't have time to think twice.
I have read your CLA and you can consider it signed, and furthermore I don't want any kind of attribution. The downer is I'm not going to put my name in a pull request, because this account is a pseudonymous dumping ground for my sillier projects. We have already spent more effort talking about this than it takes to fix the bug, so my suggestion is you just commit in your own fix :D Sorry for being difficult, and thanks for the library.
from spacy.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
from spacy.
Related Issues (20)
- Pre-trained coreference pipeline incompatible with spaCy > 3.4 HOT 3
- Error Message while trying to use spaCy experimental package for CorefResolver HOT 1
- Why GPU Memory is not released after the pipeline is finished?
- Some tokenizer exceptions not being applied HOT 7
- Spacy defines Noun as a root for a sentence HOT 1
- CUDA Runtime Error during Spacy Transformers NER Model Training HOT 2
- Spacy-LLM code sample produces no output HOT 14
- Working with the new `._.trf_data` object (3.7+) HOT 7
- spacy.load error decorative function HOT 1
- The en_core_web_trf model results in zero output HOT 2
- No such command 'fill-curated-transformer' HOT 4
- MemoryError: Unable to allocate 29.7 GiB for an array with shape (86399, 4, 4, 2880, 2) and data type float32 HOT 2
- Issue when calling spacy info HOT 3
- Spcay recoginize similar words into different entities
- Publish `requirements.txt` or `environment.yml` for installing latest versions of spacy and other dependencies HOT 1
- LLM assemble does not read .yaml example files HOT 1
- Downloading model error: ModuleNotFoundError: No module named 'spacy.symbols' HOT 1
- README.md Link Suggestions HOT 1
- English models' Accuracy Evaluation values HOT 1
- Spacy high memory consumption issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spacy.