Giter VIP home page Giter VIP logo

Comments (6)

syllog1sm avatar syllog1sm commented on April 28, 2024

Is the encoding of your terminal/text file set to UTF8?

from spacy.

rsomeon avatar rsomeon commented on April 28, 2024

Fix and tests:
#33

from spacy.

NSchrading avatar NSchrading commented on April 28, 2024

I actually have this issue as well. You can test it with the word "fiancé"

s = "fiancé"
tok = nlp(s)
print(tok[0].lemma_)
  File "spacy/tokens.pyx", line 439, in spacy.tokens.Token.lemma_.__get__ (spacy/tokens.cpp:8854)
  File "spacy/strings.pyx", line 73, in spacy.strings.StringStore.__getitem__ (spacy/strings.cpp:1652)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 5: unexpected end of data

from spacy.

syllog1sm avatar syllog1sm commented on April 28, 2024

Sorry to leave this for so long. I'm working on securing a major contract, that would ensure this project stays funded for a long time.

This is the first pull request to the code itself that I've wanted to merge, and I stalled on setting up the Contributors' License Agreement stuff.

I've adapted the Oracle Contributor's Agreement, and am using the signing process that Medium use, where you attach a file to the first pull request you make with a given GitHub username. This seems unambiguous enough.

I know that ignoring this for two weeks isn't the right way to make you feel like the project is worth bothering with, and I understand if you can't accept the CLA terms for whatever reason. But, if you still want to contribute this patch, please follow the steps here: https://github.com/honnibal/spaCy/blob/master/contributors/cla.md

from spacy.

rsomeon avatar rsomeon commented on April 28, 2024

It's me who should be apologetic, since I forgot about the dual-licensing. I should have just waited for you to come up with your own one or two line change, but the fix was so trivial I didn't have time to think twice.

I have read your CLA and you can consider it signed, and furthermore I don't want any kind of attribution. The downer is I'm not going to put my name in a pull request, because this account is a pseudonymous dumping ground for my sillier projects. We have already spent more effort talking about this than it takes to fix the bug, so my suggestion is you just commit in your own fix :D Sorry for being difficult, and thanks for the library.

from spacy.

lock avatar lock commented on April 28, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from spacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.