Giter VIP home page Giter VIP logo

Comments (5)

barrust avatar barrust commented on August 21, 2024

Sure, the steps to generating a new language are fairly straight forward:

  1. Download the set of words that should be added to the dictionary
  2. Load that file into a dictionary; this will depend on your source. The key, is to turn this into a dictionary in the form: key=word, val=frequency as an int.

If your data in in a dictionary form, you can load it like so:

from spellchecker import SpellChecker
spell = SpellChecker(language=None)
spell.word_frequency.load_dictionary(file_to_dictionary)
spell.export(location_for_export)

If you only have txt files with words, etc, you can just load those words directly and have spellchecker build the word frequency for you:

from spellchecker import SpellChecker
spell = SpellChecker(language=None)
spell.word_frequency.load_text_file(path_to_text_file)
spell.export(location_for_export)

Once you have exported the dictionary (really a word frequency list), you can then load that dictionary when you wish to use spellchecker:

from spellchecker import SpellChecker
spell = SpellChecker(language=None, local_dictionary=location_from_export)

from pyspellchecker.

MukhtarShaima avatar MukhtarShaima commented on August 21, 2024

Thanx for the clear instructions,I had successfully loaded my text file.
Now the problem is it does not give me correct answers
eg:
for word in misspelled:
# Get the one most likely answer
print(spell.correction(word))
it should return the correct or most likely word,but sometimes it gives me wrong word in the misspelled,
or it returns the whole misspelled string.
Thank you.

from pyspellchecker.

barrust avatar barrust commented on August 21, 2024

That is likely due to a few different possible issues.

  1. If you do not have frequency, i.e., everything is set to 1 (or the same thing). Try something like:
 # return those that are within the specified distance
print(spell.candidates(word)) 
  1. If the distance between the word you are trying to correct is greater than 2, then it will not work and it will return the word, as is.

Honestly, I have never tried this with non-latin character languages so I am unsure how it will perform.

from pyspellchecker.

barrust avatar barrust commented on August 21, 2024

@MukhtarShaima Let me know if you are still having issues, otherwise, I am going to close this one!

Thanks!

from pyspellchecker.

ryuzakinho avatar ryuzakinho commented on August 21, 2024

Hi,

From my understanding, we can load JSON formatted dictionaries or text documents that will be used for building the frequency list.

I would like to directly use the word frequency lists available here (Word Frequency): https://github.com/hermitdave/FrequencyWords/tree/master/content/2018/fi

These are txt files containing frequencies. Is there a way to directly load such files or do I need to convert them to JSON first?

Thanks for your help!

from pyspellchecker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.