Giter VIP home page Giter VIP logo

spellchecker's Introduction

Simple Python Spell-Checker

Quickstart

git clone https://github.com/pirate/spellchecker
cd spellchecker/
python spellchecker.py

# type interactively to get suggestions
Total Word Set: 285750
Model Precision: 1.62249168854
>manster
[('monster', 2), ('minster', 2)]

# or try some preset mispelled words
python misspeller.py | python spellchecker.py 

You can edit spellchecker.py and add more files to the training list to increase the word-frequency model precision.

Background

Peter Norvig wrote an amazing article titled How to Write a Spelling Corrector detailing a basic approach to this deceivingly simple problem. I had to write a spellchecker as an interview question for Disqus, and this repo details my efforts.

The core code that I borrow from Darius Bacon & Norvig is this beautiful block:

def variants(word):
    """get all possible variants for a word"""
    splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [a + b[1:] for a, b in splits if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
    replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
    inserts    = [a + c + b for a, b in splits for c in alphabet]
    return set(deletes + transposes + replaces + inserts)

Of course that wasn't my only code, I added a lot more on top of Norvig's implementation.

My additions are:

  • short-circuiting options for faster checking
  • hamming distance and word-frequency model based chooser for suggestions
  • double word variants for catching more complex multi-typos
  • vowel-swapping detection
  • a reductions function to efficiently store word variants like monster: ['m',['o', 'a'], 'n', 's', 't', 'e', 'r']

spellchecker's People

Contributors

pirate avatar

Watchers

James Cloos avatar etcher avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.