Giter VIP home page Giter VIP logo

dictionary's Introduction

English Language Dictionary

This repository houses the contents of Webster's Unabridged English Dictionary.

The dictionary can be found in plain text form here

You'll also find some julia files that were used to parse the text and organize it into the nice json you see here.

Contents

  • dictionary.json: This is the raw data scraped from the dictionary. Unsurprisingly, it's in the format of a dictionary, i.e. { "Word": "Definition" }
  • graph.json: This is a graph representation of the dictionary. Each word is paired with a list of the words that define it
  • dictionary.txt: This is the plain text file (I converted it from ISO-8859-1 to UTF-8)
  • main.jl: The julia script that parses the data
  • _.jl: My in-progress implementation of underscore in julia

How to run

If you want to run the code yourself, I would recommend downloading Julia Studio, which is and IDE for Julia. It comes with the binaries pre-installed. You can also get the binaries or build from source from julialang.org.

License

The works in this repository are licensed under the MIT License, with the exception of the contents of dictionary.txt, which are licensed under the terms of the Project Gutenberg License:

From Project Gutenberg:

This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net

dictionary's People

Contributors

adambom avatar matthewpalmer avatar plr108 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dictionary's Issues

Line breaks not parsed correctly

The parser does not deal with newlines in the dictionary file correctly. If a line ends in a word, the white space seems to have been trimmed and words will be merged. See below.

ABATOR
A*ba"tor, n. (Law)

Defn: (a) One who abates a nuisance. (b) A person who, without right,
enters into a freehold on the death of the last possessor, before the
heir or devisee. Blackstone.

becomes

...the death of the last possessor, before theheir or devisee...

Dictionary.json is cut off

the dictionary file gets cut off at ...,"COVARIANT":"A function involving the coefficients and the variables of aquantic, and such that when the quantic is lineally transformed thes which obv. isn't valid JSON

Licensing

This project's license is CC-BY-NC, while the original Webster1913 project is in the public domain / under the Project Gutenberg license. Is the intent here to license the JSON data derived from Webster1913, or to license the tools associated?

And if it is the former, who should this work be attributed to? You, the contributors to the Webster1913 project on Project Gutenberg, the original authors, all of the above?

Some spaces are missing

Try searching in the json, txt or graph file for anythinginjurious as an example. Many spaces fails like that.

wordnet is important and missing

your json should provide the defintions and what type of word it is, adverb, adjective, noun, pronoun, verb... etc.

Maybe JSON should be something like this?

{
  "word": {
    "wordnet": "noun",
    "definitions": ["defo 1", "defo 2"]
  }
}

Weird characters and extra quotes in graph.json

Several entries in graph.json contain odd entries at the end like "[OBS" or "[R"

For example, near the beginning of the file:

"DEFIGURE":[
"TO",
"DELINEATE",
"[OBS",
"]THESE",
"TWO",
"STONES",
"AS",
"THEY",
"ARE",
"HERE",
"DEFIGURED",
"WEEVER"
]

Only "To delineate" makes up the definition of "Defigure", "[OBS" and "]THESE" contain weird characters, and "These two stones as they are here defigured --Weever" is a quotation, not a definition.

Getting ValueError while json.loads

This is the error I get.

ValueError: Invalid \uXXXX escape: line 1 column 11520 (char 11520)

Removing those I get these type of errors -

Expecting , delimiter: line 1 column 3998 (char 3998)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.