Giter VIP home page Giter VIP logo

wordset-dictionary's Introduction

Wordset Dictionary

From 2015-2017, Wordset.org provided a public interface for anyone to help collaborate on the world's first open source, collaborative dictionary.

However, due to... well, cough lack of interest... we decided to shutdown the project.

Here we are providing the final, most up-to-date release of the Wordset Dictionary. I'll answer some questions inline about the data and what you can do about it.

How can I read it?

The data folder contains a file for every letter, and one for 'misc' (that includes emoji!).

Each file has a big JSON object keyed by the words themselves. A sample entry might look like this:

{
    "largely": {
        "word": "largely",
        "wordset_id": "54bd55df7265742391cf0000",
        "meanings": [{
            "id": "54bd55df7265742391d10000",
            "def": "in large part",
            "speech_part": "adverb",
            "synonyms": ["mostly", "for the most part"]
        }, {
            "id": "54bd55df7265742391d20000",
            "def": "on a large scale",
            "example": "the sketch was so largely drawn that you could see it from the back row",
            "speech_part": "adverb"
        }]
    }
}

Hopefully, most of it is pretty self-explainatory.

Here's another entry that includes some labels on the meanings.

{
    "lindy": {
        "word": "lindy",
        "wordset_id": "54bd567d72657423915c0d02",
        "labels": [{
            "name": "American",
            "is_dialect": true
        }],
        "meanings": [{
            "id": "54bd567d72657423915e0d02",
            "def": "an energetic American dance that was popular in the 1930s, probably named for the aviator Charles Lindbergh",
            "example": "Can you lindy?",
            "speech_part": "noun",
            "synonyms": ["lindy hop"]
        }],
        "editors": ["lefurjah"],
        "contributors": ["odd_bloke", "zellerpress", "malrase", "hcatlin"]
    }
}

What can I use it for?

Anything! We're giving this to the world. If you need a basic english Dictionary then no sense in paying for fees, this is definitely good enough for almost all uses.

We'd LOVE to see adapters for different languages using this. Or, if you have an idea of something you want to do, hit us up! We're still around on the internet and happy to help!

Is it Complete?

Alas, nothing is ever complete. However, we started the project by utilizing the Princeton WordNet project, and then made many thousands of modifications of that. But, with 177k meanings, and 63,936 manual edits by volunteers it's as complete as it will be.

What about WordNet?

Most open source projects use WordNet for simple dictionary usages, but we have vastly improved and modified that original source to be something much more human friendly.

Also, WordNet takes a long time to grok, this is some pretty human-readable stuff.

Did I mention that this is all in JSON?

Is It Racist/Sexist/Whatever?

We've had several projects on the site to try and mitigate some of the more, uh, problematic entries. However, we didn't get to everything. WordNet, the original source of the material, sourced their data from many different sources. As their goal was to power machine learning and word-maps, the definitions were often not handled with much care.

So, you can definitely find some stuff in here that we should update. In fact, it's not too late... feel free to put in a PR if you want to edit something. Can't guarantee we'll have a lot of time to check it out, but at least it will be something!

Gender Neutral

After several months of working on the dictionary, we realized how many stupidly and uselessly gendered example sentences there were. We have several write-ups about the project, but we decided to go through every single sentence that included a gendered pronoun and re-write it.

We found this drastically improved the quality of the content and in almost every case was a huge step up. "They went to the store" instead of "She went to the store".

Editorial Guidelines

I've included the original editorial guidelines in Guidelines.md, but note that since we didn't edit every word in the dictionary, that not all of them are strictly followed.

Contributors

Wordset was founded by Hampton Lintorn Catlin and Michael Lintorn Catlin , and we were joined by Justin Lefurjah in making this project a reality.

Justin, in fact, was by far our largest contributor. He was personally responsible for 97 new words and 7,460 edits!

Also, we had a TON of volunteers, who worked for hours and hours improving this dictionary.

Special thanks to msingle, sabreuse, bryanedu, zellerpress, luxfactaest, lauradhahn, odd_bloke, musicchild, jessecurry, joshuabriggs, brilliantskip, and luciankahn for all their hard work!.

wordset-dictionary's People

Contributors

gardners avatar grumdrig avatar hamptonmakes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wordset-dictionary's Issues

Developed English Dictionary Webapp using this wordset - Thank You

Hello Guys,

I am thankful to all contributors [@HamptonMakes @malrase @grumdrig @gardners] to contribute this awesome wordset-dictionary.

As a hobbyist project, I have developed one English dictionary app using this dictionary set.

I mentioned your repo link in my web app.

Demo: https://webapps-b7f67.web.app/endict?word=flower

Demo Video: https://www.youtube.com/watch?v=nLDcgignnFA

I shared my post on linkedin about this app:

https://www.linkedin.com/posts/samir38_reactjs-react-webappdevelopment-activity-6999375808641331200-GrOq

screenshot

Hope you may like it.

Thanks a lot once again.
SamSol

Considering using as a base for Trilingual English Student dictionaries globally

Hello all, just letting you know that I am considering importing the data from this dictionary into a program called FLEx to then use as a basis for creating bilingual and trilingual dictionaries for students of English. I work with an organization that does a lot of minority language dictionaries, and so the idea would be to have the base dictionary in English with every word tagged for its potential use in a dictionary (e.g. full dictionary, student basic dictionary, chemistry dictionary, etc), then to pull a certain dictionary and add definitions and translations in both a national level language (e.g. if it were in Sudan, Arabic, if it were in Mexico, Spanish), then to use that bilingual dictionary as a basis for trilingual definitions from minority language (e.g. in Sudan, a language like Fur, in Mexico, a language like Nahuatl).

Wondering if anyone knows a really simple way to get the whole database exported to a single XML file with each element tagged with a backslash code (this would be a file importable into FLEx).

Thanks everyone.

No meanings for 5 words

alkaline earth
acetoacetic acid
calcium hydride
succinic acid
transurethral resection of the prostate

Non ASCII characters

in j.jason and in q.json files.
In words "Jekyll and Hyde", "qabbālāh".
"Jekyll and Hyde": {
"word": "Jekyll and Hyde",
"wordset_id": "a4a55a6b0d",
"meanings": [
{
"id": "75b8309869",
"def": "a person who unpredictably displays two distinct and morally opposed personality traits",
"example": "The way they scream at me one minute and apologize the next�—it's like they're Jekyll and Hyde!",
"speech_part": "noun"
}
],

and

"qabbālāh": {
    "word": "qabbālāh",
    "wordset_id": "2805a6b224",
    "meanings": [
        {
            "id": "efb0ebd478",
            "def": "an esoteric or occult matter resembling the mystical Jewish teachings, based on esoteric writings, that is traditionally secret",
            "example": "Our human resources manual is a bit of a kabala.",
            "speech_part": "noun"
        }
    ],

Total number of words is 108k

I appreciate the share and just want to note that the total number of words is 108140. Either you calculate differently (count the meanings?), or maybe not the full set has been shared?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.