Giter VIP home page Giter VIP logo

turtlesoupy / this-word-does-not-exist Goto Github PK

View Code? Open in Web Editor NEW
1.0K 9.0 84.0 37.86 MB

This Word Does Not Exist

Home Page: https://www.thisworddoesnotexist.com

License: MIT License

Python 64.72% Jupyter Notebook 25.59% Shell 1.85% CSS 1.58% JavaScript 3.43% Jinja 2.83%
machine-learning gpt-2 transformers natural-language-processing natural-language-understanding natural-language-generation

this-word-does-not-exist's Introduction

Word Does Not Exist Logo

This Word Does Not Exist

This is a project allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch.

For example

incromulentness (noun)

lack of sincerity or candor

"incromulentness in the manner of speech"

Check out https://www.thisworddoesnotexist.com as a demo

Check out https://twitter.com/robo_define for a twitter bot demo

Generating Words / Running Inference

Python deps are in https://github.com/turtlesoupy/this-word-does-not-exist/blob/master/cpu_deploy_environment.yml

Pre-trained model files:

To use them:

from title_maker_pro.word_generator import WordGenerator
word_generator = WordGenerator(
  device="cpu",
  forward_model_path="<somepath1>",
  inverse_model_path="<somepath2>",
  blacklist_path="<blacklist>",
  quantize=False,
)

# a word from scratch:
print(word_generator.generate_word())

# definition for a word you make up
print(word_generator.generate_definition("glooberyblipboop")) 

# new word made up from a definition
print(word_generator.generate_word_from_definition("a word that does not exist")) 

Training a model

For raw thoughts, take a look at some of the notebooks in https://github.com/turtlesoupy/this-word-does-not-exist/tree/master/notebooks

To train, you'll need to find a dictionary -- there is code to extract from

After extracting a dictionary you can use the master training script: https://github.com/turtlesoupy/this-word-does-not-exist/blob/master/title_maker_pro/train.py. A sample recent run is https://github.com/turtlesoupy/this-word-does-not-exist/blob/master/scripts/sample_run_parsed_dictionary.sh

Website Development Instructions

cd ./website
pip install -r requirements.txt
pip install aiohttp-devtools 
adev runserver

this-word-does-not-exist's People

Contributors

eliseumds avatar mayakacz avatar turtlesoupy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

this-word-does-not-exist's Issues

How would I generate new fake words?

It seems the only variables are the dictionaries you input.

Dictionaries don't change much over time.

Is there anyway to generate more fake words than what you have on your web site?

I am seeing most of the words generated are already registered for .com

Can't use this module on my new system because it has outdated requirements

This requires at least two function in transformers that don't seem to exist anymore. I see that at least one of them existed in Transformers 3.02. I tried installing both 3.02 and 3.5.1 (using pip) to see if I could get a compatible version. Both of them failed to install because of missing source files (Windows 11, VS 2022 Community).

So now that I've changed my whole OS to Windows 11, I can't run my favorite plugin for my bot that I made using this-word-does-not-exist. Very disappointed. Can you please update the code to work with all the latest versions of the requirements? Thank you.

Error when generating new word

from title_maker_pro.word_generator import WordGenerator
word_generator = WordGenerator(
  device="cpu",
  forward_model_path="forward-dictionary-model-v1",
  inverse_model_path="inverse-dictionary-model-v1",
  blacklist_path="blacklist.pickle",
  quantize=False,
)

# a word from scratch:
print(word_generator.generate_word())

I tried the above code and I get the following error:

m = start_end_re.match(word.misc)
TypeError: expected string or bytes-like object

I printed word.misc and it was "None". Any help for a fix is appreciated. Thanks in advance

Type Error

Hey there, first of all: great work. I am getting the following error when running the sample code. Would someone be able to give a hint where i am doing wrong?

Best

0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File ".../main.py", line 12, in <module> print(word_generator.generate_word()) File "...\title_maker_pro\word_generator.py", line 72, in generate_word use_custom_generate=True, File "...\title_maker_pro\datasets.py", line 638, in generate_words example_match_pos_pipeline, l_example, start_title_idx, len(t_rstrip) File "...\title_maker_pro\datasets.py", line 346, in approx_pos m = start_end_re.match(word.misc) TypeError: expected string or bytes-like object

Public api

I was wondering if it was possible for thisworddoesnotexist.com to have an API that other developers can use so they don't have to set this project up themselves

Cross-site scripting vulnerability

Example URL: https://www.thisworddoesnotexist.com/w/jesdabest/eyJ3IjogIjxTQ1JJUFQ-YWxlcnQoJ2plc2RhYmVzdCcpPC9TQ1JJUFQ-IiwgImQiOiAiYW4gYWxlcnQgcGxhY2VkIGR1cmluZyBhIHZpZGVvIHJlY29yZGluZy4iLCAicCI6ICJub3VuIiwgImUiOiAiPFNDUklQVD5hbGVydCgnamVzZGFiZXN0Jyk8L1NDUklQVD4iLCAicyI6IFsiPFNDUklQVD5hbGVydCgiLCAiJyIsICJqZXMiLCAiZCIsICJhYmVzdCIsICInIiwgIik8L1NDUklQVD4iXX0=.53iUkLV3kFioMfmibqWu6403RBPA3xxYXfEiRheMoc8=

To avoid this, HTML entities (namely "<" and ">") should be escaped (to &lt; and &gt;) before being written to HTML.

You only have 40 characters to play with, but that would in principle be enough to load malicious code of arbitrary length from another URL. Of course, cross-site scripting on thisworddoesnotexist.com is of limited impact anyway!

Fake words are often real words

Possible solution:

  • Get Wikitext-102
  • Use stanza to parse out words
  • Anything occuring > threshold times added to dictionary
  • Union all the article titles in Wikipedia (date stripped)
  • Union words from source dictionary

License?

IS: could not find the license
SHOULD: would like to know the license, would love if it is MIT licensed

Disable progress bar

When running word_generator.generate_word_from_definition we get a default progress bar.
Is there a way to disable this?

Screen Shot 2020-10-19 at 3 02 54 PM

Feature request: Specify/define definition or word

There are times I'd like to literally make a joke entry with my own word and specify the pronunciation, word type, and definition. Any way you could add the ability to put those in when doing "Write your own"?

Add button to refresh page / word generation

A simple ajax refresh to reload the page would help users play with the website generating words to their hearts content. One way to circumvent quota limits would be to generate a static list of words (say, 10000 every other day) and use this list on the front-end instead of making direct calls.

Blacklist Pickle Error (Invalid load key)

Hi friends,

When executing it in my Anaconda prompt I got this error:
image

I tried:

  1. redownloading the blacklist.pickle.gz
  2. try with just "blacklist.pickle"
  3. uncompress it
  4. changing the path of the file
  5. dive in stackoverflow for three days for a solution
  6. using double slashes in the filepath
  7. double slash + different filename

Finally I guess '\x1f' refers to an error reading '\b' in the blacklist_path but I don't really know how to solve it.

Any help please? 😕

Character length, prefix, suffix support

Like this project a lot

Interested in:

  1. Set a character length for the generated words
  2. Ignore or generate specific prefix/suffix words

Can you point me to the code where I can build this out ?

Nested replies to bot get messed up

Probably shouldn't have the bot try to reply to a thread that already has a definition in it. Should make sure people can use it within a reply thread (e.g. define covfefe)

Is this language specific?

Is there any thing special thing about English or this can be trained and used for languages like Japanese, Arabic, Russian, Hebrew?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.