Giter VIP home page Giter VIP logo

Comments (19)

kranzky avatar kranzky commented on July 1, 2024 1

OK, let's call this good enough. Here's what I've generated so far. Next steps will be to rewrite this to look good, adding punctuation and fixing word case, etc.

from 2018.

hugovk avatar hugovk commented on July 1, 2024 1

OK, label removed! Ping me when you're ready :)

from 2018.

kranzky avatar kranzky commented on July 1, 2024 1

@hugovk please mark this one as completed now :)

The generated novel, along with an explanation of how it all works, is here: https://github.com/kranzky/insoluble/blob/master/README.md

from 2018.

kranzky avatar kranzky commented on July 1, 2024

https://github.com/kranzky/insoluble

from 2018.

kranzky avatar kranzky commented on July 1, 2024

OK, first decision: I'll use Ruby and C. Not Rust. Why burden myself?

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Baby steps. Here's kinda-sorta what I plan to do:

  • Build a model that associates words in three consecutive sentences, and use this to generate a list of keywords that should be in the middle sentence, given two adjacent sentences. Train that on the Gutenberg Corpus.
  • Build a model that can generate a sentence given a bunch of keywords that should appear in that sentence. Train that on the same corpus.
  • Use my novel from NaNoWriMo 2015 to generate a new novel, by taking sentence pairs from my novel and giving them to the first model above to yield a list of keywords, then generating hundreds of candidate sentences from the keywords using the second model above, then compare the generations with the actual sentence from my novel, and use some heuristic to select the "best" generated sentence.

I'll do all this in Ruby, using Sooth. Later, if time permits, I would like to write a new crazy grammatical inference engine that I've been thinking about. We'll see...

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Wrote some code to process the books in the Gutenberg Corpus. It finds chapters, then breaks the paragraphs in each chapter down into individual sentences, then tags those as dialogue or exposition. After running for about 30 minutes it yields 1947424 paragraphs from 33010 chapters taken from 1653 books. This will be my training data.

I also processed Insoluble to yield a template, which I will use to constrain generation. Here's a snippet:

CHAPTER
exposition:I awoke.
exposition:It was far too dark.
exposition:It occurred to me that I'd forgotten to set the alarm, and Julie had neglected to wake me from oversleeping.
exposition:The television was blaring.
exposition:One of the shopping channels judging by the cadence of the presenter's voice.
exposition:I wandered from the bedroom to the kitchen.
exposition:A mess of pots and pans, a half-empty glass of wine, the smell of cooking.
PARAGRAPH
dialogue:"Julie?"
exposition:I called in a hoarse whisper.
dialogue:"Are you home?"

The next step will be to process the template, using a model trained on the Gutenberg corpus, to replace each sentence with a list of keywords (which will then be used to constrain generation of new sentences).

from 2018.

kranzky avatar kranzky commented on July 1, 2024

After much wailing and gnashing of teeth, I managed to use 4,139,939 sentences from the Gutenberg Corpus to generate keywords from the Insoluble template. The whole process took 2h. Here are the results, for the snippet of the template shown above:

CHAPTER
exposition;2:SLEPT REFRESHED SLUMBER
exposition;5:CONCERNED AWOKE REFRESHED
exposition;17:ASLEEP JULIE'S IT'D
exposition;4:JULIE WAKE FLAILING
exposition;10:HOARSE GRUFF HIGH-PITCHED
exposition;7:CHANNELS AIMLESSLY DISHES
exposition;12:WINE GLASS PANS
PARAGRAPH
dialogue;1:JULIE'S DELIVERS SAUCER
exposition;6:WHISPER HOARSE VOICE
dialogue;3:TREETOPS ZIG-ZAGGING FATTY

You can compare the full thing with the template it was generated from.

What I'm doing here is this:

  • Construct a dictionary from everything in template.txt
  • Parse the corpus using this dictionary
  • Learn associations between words in consecutive sentences, and between words within a sentence (so we can estimate the probability of a word being present given another word)
  • Iterate through template.txt, using the learned associations to calculate the correlation between each word in the dictionary and words from sentence pairs
  • Take the three words with the highest positive correlation

The next step will be to build language models from the Gutenberg Corpus to allow this template to be fleshed out; essentially I will need to be able to generate a sentence that is guaranteed to contain at least one, but hopefully all, of the keywords.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

The language models are going to work as follows:

  • Parse each sentence of the corpus.
  • Skip sentences that do not contain any keywords.
  • Infer a model that uses as context two consecutive symbols and a third long-distance symbol in the future, and learns which symbols immediately precede that long-distance symbol.
  • To generate a sentence, try all combinations of keywords. Use the model to fill in the gaps between keywords. Score the results based on the precedence of the keywords that appear in the result and the length of the result.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

I built the language model for generating sentences that are constrained by a list of keywords. I call it a fractal language model. As a quick test, I trained it using just these four keywords: TELEPHONE, BENT, SCREAMED, LIPS. The training data was 938 sentences from the Gutenberg Corpus (I selected 100 random books, and those sentences contained at least one of the four keywords). I then asked it to generate sentences containing 1, 2 or 3 keywords, with these hand-picked results:

  • THERE SCREAMED AND DO OURSELVES PERCHANCE YOU ARE DISTORTED BY TELEPHONE FROM SCOTLAND YARD MY BUSINESS IS TO SPEAK SLOWLY IF HE EXPLAINED TO HIS LIPS
  • BY TELEPHONE CORCORAN SCREAMED AND CLUNG TO CHRISTINE WHO WERE HOLDING HER
  • MR TUPMAN'S LIPS AS IF SHE BENT AND HIS OWN SAKE AND CUTTING AWAY THE ARM CARESSINGLY
  • THE LIPS WERE DARKENING FROM LEYDEN WITH AMUSEMENT AT LAST THE GIRLS SCREAMED TO THE BRINK
  • BEFORE THE MOUTH WITH ME SCREAMED WITH HORROR WHILE EVERY TUG OF THEIR RAVENOUS HATE AND LOOKING ROUND A WOLF UPON A TURN OF AN INSTANT LATER AND RAGE THEY LEAPT UP THE SHOUTS OF THE TELEPHONE

This is good enough for now.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Here's an example of using the fractal language model to flesh out the keywords shown above, trained on a mere 15051 sentences from the Gutenberg Corpus:

CHAPTER
exposition:SCARCE HAD SLEPT IN DEEP SLUMBER
exposition:IMMEDIATELY CONCERNED BUT HE AWOKE THE SLEEPING MAN
exposition:OUTSIDE THROUGH A DOZEN COOKIES CRUMBLED AWAY FROM HER WITH THAT FOLLOWED THEODORA FELL ASLEEP
exposition:PHYLLIS WAKE OF A LITTLE AHEAD
exposition:THE HOARSE A GRUFF WITH THEM YESTERDAY'S WAR OF WORDS
exposition:IN HER AMPLE TIME FLUENT FRENCH DISHES
exposition:AND TANGLED THE DIRECTORS OF THE WINE WITHOUT RESIN AND THE GLASS AND AGAIN AND PANS OF THE FORCES
PARAGRAPH
dialogue:TARZAN
exposition:AGAIN EXULTING WHISPER A HOARSE VOICE ROSE
dialogue:REMARKABLY SENSITIVE CHARACTERISTICS

The next step is to train on all of the corpus. I might also build a blacklist of proper names that should never appear in a generated sentence.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Still working on this! Not much more to report; just fine-tuning generated text to hit the 50K minimum and to implement a minimal blacklist (I tested by blacklisting the word "THE" and the results were quite interesting, so I might do several generations with different common words omitted).

Here's a quick example of what I've got so far. The original text, from my 2016 NaNoWriMo effort:

The room beyond is dark. He peers inside, his eyes slowly adjusting to the gloom. He fancies that he sees a shadowy figure towering before him. The shape twists slightly as a breeze blows in from outside, the beams of the ceiling creaking in protest. A body is suspended in mid-air, arms spread apart, palms facing outward. Beckoning him to enter.

"Jesus!"

Mike waves his hand urgently to activate the artificial lighting. The room quickly brightens in response.

"Christopher!"

Mike rushes into the room, his colleague hanging above him, long dead. Julie falls to her knees in the vestibule, retching and whimpering, eyes tightly shut. Arthur explodes into life from his hiding hole and attempts to enter the room.

"Stop him," cries Mike to Julie. "He'll call the police!"

Julie, now curled up into a softly sobbing ball, doesn't respond. Mike scans the room quickly, looking for something, anything to use to dissuade Arthur from his instinctive response to a dead human body. He rushes into the messy kitchenette, almost overcome by its stink, and continues into the bedroom beyond. A cricket bat leans against the far wall. Consumed with murderous intent, Mike vaults over the bed in one smooth motion to seize it.

And text generated by models that never saw the original:

AND CONFINES THE HEADLIGHTS IN THE DARK. WHEN THEY WERE BLOODSHOT HE SWAYED TO SPARKLE OF THE HEAVY LIDS. HE SHOOK HIS GREAT MOUNTAIN TALL AND PEERS AND GRACEFUL. IT WAS BLOWING VERY HARD AT THAT THE WALLS OF RED CLOTH AND SOUNDED BUT THE FLOOR. HE SAID SIMPLY FOLDED HER FACE SOME INWARD LAUGHTER AND FOLDING WINDOWS SHE EXPLAINED. WHEN SUSPENDED TO FOLLOW HIS EXAMPLE.

GOD.

THE SMOULDERING PIPE WITH FOAM SEEMED ALMOST PRESSED LEVEL BY THE CIGAR. MIKE RELIED UPON A BLACK WAVES.

WREN.

THE SPIKES TOWERED BEHIND THE TOWERING THEY WERE THICKLY STREWN WITH RUSHES. HER ELBOWS MIKE THIS IS THE ROOM ASKED IN A LITTLE BLOODSHOT. THE UNIVERSITY AND FALLS JULIE IT WAS NEEDFUL THAT HOUSE DUG.

CLEAR SHRILL CRY I GO HOME AGAIN SHRIEKS. HE'S GONE SAID MIKE

HIS LIP HAD A MAN LAUGHED AGAIN IN HIS HAIR. HE SHOOK WITH CURIOUS BEINGS WHO CAME FROM THE CIRCLE OF HIS BALL JULIE IS MUCH LESS DISPLEASING SEX. SHE ASKED ARTHUR MADE MIKE COULD NOT IN THE BEST IF ONLY REMEMBER ARTHUR'S FACE. I COULD SEE THAT CONCERNED ABOUT HIS FRECKLED. THAT RAISED HER THE WALL IS VOUCHSAFED FEW MINUTES AT CRICKET IN THE INNER SURFACE OF THE TIME.

Seems there's a book on Christopher Wren in the corpus ;)

from 2018.

kranzky avatar kranzky commented on July 1, 2024

I'm happy with my blacklist, but not so happy with the keywords, so I'll spend a few days going around in circles; generating new keywords, using them to generate the novel, updating the blacklist based on the results, rinse-and-repeat.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Not so fast @hugovk... although the generation is done I still have some work to do to tidy up the results. I'm planning on inferring two models to do this: the first will fix punctuation by predicting the separator that should come between two words, while the second will fix word case by predicting the correct case for a word, given the preceding word and punctuation. The end result should look a bit more like a novel instead of LINES OF SHOUTING.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Had an idea for improving keyword generation, so will do that and see if the results are better.

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Finished that work; to recap, here's the first few lines from the start of the second chapter in the source template:

CHAPTER
exposition:I awoke.
exposition:It was far too dark.
exposition:It occurred to me that I'd forgotten to set the alarm, and Julie had neglected to wake me from oversleeping.
exposition:The television was blaring.
exposition:One of the shopping channels judging by the cadence of the presenter's voice.
exposition:I wandered from the bedroom to the kitchen.
exposition:A mess of pots and pans, a half-empty glass of wine, the smell of cooking.
PARAGRAPH
dialogue:"Julie?"
exposition:I called in a hoarse whisper.
dialogue:"Are you home?"

Here are the keywords we now extract:

CHAPTER
exposition;2:AWOKE GROANS RUINED VISION SLEPT
exposition;5:INACCESSIBLE FROWNING FLICKERING DARK AUSTRIAN
exposition;17:JULIE NEGLECTED COMMON PERHAPS OCCURRED
exposition;4:BLARING HOUSED VOID WARMTH BAG
exposition;10:CADENCE JUDGING PUBLISHED FAILING MUSIC
exposition;7:BEDROOM PLUMAGE TOMORROW WANDERED REGULAR
exposition;12:PANS POTS HALF-EMPTY CAMPERS CANS
PARAGRAPH
dialogue;1:JULIE JULIE'S SALT SMELL IMMEDIATE
exposition;6:HOARSE GIVES CRIES VOICES THEY'RE
dialogue;3:ROARED HOME BACON CRAWL OFFERS

And here is a generation using those keywords:

CHAPTER
exposition;2,2:HE AWOKE THIS VISION
exposition;3,2:A DARK FROWNING FACE HE HITHERTO INACCESSIBLE
exposition;5,0:JULIE NO MORE COMMON COMPLAINT AMONG SAVAGES NEVER PAID EVEN OCCURRED SOME HORSES AND DUMB NEGLECTED PERHAPS
exposition;3,2:THE VOID HIS BAG BALMY WARMTH
exposition;5,2:FAILING JUDGING FROM THE SWEETEST CADENCE OF A MUSIC AND PUBLISHED IT
exposition;4,2:AS HE WANDERED ON ITS REGULAR WINTER PLUMAGE TOMORROW
exposition;4,0:THE HALF-EMPTY SHED AND THE RIVAL CAMPERS SEPARATED THE CANS IRON POTS
PARAGRAPH
dialogue;2,2:NO SALT JULIE
exposition;4,1:PENS OF VOICES HOARSE CRIES THEY'RE NOT
dialogue;2,1:WHEN HE ROARED BACON

I think the results are more interesting, yet still relevant.

Onward!

from 2018.

kranzky avatar kranzky commented on July 1, 2024

I've finished the repair script, and am currently working on an improvement to the fractal language model, which is unfortunately taking days to run on an AWS high memory instance. So I may need to ditch that.

Here's the repaired version of the generation shown above:

He awoke this vision! A dark frowning face, he hitherto inaccessible. Julie more common complaint, among savages never paid, even occurred some horses and dumb, neglected, perhaps? The void his bag warmth. Failing from the sweetest cadence of a Music, and published it. As he wandered on its regular winter plumage! The half-empty shed and the rival campers separated the cans, iron pots.

"No salt--?" Pens of voices, hoarse cries? They're not. "When he roared Bacon--"

from 2018.

kranzky avatar kranzky commented on July 1, 2024

Working on multithreading, as my language model is taking too long to train...

from 2018.

hugovk avatar hugovk commented on July 1, 2024

Labelled, and congratulations!

from 2018.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.