Comments (2)
Hi @Denton-L! Thanks for raising this issue.
It looks like the error actually comes when the model is created rather than at the combine
step. (See the first three three lines of the stacktrace you've pasted; this can also be confirmed by removing the combined_model = ...
line from your script and re-running it.)
Here's what seems to be happening:
-
When
new_model = markovify.NewlineText(broken)
is invoked in your test script, it is trying to create a Markov model with just one line of text (e.g., 'this string is not broken'). When usingNewlineText
, one line of text corresponds to a single sentence. -
When
markovify.Text
— of whichmarkovify.NewlineText
is a subclass — ingests a new corpus, it ignores sentences with certain characters/patterns. (Seereject_pat
here.) -
If a corpus is composed entirely of sentences containing those patterns — as the error-triggering examples above do — then it effectively is working with an empty corpus, which is causing this error. You can test this by, for example, changing
"this string contains ("
to"this string contains ( \n this is an example sentence"
.
If this is causing problems with one of your projects, the easiest fix would be to override markovify. Text.test_sentence_input(...)
.
In the slightly longer-term, I think these fixes are in order:
-
A more informative error message when all sentences of a corpus are rejected.
-
An easier-to-override
reject_pat
inmarkovify. Text.test_sentence_input(...)
Any other thoughts/suggestions?
from markovify.
A more informative error message when all sentences of a corpus are rejected.
👍
It would be great if we can add other punctuation marks. (e.g. Japanese: 。
)
from markovify.
Related Issues (20)
- subclassing markovify.Text to allow for different types of 'sentences' HOT 3
- Decreasing export size / memory usage HOT 1
- Character level chains instead of word level? HOT 2
- Markovify always outputs "None" with russian corpus HOT 12
- markovify and music HOT 1
- Thank you for a job well done! HOT 2
- I can’t install because of the encoding of the file HOT 1
- Can I generate sentence with only two words? HOT 2
- generate sentence with it's prediction HOT 2
- spaCy model shortcuts are deprecated HOT 1
- Non-english characters are not being displayed correctly.
- markov_text_model.make_sentence_with_start KeyError HOT 1
- Fallback without building a new model? HOT 1
- “python_requires” should be set with “>=3.6”, as markovify 0.9.3 is not compatible with all Python versions. HOT 1
- Control generated sentences randomness HOT 2
- - HOT 2
- missing utf-8 BOM lead to codec failures during tests on windows
- Markovify - Markov chain : Seed and Condition to text generated based in input. HOT 2
- markovify's make_sentence_with_start() doesn't seem to work properly HOT 11
- Can't install on browser webpage.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from markovify.