Giter VIP home page Giter VIP logo

Comments (2)

jsvine avatar jsvine commented on June 19, 2024

Hi @Denton-L! Thanks for raising this issue.

It looks like the error actually comes when the model is created rather than at the combine step. (See the first three three lines of the stacktrace you've pasted; this can also be confirmed by removing the combined_model = ... line from your script and re-running it.)

Here's what seems to be happening:

  • When new_model = markovify.NewlineText(broken) is invoked in your test script, it is trying to create a Markov model with just one line of text (e.g., 'this string is not broken'). When using NewlineText, one line of text corresponds to a single sentence.

  • When markovify.Text — of which markovify.NewlineText is a subclass — ingests a new corpus, it ignores sentences with certain characters/patterns. (See reject_pat here.)

  • If a corpus is composed entirely of sentences containing those patterns — as the error-triggering examples above do — then it effectively is working with an empty corpus, which is causing this error. You can test this by, for example, changing "this string contains (" to "this string contains ( \n this is an example sentence".

If this is causing problems with one of your projects, the easiest fix would be to override markovify. Text.test_sentence_input(...).

In the slightly longer-term, I think these fixes are in order:

  • A more informative error message when all sentences of a corpus are rejected.

  • An easier-to-override reject_pat in markovify. Text.test_sentence_input(...)

Any other thoughts/suggestions?

from markovify.

shge avatar shge commented on June 19, 2024

@jsvine

A more informative error message when all sentences of a corpus are rejected.

👍

It would be great if we can add other punctuation marks. (e.g. Japanese: )

from markovify.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.