syvwlch / metroscope Goto Github PK
View Code? Open in Web Editor NEWExperiments with automatic scansion.
License: MIT License
Experiments with automatic scansion.
License: MIT License
Tag all words with their rhyming part, to make it easy to find terminal and internal rhymes.
So after some refactoring in PR #68, the only thing I pull from CMU dict are the phones.
Currently, when I do, I use the first item in the list and ignore any others, and in the custom dict I only store one.
So to support multiple pronunciations, I would need to:
Make the 'phones' key in custom dict point to a list containing the current string.
Add a new property which represents the index into the word's phones list, and which raises an IndexError when set out of range and is irrelevant when the list is None.
Update all the consumers of _phones to use the index, not zero.
Currently I’m just applying the meter directly to each word.
I should at least mark where the meter disagrees with the word’s pronunciation
Should prevent deploying code that just errors out on page load.
Store the entire phones, because that's the underlying data for both the stresses() method and the rhyming_part() method of the pronouncing package.
Which means that once I have the phones in custom_dict, I should stop using stresses_for_word() and instead call stresses() directly, and add a _phones property to WordBuilder.
Add default values for when a particular key is missing in the dict for an entry, eg the entry has syllables but no stresses.
Instead of storing the poems in text files and running the analysis each time, load the entire data object into a database with the following tables:
Update the run script to:
Now that there’s a custom dictionary for words that don’t show in CMU, check it first.
Should be faster and would allow overriding UNC if needed?
With pytest, should get all the introspection needed on plain assert.
So the site can keep track of the scansion proposed by various users.
Already know pytest doesn’t support subTest, so there’s gonna be some work involved.
Worked well for me last summer, time to do the same with this project.
Broke this off from #12 since the fix will be different.
Currently I can force an elision by replacing the offending vowel with an inverted comma in the original text, but that looks weird when the syllable is silent in normal speech.
Alternatively, I can add an entry to the custom dict, but that will only scale for the most common occurrences.
That way we scan the line only once, and the syllabification can help the stress marks land inside a different syllable.
install SQLAlchemy’s flask plugin
create the first table
add a shell context processor
add data via shell
I assumed that it was:
And I was wrong.
This is the next logical step to address #13 and other issues going forward with words that aren't in the CMU dictionary and/or don't break down to syllables properly with the current syllabifier.
First step towards TDD, and allows creation of poem-specific scripts.
Never have the same code written twice, right?
Looks like Markdown is the right package to install for this.
Cleaner than having it defined in the body of the function.
They only show on desktop and they're not styled by Bootstrap...
Once pytest runs our existing tests with no failures:
Since we already switched to NLTK for the syllabification, we could also use:
http://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html
Would allow using same format for custom_dict, maybe?
Add the major poems from chapter one of the handbook of poetry.
Everyone starts with the default scansion of a poem, but when logged in is able to edit it.
Use Pyphen with a custom dictionary in front to fix those cases where hyphenation doesn’t break along all syllables, e.g. equi-va-lent versus e/qui/va/lent.
Should work now that we .extend phones list if a word gets syllables from custom dict but not phones.
With a default colored text output, perhaps?
Should allow easy imports anywhere in the project, including tests which could go back to a separate folder.
Use tool tips to show optional info about a word, such as # of syllables, stress pattern, etc...
Get rid of Hamlet while you’re there too.
SonoriPy has been incorporated into NLTK and will no longer be updated, as per recent update to its repository.
Need to plan a transition, probably once I'm really solid on the unit tests for the functions that use it.
Two ways to do this:
Current line level logic manipulates strings directly.
Refactor that logic into a class while separating the analysis logic from the display logic. An instance of the line-level class should contain a list of the WordBuilder class instances for all of the words in the line, and build any string representations from that list as needed.
For this refactoring exercise, there will be three string representations:
str
repr
refactored logic to generate the HTML currently shown on the site
Give the highlights on what the code is doing.
In particular if the exercises are well suited to what I’m trying to do here.
In particular it would be handy to tab each word so that highlighting could be done via css?
I think the only clean way to do this is a robust method to break the word's spelling into syllables.
Once you do that, you could mark the first vowel in the syllable?
Broke off the related issue with elisions to its own issue: #13
Requires setting the EDITOR environment variable.
This seems to be best practice, to guarantee deterministic builds in prod.
You get an IndexError when you try to read an extra stress from stress_pattern.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.