Comments (8)
@kylepjohnson sure I'll sort that, I just need to get around to a little more testing in real text examples.
from cltk.
Hello @sjhuskey,
Thank you for the report, I'm not the on that developed the syllabifier for Ancient Greek, but I can look at how this can be fixed. The syllabification can be either fixed here in the prosody module for Ancient Greek or done with a specific process.
I'll try solutions when I have time.
from cltk.
Hi @sjhuskey,
I've been having similar problems. Here are my non-expert thoughts from testing.
The syllabifier doesn't syllabify the way you might want, but tries to end a "syllable" with a vowel. Therefore the unusual splitting of syllables, while perhaps undesirable from an academic perspective, seems to be the behaviour desired by the author as described in the comments on the program.
When testing this sentence, the first α of ἀρίστων is falling prey to the problem of line 288 (next_syll = sentence[sentence.index(syllable) + 1]) which finds the index of the syllable. It should return an index of 2, and instead returns [0] because it is returning the index of the first syllable 'α' in the sentence. This means it is returning True to _long_by_position case 1.
The πα of πάγχρυσον should return True to case 1 (since it is followed by two consonants and not a mute + liquid combination). The phrasing of lines 290—291, however, is:
if (next_syll[0] in self.sing_cons and next_syll[1] in self.sing_cons) and (
next_syll[0] not in self.stops and next_syll[1] not in self.liquids)
I think that:
if (next_syll[0] in self.sing_cons and next_syll[1] in self.sing_cons) and (
next_syll[0] not in self.stops or next_syll[1] not in self.liquids)
should be correct, since the lack of either of these ought to allow for a return True, and it works in this case. I can't think of any undesirable outcomes, but I will look more closely and consult the method in McCabe 1981.
As for the γχρυ of πάγχρυσον, I can't think of any way to recognize the length (which ought to be long) without an implementation of a similar dictionary-based macronizer to that in the Latin scansion module. I don't know if a morpheus solution for Greek would work, but something similar would be excellent. The scansion model as Kirby wrote it asks for fully macronized texts to begin with.
All best.
from cltk.
The solution I am currently trying out for my own research adds an enumerator to prevent the index function fetching the wrong syllable, so changing:
line 266 to: def _long_by_position(self, sentence_index, syllable: str, sentence: list[str]) -> bool:
288 to: next_syll = sentence[sentence_index + 1]
using the variable sentence_index to locate the syllable in question as a part of the function _long_by_position. The variable is set in the _scansion function.
324 to: for i, syllable in enumerate(sentence):
325 to: if self._long_by_position(i, syllable, sentence) or self._long_by_nature(
These fairly minimal changes (with the change to the logic of lines 290–291) make the line ἀνδρῶν ἀρίστων, οἳ τὸ πάγχρυσον δέρας scan ['¯¯˘¯¯¯˘¯˘¯˘x'], and have removed repeated syllable errors in other texts I have tried.
from cltk.
Hi @SDCLA @sjhuskey I don't have the time to help on this, but would gladly accept a working pull request that incorporates your changes. I only ask that you be mindful not to make any unnecessary changes.
from cltk.
Hi @sjhuskey my changes have been integrated. I would love to know what you think if you get a chance to test it! I am planning to look at the possibilities of a dictionary lookup based macronizer for those edge cases that just won't work any other way
from cltk.
Thanks for your work @SDCLA! I'll check it out sometime this week.
from cltk.
Fixed, reopen this issue if the @kylepjohnson's fix is not sufficient.
from cltk.
Related Issues (20)
- Questions about ambiguous POS and cases and about extracting lines of poetry. HOT 5
- ModuleNotFoundError: No module named 'cltk.corpus' HOT 1
- broken on latest Python in linux HOT 2
- Another root error with Collatinus.Decliner HOT 1
- Support Python 3.11 HOT 6
- Modern Greek language HOT 1
- CUDA memory leak HOT 2
- Problem with cltk_nlp.analyze() - Wont' read str HOT 13
- ctlk 1.0.25 depends on a PyYAML bug version (5.4.1) HOT 3
- cltk_nlp.analyze() producing errors no matter what input HOT 7
- Integration of HuggingFace models
- Python 3.11 support HOT 5
- Don't require a period at the end of a line for Scansion().scan_text() HOT 8
- Latin lemmatization appears buggy HOT 1
- Error analyzing NLP HOT 2
- Cannot pickle CLTK Doc containing certain data types from Spacy HOT 13
- cltk.NLP.analyze miscounts character indices HOT 8
- Issue with Transcription of Greek Words Ending in Apostrophe (᾽) HOT 3
- beta_to_unicode.py problems HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cltk.