charlesloder / havarotjs Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 6.0 2.49 MB

A Typescript package for getting syllabic data about Hebrew text with niqqud.

Home Page: https://www.npmjs.com/package/havarotjs

License: MIT License

JavaScript 2.76% TypeScript 97.16% Shell 0.07%

havarotjs's People

Contributors

Stargazers

Watchers

Forkers

rivkahcarl ryuusama09 ighmaz m-yac julianwagle

havarotjs's Issues

Fix docs failing

The doc CI job always fails because typedoc was updated. Either downgrade typedoc or find a better pages plugin

Incorrect Holem waw

When there is a word with a "waw with a holem" and a "holem waw" in the same word, the "waw with a holem" is incorrectly replaces with a "holem waw"

E.g. עֲוֹנוֹתֵינוּ

Sanitize Holem-Waw Orthography

There are two ways to write a holem-waw:

Pattern	Word
(1) consonant + holem + waw	שָׁלֹום
(2) consonant + waw + holem	שָׁלוֹם

Additionally, instead of a holem (U+05B9), a holem haser for vav (U+05BA) can also be used for typographic reasons, meaning there are four possible patterns for encoding a holem-vav.

Pattern (2) is preferred because:

it is semantically correct—the vowel belongs to the consonant, the waw is simply an orthographic marker
it reduces confusion for when a waw is being used as a consonant with a holem as its vowel (e.g. עָוֹן)

Because the holem haser for vav (U+05BA) is primarily used for typographic reasons, it will be best to convert all occurrences of U+05BA to U+05B9.

In order to semantically encode a holem-waw, all occurrences in each word of:

a waw preceding a holem, but no vowel preceding the waw will be swapped so that the holem precedes the waw.

Examples:

For שָׁלוֹם, since the waw precedes the holem, but not vowel precedes the waw, the holem and waw would be switched so that it becomes שָׁלֹום.
For עָוֹן, since a vowel precedes the waw, they would not be switched

Because taamei can occur before a waw but do not need to occur before a waw, the taamei will be removed, the characters swapped, and then the strings rebuilt like the qametsQatan sanitation.

Single shureq failing

A single shureq וּ fails with the error:

TypeError: Cannot read properties of undefined (reading 'hasTaamim')
    at /Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:297:70
    at Array.filter (<anonymous>)
    at setIsAccented (/Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:297:42)
    at /Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:347:37
    at Array.forEach (<anonymous>)
    at syllabify (/Users/charlesloder/Documents/code/personal/havarot/dist/utils/syllabifier.js:347:15)
    at get syllables [as syllables] (/Users/charlesloder/Documents/code/personal/havarot/dist/word.js:67:44)
    at /Users/charlesloder/Documents/code/personal/havarot/dist/text.js:146:46
    at Array.map (<anonymous>)
    at get syllables [as syllables] (/Users/charlesloder/Documents/code/personal/havarot/dist/text.js:146:27)

The error is caused by

https://github.com/charlesLoder/havarot/blob/a824a06690b2b823f37c555aa734088ce27904e7/src/utils/syllabifier.ts#L39-L50

Need to check if arr[i] exists

Changelog guard

I keep forgetting to update the changelog. Create some guard to ensure that it's udpated. Maybe even a simple Y/n on the command line

Holem waw with final aleph

The fix in #17 caused an error where a word with a final aleph - ס֣וֹא would lose the aleph. This is a non-standard Hebrew spelling

Improve handling Divine Name

When Latin characters are used next to the Divine Name (e.g. a comma), it creates issues.

See this issue for more context.

Will have to adjust this regex:

havarotjs/src/utils/divineName.ts

Line 1 in 7538f7d

const nonChars = /[\u{0591}-\u{05C7}]/gu;

To probably something like

const nonChars = /[^\u{05D0}-\u{05F4}]/gu;

Various spellings of Jerusalem

The various spellings of 'Jerusalem' do not sequence correctly.

Uncommon

The most uncommon spelling — יְרוּשָׁלַיִם like וִירוּשָׁלַ֨יִם֙ in Jer 26:18 — syllabifies fine ✅

Common

The common spelling of יְרוּשָׁלִַ֗ם like in Josh 10:1 does syllabify correctly, but switches the hiriq and the patach in the final syllable 👎

With a metheg/sof pasuq

See יְרוּשָׁלִָֽם in 2 Sam 14:23; the same issue as above 👎

The issues resides in how the Cluster sequences the Chars.

Support holam haser

Holam haser should be supported. It prints correctly

Thank you so much for your continued work on this fantastic library. I'm experiencing an issue with a Vav Holam shifting to the previous letter after initializing a Text object. Any suggestions on what may be going wrong? I'm using version 0.7.2 with Node for reference.

Word passed to Text() = א֑וֹר
Syllable returned = אֹ֑ור

Thank you!!

Add q.q. check

Add a check for q.q.

havarotjs/src/utils/qametsQatan.ts

Lines 98 to 104 in 6ca463a

 const qametsReg = /\u{05B8}/u; 

 const hatefQamRef = /\u{05B3}/u; 

 // if no qamets, return 

 if (!qametsReg.test(word)) { 

 return word; 

 }

 const qametsReg = /\u{05B8}/u; 
 const qametsQatReg = /\u{05C7}/u; 
 const hatefQamRef = /\u{05B3}/u; 
  
 // if no qamets or has qamets qatan char, return 
 if (!qametsReg.test(word) || qametsQatReg.test(word)) { 
   return word; 
 }

Error thrown with Divine Name

The Divine Name יְהוָה causes causes the Error A Syllable shouldn't preceded a Cluster with a Mater.

This wasn't anticipated as the Divine Name does not follow typical rules.

The name can be written two ways:

as יְהֹוָה with a holem, which produces no error
as יְהוָה w/o a holem, which produces an error but is more typical.

Perhaps add create property Word.isDivineName?

Update npm homepage info

havarotjs/package.json

Lines 21 to 28 in 6ca463a

 "repository": { 

 "type": "git", 

 "url": "https://github.com/charlesLoder/havarot.git" 

 }, 

 "bugs": { 

 "url": "https://github.com/charlesLoder/havarot/issues" 

 }, 

 "homepage": "https://github.com/charlesLoder/havarot",

Add more forms of כל

There are a lot of variation on the form כל (e.g. וְכָל). Most issues are going to occur in non-Biblical texts where the use of the maqqef is less common.

I will need to identify these — hopefully systematically.

Shureq on Alef and preceded by Syllable with Shewa throws error

This text:

וְאֵ֗לֶּה שְׁמוֹת֙ בְּנֵ֣י יִשְׂרָאֵ֔ל הַבָּאִ֖ים מִצְרָ֑יְמָה אֵ֣ת יַעֲקֹ֔ב אִ֥ישׁ וּבֵית֖וֹ בָּֽאוּ׃ ברְאוּבֵ֣ן שִׁמְע֔וֹן לֵוִ֖י וִיהוּדָֽה׃ גיִשָּׂשכָ֥ר זְבוּלֻ֖ן וּבִנְיָמִֽן׃ דדָּ֥ן וְנַפְתָּלִ֖י גָּ֥ד וְאָשֵֽׁר׃ הוַֽיְהִ֗י כׇּל־נֶ֛פֶשׁ יֹצְאֵ֥י יֶֽרֶךְ־יַעֲקֹ֖ב שִׁבְעִ֣ים נָ֑פֶשׁ וְיוֹסֵ֖ף הָיָ֥ה בְמִצְרָֽיִם׃ ווַיָּ֤מׇת יוֹסֵף֙ וְכׇל־אֶחָ֔יו וְכֹ֖ל הַדּ֥וֹר הַהֽוּא׃ זוּבְנֵ֣י יִשְׂרָאֵ֗ל פָּר֧וּ וַֽיִּשְׁרְצ֛וּ וַיִּרְבּ֥וּ וַיַּֽעַצְמ֖וּ בִּמְאֹ֣ד מְאֹ֑ד וַתִּמָּלֵ֥א הָאָ֖רֶץ אֹתָֽם׃ {פ} חוַיָּ֥קׇם מֶֽלֶךְ־חָדָ֖שׁ עַל־מִצְרָ֑יִם אֲשֶׁ֥ר לֹֽא־יָדַ֖ע אֶת־יוֹסֵֽף׃ טוַיֹּ֖אמֶר אֶל־עַמּ֑וֹ הִנֵּ֗ה עַ֚ם בְּנֵ֣י יִשְׂרָאֵ֔ל רַ֥ב וְעָצ֖וּם מִמֶּֽנּוּ׃ יהָ֥בָה נִֽתְחַכְּמָ֖ה ל֑וֹ פֶּן־יִרְבֶּ֗ה וְהָיָ֞ה כִּֽי־תִקְרֶ֤אנָה מִלְחָמָה֙ וְנוֹסַ֤ף גַּם־הוּא֙ עַל־שֹׂ֣נְאֵ֔ינוּ וְנִלְחַם־בָּ֖נוּ וְעָלָ֥ה מִן־הָאָֽרֶץ׃

throws the error:

Error: Syllable should not precede a Cluster with a Mater

Figure out why

Shewa and Shureq Not Syllabifying Correctly

Words in the form of CǝCûC are being syllabified as 1 syllable, when they should be two syllables.

Words in the form of CǝCû (w/o the final consonant) are correct 2 syllables, and so are words of the form CǝCVC.

There is likely a problem in the groupFinal logic.

Syllable should include more linguistic data

Though the Syllable has useful properties, it should have linguistic properties of syllables as well. Suggestions:

Syllable.onset: string | null
Syllable.nucleus: string
Syllable.coda: string | null

Syllable.onset

The overwhelming majority of Hebrew syllables have an onset. Though the aleph or ayin may not be considered an onset in Modern Hebrew, they were in Biblical, and orthographically function like an onset.

The only syllable that won't have onset is a word-initial shureq (e.g. וּמֶלֶךְ [u. 'mε. lεk])

In Biblical Hebrew, there are no medial consonants in the onset; that is, there are no consonant clusters (i.e. CCV or CCVC types). The only exception is for the numeral שְׁתַּיִם and its various forms.

Syllable.nucleus

Every syllable must have a nucleus (i.e. vowel). A vocal shewa is a nucleus

Syllable.coda

A coda is optional. A final qamet-he or qamets-aleph would not count as a coda; these would be of the syllable type CV, but a he with a mappiq would be a coda—it would be a syllable type of CVC.

Aleph-Shureq Failing

In the word יִירָא֥וּךָ the aleph was being was being parsed as a quiesced aleph (i.e. the syllable as רָא֥) instead of as a consonant (i.e. as א֥וּ)

Option for Modern Hebrew Syllabification

Currently, havarot syllabifies words according to Traditional (i.e. Sephardic) or Tiberian rules.
The ability to syllabify word according to general Modern Hebrew pronunciation would be beneficial, especially for augmenting with transliteration schemas that follow Modern Hebrew

Differences

Syllable Properties

Syllable.medial

In issue #2, it is proposed to introduce more linguistic properties to syllables.
Modern Hebrew differs in it's syllable properties

A medial property would need to be included:

Syllable.medial: string | null

Modern Hebrew allows for syllable types of CCV and CCVC.

E.g. גְּדֹולִים is realized as [gdo. 'lim]

Syllable.onset

For syllables beginning with א, ע, or ה, the onset can be realized as null.
Though, orthographically, they do function like an onset.

Realization of Shewa

In Biblical Hebrew reading traditions, the shewa is often vocalic, but in Modern Hebrew it is often realized as a zero-vowel [Ø] (Coffin and Bolozky, A Reference Grammar of Modern Hebrew, 22), creating syllables of CCV or CCVC types (see above)

The most common times that a word-initial (maybe syllable-initial) shewa is realized as vocalic is when (1) it's onset is a י, ל, מ, נ, or ר, or (2) when the second letter is א, ה, or ע.

Example of (1):

גְּדֹולִים is [gdo. 'lim]
לְבָנִים is [lǝ. va. 'nim]

Example of (2):

תְּשׁוּקָה is [tʃu. ˈka], but
תְּאוּנָה is [tǝ. u. ˈna]

A shewa preceded by a shewa is typically vocal as well, just like TIberian, but not necessarily so

Mater text is reversed

On v0.1.2, example:

const str = "מַשִׁיחַ";
const doc = new Text(str);
const res = doc.syllables.map((el) => el.text);
[
  "מַ", // \u{5DE}\u{5B7} (mem, patach) 
  "ישִׁ",// \u{5D9}\u{5E9}\u{5C1}\u{5B4} (yod, shin, shin-dot, hiriq)
  "חַ" // \u{5D7}\u{5B7} (chet, patach)
]

The yod should be after the shin cluster.

Something is wrong in the new syllabifier.ts logic. Need to add tests.

Improved Metheg/Siluq Distinction

The Cluster.hasMetheg property needs to better determine between the use of U+05BD as a metheg or as a siluq.

check if the Cluster even has a metheg. If no, return false
If yes, loop over the text of Clusters via this.next.
a. check if a sof pasuq is present. If yes, then the metheg is really a siluq. Return false
b. check if another Cluster has a metheg. If yes, than the second metheg is the siluq, and the current one is a metheg. Return true
c. if no sof pasuq or additional metheg is found, then return true

This logic will have to be tweaked a bit

Add `vowelName` property to `Cluster`

Similar to #74, a property called vowelName should exist that returns unicode character name.

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.vowelName)
// ["SHEVA", "SEGOL", "SEGOL", null]

Things to consider:

is SHEVA a "vowel"? See especially hasShewa property
how should names be formatted? Leaning towards replacing spaces with underscores

Maybe `isVocalSheva` property on `Cluster` only

On the Cluster object, add a property called isVocalSheva, that return a boolean indicating if the shewa is vocal or not.

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.isVocalSheva)
// [true, false, false, false]

Things to consider:

Perhaps null if there is not shewa?
vocal shewa or shewa na?

Letters that Reject Dagesh Chazaq, but have Shewa Na'

Certain letters—שׁ, שׂ, ס, צ, נ, מ, ל, ו, י—when they have a shewa na' (i.e. vocal shewa) reject a dagesh chazaq (i.e. forte).

E.g. וַיְּהִי* becomes וַיְהִי

Should be syllabified as: ["וַ", "יְ", "הִי"], but instead get ["וַיְ", "הִי"].

Some may consider the first syllable (i.e. "וַ") as closed, but it will be considered open.

Medial maters incorrect when strict is false

See: אֱלֹהֶ֑יךָ or רָקִ֖יעַ

Error with Forms with Shureq and Shewa

A word like: וּלְזַמֵּ֖ר throws the Error "A Syllable shouldn't precede a Cluster"

Multiple Maqqefs Not Split

A text like עַל־כָּל־הַשָּׂרִים֙ is only split into 2 Words. Should be 3

Loss of Dagesh Chazaq after Article and Interrogative

Acc. to GKC §20m there are instances when after the article and the interrogative מה that the dagesh chazaq (or forte) is omitted:

Very frequently in certain consonants with Šewâ mobile, since the absence of a strong vowel causes the strengthening to be less noticeable. This occurs principally in the case of ו and י (on יְ and יֵּ after the article, see § 35 b; on יְּ after מַה־, § 37 b); and in the sonants מ‍,[6] נ‍ and ל; also in the sibilants, especially when a guttural follows (but note Is 629, מְאַסְפָיו, as ed. Mant. and Ginsb. correctly read, while Baer has מְאָֽסְ׳ with compensatory lengthening, and others even מְאָסְ׳; מִשְׁמַנֵּי Gn 2728, 39; מִשְׁלשׁ 38:24 for מִשְּׁ׳, הַֽשְׁלַבִּים 1 K 728; אֶֽשְֽׁקָה־ 1 K 1920 from נָשַׁק, הַֽשְׁפַתַּ֫יִם Ez 4043 and לַֽשְׁפַנִּים ψ 10418; מִשְׁתֵּים Jon 411, הַֽצְפַרְדְּעִים Ex 81 &c.);—and finally in the emphatic ק.[7]

Of the Begadkephath letters, ב occurs without Dageš in מִבְצִיר Ju 82; ג in מִגְבֽוּרָתָם Ez 3230; ד in נִדְחֵי Is 1112 56:8, ψ 1472 (not in Jer 4936), supposing that it is the Participle Niphʿal of נָדַח; lastly, ת in תִּתְצוּ Is 2210. Examples, עִוְרִים, וַיְהִי (so always the preformative יְ in the imperf. of verbs), מִלְמַ֫עְלָה, לַֽמְנַצֵּחַ, הִנְנִי, הַֽלֲלוּ, מִלְאוּ, כִּסְאִי, יִשְׂאוּ, יִקְחוּ, מַקְלוֹת, מִקְצֵה, &c. In correct MSS. the omission of the Dageš is indicated by the Rāphè stroke (§ 14) over the consonant. However, in these cases, we must assume at least a virtual strengthening of the consonant (Dageš forte implicitum, see § 22 c, end).

The second paragraph is likely beyond the scope of this package.

The first paragraph has three categories for when a dagesh chazaq may be lost, but the shewa should still be counted as a shewa naʿ (or shewa mobile/vocal):

in the case of ו and י (on יְ and יֵּ after the article, see § 35 b; on יְּ after מַה־, § 37 b)
in the sonants מ‍, נ‍ and ל
in the sibilants, especially when a guttural follows

The Article

Walkte & O'Connor §13.3d give a simplified explanation:

According to this, the shewa is a shewa nach not a shewa na' seemingly contra GKC.

The Interrogative

GKC's references are ambiguous

see charlesLoder/hebrew-transliteration#14

...wip

Goal

In the forms with a metheg there is nothing to check. For the others, something like:

if cluster.hasShewa and /י/,test(cluster.text) and /הַ/.test(cluster.prev.text)

should syllabify as:

["מִן־", "הַ", "יְ", "אֹ֗ר"]

That would limit it only to the article, but it would be a start.

Allow incorrect syllabification option

Maybe an option like strict and when false, allows for incorrect text.

Basically,

havarotjs/src/utils/syllabifier.ts

Lines 214 to 218 in 460869f

 
 if (nxt instanceof Syllable) { 

 throw new Error("Syllable should not precede a Cluster with a Mater"); 

 }

and

havarotjs/src/utils/syllabifier.ts

Lines 265 to 269 in 460869f

 
 if (nxt instanceof Syllable) { 

 throw new Error("Syllable should not precede a Cluster with a Mater"); 

 }

would need to be bypassed, and error that occur from Cannot read properties of undefined (reading 'has<something>')

Goliath

See original issue here

See twitter thread

Paseq Should Not Be Included

The paseq is included when words as split (e.g. "עֵֽדֹתֶ֨יךָ ׀ "). Because the paseq is a word divider, it should not be included. Additionally, it messes with adding accents to syllables, causing in the example above the the final kaf to be marked as accented.

The paseq should be counted as its own word with a single syllable similar to how non-Hebrew words are handled.

Delete Class Interfaces

Need to delete the interfaces for the classes as they are not being used correctly

See:

Improve Qamets Qatan for 'kol'

Currently, qamets qatan for the word כל is only recognized when a maqqef is present

https://github.com/charlesLoder/havarot/blob/b299902f4980c838b98fd15d21988f1eda93bb31/src/utils/qametsQatan.ts#L65-L66

This could be potentially fixed with adding

"^כָּל$",
"^כָל$",

For clarity that should be ^kaf+(dagesh)+qamets+lamed$

Single syllable holem waw lose final letter

Single syllable words spelled with a holem waw lost the final letter.

Examples are : י֔וֹם and ע֜וֹד

The issue lies in holemWaw.ts

Allow for Hebrew without niqqud

Currently, only text with niqqud is allowed

https://github.com/charlesLoder/havarot/blob/edded3c45e46d6c524826945f31faa7f15adb89d/src/text.ts#L135-L141

It may be good to have an option that makes it explicit that the user would like to syllabify words w/o niqqud — I'm not totally sure what that'll look like though

Shureqs not working when `strict` false

Hebrew רוּחַ

Overall, when strict is true everything should still be correct

Standardize Hebrew character names

The names of Hebrew characters should be standardized (e.g. shewa become sheva).

Also update the documentation

Find more words

Like מָחֳרָת and יָרָבְעָם

Divine name dropping latin char after

See here

The Latin character after the Divine Name is dropped

Hebrew

כִּי אִם בְּתוֹרַת יְהוָה, חֶפְצוֹ; וּבְתוֹרָתוֹ יֶהְגֶּה, יוֹמָם וָלָיְלָה

Transliteration

kî ʾim bǝtôrat yhwh ḥepṣô; ûbǝtôrātô yehgê, yômām wālāyǝlâ

Add `hasVowel` property to `Cluster`

In the same vein as #74 and #75, create a property (maybe method is better term here) called hasVowel that takes a vowel name and returns a boolean

new Text("בְּאֶ֣רֶץ").cluster.map(c => c.hasVowel("SEGOL"));
// [false, true, true, false]

Add premade syllabification schemas

The way it checks for a schema and then sets options according to that isn't intuitive.

Instead, create premade syllabification schemas that can just be imported

Error Handling for Texts w/o Niqqud

Should through an error if text without niqqud is passed in.

Add `vowel` property to `Cluster`

On the Cluster object, add a property that return the unicode character.

Something like:

new Text("בְּאֶ֣רֶץ").clusters.map(c => c.vowel)

The first three should return the vowel characters of SHEVA, SEGOL, and SEGOL, and the final should return null.

Update repo name and branding to havarotjs

I went simply w/ havarot at first, then found out that it was taken on npm. I should change everything for consistency

Treat hyphen or double hyphen like maqqef

Some texts use a hyphen or double hyphen like a maqqef

Pipe character causing errors

The pipe character (e.g. אֲשֶׁר | אָֽנֹכִי) causes the error Cannot read properties of undefined (reading 'hasVowel').

Some texts use a pipe character instead of a paseq.

The pipe characters are separated into their own words, and when they are syllabified, all the Latin chars are removed and an empty array is used when trying to group clusters

https://github.com/charlesLoder/havarot/blob/1dd198029947386b03dd1433d8fadeece0bfd57b/src/utils/syllabifier.ts#L379

In order to fix this, add a check to see if the Word is Hebrew or not. If not, just make a syllable like is done with the Divine Name

Lines 86 to 92 in 6ca463a

 const sequenceSnippets = (arr: string[]) => { 

 return arr.map((snippet) => { 

 const text = snippet.normalize("NFKD"); 

 const sequencedChar = sequence(text).flat(); 

 return sequencedChar.reduce((a, c) => a + c.text, ""); 

 }); 

 };

To something like

const sequenceSnippets = (arr: string[]) => { 
   return arr.map((snippet) => (sequence(snippet.normalize("NFKD")).flat().join(""));
 };

	const qametsReg = /\u{05B8}/u;
	const hatefQamRef = /\u{05B3}/u;

	// if no qamets, return
	if (!qametsReg.test(word)) {
	return word;
	}

	"repository": {
	"type": "git",
	"url": "https://github.com/charlesLoder/havarot.git"
	},
	"bugs": {
	"url": "https://github.com/charlesLoder/havarot/issues"
	},
	"homepage": "https://github.com/charlesLoder/havarot",


	if (nxt instanceof Syllable) {
	throw new Error("Syllable should not precede a Cluster with a Mater");
	}

	const sequenceSnippets = (arr: string[]) => {
	return arr.map((snippet) => {
	const text = snippet.normalize("NFKD");
	const sequencedChar = sequence(text).flat();
	return sequencedChar.reduce((a, c) => a + c.text, "");
	});
	};

charlesloder / havarotjs Goto Github PK

havarotjs's People

Contributors

Stargazers

Watchers

Forkers

havarotjs's Issues

Uncommon

Common

With a metheg/sof pasuq

Syllable.onset

Syllable.nucleus

Syllable.coda

Differences

Syllable Properties

Syllable.medial

Syllable.onset

Realization of Shewa

The Article

The Interrogative

Goal

Recommend Projects

Recommend Topics

Recommend Org