freebiblesindia / punjabi_bible Goto Github PK

Punjabi Bible (ਪੰਜਾਬੀ ਬਾਈਬਲ). This work is made available under a Creative Commons Attribution-ShareAlike 4.0 International License.

Home Page: http://www.freebiblesindia.in/bible/pan/

License: Other

punjabi_bible's Issues

Canonical Psalm titles should use \d rather than \s ; acrostic headings should use \qa

Currently in the Psalms there are

175 instances of \s - being used as non-canonical section headings
137 instances of \s1 - being used improperly

The proper USFM tag for the 116 canonical Psalm titles is \d.
The 22 acrostic stanza headings in Psalm 119 should use the tag \qa.

btw. There must be one canonical title missing, as 116 + 22 = 138.

The chapter label tag \cl occurs only once in Psalm 1 after the chapter marker.

\c 1
\ms Book One
\mr Psalms 1-41
\cl Psalm 1

If you wish each Psalm to be properly labeled with the word Psalm rather than Chapter,
then you should use the chapter label tag before the marker for chapter 1, thus:

\cl Psalm
\c 1
\ms Book One
\mr Psalms 1-41

Refer to the USFM User Reference for details.
NB. Some Bible software apps may not support this feature.

Misplaced question mark in 2 Kings 1:6

2 Kings 1:6 reads:
\v 6 ਉਨ੍ਹਾਂ ਨੇ ਉਸ ਨੂੰ ਉੱਤਰ ਦਿੱਤਾ, “ਇੱਕ ਮਨੁੱਖ ਸਾਨੂੰ ਮਿਲਣ ਲਈ ਆਇਆ ਅਤੇ ਸਾਨੂੰ ਆਖਿਆ ਕਿ ਜਿਸ ਰਾਜੇ ਨੇ ਤੁਹਾਨੂੰ ਭੇਜਿਆ ਉਸ ਦੇ ਕੋਲ ਮੁੜ ਜਾਓ ਅਤੇ ਉਸ ਨੂੰ ਆਖੋ ਕਿ ਯਹੋਵਾਹ ਇਸ ਤਰ੍ਹਾਂ ਕਹਿੰਦਾ ਹੈ, ‘ਕੀ ਇਸਰਾਏਲ ਵਿੱਚ ਕੋਈ ਪਰਮੇਸ਼ੁਰ ਨਹੀਂ ਹੈ ਜੋ ਤੂੰ ਅਕਰੋਨ ਦੇ ਦੇਵਤੇ ਬਆਲ-ਜਬੂਬ ਕੋਲ ਪੁੱਛਣ ਲਈ ਭੇਜਦਾ ਹੈਂ’? ਇਸ ਲਈ ਜਿਸ ਬਿਸਤਰ ਉੱਤੇ ਤੂੰ ਪਿਆ ਹੈਂ, ਉਸ ਤੋਂ ਤੂੰ ਨਹੀਂ ਉੱਠੇਂਗਾ ਸਗੋਂ ਤੂੰ ਜ਼ਰੂਰ ਮਰੇਂਗਾ ।”

Shouldn't the question mark actually be before the end quotation mark? i.e.
\v 6 ਉਨ੍ਹਾਂ ਨੇ ਉਸ ਨੂੰ ਉੱਤਰ ਦਿੱਤਾ, “ਇੱਕ ਮਨੁੱਖ ਸਾਨੂੰ ਮਿਲਣ ਲਈ ਆਇਆ ਅਤੇ ਸਾਨੂੰ ਆਖਿਆ ਕਿ ਜਿਸ ਰਾਜੇ ਨੇ ਤੁਹਾਨੂੰ ਭੇਜਿਆ ਉਸ ਦੇ ਕੋਲ ਮੁੜ ਜਾਓ ਅਤੇ ਉਸ ਨੂੰ ਆਖੋ ਕਿ ਯਹੋਵਾਹ ਇਸ ਤਰ੍ਹਾਂ ਕਹਿੰਦਾ ਹੈ, ‘ਕੀ ਇਸਰਾਏਲ ਵਿੱਚ ਕੋਈ ਪਰਮੇਸ਼ੁਰ ਨਹੀਂ ਹੈ ਜੋ ਤੂੰ ਅਕਰੋਨ ਦੇ ਦੇਵਤੇ ਬਆਲ-ਜਬੂਬ ਕੋਲ ਪੁੱਛਣ ਲਈ ਭੇਜਦਾ ਹੈਂ ?’ ਇਸ ਲਈ ਜਿਸ ਬਿਸਤਰ ਉੱਤੇ ਤੂੰ ਪਿਆ ਹੈਂ, ਉਸ ਤੋਂ ਤੂੰ ਨਹੀਂ ਉੱਠੇਂਗਾ ਸਗੋਂ ਤੂੰ ਜ਼ਰੂਰ ਮਰੇਂਗਾ ।”

After all, in English, the quotation is a question:
'Is there no God in Israel that you send to inquire of Baal Zebub, the god of Ekron?'

Should there be a space before the Devanagari Danda?

In the Punjabi Bible
A search for the regexp \S\x{0964} gave 1547 hits.
A search for the regexp \s\x{0964} gave 21326 hits.
Here, those without a space are in a minority, being less than 6.8% of the total.

This prompts the question:

Should there be a space before the Devanagari Danda?

cf. In the Assamese Bible, the results are quite the opposite!
Those with a space are in a minority, being less than 6.3% of the total.

A search for the regexp \S\x{0964} gave 27756 hits.
A search for the regexp \s\x{0964} gave 1855 hits.

NB. These results relate to my fork of the repo after my commits to the master branch.

What is the typographical standard in this matter for the various languages that use an Indic script?

NB. If some sort of space is required before the Danda, it's conceivable that it should be U+2008 PUNCTUATION SPACE rather than an ordinary space.

Two verses that end with an English letter

Deuteronomy 1:8 ends with the letter i
\v 8 ਵੇਖੋ, ਮੈਂ ਇਸ ਦੇਸ਼ ਨੂੰ ਤੁਹਾਡੇ ਸਾਹਮਣੇ ਰੱਖ ਦਿੱਤਾ ਹੈ, ਜਿਸ ਦੇਸ਼ ਦੀ ਯਹੋਵਾਹ ਨੇ ਤੁਹਾਡੇ ਪਿਉ-ਦਾਦਿਆਂ ਨਾਲ ਅਰਥਾਤ ਅਬਰਾਹਾਮ, ਇਸਹਾਕ ਅਤੇ ਯਾਕੂਬ ਨਾਲ ਸਹੁੰ ਖਾਧੀ ਸੀ ਕਿ ਮੈਂ ਇਸ ਦੇਸ਼ ਨੂੰ ਤੁਹਾਨੂੰ ਅਤੇ ਤੁਹਾਡੇ ਬਾਅਦ ਤੁਹਾਡੇ ਵੰਸ਼ ਨੂੰ ਦਿਆਂਗਾ, ਇਸ ਲਈ ਜਾਓ ਅਤੇ ਇਸ ਦੇਸ਼ ਨੂੰ ਆਪਣੇ ਅਧੀਨ ਕਰ ਲਓ । " i

Jeremiah 31:40 ends with the letter s
\v 40 ਤਾਂ ਲੋਥਾਂ ਅਤੇ ਸੁਆਹ ਦੀ ਸਾਰੀ ਵਾਦੀ ਅਤੇ ਸਾਰੇ ਖੇਤ ਕਿਦਰੋਨ ਦੇ ਨਾਲੇ ਤੱਕ ਅਤੇ ਘੇੜੇ ਫਾਟਕ ਦੀ ਨੁੱਕਰ ਤੱਕ ਚੜ੍ਹਦੇ ਪਾਸੇ ਵੱਲ ਯਹੋਵਾਹ ਲਈ ਪਵਿੱਤਰ ਹੋਣਗੇ ਅਤੇ ਉਹ ਫਿਰ ਸਦਾ ਤੱਕ ਨਾ ਕਦੀ ਪੁੱਟਿਆ ਜਾਵੇਗਾ ਨਾ ਡੇਗਿਆ ਜਾਵੇਗਾ । s

These letters are superfluous.

Lookalike characters keyed instead of the Devanagari Danda

I have discovered that the USFM files have numerous instances where a lookalike character has been keyed instead of the Devanagari Danda.

In the Punjabi Bible, a search for the regexp [\x{0A00}-\x{0AFF}]\s*\x6C gave 5172 hits.
These are where the lowercase letter l has been keyed just after a Gurmukhi character.
There were even 2 exclamation marks with no space just before a Gurmukhi letter.
A search for the whole word l gave 5240 hits, 68 more being found.
This merely indicates that some instances are after a punctuation mark instead.
An improved search for l found a total of 5247 hits.
All of these can be safely replaced by a Danda.

It's also possible that many of the existing exclamation marks in the text may be miskeyed lookalikes.
Without understanding each context, this is not something that can be determined merely by counting.

There are 3022 hits to the regexp [\x{0A00}-\x{0AFF}]\s*\x21
There are 3045 exclamation marks in total.

The 16 hits for !! strongly suggest that these were miskeyed for U+0965 Devanagari Double Danda.
There are 18 hits for the regexp \x{0964}\s*\x{0964} that should also be replaced by the Double Danda ॥.

There were also 62 hits for the pattern ! l and 103 hits for the regexp !\s*\x{0964} that are further candidates for Double Danda. These will be taken care of by doing the replacements in the right order.

Quotation marks

The attached text file provides a character frequency analysis of the 66 USFM files.

merged.usfm.character.frequency.txt

For this issue the following lines are of particular interest.

U+0022	"	1,462	QUOTATION MARK
U+0027	'	171	APOSTROPHE
U+2018	‘	15	LEFT SINGLE QUOTATION MARK
U+2019	’	15	RIGHT SINGLE QUOTATION MARK
U+201C	“	640	LEFT DOUBLE QUOTATION MARK
U+201D	”	638	RIGHT DOUBLE QUOTATION MARK

It's evident that this translation makes no use of continuation quotation marks.

It's apparent that not all the quotations make proper use of left and right quotation marks.
To achieve consistent punctuation of quotations in the translation,

Most of the U+0022 pairs will need to be replaced by U+201C and U+201D.
Most of the U+0027 pairs will need to be replaced by U+2018 and U+2019.
NB. As 171 is not an even number, there must be at least one instance of U+0027 that has no corresponding mark at the other end of the quotation.

Notice also the difference of 2 between the counts of U+201C and U+201D.
With some ingenuity in method, I have traced this problem to John 16 where the marking of quotations does not fully reflect the text.
The translation team needs to revisit this chapter and make suitable corrections.

A counted words list to help with proof reading

The attached tab delimited text file is a counted words list derived from the Punjabi Bible USFM files.

merged.usfm.words.count.txt

This may be of considerable help towards proof reading.
The counts are in the first field, the words in the second field.
The output is sorted on the words field, so those with similar spellings will be near to each other.
Browsing through the list may therefore bring to light any words with anomalous spelling.

The file can be dropped into Microsoft Excel™ for further analysis by resorting, filtering, etc.

Notes:

The file was output from a bespoke TextPipe filter using Count duplicate lines.
Hyphenated words were preserved.
All the Gurmukhi text was included, not just the verse text.
The collation algorithm (for the sort) is just how TextPipe works.
Using Excel™ to sort, the words will not be in the same order.

Some observations:

There are 21818 different words in total
The most common word is ਦੇ which is found 34846 times
The longest 2 words have 16 characters
There are 8695 hapax legomena (words with count=00001)
There are 712 hyphenated words of which 7 have more than 1 hyphen, namely:

00001	ਬਏਰ-ਲਹਈ-ਰੋਈ
00002	ਬਏਰ-ਲਹੀ-ਰੋਈ
00002	ਮਹੇਰ-ਸ਼ਲਾਲ-ਹਾਸ਼-ਬਜ਼
00001	ਅਟਰੋਥ-ਬੈਤ-ਯੋਆਬ
00001	ਆਬੇਲ-ਬੈਤ-ਮਆਕਾਹ
00001	ਏਲੋਨ-ਬੈਤ-ਹਨਾਨ

Should there be a space before a comma?

A search for the regexp [\x{0A00}-\x{0AFF}] , gave 249 hits.
A search for the regexp [\x{0A00}-\x{0AFF}], gave 37335 hits.

Those without a space before the comma are the majority.
Those with a space are only 0.66% of the total.

Which is preferred? @joshykurian

Widening the searches somewhat:

Regexp \s, gave 254 hits.
Regexp \S, gave 37855 hits

These counts include where the previous character was not in the Gurmukhi block.

The 5 extra instances of space before comma were , , - typos that have a comma before and after a space!

\v 3 ਤਦ ਯਸਾਯਾਹ ਨਬੀ ਹਿਜ਼ਕੀਯਾਹ ਰਾਜਾ ਦੇ ਕੋਲ ਆਇਆ ਅਤੇ ਉਸ ਨੂੰ ਪੁੱਛਿਆ, ਇਨ੍ਹਾਂ ਮਨੁੱਖਾਂ ਨੇ ਕੀ ਆਖਿਆ ਅਤੇ ਉਹ ਕਿੱਥੋਂ ਤੇਰੇ ਕੋਲ ਆਏ ਹਨ ? ਹਿਜ਼ਕੀਯਾਹ ਨੇ ਅੱਗੋਂ ਉੱਤਰ ਦਿੱਤਾ, , ਉਹ ਇੱਕ ਦੂਰ ਦੇ ਦੇਸ ਤੋਂ ਮੇਰੇ ਕੋਲ ਆਏ, ਅਰਥਾਤ ਬਾਬਲ ਤੋਂ ।
\v 3 ਹੇ ਯਾਕੂਬ ਦੇ ਘਰਾਣੇ, ਮੇਰੀ ਸੁਣੋ, ਨਾਲੇ ਇਸਰਾਏਲ ਦੇ ਘਰਾਣੇ ਦੇ ਸਾਰੇ ਬਚੇ ਹੋਇਓ, ਤੁਸੀਂ ਜਿਨ੍ਹਾਂ ਨੂੰ ਮੈਂ ਜਨਮ ਤੋਂ ਸੰਭਾਲਿਆ ਅਤੇ ਕੁੱਖੋਂ ਹੀ ਚੁੱਕੀ ਫਿਰਦਾ ਰਿਹਾ, ,
\v 12 ਤੂੰ ਆਪਣੀਆਂ ਝਾੜਾ-ਫੂਕੀਆਂ ਵਿੱਚ, ਅਤੇ ਆਪਣੀਆਂ ਜਾਦੂਗਰੀਆਂ ਦੇ ਵਾਧੇ ਵਿੱਚ ਕਾਇਮ ਰਹਿ, , ਜਿਨ੍ਹਾਂ ਵਿੱਚ ਤੂੰ ਆਪਣੀ ਜੁਆਨੀ ਤੋਂ ਮਿਹਨਤ ਕੀਤੀ, ਸ਼ਾਇਦ ਤੈਨੂੰ ਲਾਭ ਹੋ ਸੱਕੇ, ਸ਼ਾਇਦ ਤੂੰ ਉਹਨਾਂ ਨੂੰ ਡਰਾ ਸਕੇਂ !
\v 6 ਆਪਣੀਆਂ ਅੱਖਾਂ ਅਕਾਸ਼ ਵੱਲ ਚੁੱਕੋ, ਅਤੇ ਹੇਠਾਂ ਧਰਤੀ ਉੱਤੇ ਨਿਗਾਹ ਮਾਰੋ, , ਅਕਾਸ਼ ਤਾਂ ਧੂੰਏਂ ਵਾਂਗੂੰ ਅਲੋਪ ਹੋ ਜਾਵੇਗਾ, ਅਤੇ ਧਰਤੀ ਕੱਪੜੇ ਵਾਂਗੂੰ ਪੁਰਾਣੀ ਪੈ ਜਾਵੇਗੀ, ਉਹ ਦੇ ਵਾਸੀ ਮੱਖੀਆਂ ਵਾਂਗੂੰ ਮਰ ਜਾਣਗੇ, ਪਰ ਮੇਰੀ ਮੁਕਤੀ ਸਦੀਪਕ ਹੋਵੇਗੀ, ਅਤੇ ਮੇਰਾ ਧਰਮ ਅਨੰਤ ਹੋਵੇਗਾ ।
\v 17 ਵੇਖੋ, ,ਮੈਂ ਉਹਨਾਂ ਲਈ ਤਲਵਾਰ, ਕਾਲ ਅਤੇ ਬਵਾ ਨੂੰ ਘੱਲਾਂਗਾ, ਸੈਨਾਂ ਦਾ ਯਹੋਵਾਹ ਇਸ ਤਰ੍ਹਾਂ ਆਖਦਾ ਹੈ, ਮੈਂ ਉਹਨਾਂ ਨੂੰ ਸੜੀਆਂ ਹੋਈਆਂ ਹਜੀਰਾਂ ਵਾਂਗੂੰ ਬਣਾਵਾਂਗਾ ਜਿਹੜੀਆਂ ਖਰਾਬ ਹੋਣ ਦੇ ਕਾਰਨ ਖਾਧੀਆਂ ਨਹੀਂ ਜਾਂਦੀਆਂ

These typos must be corrected as well.

Preliminary tidy up

As a preliminary tidy up, I'd like to run the following 3 filters on the 66 USFM files:

Remove blanks from End of Line
Remove any blank lines
Remove multiple whitespace

Before I do, @joshykurian please advise whether the last one is acceptable.
cf. There are 2926 instances of (2 spaces) of which

1264 are after a Devanagari Danda.
1430 are after a question mark
230 are after a Gurmukhi character
1 is after a paragraph tag \p
1 is after a right single quotation mark

btw. There are 7050 instances of lines ending with a space.
Removing these at this stage will help me to deal with lines that visibly end with various punctuation marks.

Space missing in book name found in one parallel passage heading

The parallel passage heading before 1 Kings 22:29 reads:
\r (2ਇਤਿਹਾਸ 18:28-34)
The space is missing after the 2 in the book name. It should read:
\r (2 ਇਤਿਹਾਸ 18:28-34)

The non-use of Gurmukhi digits?

A search for the regexp [\x{0A66}-\x{0A6F}] gave no hits, showing that no use is made of Gurmukhi digits.

i.e. All scripture references in parallel passage markers and book introductions use ordinary Western digits for chapter and verse numbers and the numerical part of some book names.

Even so, this prompts the question:

Why not use the Gurmukhi digits in the Punjabi Bible ?

Marking proper names in Indic scripts?

Unlike Latin scripts and several other writing systems, Indic scripts do not have any equivalent to capitalising the first letter of a word to show that it's a proper name.

For new readers of the Bible in Indic languages, this may be somewhat daunting if they have no immediate means to recognise that a particular word is a proper name.

USFM 2.4 already provides the marker pair \pn_...\pn* for marking proper names.
USFM 3.0 documents a further pair \png_...\png* to mark geographic names.

Might it be worthwhile to make use of these markers in all the Bibles maintained by FreeBiblesIndia ?

This would give app developers the chance to add features to toggle the display of all proper names in [say] a different text colour (as a user option).

Unless someone makes a start in such a direction, there will be no motivation for app developers to even consider such a notion.

Multiple hyphens

A search for the regexp --+ gave 39 hits, of which 17 are triple hyphens, the rest are doubles.

Suggestion:

Replace triple hyphens by U+2015 HORIZONTAL BAR
Replace double hyphens by U+2014 EM DASH

Any objections? @joshykurian

freebiblesindia / punjabi_bible Goto Github PK

punjabi_bible's Issues

Canonical Psalm titles should use \d rather than \s ; acrostic headings should use \qa

Misplaced question mark in 2 Kings 1:6

Should there be a space before the Devanagari Danda?

Two verses that end with an English letter

Lookalike characters keyed instead of the Devanagari Danda

Quotation marks

A counted words list to help with proof reading

Should there be a space before a comma?

Preliminary tidy up

Space missing in book name found in one parallel passage heading

The non-use of Gurmukhi digits?

Marking proper names in Indic scripts?

Multiple hyphens

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent