Giter VIP home page Giter VIP logo

jochre's People

Contributors

aclooney avatar benk-cogapp avatar dependabot[bot] avatar sreyfe avatar urieli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jochre's Issues

books excluded from jochre?

just tried to find something within nybc212365 but jochre would show the work - ?
It is in the library since 2012

Mistakes in Book Center's titles need to be corrected / correctable

As we all know, the titles and names of the scanned books of the Yiddish Book Center contain endless numbers of mistakes. Now, with Jochre, they become even more exposed and should be corrected or correctable. Could you try convincing them to do something about them?
screenshot_20181104-2

2 uncorrectable misreadings Birnbaum and Sotek

Two words I stumbled across today I wanted to correct but couldn't. The Sotek-one looks a bit like the #36 doubling problem, but is not the same. The Birnbaum I don't really understand.
1 picture search query
2 pictures Birnbaum
2 pictures Sotek

search for birnbu and sotek

birnbu-1

birnbu-2

sotek-1

sotek-2

book invisible when looking for it via author

commonly misread words

אינדן - in nybc202767 this is אונז but this is not a general rule for all books. But I guess that it always is a misinterpretation.

תּג"ך should always be corrected into תּנ"ך

Remove cross-origin requests from frontend

When making a correction on the frontend, I get this error:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://ybc-backend.jochre.org/jochreSearch/search?command=s…7%90%D6%B8%D7%A1&suggestion2=&fontCode=serif&languageCode=yi. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing).

Need to remove direct interactions between javascript and backend.

give option to bring up full book(s) without search term

This morning I wanted to check how the books by Liptsin, Sem appear in Jochre. Since the search field is not allowed to start with * or ? I had to think of a placeholder word. That is quite annoying. I came up with "אינהאלט" but that wasn't a good idea, it retrieved far too little books: 15 out of 28.
I then tried "פאר" which already found 21 out of 28. Still not all but far better. It would be nice if I could just see all books by Sem Liptsin next to each other, without having to invent search terms.

Liptsin, Sem in NYBC-collection in archive.org: https://archive.org/search.php?query=Liptsin&and[]=languageSorter%3A%22Yiddish%22&and[]=collection%3A%22nationalyiddishbookcenter%22

https://ocr.yiddishbookcenter.org/?page=1&query=%D7%90%D7%99%D7%A0%D7%94%D7%90%D7%9C%D7%98&author=&author=%7CLiptsin%2C+Sem%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%95%2C+%D7%A1%D7%A2%D7%9D%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%9F%2C+%D7%A1%D7%A2%D7%9D%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%9F%2C%D7%A1%D7%A2%D7%9D&authorInclude=true&title=&fromYear=&toYear=&sortBy=score

https://ocr.yiddishbookcenter.org/?page=2&query=%D7%A4%D7%90%D7%A8&author=&author=%7CLiptsin%2C+Sem%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%95%2C+%D7%A1%D7%A2%D7%9D%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%9F%2C+%D7%A1%D7%A2%D7%9D%7C%D7%9C%D7%99%D7%A4%D7%A6%D7%99%D7%9F%2C%D7%A1%D7%A2%D7%9D&authorInclude=true&title=&fromYear=&toYear=&sortBy=score

Enhancement request: Export function

In addition to #15 :
I would like to export the results in some way.
Like, I want to know which books contain a certain word. Because I want to know about the region it was used in, or which authors, or which time, or so.
I'd like to export the result list.

Find ways to handle bi-alphabetical Yiddish books

https://archive.org/details/nybc208282 is published in Yiddish in Hebrew letters >and< in parallel in Latin letters.
The search results do not acknowledge that. Not only is it not possible to search for text in Latin characters, also the OCR-text-view does not display the pages in Romanized Yiddish.
(I was curious because Yiddish in Latin letters is my research field.)
It would be nice if these texts would be acknowledged and be searchable. Displaying them in the OCR-flat-text would be nice too, since one can learn things from such a transcription, getting hints of pronunciation.
(https://ocr.yiddishbookcenter.org/contents?doc=nybc208282#page4)

special character V in YIVO-publication

probably for version 3.0: nybc210852 has some difficult characters. Waw-Yud isn't well recognised (screenshot) but worse: this YIVO-pulication has the special character for double-waw (screenshot). The OCR-program will need special training.
oy-fun-yivo
v-fun-yivo

"Fix a word"-screen not clear

The fix-a-word-screen is not fully clear.
first, the lower part goes missing. But it is vital, since very often the correction is not recorded (probably due to time-out). the status needs to be checked and should therefor be visible.
second, the meaning of "serif" is not clear. Is it the type used in the book or is it the type the correcting user uses?
third, it would be helpful to have a short sample list for copy-pasting of special characters you want the users to keep in their corrections. Very often it is quite difficult to generate a beys with a dot etc. I, for example, need to go via a word-document or via my list in sticky-notes. It would be a help if I could just copy them (as has, for example, Refoyl in his Sholem-Aleikhem-project (http://www.cs.uky.edu/~raphael/yiddish/searchSholem.cgi). A shorter list would do. Also, it is not clear if you want double-waw as one special unit or two wawn.

Fix-a-word-screen

Fix-a-word-screen comments

Missing words - how to hand in corrections?

How do you want us to hand in corrections for missing words? Or badly segmented - the text states that we should not correct badly segmented text.
(By the way, there is a typo "If the word as badly segmented")
below an example for a word missing altogether
word missing - how to correct

doubling of words

Sometimes words get doubled / repeated even if they are only once in the text. I have seen that several times already. I think I remember that it happens at ends of lines, but that needs to be verified.
latest example:
https://tinyurl.com/zitsutsu
איז נײטיק זיצו
צו פאַרבּינדן
should read
איז נײטיק זי
צו פאַרבּינדן

screenshot:
afbeelding

Add volume information when available

When a work consists of several volumes it is difficult to understand in Jochre from which vol we are getting the information.
The metadata does give more information. Next to (probably not easy to show) information on the title of this part, it does show the number of the volume, like Volume 4 or Volume 5.
Could at least that be added to the title information in Jochre?
Thanks, Mirjam

https://archive.org/details/nybc200075
https://fcaw.library.umass.edu/F/?func=direct&doc_number=014935715&doc_library=FCL01
https://www.yiddishbookcenter.org/collections/yiddish-books/spb-nybc200075
several vols 1
several vols 1a

several vols 2

several vols 3
several vols 3a
https://fcaw.library.umass.edu/F/?func=direct&doc_number=014941647&doc_library=FCL01
https://www.yiddishbookcenter.org/collections/yiddish-books/spb-nybc201197

פֿאָרלייג פֿון יצחק ניבאָרסקי

יצחק ניבאָרסקי האָט געבעטן בײַ מיר, איך זאָל צושיקן דעם אָ פֿאָרלייג:

בײַ פֿאַרשיידענע מינים פֿאָרשונג (למשל : סטיליסטישע אָדער גראַמאַטיקאַלישע), איז וויכטיק צו קענען באַגרענעצן די צאָל מחברים אין וועמענס ווערק מע זוכט. אָפֿט מאָל איז נישט כּדאי איבערצוקוקן הונדערטער אָדער טויזנטער פֿאָרקומענישן פֿון אַ וואָרט, אַן אויסדריק אָדער אַ געוויסן ווערטער־סדר בײַ אָן אַ שיעור צווייטראַנגיקע שרײַבער. עס איז בעסער דאָס צו פֿאָרשן בײַ אַ גרופּע פֿון 25 אָדער 30 רעפּרעזענטאַטיווע מחברים. אַרײַנקלאַפּן אַזאַ אויסגעקליבענע גרופּע קען נעמען אַ פֿערטל שעה און מער (דערצו נאָך טרעפֿט אָפֿט אַז פֿון איין איינציקן שרײַבער דאַרף מען אויסקלײַבן עטלעכע וואַריאַנטן פֿון זײַן נאָמען). די פֿאָרשונג איבער די ווערק פֿון די דאָזיקע מחברים קען זיך ציִען חדשים לאַנג. אויב צווישן צוויי אַרבעטס־סעאַנסן דאַרף מען באַניצן „יאָקער“ פֿאַר אַן אַנדער נאָכזוך (אינעם גאַנצן קאָרפּוס אָדער בײַ אַנדערע מחברים), קומט אויס צו פּטרן יעדעס מאָל אַ לענגערע צײַט אויף אַרײַנצוקלאַפּן די מחברים־גרופּע. די בקשה איז : מע זאָל שאַפֿן די מיטלען צו פֿאַרהיטן די רשימה רעפּרעזענטאַטיווע מחברים, אַז מע זאָל זיי קענען צו יעדער צײַט און אין איין רגע זיי צוריק אַרײַנשטעלן.

Smartphone: too much whitespace to the right

The amount of whitespace to the right of the text colon causes a horizontal scrollbar to appear and the text moves left and right below fingers

Screenshot_20190424-220042_Samsung Internet
Screenshot_20190424-220057_Samsung Internet

Used Samsung A6, Samsung Internet browser, probably version 9.2.00.70 (or one earlier)

search by place of publication / publishing house?

Is it possible to add a search by place of publication or by publishing house?
The information should be available in MARC field 260 of the original books.

Just as I'd might to want to get a picture of literature written in the years 1908 till 1910, I might want to know which kind of literature was published by Kletskin, or in Vilne.
Or a combination: Kletskin in the year 1923.
Or the books Bashevis published with Kletskin.

strange signs in OCR (cifers)

I have been searching for
אַלף-בּית
in nybc nybc202767 , not strong
result p. 217 has strange
דאָס איז אײנס: אןז8—8ןןןעה, ען)בּ—פּ0וןגאן0, פּןזגא0—מ —80ןגנןסאַ1ג.)
װען ס'זאָלן מער קײן טעמיס נישט זײַן װי דער פּראַקטישער טעס, איו
when trying to check what this is with the word 8ןןןעה, I got an even stranger, namely correct looking result:
שיקט
see screenshot
same holds for the other numbered words
afbeelding

bookmarking / my lists - add option for keeping searches/results

Today I made a search which returned results I would like to look at more in depth in future.
Could you add an option to keep this?
It could be a simple permalink, or, more enhanced / complicated, a "my search"-menue.
And the crème de la crème would be if I could preselect and sort the results I really like. But thats probably Jochre 4.0...
Best, Mirjam

Enhancement request: "Jump to" - list of titles

Can we have a "Jump to" button?
I made a search which gives loads of results.
I'd like to see the titles of the books in which the results are found.
So I would like to scroll through all titles. Or jump from title to title. Or so.
Or the results could be folded in in the cases when I am more interested in the titles than in the words in the sentences around my search.
Best,
Mirjam

move login-info aside

login-centered - 1
login-centered - 2

Please move info who logged in + option to log out aside. In case I forget who I am I can go to the mirror...

add search by nybc-number

Catalogues like world-cat have the nybc-number of an item. Consider making it a search option for those that want to search in a specific book.
Motivation: I often use google-books as an index for books I actually own. This helps me to find a certain passage I would like to read or quote. Having the option for searching easily within a specific book of the Yiddish Book Center would enable this for Yiddish books too.
Example: we are reading Kreitman's Brilyantn at the moment. She uses certain words which seem to be specific for her several times. This way one can count the times she uses them and find the context within which she uses them easily.

שטערנדל (*) אין מערװאָרטיקע פֿראַזעס

עס זעט אויס אַז דאָס שטערנדל
wildcard
פֿונקציאָנירט נישט װען מע זוכט אַ גאַנצע פֿראַזע.
למשל, אויב איך זוך
"בלייבן שטײ*"
מיט אַ שטערנדל צום סוף פֿון װאָרט „שטײ“ באַקומען זיך נישט קײן רעזולטאַטן, הגם עס זײַנען דאָ אַ סך פֿאַלן פֿון דער פֿראַזע „בלײַבן שטײן“

translations appearing first

Hi, and thanks for your work. I've noticed that when I search on Jochre, translations of works into Yiddish appear first (seemingly regardless of what I'm searching for). I've also noticed that there is one book (I believe it's Der Shpanish-Ameriḳanisher ḳrieg a hisṭorisher roman) that appears as the first result very often.

אות פֿעלט — ךְ

זעט אויס אַז די פּראָגראַם װײסט נישט װאָס צו טאָן מיט אָט דעם אות.

ער באַװײַזט זיך, למשל, אין דעם װערק.

אויפֿן בילד אונטן, באַמערק אַז די לעצטע אותיות פֿון ענדליךְ און דאָךְ פֿעלן.

image

wrong (adjacent) snippet showing up

didn't manage to get the right snippet, it always shows the one to the right. The next word works fine, though. See screenshot. It is about nybc210852
wrong snippet

List of works not OCRed?

Can the testers have a list of the works not OCRed?
I was looking for the book fragnfunyidisher00farl and do not manage to find it in Jochre. I assume it is one of the works that couldn't be OCRed, but I am not certain.
Best,
Mirjam

rendering of "yidish" in OCRed text

In the OCRed text the punctuation is meticulously kept as in the original.
however, the word for the language Yiddish "yidish" is always spelled with a dot under the second yud, even if this is not the case in the original.
Is there a convincing explanation for this?

לרט - loyt

I think, all
לרט
could be exchanged for
לויט

by the way, some of the results are rather badly rendered.

why does an author need a first name (which needs to be chosen from drop down menu)?

why does an author need a first name (which needs to be chosen from drop down menu)?
Why do just last names or wild cards not work?

Likewise, if I copy the Latin-charactered name into the fields, it has to be exactly as in the catalogue. Including lower- and upper case letters. Eg it needs to be Gilbert, F. S (Gilbert, F. S. will not work either - there is one dot too many)

author needs first name
)?

text-correction on touch screen

Didn't try it yet on tablet, but probably the same inherent problem as on smartphone:
text cannot be double clicked for correction since that enlarges the text or zooms out

search results rather very fuzzy

when looking for גלייכט בעסער ~5
results are, amongst others,
גוט
קעסערן
געגער
רעסע
יעוער
and more - see screenshot
fuzzy results

מקף נישט דערקענט

װען איך זוך אַ װאָרט מיט אַ מקף (־) באַקום איך נישט קײן רעזולטאַטן. למשל:
https://ocr.yiddishbookcenter.org/?page=1&query=%D7%99%D7%95%D7%9C%D7%99%D6%BE%D7%98%D7%90%D6%B8%D7%92&author=&author=%7C&authorInclude=true&title=&fromYear=&toYear=&sortBy=score&reference=

אָבער װען איך זוך מיט אַ „גוייִשן” בינדשטריכל, גײט עס:
https://ocr.yiddishbookcenter.org/?page=1&query=%D7%99%D7%95%D7%9C%D7%99-%D7%98%D7%90%D6%B8%D7%92&author=&author=%7C&authorInclude=true&title=&fromYear=&toYear=&sortBy=score&reference=

מסתּמא װאָלט געווען כּדאַי, אויך אָנצונעמען דעם עכטן ייִדישן מקף.

problems with columns

ocr didn't manage to recognise the braking of lines in a text with two columns.
see screenshot.
problem with text in columns

"Correction mode" for longer parts of texts?

Couldn't there be a correction mode for longer parts of texts (perhaps for accredited users)? Perhaps a bit like Finkels All works of Sholem-Aleichem?
I am looking at a piece of text in jochre and I decided that I want to quote the full page. I see that there are quite some misreadings in the OCRed text. Since I need to correct it for myself I would like to correct it inside Jochre, so that the corrections become available to more users. At the moment I would continuously have to search for the following passage, which is quite tedious.
The most ideal case would be that I could see the corrected text already, so that I can copy and paste it into my document on my computer, even before it is uploaded and available for the community a day later.

Page breaks at tweaked search

https://ocr.yiddishbookcenter.org/?page=1&query=%2B%D7%90%D6%B7%D7%A8%D7%95%D7%99%D7%A1%D7%93%D7%A8%D7%B2%D7%A2%D7%9F+%2B%D7%A4%D6%BF%D7%9C%D7%90%D6%B7%D7%9D+%7E10&author=&author=%7C&authorInclude=true&title=&fromYear=&toYear=&sortBy=score

This was the search I tried:
+אַרויסדרײען +פֿלאַם ~10

(In transcription this would be:) +aroysdreyen +flam ~10
it is connected to the ~10

I am altogether not sure about the right solution for this problem. It shows, that it is not intuitive when to use + and when to use "".
When the definitive version is online, an error page could link to the user-guide and say something like: 'probably you made an illegal search, check again the legal options.'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.