sefaria / sefaria-export Goto Github PK
View Code? Open in Web Editor NEWStructured Jewish texts and metadata exported from Sefaria's database.
License: Other
Structured Jewish texts and metadata exported from Sefaria's database.
License: Other
Seems it didn't updated for over a year now
I was looking for this specific translation, which I know exists on Sefaria's website, but couldn't find it in this repo (see here, for example of this translation being used).
Thanks for all your hard work! Our program, Tanach Study has benefited tremendously from your texts.
Thanks. Please regenerate export to include Chullin and Bechoros.
On the sefaria website I am able to access vocalised text for the Pesachim tractate of the Talmud Bavli but the download doesn't seem to include it. On closer inspection I noticed that this is the case for a number of tractates. Are there any plans to update and include the vocalised versions?
According to my counts:
total: 305172 (not including items in [] in the raw .txt files)
א: 27069
ב: 16357
ג: 2111
ד: 7039
ה: 28085
ו: 30596
ז: 2199
ח: 7194
ט: 1806
י: 31607
ך: 3358
כ: 8614
ל: 21583
ם: 10630
מ: 14474
ן: 4260
נ: 9889
ס: 1836
ע: 11270
ף: 830
פ: 3976
ץ: 1035
צ: 2937
ק: 4700
ר: 18147
ש: 15605
ת: 17965
https://www.torahmusings.com/2012/10/letters-of-the-torah/
https://nifla-ot.co.il/2016/01/27/%D7%90%D7%95%D7%AA%D7%99%D7%95%D7%AA-%D7%91%D7%AA%D7%A0%D7%9A/
How were these files created?
Description:
When attempting to clone the repository on my system, I encountered an error due to an invalid file path. The specific error message received was:
error: invalid path 'cltk-flat/Jewish Thought/Modern/Eliezer Berkovits/Conversion "According to Halakhah"; What Is It/English/Conversion According to Halakhah - What Is It Judaism 23 Fall 1974 467-78.json'
The issue appears to be caused by the quotation marks and semicolons in the file path, as these characters are not allowed in file names on certain systems.
Steps to Reproduce:
Attempt to clone the repository using the command: git clone https://github.com/Sefaria/Sefaria-Export.git
Expected Behavior:
The repository should be cloned without any errors, regardless of the system used.
Actual Behavior:
The cloning process fails due to an invalid file path error.
Additional Information:
Operating System: Win10
Possible Solution:
Consider renaming files or folders in the repository to avoid using special characters such as quotation marks or semicolons, which are not compatible with certain file systems.
Is there any file that maps the titles in the books?
[I did not see any markings in the body of the text]
I am doing some research on Talmud text, using the excellent edition of William Davidson. I have several questions:
Thanks!!!
The correct syntax is :
mongorestore --drop -d <database-name> <directory-of-dumped-backup>
Ex : $ mongorestore --drop -d sepharia sepharia
Important notice for the directory : I suppose you are in the directory dump. if not type : dump/sepharia
The link files contain citations in the following format:
"A New Israeli Commentary on Pirkei Avot 1:10:13" -> "Sanhedrin 4a"
The table also contains book names and these are easy to identify and find in the repo.
However, it is not clear how to easily find the fragments referred to in the citation columns without writing custom parsers.
For example, Sanhedrin 4a - there isn't anything (like an index) in the json (the same applies to the other formats) structure of the Sanhedrin file to find the text extract itself.
Moreover, even if I were to write custom parsers for these references, they only point to the beginning of the extract and not the end.
On the other hand, the Sefaria application successfully maps all references (and the websites too obviously). What am I missing?
This is very cool content! Your README states:
Structured Jewish texts and metadata with free public licenses, exported from Sefaria's database.
Can you clarify what those licenses are? I imagine that different licenses affect how people can use this information.
Thanks!
Do you have a French translation of Tanakh or at least Houmach?
Basically what it says there on the tin. JSON file just has two sets of "" where the translation should be
Hi,
We are developing TorahBibleCodes Equidistant Letter Sequences (ELS) Search Software based upon the Tanach texts that you have provided:
https://github.com/TorahBibleCodes/TorahBibleCodes
We have forked your texts, and have found an error because of unnecessary HTML tags included in the JSON Hebrew Text of Malachi on last line here:
Here is copy of the line:
"והשיב לב־אבות על־בנים ולב בנים על־אבותם פן־אבוא והכיתי את־הארץ חרם<br><small>[הנה אנכי שלח לכם את אליה הנביא לפני בוא יום יהוה הגדול והנורא]</small>"
We will remove these on our local copy of your texts that are together with our program, but are concerned that those who choose to fork and/or clone your texts directly will encounter this error/bug when running our program using your texts until you are able to correct these errors.
Thank you.
TorahBibleCodes.com
https://github.com/TorahBibleCodes
on running the mongodb restore from dump (smaller version), I get:
Failed: sefaria.webpages: error creating indexes for sefaria.webpages: createIndex error: WiredTigerIndex::insert: key too large to index, failing 1149 { : "https://yutorah.org/daf.cfm/6040/taanit/2/a/static/rand=0.9185715476199539&iit=1635769073858&tmr=load=1635769073708&core=1635769073742&..." }
This is apparently caused by trying to index a string too long?
When looking at the JSON file representation of the 5 books, I'm unable to decipher how these are structured. It doesn't appear that these arrays are broken down by what the standard weekly parsha reading would be.
How are they broken down?
Is there any way to generate array items that would group the text by lines in the parsha?
I noticed you guys have what seems to be the most complete rendering of the BDB on the internet (in many respects, better than the copy-written one Biblesoft has licensed to many popular websites.) After necromancing around the depths of the interwebs, I found a thread where your CPO (@EliezerIsrael) pointed to bdb_parse being the fruit of a team at UTexas digitizing the public domain BDB printing.
Their project failed to get a grant renewal before the rough edges were polished. Specifically, many foreign languages (Aramaic, Arabic, Ethiopian, etc.) were left in an encoded state. The Hebrew itself seemed to have an issue with ordering and encoding het and tsade.
After just glancing through your final results, I can see you seem to have fixed all of the above issues while seeming to have removed details encoded into the original (i.e., the original language of a given word.) I can tell a lot of work occurred between the final results of your public repository Sefaria-BDB and the content displayed on the site.
I also noticed that these final results were excluded from the large dump at Sefaria-Export. Given how much effort went into its production, I am unsure if this is intentional. That being said, I figured I'd give it a shot and politely request that it be included in the export, if possible.
Thanks a ton,
Julian Wagle
I asked this on the Judaism StackExchange: https://judaism.stackexchange.com/questions/134282/what-is-the-best-free-version-of-the-book-of-genesis-in-english-which-is-comple
To summarize, I am just looking for Genesis in English, to begin my search to find quality open source English translations of the Torah (and potentially Tanakh or larger books in the future, but Torah is pretty universal). However, it appears not even is the Genesis Sefaria JSON complete or accurate. Can you briefly outline the state of affairs when it comes to English translations in JSON (or other format), particularly with Genesis?
For Genesis in particular, what is the best English translation you have? Are you planning on adding more? If you don't have a quality one, know of a quality open source (free) one online somewhere by chance?
Thank you for all you've done, what an invaluable resource Sefaria is so far!
I've looked in various places for a sense of how large each of the texts of Sefaria are, e.g. as listed on the main page, with no luck. Is there a page that summarizes that, or a tool to process the export data to calculate sizes in terms of word counts, pages, or whatever is common this realm?
I guess this might end up on the Metrics page: https://www.sefaria.org/metrics
Thanks for this great project!
Hi there I just wanted to request the Jastrow dicitonary as txt or json on this repo.
Todah Rabbah.
I have tried multiple times to download the data without text history and I cannot extract the tarball. Unexpected EOF. Please verify that file is healthy.
cltk-flat
and cltk-full
seem to duplicate a lot of the content from the json
directory. Each one of these directories is 4.1GB, meaning that a git clone
operation is extremely slow and requires a lot of disk. (Sparse clone is theoretically possible but very fiddly to set up and very slow to execute, and it has problems with the number of files in the schema
directory.)
Would it be possible to do one of the following?
cltk*
material in a separate git repositorycltk*
based on the information in the json
directory if neededcltk*
from an FTP site if neededcltk*
file trees use symlinks to, rather than duplicating the files from, json
Is the Jastrow dictionary available among your structured data? I am not able to find it... Thank you very much in advance.
I am looking at https://www.sefaria.org/Zohar%2C_Introduction.3.8?lang=bi&with=Translations&lang2=en and trying to find the Hebrew text in the JSON, but it is not there I don't think. Which version of the Hebrew Zohar is on Sefaria.org, and is it to be found in the export somewhere?
For example, this is all it found:
From this on the site:
Is the .txt
files different from the JSON?
Can you export the XML data?
Thanks!
Hi! We are curious whether we can use your siddur commercially. What is the license?
Also - drop box might not be the greatest option. It used to be I had to add it to my dropbox account before I downloaded it and it was too big for the free account. I think a google drive or s3 link might be a better option.
Hi
Has anyone created a software that can convert one of Sefaria's formats (say, json) to an e-book like epub, mobi, etc?
If not, is there is anyway a documentation of Sefaria's formats, so I can create one myself?
תבורכו מן שמיא
There are a handful of JSON files that have 2 versions–one with an uppercased letter and one without. Is this intentional?
The macos APFS filesystem (while being capable of case-sensitivity is almost always configured to be case-insensitive). sefer
is a popular collision.
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:
'cltk-flat/Halakhah/Sefer HaChinukh/Hebrew/*Sefer* HaChinukh -- Torat Emet.json'
'cltk-flat/Halakhah/Sefer HaChinukh/Hebrew/*sefer* HaChinukh -- Torat Emet.json'
'cltk-flat/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna Edition.json'
'cltk-flat/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna edition.json'
'cltk-flat/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim On Jonah--Wikisource.json'
'cltk-flat/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim on Jonah--Wikisource.json'
'cltk-full/Halakhah/Sefer HaChinukh/Hebrew/Sefer HaChinukh -- Torat Emet.json'
'cltk-full/Halakhah/Sefer HaChinukh/Hebrew/sefer HaChinukh -- Torat Emet.json'
'cltk-full/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna Edition.json'
'cltk-full/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna edition.json'
'cltk-full/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim On Jonah--Wikisource.json'
'cltk-full/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim on Jonah--Wikisource.json'
'json/Halakhah/Sefer HaChinukh/Hebrew/Sefer HaChinukh -- Torat Emet.json'
'json/Halakhah/Sefer HaChinukh/Hebrew/sefer HaChinukh -- Torat Emet.json'
'json/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna Edition.json'
'json/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna edition.json'
'json/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim On Jonah--Wikisource.json'
'json/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim on Jonah--Wikisource.json'
'txt/Halakhah/Sefer HaChinukh/Hebrew/Sefer HaChinukh -- Torat Emet.txt'
'txt/Halakhah/Sefer HaChinukh/Hebrew/sefer HaChinukh -- Torat Emet.txt'
'txt/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna Edition.txt'
'txt/Talmud/Bavli/Commentary/Chidushei Halachot/Seder Moed/Chidushei Halachot on Taanit/Hebrew/Vilna edition.txt'
'txt/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim On Jonah--Wikisource.txt'
'txt/Tanakh/Commentary/Malbim/Prophets/Malbim on Jonah/Hebrew/Malbim on Jonah--Wikisource.txt'```
this analysis: https://lizshayne.wordpress.com/2014/06/17/sefaria_in_gephi/ interpret links as having a meaningful directionality, however i could not find the documentation for this, and some examples suggest that the order of citation 1/citation 2 may be arbitrary and actually describes an undirected graph. can you confirm?
Hi,
Not sure if this error will happen on other OS, however on Windows, a checkout fails, since one of the folder names contains invalid characters for a Windows file/folder:
fatal: cannot create directory at 'cltk-flat/Modern Works/Works of Eliezer Berkovits/Conversion "According to Halakhah"; What Is It': Invalid argument warning: Clone succeeded, but checkout failed.
Windows does not allow quotes in a file name
The end of each line in Mishnah in the json directory (also several other sources), each line ends with a \n
escape character. However, in some places (i.e. Mishnah Tahorot/Hebrew/merged.json) some chapters are missing the escape character.
I am cross-posting a question I posted on the Literature StackExchange: What do these formatting styles in Hebrew texts from Sefaria mean?. Any help would be appreciated. Thank you.
In Bereishit Rabbah/Hebrew there's a file named " Midrash Rabbah - Bereshit Lviv, 1874.json" with a space at the beginning of the file name, and a smaller file with the same name but without a space at the beginning. The larger one contains chapter 67 (Vayetze), and the smaller one contains chapter 23 (Toledot)...
Hello,
I use the Sefaria app and I noticed that there are differences in the text there and this one.
For example, in the app, Genesis 2:10 it says:
"A river issues from Eden to water the garden, and it then divides and becomes four branches."
But, from the merged.json
[and merged.txt
], it says:
"And a river went out of Eden to water the garden; and from thence it was parted, and became four heads."
Can you kindly explain why this is so? What can be done to normalize this?
From e.g. #9 and #32, it's apparently not obvious from the README that this repository is manually and intermittently updated.
It would be nice if the README made that clearer, along with a "last update" date on the README itself and not just in the commit history.
Maybe this can be done on the next update - it looks like it's about time 😉
Some Sefaria texts are using similar-shaped glyphs when more semantically-appropriate glyphs exist in the unicode standard:
e.g.
U+0022 QUOTATION MARK in place of U+05F4 HEBREW PUNCTUATION GERSHAYIM
U+0027 APOSTROPHE in place of U+05F3 HEBREW PUNCTUATION GERESH
U+003A COLON in place of U+05C3 HEBREW PUNCTUATION SOF PASUQ
see as an example https://raw.githubusercontent.com/Sefaria/Sefaria-Export/master/cltk-flat/Midrash/Aggadic%20Midrash/Midrash%20Tehillim/Hebrew/merged.json
You don't have the CSV files available. Could you put those in?
Also, this particular .csv can't be downloaded from sefaria:
https://www.sefaria.org/download/version/Rashi%20on%20Chullin%20-%20he%20-%20merged.csv
That particular link is from the Download Text section at the bottom of this page:
https://www.sefaria.org/Rashi_on_Chullin?lang=bi
I'm not sure if this was the place to put this...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.