Comments (4)
Hi @mwarqee,
Thanks for the interest in this tool and reporting this issue.
I really dont know what you expect from the tool in this case. Arxiv is a preprint server. How should the journal be set to anything? By definition those are preprints and not peer-reviewed. Furthermore, arxiv does not assign DOIs so how should there be any?
from paperscraper.
Apologies for not providing more detail, perhaps I misunderstood the purpose of the module.
This section refers to the option of downloading PDFs based on the jsonl file
from paperscraper.pdf import save_pdf_from_dump
# Save PDFs in current folder and name the files by their DOI
save_pdf_from_dump('medrxiv_covid_ai_imaging.jsonl', pdf_path='.', key_to_save='doi')
Since the dump does not cotain any DOI then no files will be downloaded. Furthermore if I check the arxiv site for one paper there is an arXiv doi
https://doi.org/10.48550/arXiv.2302.11382 and coresponding link for the paper - https://arxiv.org/pdf/2302.11382
So I am bit puzzled, or maybe misunderstanding how the paper download should work for arXiv?
from paperscraper.
Fair point but this is a completely different thing from what you described in your first post. Arxiv introduced DOIs in 2022 (see https://blog.arxiv.org/2022/02/17/new-arxiv-articles-are-now-automatically-assigned-dois/), so while the vast majority of the arxiv DB will still not have DOIs a small fraction of recent papers will have DOIs. For those papers, it would be great if the DOI would be added to the jsonl
dump, on this I completely agree, thanks for pointing this out
from paperscraper.
HI @mwarqee, please check the PR I opened (#27)
from paperscraper.
Related Issues (17)
- import error HOT 4
- Randomness in arxiv API requests
- get_dumps.chemrxiv does nothing HOT 6
- Error when importing any of chemrxiv, biorxiv, medrxiv from paperscraper.get_dumps HOT 7
- ChemRxiv Engage API integration HOT 1
- ImportError: attempted relative import beyond top-level package HOT 1
- HTTPError for paperscraper.get_dumps.chemrxiv() HOT 6
- UnexpectedEmptyPageError and associated errorscre HOT 3
- How to turn off the DEBUG log information? HOT 3
- Scrape X-rxiv via API HOT 3
- Remote diconnected and didnt download files HOT 5
- Searching impact factor of journal
- scrapper Killed HOT 2
- Error when downloading papers from Pubmed. HOT 4
- move server_dumps directory? HOT 1
- Autogenerate docs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paperscraper.