Ran <div class="snippet-clipboard-content notranslate position-relative overflow-a

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

HI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

No DOI given in saved dumps of recent arxiv papers about paperscraper HOT 4 CLOSED

mwarqee commented on June 20, 2024

No DOI given in saved dumps of recent arxiv papers

from paperscraper.

Comments (4)

jannisborn commented on June 20, 2024

Hi @mwarqee,

Thanks for the interest in this tool and reporting this issue.
I really dont know what you expect from the tool in this case. Arxiv is a preprint server. How should the journal be set to anything? By definition those are preprints and not peer-reviewed. Furthermore, arxiv does not assign DOIs so how should there be any?

from paperscraper.

mwarqee commented on June 20, 2024

Apologies for not providing more detail, perhaps I misunderstood the purpose of the module.
This section refers to the option of downloading PDFs based on the jsonl file

from paperscraper.pdf import save_pdf_from_dump

# Save PDFs in current folder and name the files by their DOI
save_pdf_from_dump('medrxiv_covid_ai_imaging.jsonl', pdf_path='.', key_to_save='doi')

Since the dump does not cotain any DOI then no files will be downloaded. Furthermore if I check the arxiv site for one paper there is an arXiv doi
https://doi.org/10.48550/arXiv.2302.11382 and coresponding link for the paper - https://arxiv.org/pdf/2302.11382

So I am bit puzzled, or maybe misunderstanding how the paper download should work for arXiv?

from paperscraper.

jannisborn commented on June 20, 2024

Fair point but this is a completely different thing from what you described in your first post. Arxiv introduced DOIs in 2022 (see https://blog.arxiv.org/2022/02/17/new-arxiv-articles-are-now-automatically-assigned-dois/), so while the vast majority of the arxiv DB will still not have DOIs a small fraction of recent papers will have DOIs. For those papers, it would be great if the DOI would be added to the jsonl dump, on this I completely agree, thanks for pointing this out

from paperscraper.

jannisborn commented on June 20, 2024

HI @mwarqee, please check the PR I opened (#27)

from paperscraper.

Recommend Projects

No DOI given in saved dumps of recent arxiv papers about paperscraper HOT 4 CLOSED

Comments (4)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent