Giter VIP home page Giter VIP logo

unpywall's Introduction

unpywall - Interfacing the Unpaywall API with Python

Build Status codecov.io PyPI - Downloads License DOI PyPI - Version PyPI - Python Version Documentation Status

Introduction

unpywall is a Python client that utilizes the Unpaywall REST API for scholarly analysis with pandas. This package is influenced by roadoi, a R client that interacts with the Unpaywall API.

You can find more about the Unpaywall service here: https://unpaywall.org/.

The documentation about the Unpaywall REST API is located here: https://unpaywall.org/products/api.

Install

Install from pypi using pip:

pip install unpywall

Use

Authentication

An authentification is required to use the Unpaywall Service. For that, unpywall offers two options for authorizing the client. You can either import UnpywallCredentials which generates an environment variable or you can set the environment variable by yourself. Both methods require an email.

from unpywall.utils import UnpywallCredentials

UnpywallCredentials('[email protected]')

Notice that the environment variable for authentication needs to be called UNPAYWALL_EMAIL.

Query Unpaywall by DOI

If you want to search articles by a given DOI use the method doi. The result is a pandas DataFrame.

from unpywall import Unpywall

Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'])

#   data_standard  ... best_oa_location.version
#0              2  ...         publishedVersion
#1              2  ...         publishedVersion

#[2 rows x 32 columns]

You can track the progress of your API call by setting the parameter progress to True. This is especially useful for estimating the time required.

Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],
             progress=True)

#|=========================                        | 50%

This method also allows two options for catching errors (raise and ignore)

Unpywall.doi(dois=['10.1038/nature12373', '10.1093/nar/gkr1047'],
             errors='ignore')

Query Unpaywall by text search

If you want to search articles by a given term use the method query. The result is a pandas DataFrame

Unpywall.query(query='sea lion',
               is_oa=True)
#   data_standard  ... first_oa_location.version
#0              2  ...          publishedVersion
#1              2  ...          publishedVersion
#2              2  ...          publishedVersion

Conveniently obtain full text

If you are using Unpaywall to obtain full-text copies of papers for literature mining, you may benefit from the following functions:

You can use the download_pdf_handle method to return a PDF handle for the given DOI.

Unpywall.download_pdf_handle(doi='10.1038/nature12373')

#<http.client.HTTPResponse object at 0x7fd08ef677c0>

To return an URL to a PDF for the given DOI, use get_pdf_link.

Unpywall.get_pdf_link(doi='10.1038/nature12373')

#'https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf'

To return an URL to the best available OA copy, regardless of the format, use get_doc_link.

Unpywall.get_doc_link(doi='10.1016/j.envint.2020.105730')

#'https://doi.org/10.1016/j.envint.2020.105730'

To return a list of all URLS to OA copies, use get_all_links.

Unpywall.get_all_links(doi='10.1038/nature12373')

#['https://dash.harvard.edu/bitstream/1/12285462/1/Nanometer-Scale%20Thermometry.pdf']

You can also directly access all data provided by unpaywall in json format using get_json.

Unpywall.get_json(doi='10.1038/nature12373')

#{'best_oa_location': {'endpoint_id': '8c9d8ba370a84253deb', 'evidence': 'oa repository (via OAI-PMH doi match)', 'host_type': ...

Command-Line-Interface

unpywall comes with a command-line-interface that can be used to quickly look up a PDF or to download free full-text articles to your device.

Obtain a PDF URL

Retrieve the URL of a PDF for a given DOI with the following command.

unpywall link 10.1038/nature12373

View a PDF

If you want to view a PDF in your Browser or on your system use view.

unpywall view 10.1038/nature12373 -m browser

PDF Download

Use download if you want to store a PDF on your machine.

unpywall download 10.1038/nature12373 -f article.pdf -p ./documents

Help

You can always use help to open a description for the provided functions.

unpywall -h

Documentation

Full documentation is available at https://unpywall.readthedocs.io/.

Develop

To install unpywall, along with dev tools, run:

pip install -e '.[dev]'

unpywall's People

Contributors

bganglia avatar frankier avatar naustica avatar simon-20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

unpywall's Issues

Local copy of Unpaywall database?

This was brought up in issue #18

If possible, it would be nice to offer the following features:

  • download updates
  • return results in JSON format, just like using _fetch

As @naustica mentioned, it might be a good idea to use a SQL database using sqlite.

Request forbidden by administrative rules.

Hello everyone!

I am trying to download a paper with Python and this command:

!unpywall download 10.4049/jimmunol.1701153 -f article.pdf -p '/path/folder/'

Unfortunately, it does not download a pdf, but a file with this inside:

403 Forbidden

Request forbidden by administrative rules.

I have done the authentication to Unpaywall too.
Moreover, if you try to search for this paper in Unpaywall, it is actually present and you can download the pdf!

Can you help me please?
Thank you so much in advance

Matteo

Support link to HTML to PDF

Links like https://doi.org/10.1016/j.jns.2020.116832 do not lead directly to a PDF - they go through HTML first. This leads to failures when loading PDF handles / saving PDF files.

This was separated from #27

In the example above the key element is this:

<input type="hidden" name="redirectURL" value="http%3A%2F%2Fjns-journal.com%2Fretrieve%2Fpii%2FS0022510X20301684" id="redirectURL"/>

File was successfully downloaded, but it downloads a strange txt file

Hello everyone,

I am trying to download a paper with this command (after login in Unpaywall):

from unpywall import Unpywall
!unpywall download 10.1002/ags3.12621 -f article.pdf -p "path/to/dir"

But instead of downlaoding a .pdf file it gives back this strange txt file:

<title>Just a moment...</title><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131}button,html{font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji}@media (prefers-color-scheme:dark){body{background-color:#222;color:#d9d9d9}body a{color:#fff}body a:hover{color:#ee730a;text-decoration:underline}body .lds-ring div{border-color:#999 transparent transparent}body .font-red{color:#b20f03}body .big-button,body .pow-button{background-color:#4693ff;color:#1d1d1d}body #challenge-success-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSIgdmlld0JveD0iMCAwIDI2IDI2Ij48cGF0aCBmaWxsPSIjZDlkOWQ5IiBkPSJNMTMgMGExMyAxMyAwIDEgMCAwIDI2IDEzIDEzIDAgMCAwIDAtMjZtMCAyNGExMSAxMSAwIDEgMSAwLTIyIDExIDExIDAgMCAxIDAgMjIiLz48cGF0aCBmaWxsPSIjZDlkOWQ5IiBkPSJtMTAuOTU1IDE2LjA1NS0zLjk1LTQuMTI1LTEuNDQ1IDEuMzg1IDUuMzcgNS42MSA5LjQ5NS05LjYtMS40Mi0xLjQwNXoiLz48L3N2Zz4=)}body #challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+)}}body{display:flex;flex-direction:column;min-height:100vh}body.no-js .loading-spinner{visibility:hidden}body.no-js .challenge-running{display:none}body.dark{background-color:#222;color:#d9d9d9}body.dark a{color:#fff}body.dark a:hover{color:#ee730a;text-decoration:underline}body.dark .lds-ring div{border-color:#999 transparent transparent}body.dark .font-red{color:#b20f03}body.dark .big-button,body.dark .pow-button{background-color:#4693ff;color:#1d1d1d}body.dark #challenge-success-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSIgdmlld0JveD0iMCAwIDI2IDI2Ij48cGF0aCBmaWxsPSIjZDlkOWQ5IiBkPSJNMTMgMGExMyAxMyAwIDEgMCAwIDI2IDEzIDEzIDAgMCAwIDAtMjZtMCAyNGExMSAxMSAwIDEgMSAwLTIyIDExIDExIDAgMCAxIDAgMjIiLz48cGF0aCBmaWxsPSIjZDlkOWQ5IiBkPSJtMTAuOTU1IDE2LjA1NS0zLjk1LTQuMTI1LTEuNDQ1IDEuMzg1IDUuMzcgNS42MSA5LjQ5NS05LjYtMS40Mi0xLjQwNXoiLz48L3N2Zz4=)}body.dark #challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+)}body.light{background-color:transparent;color:#313131}body.light a{color:#0051c3}body.light a:hover{color:#ee730a;text-decoration:underline}body.light .lds-ring div{border-color:#595959 transparent transparent}body.light .font-red{color:#fc574a}body.light .big-button,body.light .pow-button{background-color:#003681;border-color:#003681;color:#fff}body.light #challenge-success-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSIgdmlld0JveD0iMCAwIDI2IDI2Ij48cGF0aCBmaWxsPSIjMzEzMTMxIiBkPSJNMTMgMGExMyAxMyAwIDEgMCAwIDI2IDEzIDEzIDAgMCAwIDAtMjZtMCAyNGExMSAxMSAwIDEgMSAwLTIyIDExIDExIDAgMCAxIDAgMjIiLz48cGF0aCBmaWxsPSIjMzEzMTMxIiBkPSJtMTAuOTU1IDE2LjA1NS0zLjk1LTQuMTI1LTEuNDQ1IDEuMzg1IDUuMzcgNS42MSA5LjQ5NS05LjYtMS40Mi0xLjQwNXoiLz48L3N2Zz4=)}body.light #challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI2ZjNTc0YSIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjZmM1NzRhIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+)}a{background-color:transparent;color:#0051c3;text-decoration:none;transition:color .15s ease}a:hover{color:#ee730a;text-decoration:underline}.main-content{margin:8rem auto;max-width:60rem;width:100%}.heading-favicon{height:2rem;margin-right:.5rem;width:2rem}@media (width <= 720px){.main-content{margin-top:4rem}.heading-favicon{height:1.5rem;width:1.5rem}}.footer,.main-content{padding-left:1.5rem;padding-right:1.5rem}.main-wrapper{align-items:center;display:flex;flex:1;flex-direction:column}.font-red{color:#b20f03}.spacer{margin:2rem 0}.h1{font-size:2.5rem;font-weight:500;line-height:3.75rem}.h2{font-weight:500}.core-msg,.h2{font-size:1.5rem;line-height:2.25rem}.body-text,.core-msg{font-weight:400}.body-text{font-size:1rem;line-height:1.25rem}@media (width <= 720px){.h1{font-size:1.5rem;line-height:1.75rem}.h2{font-size:1.25rem}.core-msg,.h2{line-height:1.5rem}.core-msg{font-size:1rem}}#challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI2ZjNTc0YSIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjZmM1NzRhIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+);padding-left:34px}#challenge-error-text,#challenge-success-text{background-repeat:no-repeat;background-size:contain}#challenge-success-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSIgdmlld0JveD0iMCAwIDI2IDI2Ij48cGF0aCBmaWxsPSIjMzEzMTMxIiBkPSJNMTMgMGExMyAxMyAwIDEgMCAwIDI2IDEzIDEzIDAgMCAwIDAtMjZtMCAyNGExMSAxMSAwIDEgMSAwLTIyIDExIDExIDAgMCAxIDAgMjIiLz48cGF0aCBmaWxsPSIjMzEzMTMxIiBkPSJtMTAuOTU1IDE2LjA1NS0zLjk1LTQuMTI1LTEuNDQ1IDEuMzg1IDUuMzcgNS42MSA5LjQ5NS05LjYtMS40Mi0xLjQwNXoiLz48L3N2Zz4=);padding-left:42px}.text-center{text-align:center}.big-button{border:.063rem solid #0051c3;border-radius:.313rem;font-size:.875rem;line-height:1.313rem;padding:.375rem 1rem;transition-duration:.2s;transition-property:background-color,border-color,color;transition-timing-function:ease}.big-button:hover{cursor:pointer}.captcha-prompt:not(.hidden){display:flex}@media (width <= 720px){.captcha-prompt:not(.hidden){flex-wrap:wrap;justify-content:center}}.pow-button{background-color:#0051c3;color:#fff;margin:2rem 0}.pow-button:hover{background-color:#003681;border-color:#003681;color:#fff}.footer{font-size:.75rem;line-height:1.125rem;margin:0 auto;max-width:60rem;width:100%}.footer-inner{border-top:1px solid #d9d9d9;padding-bottom:1rem;padding-top:1rem}.clearfix:after{clear:both;content:"";display:table}.clearfix .column{float:left;padding-right:1.5rem;width:50%}.diagnostic-wrapper{margin-bottom:.5rem}.footer .ray-id{text-align:center}.footer .ray-id code{font-family:monaco,courier,monospace}.core-msg,.zone-name-title{overflow-wrap:break-word}@media (width <= 720px){.diagnostic-wrapper{display:flex;flex-wrap:wrap;justify-content:center}.clearfix:after{clear:none;content:none;display:initial;text-align:center}.column{padding-bottom:2rem}.clearfix .column{float:none;padding:0;width:auto;word-break:keep-all}.zone-name-title{margin-bottom:1rem}}.loading-spinner{height:76.391px}.lds-ring{display:inline-block;position:relative}.lds-ring,.lds-ring div{height:1.875rem;width:1.875rem}.lds-ring div{animation:lds-ring 1.2s cubic-bezier(.5,0,.5,1) infinite;border:.3rem solid transparent;border-radius:50%;border-top-color:#313131;box-sizing:border-box;display:block;position:absolute}.lds-ring div:first-child{animation-delay:-.45s}.lds-ring div:nth-child(2){animation-delay:-.3s}.lds-ring div:nth-child(3){animation-delay:-.15s}@Keyframes lds-ring{0%{transform:rotate(0)}to{transform:rotate(1turn)}}@media screen and (-ms-high-contrast:active),screen and (-ms-high-contrast:none){.main-wrapper,body{display:block}}</style>
Enable JavaScript and cookies to continue
<script>(function(){window._cf_chl_opt={cvId: '3',cZone: "onlinelibrary.wiley.com",cType: 'managed',cNounce: '13539',cRay: '86a832456de77bcf',cHash: '8805de6d3865e0b',cUPMDTk: "\/doi\/pdfdirect\/10.1002\/ags3.12621?__cf_chl_tk=4Rx7Fna_PBs44qLECdhNu9miXy27HLhuCbH9Z2fceGQ-1711467177-0.0.1.1-1663",cFPWv: 'g',cTTimeMs: '1000',cMTimeMs: '375000',cTplV: 5,cTplB: 'cf',cK: "visitor-time",fa: "\/doi\/pdfdirect\/10.1002\/ags3.12621?__cf_chl_f_tk=4Rx7Fna_PBs44qLECdhNu9miXy27HLhuCbH9Z2fceGQ-1711467177-0.0.1.1-1663",md: "2kMVrnQ403EX0AsX55dMUmEHX3VR7jT5Irvu6F_AnnI-1711467177-1.1.1.1-fMgF610jLrm7O.x3eSi.3uXL4sGeHm.PCLTQQLn7bq6ZiIwhhQyOJ6pPWIcxg19f08EaPfAEacKEmpcCU3Qe3AkuxP6SSBCEBMQTBWsqqq7.XBtpYtGAXn_34Av9qzL5O_Ezk4EaREr0wDvcg40cVrafm6G9eoZQ8UDY4CxUacQOyHA4bMA0HSqYqXA.1V7Kuu7fC64dgkrmdZh3rtcTvo39NT6g2uCTvRJ_KS9BYkX7PWanZpTgvmPZV1a6OGsf.mSDkFKmrCG2MYB3AwoYFEaPSmLLF.0tH1Euavo9cz7DyzdTTgy8wu.JDzcLRP5tPoRRfwLIO75xvMuzIbBdTfjI_Pkbq1dTuz91NjqnLBgJUaBHYD0VhiNKMxyKZdqgTaks6BPMgfpsMJcPagvXcL7FsBdJOC7Ux8Swz61RZ0o91nMrfLJLCIWhN_KLJ0_OLEjVpSwvImh7ORr_Sk0.seEzCHXEVRG9NYM0ZHZvofZNJlwmdOqhCJIzkuELXBq4q0ILofF047qdNiVVEL97nMqyvR2dx7rmPOt5yRMuVySAYARO6gmFlKD48Q_Zk_ohzCMXuWCUrq0liYb.VSi.7670_jX.yu7f94ndjI.Cq_ZwBIGDmlFu09vzAf1XInY2CDBJpaTFkIWLIeSkd19iEahRBHmX5qiUghsIc7ZJNTuwjI61KkmQF6i0pdosMXdj6IU5ANOxxcEnmR6oL.oyreDmeJRmdJdPtP8IKvIxupTaZK2rr5XxmpNbPxvhsfp4e77MyW_wOx8Pk2j8mb4IDGjJo6pAp79LxoygxIZfaESn4sHOAQH7.rqwKAnnmWjXCZwiaIUAB489RdQsjHfR13xuhSIfujU3QsRsPb0m.oI.lRUIcVA42YubkvjR_69VWuBhlGl8avuW9yLP9synqEEGYNWKDy.18DUhM1jKAcDPbCTttoCDxop6hF2rwsyDBz0mXdXNxebOxreqo732X92mCDAs9K1JrAsajxbjgFqotXf7Byt3G3tvK1.1Dk0gsOgTCQXP1SjJ.ojtZwndCHOM.Igs8jptwo2xv_P7y7F5empQsslgMzy.DpBrVD2HzKzrAOTv5Do.pvbQsws8OcgtzMwgORn2amF7.XDHu96gF2JKTGCugprF5swmCxGL2aQwt2k8bPGYdcjAkEHk.pzbDW5ItIKQMnKE2jPTFP3tkWYrSQdlOdaAYQeNpguB6cu7WywVvtXTkhcXqKVQIKjHNjkZbnq5vG_B10TYd5OJD1Miwy9pdaaGWvyzUali7qGIZ1Fxdsk30p9b9ytmmSTfonND5IoFkdQMeldsjGztTaftaE4N3YgG1TYIp65BYF.aD035MHn7qSidhZkaQ.zyfYsjY0U60ZVju8u48Nf_bi4ItjdgOw.ezb0xvTObIEkohCv7T6oQEh1YnUoNfyTIRSXesSEcSgUt_KScId1yTitGHhMoTABWDtyZV96mY0b4sgNBNegyJYGVbAEm37GueO71B4sQL7Y7oHzGK_GWV1WJZFKpttxJVdhfxGqnXuKqOUfq6cYp1zR1HQIvSNXpwhFx10jT2YvA0l46fAI8JWHqfYX1faKe3LoafwOG",mdrd: "kdqTdpZw64Kg5j54SK7QJmwuGZfqyOA3x4aV7aZIWx8-1711467177-1.1.1.1-VmN08ju3itw_I_zUhqUReEYcJisHmGFIgqSmabudz2zVk2c.dtWhkWmWZvCBzTVlMuwTe5YLDFR01RFUd3cHZtmWYkUvsItzLz1T76k19snmHw.AdPRpCfdUba0bN2lKd.zxZempx1kLdZdkNFSYwW_HdEGuCsR1zzPBsZunimLQbkYFZF9VRkDR2i.ZJw1f5alWOgjXxEnddvbty.OcALjP9bivIHBV4ZxCSERdGGut1DiVei.yxPyA0Gq0RDU05toW6.ZXiloIk3TL8jIAnG7dpR9_7jv8kA.UYzHlpTerBWHvh8QdgBhCEo84oRXJQPkBE6jo6h1wS3cb7FZSzDDn3h.it7r1rsokRQ2sLB8TS5nn.eg5rOSmKFtf9JmRpsKlsWh.x_MkzURCvNNTmCpdkCTY.B7WvE46Bo0LJ_hb0IgFxutIH7VgSKOe_JWfv54_rHw2QEvVOIfYimwxNL3t6oKACBWEuCacRKtxFoh15nuH2HA_FDfocQOin._uoG1iBBk.ZaOV.eZkPKFv2.vqbH32IGkBJhACmPZ0Ve0WH.O8QiFBIj5uzxYT4m2iD7Pc6UGZWXksau5AyFTLheYPsqsbw7IIxIPska7ptTb8BJiAKcfRRzcl4hHAitSaEOncz22O9232EBBlqncM4wUqAaO9gYr8NPR6eaX6OuFNCrDYhHwMjwwo4skTTVLfQZ02aN58CxostU0gBYII6ZxBBUapzuS8tG6mJl7vyGoC25ENcDDPf4fAjghF0iRIPAqHkvZqXKzhYLF6nENHU0i4Ajb4iIeZJQtWcdzdIYzV9mcKoRgapXsg1YiMyEyXFfAdNQuxS7jklh_nFx4mQUR3d5BTg7iPi9WTMrYGM4uJGzQG0pflOUv79gRjOwa333KR3aBqIhOe9d35437aZ.KnQEVerOUR6rWCDDGdScSxoqeuTf1doZoKCqLthHHkL7KgZUnYj8YMlGXlQr58UmV8CDaB0K68VNtvSJ.Hc5U0KZDqZHp4lmgceBO45aFttrGap03u3R4oL0vggFZX3bb8i0qApWM2.XbeWa.SAzFCejsWP5muDDS5LlTivKoKpXwZcJDFEQOKSIOAjpe82ByV77UP.WuVGNEc201A.0gzZnmggdfM6ULlCog.bhbH1A8OMfyvQVzE9U9jC94pOWiKv4du5Zpd4RZS1N1SgoEiwSKHQUtaxm9ER6FNkYqapUyKMM5ETx56deCwClZOD20V6dX11no5St.AEDrzZIqm1bbVcuALjFIEm0ck7rcHWZ4D9AMsbodMjw9IszfP2kOwyDj.2ZH4WvEc2Ph5yA2dOidUCcXScGlSyxkqjh_c_jGp8PiD32Fs1o5YZ8oSd90Pc7lPx9QubUedFlJaCuX5PENJF6VwNQOFt2QUdCo5jPISyCwvt.wK8TZ_bm9oFi.TI6BpXfbwQe9kQcDr8DuQzbJX_bhVyi7xB8p5cs1xqZDWbiCuaJszaxdXfyWF40hcH3cXEvV7Efg8sWuYTxnFvLA1G97HO0YyuNeHjcVgqs_P23OrSoQJp2DRyXZMJKsuD0bFr_vdxF2aOg4IVMa9IyEgpSpLkFwrA9NDGHkJXEPXt07jhLCGDz0pJ2dCGOTvZQsw_QxMlBQdvSmDWmQmZUadIhrB9EGVD0fhmMhLwNmVTZ0umwsHWcR8gUiAVColHd_eTnZ94ZBZ8T_8J86i2rotcvPtY9Wem1E.4LzBbs.FUBQjSGTirUYzTsszi4YlNRwrBiKbXyAwzJqYd7NZqM.a9zXBAaWb3PqERW_eRYoexdxPGzBqe1MeHEfuFQhA9_qqOrGbeXQYia7jyGcjWKTT9oCERNeYXzvXRD_sCg9uYPAwpltyd30AE.LFmtya7u5sVTL53OfwhbeNouaEz.7i9I7NW.bG.hYdShX3B1SvIn3gtDxmR3MtChdnlFx.b1OtgXx5WDhjKLystOPCTct1entCT1JN_YzT5JQI8M.H2IrDBFcQtf4XVDMHYtQgKPgvM_nxCU9p_Ng_JMrtzrnWoTxIWivrPNoXOuk_fHhic3ZnHDoYvy_YdP_GADTO13kkpOh0vXwOt8OhMiIu07NDdKN8hlUeAB4.9gFl_AL8_DA5IVHC18indRLgJAPHa3XbA9c697tiJdeCn1jgpaRE0PXN9nQYivozARPSdZ5qy.tpVc0iChJgESWg4g",cRq: {ru: 'aHR0cHM6Ly9vbmxpbmVsaWJyYXJ5LndpbGV5LmNvbS9kb2kvcGRmZGlyZWN0LzEwLjEwMDIvYWdzMy4xMjYyMQ==',ra: 'cHl0aG9uLXJlcXVlc3RzLzIuMzEuMA==',rm: 'R0VU',d: 'zxtg/Bn0yfbnBuGiZMvPLYDqhJ60QEIt+rjQArBA0HgVt5GuC5+trQ/zFtOsmaU6nRTHYdrhNW1USt5rLSwNBVQsO7cxiWOr+u+fwPbeTNJGVFvULde2KmCxEP2v0j0avlpTQnPuarckkIF5U6OcmOmAV7X0O3M0gtlWaNIZyLhnPaw7/D7tCerZeX+x4xVAS34e+uQ04OVBqnf5FUKJghc74L04q7CCc3c2i2iCUkDa3brXMKts5FShA42uYS0XzXh4NrvjT0blN/IaGpzO5/Bm6OlBsVBl6EGo5wL201LjZ6xW5uvF3RVRfOQoZ+qWhgivr6g0jpzqHfhheenBYtBGtLFosbYsAeF26GCS3IsIzcwXVUJo+Tg4hunzK28excJwrphrMgrYJGg+QDTIr5/qZIUGReHEVzxeMAOmxGMgmIvF9TI4UTyeGTiTuzkxHEuI7vGaiorti+DrlO8SmNGYC5boKOfGDjlSigpQyWekZ3ANOxm7d1mOgItmTytbuHWDyMoWJ4ArsCD2iHSUnbKxCPK0DLd+yXgROYL+4Nee8V8Zpqxd2/1ECdc6H8oMUyw1CVwGF3j8fYHVt9DB4A==',t: 'MTcxMTQ2NzE3Ny44MzQwMDA=',cT: Math.floor(Date.now() / 1000),m: '2IHXmjqpQQEUeVTUC3w1kQOB71DgJ565DlSH8sz6aTQ=',i1: 'mV5IA7mxTxBl+ykLbHZLsQ==',i2: 'spdAkhQWmK771w+FBVU8GQ==',zh: '5psXVDyHKhylyDMHX58/NfMz0/wFqAsf4zTMPkqswgI=',uh: 'YE9XOpG5TeHmhA1zfs5mxC8CrRZzq2a/+r+OU7dliYQ=',hh: 'EpAw8Lb/MscuVNQHjMk8adeJO9erYR0sl2/eKp5d934=',}};var cpo = document.createElement('script');cpo.src = '/cdn-cgi/challenge-platform/h/g/orchestrate/chl_page/v1?ray=86a832456de77bcf';window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;if (window.history && window.history.replaceState) {var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;history.replaceState(null, null, "\/doi\/pdfdirect\/10.1002\/ags3.12621?__cf_chl_rt_tk=4Rx7Fna_PBs44qLECdhNu9miXy27HLhuCbH9Z2fceGQ-1711467177-0.0.1.1-1663" + window._cf_chl_opt.cOgUHash);cpo.onload = function() {history.replaceState(null, null, ogU);}}document.getElementsByTagName('head')[0].appendChild(cpo);}());</script>

I can easily download the pdf manually, but with the code no :(
Is it possible to solve this issue?

Thank you in advance

Handle connection errors in cache?

Failed requests should not be saved.

A report should be generated with the DOIs that have led to failed requests.

This was raised as part of the discussion in issue #12

Bring test coverage up to 80%

@naustica's suggestion from #12

Here is a list of functions we may want to write tests for. Perhaps it does not make sense to have tests for all of them, so this list can be revised.

  • UnpywallCache.reset_cache
  • UnpywallCache.delete
  • UnpywallCache.timed_out - True
  • UnpywallCache.timed_out - False
  • UnpywallCache.get - Present
  • UnpywallCache.get - Absent
  • UnpywallCache.save
  • UnpywallCache.save
  • UnpywallCache.load - File Present
  • UnpywallCache.load - File Absent
  • UnpywallCache.download
  • Unpywall._progress (does it make sense to have a test for this?)
  • Unpywall.get_pdf_link
  • Unpywall.get_doc_link
  • Unpywall.get_all_links
  • Unpywall.download_pdf_handle
  • UnpwayllURL.url

Already added as of last release:

  • UnpywallCredentials.validate_email
  • Unpywall.get_df
  • Unpywall.get_json
  • Unpywall._validate_dois

ZeroDivisionError: division by zero when downloading with a progress bar

Hi,

Sometimes when attempting to download a PDF using a progress bar, a ZeroDivisionError is thrown from __init__.py line 514.

This is caused when servers do not return a Content-Length header; line 496 uses 0 as a default, which the progress bar code then uses in a division, causing the error.

Simon

Improve method names

E.g. unpaywall.unpaywall_json is redundant; something like unpaywall.oa_json would be better.

Update example in README.md

The Unpaywall python wrapper must be configured with an email, as in: unpaywall.email = "<insert-email-here>"

Add docstrings

This would make it easier to understand how the code works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.