fabiobatalha / crossrefapi Goto Github PK

View Code? Open in Web Editor NEW

251.0 251.0 44.0 122 KB

A python library that implements the Crossref API.

License: BSD 2-Clause "Simplified" License

Python 100.00%

crossrefapi's People

Contributors

Stargazers

Watchers

crossrefapi's Issues

Too Many Requests Error

Hi @fabiobatalha
I am using this library now and i got too many request error.
Because i called the multiple request at a time.
How to handle this error
Please reply me.

Add Missing Documentation for Classes and Methods

Description:

The code currently lacks proper documentation, making it difficult for users to understand the classes and methods and their intended usage. In order to improve the code's usability and maintainability, we should add comprehensive documentation.

Documentation Status:

Classes have no documentation
Some methods have documentation while others don't.
Lack of docstrings throughout the code.

Action Required:

Add docstrings to classes, methods, and functions where missing.
Improve or complete existing docstrings.
Ensure consistent style and formatting of the documentation.

Expected Documentation Style:
We can use PEP 257 style docstrings for documenting classes, methods, and functions. Refer to the PEP 257 documentation for guidelines.

Specific Examples:

Class Endpoint and Works have no documentation.
Method do_http_request in the HTTPRequest class has no documentation

Question: matching titles to doi

Apologies if this isn't the right channel to ask.

I'm trying to match titles to their DOI with a simple loop

for article in articles:
	work = works.query(bibliographic=article.title)
	for w in work:
		if hasattr(article, 'title') and hasattr(w,'title') and w['title'][0] == article.title:
			article.doi = w['DOI']
			print(article.doi)
			article.save()
		else:
			print('not found', article.title)

But since work contains over 80k results, the method is too slow to be valuable. I have also tried with .sample(20) hoping it would narrow the search, but it didn't match any titles. Is it because the sample is random?

Is there any way I can just fetch the first items from the work class? It seems they always contain the match I need.

from-accepted-date filter returning UrlSyntaxError

I was trying to get my own works using the from_accepted_date filter and it returned this error.

from crossref.restful import Works
works = Works()

pub_date = '2001'
author='aguiam'
pub = works.query(author=author,).filter(from_accepted_date=pub_date
                                                    ).sort('published')

UrlSyntaxError: Filter from-accepted-date specified but there is no such filter for this route. Valid filters for this route are: from-event-start-date, has-update, has-abstract, article_number, until-update-date, from-posted-date, license.delay, has-update-policy, prefix, has-content-domain, has-authenticated-orcid, type, relation.type, from-event-end-date, has-orcid, archive, full-text.version, until-event-end-date, from-pub-date, until-index-date, has-full-text, has-assertion, until-posted-date, until-print-pub-date, has-affiliation, funder-doi-asserted-by, license.version, assertion, has-funder, member, from-created-date, has-domain-restriction, from-index-date, full-text.application, has-event, until-pub-date, until-event-start-date, from-deposit-date, relation.object-type, has-award, clinical-trial-number, assertion-group, until-deposit-date, award.funder, until-accepted-date, from-online-pub-date, until-online-pub-date, has-archive, license.url, orcid, type-name, isbn, full-text.type, has-relation, from-print-pub-date, until-created-date, from-update-date, has-clinical-trial-number, has-references, content-domain, doi, award.number, until-issued-date, has-license, issn, alternative_id, group-title, relation.object, is-update, container-title, directory, category-name, funder, from-accepted_date, has-funder-doi, update-type, updates, from-issued-date

Unexpected query output

When trying to retrieve information via simple queries, I consistently got outputs that I did not expect. Specifically, the publications which are referred to by the keywords are not returned in the result of the query. I do however get a return with the right publication data via a manual HTTP GET request.

Example code:

from crossref.restful import Works 

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

works = Works()
result = works.query(keyword)
for entry in result:
    print(entry)
    break
>> {'indexed': {'date-parts': [[2019, 11, 19]], 'date-time': '2019-11-19T19:11:52Z', 'timestamp': 1574190712445}, 'reference-count': 0, 'publisher': 'Maney Publishing', 'issue': '1', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'short-container-title': ['Journal of the American Institute for Conservation'], 'published-print': {'date-parts': [[1980]]}, 'DOI': '10.2307/3179679', 'type': 'journal-article', 'created': {'date-parts': [[2006, 4, 18]], 'date-time': '2006-04-18T05:15:34Z', 'timestamp': 1145337334000}, 'page': '21', 'source': 'Crossref', 'is-referenced-by-count': 0, 'title': ['A Semi-Rigid Transparent Support for Paintings Which Have Both Inscriptions on Their Fabric Reverse and Acute Planar Distortions'], 'prefix': '10.1179', 'volume': '20', 'author': [{'given': 'Albert', 'family': 'Albano', 'sequence': 'first', 'affiliation': []}], 'member': '138', 'container-title': ['Journal of the American Institute for Conservation'], 'deposited': {'date-parts': [[2015, 6, 26]], 'date-time': '2015-06-26T01:05:23Z', 'timestamp': 1435280723000}, 'score': 4.5581737, 'issued': {'date-parts': [[1980]]}, 'references-count': 0, 'journal-issue': {'published-print': {'date-parts': [[1980]]}, 'issue': '1'}, 'URL': 'http://dx.doi.org/10.2307/3179679', 'ISSN': ['0197-1360'], 'issn-type': [{'value': '0197-1360', 'type': 'print'}]}

I get this kind of output which has nothing to do with my input keyword with different keywords, too. I have tried modifying the order of the result [result.order('desc')] but that does not seem to change anything.

When I then do the same request via HTTP GET and the normal API URL, I get the expected output as the first result:

import requests

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

keyword = '+'.join(keyword.split())
url = 'https://api.crossref.org/works?query=' + keyword
result = requests.get(url = url)
# Take first result
result = result.json()['message']['items'][0]
print(result)

>> {'indexed': {'date-parts': [[2020, 5, 25]], 'date-time': '2020-05-25T14:23:45Z', 'timestamp': 1590416625775}, 'publisher-location': 'Wiesbaden', 'reference-count': 0, 'publisher': 'Vieweg+Teubner Verlag', 'isbn-type': [{'value': '9783663193722', 'type': 'print'}, {'value': '9783663195108', 'type': 'electronic'}], 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1923]]}, 'DOI': '10.1007/978-3-663-19510-8_3', 'type': 'book-chapter', 'created': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:43Z', 'timestamp': 1386295723000}, 'page': '26-50', 'source': 'Crossref', 'is-referenced-by-count': 5, 'title': ['Zur Elektrodynamik bewegter Körper'], 'prefix': '10.1007', 'author': [{'given': 'A.', 'family': 'Einstein', 'sequence': 'first', 'affiliation': []}], 'member': '297', 'container-title': ['Das Relativitätsprinzip'], 'link': [{'URL': 'http://link.springer.com/content/pdf/10.1007/978-3-663-19510-8_3', 'content-type': 'unspecified', 'content-version': 'vor', 'intended-application': 'similarity-checking'}], 'deposited': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:45Z', 'timestamp': 1386295725000}, 'score': 53.638336, 'issued': {'date-parts': [[1923]]}, 'ISBN': ['9783663193722', '9783663195108'], 'references-count': 0, 'URL': 'http://dx.doi.org/10.1007/978-3-663-19510-8_3'}

The output that I have retrieved with the tool in this repository has nothing to do with my query keyword. Do you have an idea about how I can fix this? I would be very grateful for every kind of help.

Please add Depositor example to Readme.md

I'm interested in using this project to do deposits into crossref. Could you add an example of how to use the Depositor class?

Also, does the Depositor support resource-only deposits?

Thanks!

full-text.application filter error

When I try

Same with an "_" as in some examples.

works.filter(has_full_text='true').query('zika').count()

is fine though

filters not supported on members route

the members route now supports a few filters. So this should work:

Members().filter(has_public_references="True")

order of sample and filter produces different API calls

Does not seem to like filter before sample, but does not complain:

>>>works.filter(type='book').sample(10).url
https://api.crossref.org/works?sample=10

Sample before filter works as expected:

>>>works.sample(10).filter(type='book').url
https://api.crossref.org/works?sample=10&filter=type%3Abook

support for "polite" usage of the API

We have added some new header and parameter support for providing contact information. This is designed to help us troubleshoot problems with the API. See the section on etiquette at api.crossref.org. Would love to see support for this.

supporting proxies

Great package !
It would be nice to add support for proxies

the request.get method accept proxies (in a form of a dictionary)

 dict_proxies = {'https': 'https://username:password@HOST:PORT',
            'http': 'http://username:password@HOST:PORT',
               }
  requests.get(url , proxies = dict_proxies)

works.filter() and more than 1000 results

I'm using the filter ('for i in works.filter()') and selecting journal (using ISSN) and one month date period to gather articles (using from-pub-date and until-pub-date) from same issue/volume of journal. This seems to work, but when there are more than 1000 records I receive error "Expecting value: line 1 column 1 (char 0)" when using 'json.dumps(i)' and it kills my code.

I can't figure out why this is happening? Any ideas?

Support for rows missing

Cool library. I however find that only supporting sample(n) for limiting the results is a pity.
It would be cool if you could also support the rows query parameter to control number of results returned by a query. This would also help limiting the load for common use cases like "find the best (or best n) match based on title and author"

Could one use the crossrefapi to construct an author search with both first and last name?

I have tried that a bit but the results are very imprecise.

filters not supported on funders route

The funders route also now supports a location filter. Would be great to see support for:

Funders().filter(location="Japan")

Configuration for different API endpoint (test crossref site)

It would be useful if you could set the API url for crossref API requests.

For example, as we are testing, it would be good to make requests to test.crossref.org instead of api.crossref.org so that we are not testing using the production site.

Thanks!

Result inconsistency

If I run below code it gives me 0 results, which is expected
journals.works('1946-3944').filter(type='journal-article').filter(from_created_date='2021-11-05').filter(until_created_date='2021-11-05').count()

But when I run the same code with all(), it gives me some random results.
journals.works('1946-3944').filter(type='journal-article').filter(from_created_date='2021-11-05').filter(until_created_date='2021-11-05').all()
If I iterate over the results, it gives me some random results. Even this code should return an empty array

query error

I am a fresh user, when applying syntax
w1 = works.query(title='zika')
return
TypeError: list indices must be integers or slices, not str
But it is OK when query author and others, have any ideas?

It seems that a lot of literature doesn't exist abstracts in crossref

I have got 10000+ Wiley dois by crossref API( works.query() without "has_abstract=true", cuz it returns 0 with "has_abstract=true" ) and tried two different ways to fetch abstracts, but they don't work.
If I tried to fetch abstracts by Wiley API, it would download full texts(PDF), which wastes a lot of time in parsing.
So how can I get abstracts by crossref or does crossref not have abstracts of these dois?
Thanks for your help!
There are my approaches to fetch abstracts:
1.from crossref.restful import Works API:

2."requests" tool to link crossref url:

'crossref' is not a package error

I installed crossrefapi using pip, it seemed to install fine. When I attempted to use it, got the following error: ModuleNotFoundError: No module named 'crossref.restful'; 'crossref' is not a package

Here was my code

from crossref.restful import Journals

journals = Journals()

print(journals.journal('1759-3441'))

Value Error due to headers format change/

Hi @fabiobatalha,

It seems like crossref api has made some modifications in header format.

{'date': 'Mon, 26 Jul 2021 11:41:59 GMT', 'content-type': 'application/json', 'transfer-encoding': 'chunked', 'access-control-allow-origin': '*', 'access-control-allow-headers': 'X-Requested-With', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip', 'server': 'Jetty(9.4.40.v20210413)', 'x-ratelimit-limit': '50', 'x-ratelimit-interval': '1s', 'x-rate-limit-limit': '50, 50', 'x-rate-limit-interval': '1s, 1s', 'permissions-policy': 'interest-cohort=()', 'connection': 'close'}

'x-rate-limit-limit': '50, 50', 'x-rate-limit-interval': '1s, 1s',

Running below code

from crossref.restful import Works
works = Works()
w1 = works.query('zika').sample(20)
for item in w1:
    print(item["title"])

is giving following error:

Traceback (most recent call last):
  File "/home/ankush/.config/JetBrains/PyCharm2021.1/scratches/crossref_scratch.py", line 6, in <module>
    for item in w1:
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 264, in __iter__
    result = self.do_http_request(
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 80, in do_http_request
    self._update_rate_limits(result.headers)
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 43, in _update_rate_limits
    self.rate_limits['X-Rate-Limit-Limit'] = int(headers.get('X-Rate-Limit-Limit', 50))
ValueError: invalid literal for int() with base 10: '50, 50'

support for select parameter

We have added a select parameter that allows one finer control of response sizes. The following, for example, will only return the DOI and title for each matching record.

http://api.crossref.org/works?sample=10&select=DOI,title

BUG: timeout when looking for works

I sometimes get timeout errors when searching for dois:

Traceback (most recent call last):
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\requests\models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Projets\h-transport-materials-dashboard\test.py", line 6, in <module>
    works.doi("10.1103/PhysRevB.4.330")
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\crossref\restful.py", line 957, in doi
    result = result.json()
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\requests\models.py", line 917, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: [Errno Expecting value] <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>
: 0

This is rather new and I haven't experience that before.

Here's the code to reproduce:

from crossref.restful import Works

works = Works()
works.doi("10.1103/PhysRevB.4.330")

Question: Matching unstructured citations with DOIs

I realise this is more of a general question, but I hope I can still get some help.

I would like to get the DOIs of a list of unstructured citations (somehow, similar to this issue).

However, if I run:

unstructured_citation = "Jan Hansen, Jochen Hung, Jaroslav Ira, Judit " \
                        "Klement, Sylvain Lesage, Juan Luis Simal and " \
                        "Andrew Tompkins (eds), The European Experience: " \
                        "A Multi-Perspective History of Modern Europe. " \
                        "Cambridge, UK: Open Book Publishers, 2023."
work = Works.query(bibliographic="unstructured_citation").sort("relevance")

I get a huge numbers of results in the variable work (some of which are not even related).

What am I am missing? Is the bibliographic argument meant to be used for work titles only? Should I try to extract the work titles from the raw citations and then use them as part of the query?
Thank you!

Unused (& misspelled) line of code

crossrefapi/crossref/restful.py

Line 29 in 2660c25

THROTTLING_TUNNING_TIME = 600

This value doesn't appear to be used anywhere.

Also did you mean "tuning" ? ("tunning" is not a recognised English word).

get metadata of a certain paper

I have known the exact title of a paper, how can I print metadata, like author name?
(My plan is getting information automatically of more than 80 papers which I only know titles)
Anyone have some ideas?

sample doesn't work with filters

Great library. But just discovered that sample doesn't work when combined with a filter.

w = Works().filter(type='journal-article').sample(5).url
w

returns

'https://api.crossref.org/works?sample=5'

Would expect something like this:

http://api.crossref.org/works?filter=type:journal-article&sample=5

ReadTimeout: HTTPSConnectionPool(host='api.crossref.org', port=443): Read timed out. (read timeout=10)

Hi,
I checked the package from different computers and consistently get:
ReadTimeout: HTTPSConnectionPool(host='api.crossref.org', port=443): Read timed out. (read timeout=10).

[Question] Is there a way to get Abstract?

I was wondering if the library allows to fetch only works that have an abstract and if there way to fetch abstract.

Question regarding paging

I want to read 200 results in 50 result chunks. The reading is done not concurrently but with sometimes other requests in between. How to tell the api to give the next 50 results (51 to 100)?

result rank

Are results ranked in a row? If I search the same key words in different number in 'sample()', how can I get different results in the second time?

For example, works.query(bibliographic=key_words).sample(10), get 10 results.
Then works.query(bibliographic=key_words).sample(20), how to get new 20 results instead 10 old and 10 new.

Thanks!

New Crossref REST API coming soon

Hi,

Thank you for maintaining one of the documented libraries for using the Crossref REST API.

We’ve been working on a new version of the REST API, replacing the Solr backend with Elasticsearch and moving from our own hardware in a datacenter to a cloud platform.

We plan to cutover to the new version shortly (expect an official announcement on our blog in the next few days with more details), and wanted to invite you to test it out before the official cutover.

Please check it out at https://api.production.crossref.org/

During the cutover phase (expected to last a few weeks), traffic will be redirected to the above domain on a pool by pool basis. Once all traffic is using the new service, we will continue to use the api.crossref.org domain, so please do not update anything to use the temporary domain.

Let me know if you have any questions. Issues can be filed into our GitLab issue repository, or I’ll keep an eye on this thread.

Thanks again,
Patrick

Affiliation missing

When I search for a DOI, the affiliations for the authors are missing. Example

from crossref.restful import Works
import json

doi = "10.1016/j.jbusvent.2019.105970"
works = Works()
res = works.doi(doi)

with open("doi.json","w", encoding="utf8") as fileh:
    json.dump(res, fileh, ensure_ascii=False, indent=4, sort_keys=True)

The relevant rows in the output file doi.json are:

"author": [
        {
            "affiliation": [],
            "family": "Douglas",
            "given": "Evan J.",
            "sequence": "first"
        },
        {
            "affiliation": [],
            "family": "Shepherd",
            "given": "Dean A.",
            "sequence": "additional"
        },
        {
            "affiliation": [],
            "family": "Prentice",
            "given": "Catherine",
            "sequence": "additional"
        }
    ],

with the affiliation of the authors missing

ImportError: No module named restful

When i am importing crossref.restful, getting error

`from crossref.restful import Works
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "crossref.py", line 1, in <module>
    from crossref.restful import Works
ImportError: No module named restful

Metadata plus support?

Does this api support passing of metadata plus api token?

Multiple words in query

How to do query with multiple words? I use "+" but the results that I got from Crossref website is different than using crossrefapi?

crosseref api "polite pool"

Hi,

https://github.com/CrossRef/rest-api-doc#etiquette says that in order to get in the "polite pool" i have to use https and include a mailto parameter in the query or in the user agent.
How do i do this with this lib?

Thank you

Support for rate limiting

Does this library automatically apply throttling to comply with Crossref rate limits? I.e., as an extreme example, if non-Plus caller invokes doi() 100 times a second, would crossrefapi throttle outgoing requests and make the caller wait so as not to exceed Crossref API limits?
If not, is there a way for the caller to programmatically find out rate limit currently in effect and throttle its doi() invocations accordingly?

See also: https://api.crossref.org/swagger-ui/index.html. It seems that Crossref signals current rate limits using HTTP headers.

Presumably, complying with rate limits is preferable and guarantees not running into any further limiting.

Wiki example: download PDF from DOI

Hi there, great work!!!

I am a first-time user, trying to wrap my head around dowloading a PDF based on its DOI, something like
works.doi.download('10.1590/0102-311x00133115', '~/Downloads/')
that would result in the PDF landing in my Downloads folder.

I guess this is something very simple, however I could not find any example. Would you please provide one, perhaps even as a Wiki entry?

Many thanks & Merry Christmas,
Stav

fabiobatalha / crossrefapi Goto Github PK

crossrefapi's People

Contributors

Stargazers

Watchers

Forkers

crossrefapi's Issues

Recommend Projects

Recommend Topics

Recommend Org