Giter VIP home page Giter VIP logo

crossrefapi's People

Contributors

1kastner avatar air-kyi avatar ankush-chander avatar benselme avatar bluetyson avatar daguiam avatar danidelvalle avatar fabiobatalha avatar geritwagner avatar markpbaggett avatar richardscottoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crossrefapi's Issues

Too Many Requests Error

Hi @fabiobatalha
I am using this library now and i got too many request error.
Because i called the multiple request at a time.
How to handle this error
Please reply me.

Add Missing Documentation for Classes and Methods

Description:

The code currently lacks proper documentation, making it difficult for users to understand the classes and methods and their intended usage. In order to improve the code's usability and maintainability, we should add comprehensive documentation.

Documentation Status:

  • Classes have no documentation
  • Some methods have documentation while others don't.
  • Lack of docstrings throughout the code.

Action Required:

  • Add docstrings to classes, methods, and functions where missing.
  • Improve or complete existing docstrings.
  • Ensure consistent style and formatting of the documentation.

Expected Documentation Style:
We can use PEP 257 style docstrings for documenting classes, methods, and functions. Refer to the PEP 257 documentation for guidelines.

Specific Examples:

  • Class Endpoint and Works have no documentation.
  • Method do_http_request in the HTTPRequest class has no documentation

Question: matching titles to doi

Apologies if this isn't the right channel to ask.

I'm trying to match titles to their DOI with a simple loop

for article in articles:
	work = works.query(bibliographic=article.title)
	for w in work:
		if hasattr(article, 'title') and hasattr(w,'title') and w['title'][0] == article.title:
			article.doi = w['DOI']
			print(article.doi)
			article.save()
		else:
			print('not found', article.title)

But since work contains over 80k results, the method is too slow to be valuable. I have also tried with .sample(20) hoping it would narrow the search, but it didn't match any titles. Is it because the sample is random?

Is there any way I can just fetch the first items from the work class? It seems they always contain the match I need.

from-accepted-date filter returning UrlSyntaxError

I was trying to get my own works using the from_accepted_date filter and it returned this error.

from crossref.restful import Works
works = Works()

pub_date = '2001'
author='aguiam'
pub = works.query(author=author,).filter(from_accepted_date=pub_date
                                                    ).sort('published')

UrlSyntaxError: Filter from-accepted-date specified but there is no such filter for this route. Valid filters for this route are: from-event-start-date, has-update, has-abstract, article_number, until-update-date, from-posted-date, license.delay, has-update-policy, prefix, has-content-domain, has-authenticated-orcid, type, relation.type, from-event-end-date, has-orcid, archive, full-text.version, until-event-end-date, from-pub-date, until-index-date, has-full-text, has-assertion, until-posted-date, until-print-pub-date, has-affiliation, funder-doi-asserted-by, license.version, assertion, has-funder, member, from-created-date, has-domain-restriction, from-index-date, full-text.application, has-event, until-pub-date, until-event-start-date, from-deposit-date, relation.object-type, has-award, clinical-trial-number, assertion-group, until-deposit-date, award.funder, until-accepted-date, from-online-pub-date, until-online-pub-date, has-archive, license.url, orcid, type-name, isbn, full-text.type, has-relation, from-print-pub-date, until-created-date, from-update-date, has-clinical-trial-number, has-references, content-domain, doi, award.number, until-issued-date, has-license, issn, alternative_id, group-title, relation.object, is-update, container-title, directory, category-name, funder, from-accepted_date, has-funder-doi, update-type, updates, from-issued-date

Unexpected query output

When trying to retrieve information via simple queries, I consistently got outputs that I did not expect. Specifically, the publications which are referred to by the keywords are not returned in the result of the query. I do however get a return with the right publication data via a manual HTTP GET request.

Example code:

from crossref.restful import Works 

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

works = Works()
result = works.query(keyword)
for entry in result:
    print(entry)
    break
>> {'indexed': {'date-parts': [[2019, 11, 19]], 'date-time': '2019-11-19T19:11:52Z', 'timestamp': 1574190712445}, 'reference-count': 0, 'publisher': 'Maney Publishing', 'issue': '1', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'short-container-title': ['Journal of the American Institute for Conservation'], 'published-print': {'date-parts': [[1980]]}, 'DOI': '10.2307/3179679', 'type': 'journal-article', 'created': {'date-parts': [[2006, 4, 18]], 'date-time': '2006-04-18T05:15:34Z', 'timestamp': 1145337334000}, 'page': '21', 'source': 'Crossref', 'is-referenced-by-count': 0, 'title': ['A Semi-Rigid Transparent Support for Paintings Which Have Both Inscriptions on Their Fabric Reverse and Acute Planar Distortions'], 'prefix': '10.1179', 'volume': '20', 'author': [{'given': 'Albert', 'family': 'Albano', 'sequence': 'first', 'affiliation': []}], 'member': '138', 'container-title': ['Journal of the American Institute for Conservation'], 'deposited': {'date-parts': [[2015, 6, 26]], 'date-time': '2015-06-26T01:05:23Z', 'timestamp': 1435280723000}, 'score': 4.5581737, 'issued': {'date-parts': [[1980]]}, 'references-count': 0, 'journal-issue': {'published-print': {'date-parts': [[1980]]}, 'issue': '1'}, 'URL': 'http://dx.doi.org/10.2307/3179679', 'ISSN': ['0197-1360'], 'issn-type': [{'value': '0197-1360', 'type': 'print'}]}

I get this kind of output which has nothing to do with my input keyword with different keywords, too. I have tried modifying the order of the result [result.order('desc')] but that does not seem to change anything.

When I then do the same request via HTTP GET and the normal API URL, I get the expected output as the first result:

import requests

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

keyword = '+'.join(keyword.split())
url = 'https://api.crossref.org/works?query=' + keyword
result = requests.get(url = url)
# Take first result
result = result.json()['message']['items'][0]
print(result)

>> {'indexed': {'date-parts': [[2020, 5, 25]], 'date-time': '2020-05-25T14:23:45Z', 'timestamp': 1590416625775}, 'publisher-location': 'Wiesbaden', 'reference-count': 0, 'publisher': 'Vieweg+Teubner Verlag', 'isbn-type': [{'value': '9783663193722', 'type': 'print'}, {'value': '9783663195108', 'type': 'electronic'}], 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1923]]}, 'DOI': '10.1007/978-3-663-19510-8_3', 'type': 'book-chapter', 'created': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:43Z', 'timestamp': 1386295723000}, 'page': '26-50', 'source': 'Crossref', 'is-referenced-by-count': 5, 'title': ['Zur Elektrodynamik bewegter Körper'], 'prefix': '10.1007', 'author': [{'given': 'A.', 'family': 'Einstein', 'sequence': 'first', 'affiliation': []}], 'member': '297', 'container-title': ['Das Relativitätsprinzip'], 'link': [{'URL': 'http://link.springer.com/content/pdf/10.1007/978-3-663-19510-8_3', 'content-type': 'unspecified', 'content-version': 'vor', 'intended-application': 'similarity-checking'}], 'deposited': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:45Z', 'timestamp': 1386295725000}, 'score': 53.638336, 'issued': {'date-parts': [[1923]]}, 'ISBN': ['9783663193722', '9783663195108'], 'references-count': 0, 'URL': 'http://dx.doi.org/10.1007/978-3-663-19510-8_3'}

The output that I have retrieved with the tool in this repository has nothing to do with my query keyword. Do you have an idea about how I can fix this? I would be very grateful for every kind of help.

Please add Depositor example to Readme.md

I'm interested in using this project to do deposits into crossref. Could you add an example of how to use the Depositor class?

Also, does the Depositor support resource-only deposits?

Thanks!

order of sample and filter produces different API calls

Does not seem to like filter before sample, but does not complain:

>>>works.filter(type='book').sample(10).url
https://api.crossref.org/works?sample=10

Sample before filter works as expected:

>>>works.sample(10).filter(type='book').url
https://api.crossref.org/works?sample=10&filter=type%3Abook

support for "polite" usage of the API

We have added some new header and parameter support for providing contact information. This is designed to help us troubleshoot problems with the API. See the section on etiquette at api.crossref.org. Would love to see support for this.

supporting proxies

Great package !
It would be nice to add support for proxies

the request.get method accept proxies (in a form of a dictionary)

 dict_proxies = {'https': 'https://username:password@HOST:PORT',
            'http': 'http://username:password@HOST:PORT',
               }
  requests.get(url , proxies = dict_proxies)

works.filter() and more than 1000 results

I'm using the filter ('for i in works.filter()') and selecting journal (using ISSN) and one month date period to gather articles (using from-pub-date and until-pub-date) from same issue/volume of journal. This seems to work, but when there are more than 1000 records I receive error "Expecting value: line 1 column 1 (char 0)" when using 'json.dumps(i)' and it kills my code.

I can't figure out why this is happening? Any ideas?

Support for rows missing

Cool library. I however find that only supporting sample(n) for limiting the results is a pity.
It would be cool if you could also support the rows query parameter to control number of results returned by a query. This would also help limiting the load for common use cases like "find the best (or best n) match based on title and author"

Configuration for different API endpoint (test crossref site)

It would be useful if you could set the API url for crossref API requests.

For example, as we are testing, it would be good to make requests to test.crossref.org instead of api.crossref.org so that we are not testing using the production site.

Thanks!

Result inconsistency

If I run below code it gives me 0 results, which is expected
journals.works('1946-3944').filter(type='journal-article').filter(from_created_date='2021-11-05').filter(until_created_date='2021-11-05').count()

But when I run the same code with all(), it gives me some random results.
journals.works('1946-3944').filter(type='journal-article').filter(from_created_date='2021-11-05').filter(until_created_date='2021-11-05').all()
If I iterate over the results, it gives me some random results. Even this code should return an empty array

query error

I am a fresh user, when applying syntax
w1 = works.query(title='zika')
return
TypeError: list indices must be integers or slices, not str
But it is OK when query author and others, have any ideas?

It seems that a lot of literature doesn't exist abstracts in crossref

I have got 10000+ Wiley dois by crossref API( works.query() without "has_abstract=true", cuz it returns 0 with "has_abstract=true" ) and tried two different ways to fetch abstracts, but they don't work.
If I tried to fetch abstracts by Wiley API, it would download full texts(PDF), which wastes a lot of time in parsing.
So how can I get abstracts by crossref or does crossref not have abstracts of these dois?
Thanks for your help!
There are my approaches to fetch abstracts:
1.from crossref.restful import Works API:
image
2."requests" tool to link crossref url:

image

'crossref' is not a package error

I installed crossrefapi using pip, it seemed to install fine. When I attempted to use it, got the following error: ModuleNotFoundError: No module named 'crossref.restful'; 'crossref' is not a package

Here was my code

from crossref.restful import Journals

journals = Journals()

print(journals.journal('1759-3441'))

Value Error due to headers format change/

Hi @fabiobatalha,

It seems like crossref api has made some modifications in header format.

{'date': 'Mon, 26 Jul 2021 11:41:59 GMT', 'content-type': 'application/json', 'transfer-encoding': 'chunked', 'access-control-allow-origin': '*', 'access-control-allow-headers': 'X-Requested-With', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip', 'server': 'Jetty(9.4.40.v20210413)', 'x-ratelimit-limit': '50', 'x-ratelimit-interval': '1s', 'x-rate-limit-limit': '50, 50', 'x-rate-limit-interval': '1s, 1s', 'permissions-policy': 'interest-cohort=()', 'connection': 'close'}

'x-rate-limit-limit': '50, 50', 'x-rate-limit-interval': '1s, 1s',

Running below code

from crossref.restful import Works
works = Works()
w1 = works.query('zika').sample(20)
for item in w1:
    print(item["title"])

is giving following error:

Traceback (most recent call last):
  File "/home/ankush/.config/JetBrains/PyCharm2021.1/scratches/crossref_scratch.py", line 6, in <module>
    for item in w1:
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 264, in __iter__
    result = self.do_http_request(
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 80, in do_http_request
    self._update_rate_limits(result.headers)
  File "/media/ankush/ContinentalGroun/workplace/open_source/crossrefapi/crossref/restful.py", line 43, in _update_rate_limits
    self.rate_limits['X-Rate-Limit-Limit'] = int(headers.get('X-Rate-Limit-Limit', 50))
ValueError: invalid literal for int() with base 10: '50, 50'

support for select parameter

We have added a select parameter that allows one finer control of response sizes. The following, for example, will only return the DOI and title for each matching record.

http://api.crossref.org/works?sample=10&select=DOI,title

BUG: timeout when looking for works

I sometimes get timeout errors when searching for dois:

Traceback (most recent call last):
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\requests\models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Projets\h-transport-materials-dashboard\test.py", line 6, in <module>
    works.doi("10.1103/PhysRevB.4.330")
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\crossref\restful.py", line 957, in doi
    result = result.json()
  File "C:\Users\delap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\requests\models.py", line 917, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: [Errno Expecting value] <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>
: 0

This is rather new and I haven't experience that before.

Here's the code to reproduce:

from crossref.restful import Works

works = Works()
works.doi("10.1103/PhysRevB.4.330")

Question: Matching unstructured citations with DOIs

I realise this is more of a general question, but I hope I can still get some help.

I would like to get the DOIs of a list of unstructured citations (somehow, similar to this issue).

However, if I run:

unstructured_citation = "Jan Hansen, Jochen Hung, Jaroslav Ira, Judit " \
                        "Klement, Sylvain Lesage, Juan Luis Simal and " \
                        "Andrew Tompkins (eds), The European Experience: " \
                        "A Multi-Perspective History of Modern Europe. " \
                        "Cambridge, UK: Open Book Publishers, 2023."
work = Works.query(bibliographic="unstructured_citation").sort("relevance")

I get a huge numbers of results in the variable work (some of which are not even related).

What am I am missing? Is the bibliographic argument meant to be used for work titles only? Should I try to extract the work titles from the raw citations and then use them as part of the query?
Thank you!

get metadata of a certain paper

I have known the exact title of a paper, how can I print metadata, like author name?
(My plan is getting information automatically of more than 80 papers which I only know titles)
Anyone have some ideas?

sample doesn't work with filters

Great library. But just discovered that sample doesn't work when combined with a filter.

w = Works().filter(type='journal-article').sample(5).url
w

returns

'https://api.crossref.org/works?sample=5'

Would expect something like this:

http://api.crossref.org/works?filter=type:journal-article&sample=5

Question regarding paging

I want to read 200 results in 50 result chunks. The reading is done not concurrently but with sometimes other requests in between. How to tell the api to give the next 50 results (51 to 100)?

result rank

Are results ranked in a row? If I search the same key words in different number in 'sample()', how can I get different results in the second time?

For example, works.query(bibliographic=key_words).sample(10), get 10 results.
Then works.query(bibliographic=key_words).sample(20), how to get new 20 results instead 10 old and 10 new.

Thanks!

New Crossref REST API coming soon

Hi,

Thank you for maintaining one of the documented libraries for using the Crossref REST API.

We’ve been working on a new version of the REST API, replacing the Solr backend with Elasticsearch and moving from our own hardware in a datacenter to a cloud platform.

We plan to cutover to the new version shortly (expect an official announcement on our blog in the next few days with more details), and wanted to invite you to test it out before the official cutover.

Please check it out at https://api.production.crossref.org/

During the cutover phase (expected to last a few weeks), traffic will be redirected to the above domain on a pool by pool basis. Once all traffic is using the new service, we will continue to use the api.crossref.org domain, so please do not update anything to use the temporary domain.

Let me know if you have any questions. Issues can be filed into our GitLab issue repository, or I’ll keep an eye on this thread.

Thanks again,
Patrick

Affiliation missing

When I search for a DOI, the affiliations for the authors are missing. Example

from crossref.restful import Works
import json

doi = "10.1016/j.jbusvent.2019.105970"
works = Works()
res = works.doi(doi)

with open("doi.json","w", encoding="utf8") as fileh:
    json.dump(res, fileh, ensure_ascii=False, indent=4, sort_keys=True)

The relevant rows in the output file doi.json are:

"author": [
        {
            "affiliation": [],
            "family": "Douglas",
            "given": "Evan J.",
            "sequence": "first"
        },
        {
            "affiliation": [],
            "family": "Shepherd",
            "given": "Dean A.",
            "sequence": "additional"
        },
        {
            "affiliation": [],
            "family": "Prentice",
            "given": "Catherine",
            "sequence": "additional"
        }
    ],

with the affiliation of the authors missing

ImportError: No module named restful

When i am importing crossref.restful, getting error

`from crossref.restful import Works
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "crossref.py", line 1, in <module>
    from crossref.restful import Works
ImportError: No module named restful

`

Multiple words in query

How to do query with multiple words? I use "+" but the results that I got from Crossref website is different than using crossrefapi?

Support for rate limiting

  1. Does this library automatically apply throttling to comply with Crossref rate limits? I.e., as an extreme example, if non-Plus caller invokes doi() 100 times a second, would crossrefapi throttle outgoing requests and make the caller wait so as not to exceed Crossref API limits?

  2. If not, is there a way for the caller to programmatically find out rate limit currently in effect and throttle its doi() invocations accordingly?

See also: https://api.crossref.org/swagger-ui/index.html. It seems that Crossref signals current rate limits using HTTP headers.

Presumably, complying with rate limits is preferable and guarantees not running into any further limiting.

Wiki example: download PDF from DOI

Hi there, great work!!!

I am a first-time user, trying to wrap my head around dowloading a PDF based on its DOI, something like
works.doi.download('10.1590/0102-311x00133115', '~/Downloads/')
that would result in the PDF landing in my Downloads folder.

I guess this is something very simple, however I could not find any example. Would you please provide one, perhaps even as a Wiki entry?

Many thanks & Merry Christmas,
Stav

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.