Giter VIP home page Giter VIP logo

Comments (12)

xlcnd avatar xlcnd commented on May 28, 2024

Start with this TEMPLATE and take a look at some built-in providers (openl, goob) and to the plugin isbnlib-porbase).

The pattern to follow is very easy (see goob):

  1. Find the appropriate url for your web service.
  2. Create a function named query.
  3. Make a call to the web service using isbnlib.dev.webquery. (Most of the services respond by sending data in json).
  4. Select the relevant data.
  5. Parse that data in order to get a canonical set of fields (ISBN-13, Title, Authors, Publisher, Year and Language).
  6. Use isbnlib.dev.stdmeta to clean and validate the data.

And that's all! All the rest is error handling...

If you need help, please write a comment here.

Take a look at https://github.com/xlcnd/isbnlib/projects/2.

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin for Portuguese books called isbnlib-porbase!

More quality data is needed for books in French, German, Spanish, Italian, Chinese, Russian, ... that can only be provided by local sources.

So please, I ask for your contribution to add metadata plugins for books in your language.

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin for French books called isbnlib-bnf.

We need German, Spanish, Italian, Chinese, Russian, ... local sources too.

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

You can now have the Library of Congress (US) as a metadata provider, just install isbnlib-loc.

NOTE: Many countries have national libraries with web services that provide metadata (mainly of local books). Most of these services use the SRU protocol developed by LoC. It is very easy to grab for instance isbnlib-bnf and make a new plugin for books in your language. A list of libraries that use this protocol could be found here.

from isbnlib.

arangb avatar arangb commented on May 28, 2024

Hello,
Three years ago (unaware of the existence of isbnlib) I developed a python program to parse the webpage of the Ministerio de Cultura (MCU) in Spain, which has a huge ISBN database of all books printed in Spain. The main problem is that it is not an API (like BNF or porbase) which returns the results to your query in JSON or xml. It's just a webpage with a submit form:
http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&prev_layout=busquedaisbn&layout=busquedaisbn&language=es

But I made it work using mechanize and used it successfully to scan more than 1500 books.
I learned about isbnlib recently and I decided to add my contribution, I downloaded the template, and edited the files following the examples from porbase, mostly.
Since mechanize doesn't work with Python3, I am now using mechanicalsoup (sudo pip install mechanicalsoup). So that's the main difference that I had to introduce: we need to manipulate the MCU webpage to fill in the form with our ISBN, submit the query, and get back a new webpage that we can then parse very crudely by matching text strings. I wish we had an API for Spanish books, but I don't know of any!

In any case, if you think this could be useful here is my code:
https://github.com/arangb/isbnlib-mcues

A few comments:

  1. Obviously, we now need to import mechanicalsoup, so that's a new dependence. Your webquery and webservice assume the URL contains the query, but I cannot use that with the MCU service. I could not figure out how to handle forms using just urllib and urllib2. With mechanicalsoup is just a few lines.
  2. I have not added a timeout/throttle to my query, like you have in webquery. Perhaps I should.
    This means that someone could try to send very rapid queries and collapse the MCU service, which we should try to avoid!
  3. The ISBN database MCU website has been very stable for the last five years, so the code works retrieving the info. Of course, they could change the URL (with a new Ministry name, which happens often in Spain) or change the way they display the results, and then we would need to adapt... So I am not sure how long-lived this will be! That's why an xml API service would be so much nicer! But alas!

Take a look at the code and let me know what you think. The isbn_mcues/test/_test_metadata.py has a few examples.
I hope this is useful!

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

Thanks for you interest in isbnlib!

Metadata of Spanish books would be an important contribution to the project, since they are underrepresented in the default providers (Google Books and Open Library).

In relation to your comments:

  1. In principle, you can get rid of mechanicalsoup since handling forms with urllib and urllib2 is very easy. Read stackoverflow and the relevant documentation on python.org. You just have to be careful with hidden fields, request headers and session cookies (you can use Chrome Dev Tools to see exactly the full request).

  2. You should add at least a throttle mechanism so that people don't abuse the site!

  3. It is a shame that public services don't implement an API, however if you parse these pages using careful chosen patterns, they make a very stable API. For this, is better to use regex than general html parsers like beautyfulsoup or mechanicalsoup.

Let me know before you submit your project to pypi, because there are errors in your setup.py.

So I am looking forward for your contribution.

P.S. the content of __init__.py should be:

from ._mcues import query

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin for Spanish books called isbnlib-mcues!

We need German, Italian, Chinese, Russian, ... local sources too.

from isbnlib.

arangb avatar arangb commented on May 28, 2024

Hello, after doing the Spanish plugin it was easy to modify it a little bit and get it to work for German books, using the service from the Deutsche National Bibliothek.
There is a SRU service that requires a token/APIkey and returns a MARCXML result, but I've decided to parse the webpage from the general search: that does not require user registration for the key so it is available to everybody.

You can find the code here:
https://github.com/arangb/isbnlib-dnb

The isbnlib_dnb/test/test_metadata.py still has the Spanish ISBNs, you'll have to modify it to test it with German ISBNs.
Let me know if there is anything else I need to change, and I'll release it to PyPi.

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

"Super"!

I will take a look and write some tests...

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin for German books called isbnlib-dnb!

We need Italian, Chinese, Russian, ... local sources too.

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin that uses OCLC.ORG called isbnlib-oclc!

from isbnlib.

xlcnd avatar xlcnd commented on May 28, 2024

There is a new plugin for Italian books called sbn.

We need Chinese, Russian, ... local sources too.

from isbnlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.