Comments (12)
Start with this TEMPLATE and take a look at some built-in providers (openl, goob) and to the plugin isbnlib-porbase).
The pattern to follow is very easy (see goob):
- Find the appropriate url for your web service.
- Create a function named
query
. - Make a call to the web service using
isbnlib.dev.webquery
. (Most of the services respond by sending data in json). - Select the relevant data.
- Parse that data in order to get a canonical set of fields (ISBN-13, Title, Authors, Publisher, Year and Language).
- Use
isbnlib.dev.stdmeta
to clean and validate the data.
And that's all! All the rest is error handling...
If you need help, please write a comment here.
Take a look at https://github.com/xlcnd/isbnlib/projects/2.
from isbnlib.
There is a new plugin for Portuguese books called isbnlib-porbase!
More quality data is needed for books in French, German, Spanish, Italian, Chinese, Russian, ... that can only be provided by local sources.
So please, I ask for your contribution to add metadata plugins for books in your language.
from isbnlib.
There is a new plugin for French books called isbnlib-bnf.
We need German, Spanish, Italian, Chinese, Russian, ... local sources too.
from isbnlib.
You can now have the Library of Congress (US) as a metadata provider, just install isbnlib-loc.
NOTE: Many countries have national libraries with web services that provide metadata (mainly of local books). Most of these services use the SRU protocol developed by LoC. It is very easy to grab for instance isbnlib-bnf and make a new plugin for books in your language. A list of libraries that use this protocol could be found here.
from isbnlib.
Hello,
Three years ago (unaware of the existence of isbnlib) I developed a python program to parse the webpage of the Ministerio de Cultura (MCU) in Spain, which has a huge ISBN database of all books printed in Spain. The main problem is that it is not an API (like BNF or porbase) which returns the results to your query in JSON or xml. It's just a webpage with a submit form:
http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&prev_layout=busquedaisbn&layout=busquedaisbn&language=es
But I made it work using mechanize and used it successfully to scan more than 1500 books.
I learned about isbnlib recently and I decided to add my contribution, I downloaded the template, and edited the files following the examples from porbase, mostly.
Since mechanize doesn't work with Python3, I am now using mechanicalsoup (sudo pip install mechanicalsoup). So that's the main difference that I had to introduce: we need to manipulate the MCU webpage to fill in the form with our ISBN, submit the query, and get back a new webpage that we can then parse very crudely by matching text strings. I wish we had an API for Spanish books, but I don't know of any!
In any case, if you think this could be useful here is my code:
https://github.com/arangb/isbnlib-mcues
A few comments:
- Obviously, we now need to import mechanicalsoup, so that's a new dependence. Your webquery and webservice assume the URL contains the query, but I cannot use that with the MCU service. I could not figure out how to handle forms using just urllib and urllib2. With mechanicalsoup is just a few lines.
- I have not added a timeout/throttle to my query, like you have in webquery. Perhaps I should.
This means that someone could try to send very rapid queries and collapse the MCU service, which we should try to avoid! - The ISBN database MCU website has been very stable for the last five years, so the code works retrieving the info. Of course, they could change the URL (with a new Ministry name, which happens often in Spain) or change the way they display the results, and then we would need to adapt... So I am not sure how long-lived this will be! That's why an xml API service would be so much nicer! But alas!
Take a look at the code and let me know what you think. The isbn_mcues/test/_test_metadata.py has a few examples.
I hope this is useful!
from isbnlib.
Thanks for you interest in isbnlib
!
Metadata of Spanish books would be an important contribution to the project, since they are underrepresented in the default providers (Google Books and Open Library).
In relation to your comments:
-
In principle, you can get rid of
mechanicalsoup
since handling forms withurllib
andurllib2
is very easy. Read stackoverflow and the relevant documentation on python.org. You just have to be careful with hidden fields, request headers and session cookies (you can use Chrome Dev Tools to see exactly the full request). -
You should add at least a throttle mechanism so that people don't abuse the site!
-
It is a shame that public services don't implement an API, however if you parse these pages using careful chosen patterns, they make a very stable API. For this, is better to use regex than general html parsers like
beautyfulsoup
ormechanicalsoup
.
Let me know before you submit your project to pypi
, because there are errors in your setup.py
.
So I am looking forward for your contribution.
P.S. the content of __init__.py
should be:
from ._mcues import query
from isbnlib.
There is a new plugin for Spanish books called isbnlib-mcues!
We need German, Italian, Chinese, Russian, ... local sources too.
from isbnlib.
Hello, after doing the Spanish plugin it was easy to modify it a little bit and get it to work for German books, using the service from the Deutsche National Bibliothek.
There is a SRU service that requires a token/APIkey and returns a MARCXML result, but I've decided to parse the webpage from the general search: that does not require user registration for the key so it is available to everybody.
You can find the code here:
https://github.com/arangb/isbnlib-dnb
The isbnlib_dnb/test/test_metadata.py still has the Spanish ISBNs, you'll have to modify it to test it with German ISBNs.
Let me know if there is anything else I need to change, and I'll release it to PyPi.
from isbnlib.
"Super"!
I will take a look and write some tests...
from isbnlib.
There is a new plugin for German books called isbnlib-dnb!
We need Italian, Chinese, Russian, ... local sources too.
from isbnlib.
There is a new plugin that uses OCLC.ORG called isbnlib-oclc!
from isbnlib.
There is a new plugin for Italian books called sbn.
We need Chinese, Russian, ... local sources too.
from isbnlib.
Related Issues (20)
- `mask` fails for issued ISBN HOT 2
- Time to drop Python 3.6? HOT 1
- [Enhancement] `meta` be able to ignore ISBNNotConsistentError HOT 3
- ISBN from words throttling HOT 5
- Add a network pytest mark for tests that use the network HOT 1
- error when using meta(isbn, service='openl') HOT 1
- Books in your language
- Basic Template
- If source returns Unimarc
- If source returns xml from SRU protocol
- Instructions for release
- Dutch Books
- French books
- Portuguese books
- Spanish books
- Italian books
- German books
- Wikipedia provider
- Library of Congress (US)
- Metadata function 'classify' doesn't work
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from isbnlib.