I've held off adding this directly to pyisbn for a fe

Since I did a little analysis at <a href="https://phabricator.wikimedia.org/T132919" r

Support hyphenation of ISBNs when converting to str() about pyisbn HOT 6 CLOSED

JNRowe commented on August 26, 2024

Support hyphenation of ISBNs when converting to str()

from pyisbn.

Comments (6)

jayvdb commented on August 26, 2024

Many more years have passed, and still no hyphenation.

If I understand the licensing issue, if it is still valid in 2016, it is that while the International ISBN Agency provides machine readable version of their data , they do not provide a license to redistribute the information.

e.g. https://www.isbn-international.org/content/terms-and-conditions section on Intellectual Property is currently quite restrictive, beginning with

"We are the owner or the licensee of all intellectual property rights in our site, and in the material published on it. Those works are protected by copyright laws and treaties around the world. All such rights are reserved."

And if taken literally "Our status (and that of any identified contributors) as the authors of material on our site must always be acknowledged." would require that re-use of their data in a library would require that the library emits a notice advising the user that they are the author of the data used to determine correct hyphenation, perhaps only on a help screen somewhere.

In your option 3 you note "In some regions the data is freely available". I suspect that refers to the copyright laws of some countries preventing a data file of this kind from being copyrightable, for various reasons. Or did you have some other reason why the data might be freely available in some countries? Are national agencies offering the same data under different licenses?

There are lots of packages that do option 1, and it results in stale data as you have noted.

I think it would be valuable to attempt a different approach.

I am not sure precisely what is the difference between option 2 and 3, but roughly what they describe sounds much better.

If the data is downloaded by the user, somehow, then at least this library doesnt need to be overly concerned with the licensing problems. As the International ISBN Agency doesnt appear to be going after the existing online datasets of the range data, it is fairly safe to assume users are not going to be overly concerned about the risk this introduces.

Ideally this library would perform the fetching, and perform caching to ensure the data is not stale. If you want to be very paranoid, the library could have a hook that by default emit some user message the first time the data is used in a session, but the caller can replace the hook with their own implementation, so that the caller can control the message according to their legal requirements, etc.

from pyisbn.

jayvdb commented on August 26, 2024

Since I did a little analysis at https://phabricator.wikimedia.org/T132919 , I couldnt find any library that deals with the staleness problem. They all appear to fetch the data at install time, and dont offer a way to update the package data, except with periodic (and not syncronised) package releases.

Here are the various raw data updates for three of them

https://pypi.python.org/pypi/isbn_hyphenate - https://github.com/TorKlingberg/isbn_hyphenate/commits/master/isbn_hyphenate/isbn_lengthmaps.py
https://pypi.python.org/pypi/isbnid - https://github.com/nekobcn/isbnid/commits/master/data
https://pypi.python.org/pypi/isbnlib - https://github.com/xlcnd/isbnlib/commits/master/isbnlib/_data/data4mask.py

from pyisbn.

JNRowe commented on August 26, 2024

tldr; While I appreciate the effort that went in to your comment, I'm still quite unmoved on the topic.

If I understand the licensing issue, if it is still valid in 2016, it is that while the International ISBN Agency provides machine readable version of their data , they do not provide a license to redistribute the information.

Yep, that is the crux. I tried contacting them about this at one point, but it was quite fruitless.

In your option 3 you note "In some regions the data is freely available". I suspect that refers to the copyright laws of some countries preventing a data file of this kind from being copyrightable, for various reasons.

Exactly that, and I should've been clearer in the first place ;)

[ways forward]

Frankly, I'm just not seeing the benefit at my end.

At this point users who feel they need hyphenation have likely switched to another package.

Ideally this library would perform the fetching, and perform caching to ensure the data is not stale. If you want to be very paranoid, the library could have a hook that by default emit some user message the first time the data is used in a session, but the caller can replace the hook with their own implementation, so that the caller can control the message according to their legal requirements, etc.

That is the main difference in option 2 and 3. Implementing your suggestion means either adding a chain of dependencies to deal with fetching and cache validation or writing a big chunk of code. In both cases it is a bunch of code which would only very rarely be used.

If I was to implement this PR I'd go with option 2, and have users shovel the data in themselves.

I'll also note that "option 3"-style solutions would have required patching to update the fetching mechanism since I first looked in to this too, based on a quick look at the site and my notes.

from pyisbn.

jayvdb commented on August 26, 2024

If I understand correctly, you are more open to option 2, in which case... lets do it?

We need to decide on a data format that your library will accept, and then someone (me..) creates an external package that provides the data in the agreed upon format, so it is easy for someone to use the two packages together.

If you're willing, so am I. Then we can work with the other isbn libraries mentioned above to see if they are interested in using a centralised data provider package.

Of the existing libraries mentioned above, https://github.com/TorKlingberg/isbn_hyphenate/blob/master/isbn_hyphenate/isbn_lengthmaps.py seems to be the best basis for a data structure, however we probably want groups_length and publisher_length to be members of a single object.

If you're not keen, I'll focus my efforts elsewhere.

from pyisbn.

JNRowe commented on August 26, 2024

I'd suggest approaching some of the other projects first to see if there is any interest in a data-only package. It is definitely an interesting solution if it gained traction, and there is some precedent with things like certifi.

[I'm not trying to pile on the stop motion here, I'm just trying to be honest about the situation as I see it.]

Another possibility to add to your list is arthurdejong/python-stdnum.

from pyisbn.

JNRowe commented on August 26, 2024

I've tried to move this forward again, but it doesn't look like it will be going anywhere.

So that this issue doesn't trip up others who think it is simply a SMOP, I'm closing this {CANT,WONT}FIX.

Times have changed... people who desire hyphenation can be served with one of the alternatives, and those of us who don't can continue without touching the licensing problem.

from pyisbn.

Support hyphenation of ISBNs when converting to str() about pyisbn HOT 6 CLOSED

Comments (6)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent