Giter VIP home page Giter VIP logo

Comments (6)

jayvdb avatar jayvdb commented on August 26, 2024

Many more years have passed, and still no hyphenation.

If I understand the licensing issue, if it is still valid in 2016, it is that while the International ISBN Agency provides machine readable version of their data , they do not provide a license to redistribute the information.

e.g. https://www.isbn-international.org/content/terms-and-conditions section on Intellectual Property is currently quite restrictive, beginning with

"We are the owner or the licensee of all intellectual property rights in our site, and in the material published on it. Those works are protected by copyright laws and treaties around the world. All such rights are reserved."

And if taken literally "Our status (and that of any identified contributors) as the authors of material on our site must always be acknowledged." would require that re-use of their data in a library would require that the library emits a notice advising the user that they are the author of the data used to determine correct hyphenation, perhaps only on a help screen somewhere.

In your option 3 you note "In some regions the data is freely available". I suspect that refers to the copyright laws of some countries preventing a data file of this kind from being copyrightable, for various reasons. Or did you have some other reason why the data might be freely available in some countries? Are national agencies offering the same data under different licenses?

There are lots of packages that do option 1, and it results in stale data as you have noted.

I think it would be valuable to attempt a different approach.

I am not sure precisely what is the difference between option 2 and 3, but roughly what they describe sounds much better.

If the data is downloaded by the user, somehow, then at least this library doesnt need to be overly concerned with the licensing problems. As the International ISBN Agency doesnt appear to be going after the existing online datasets of the range data, it is fairly safe to assume users are not going to be overly concerned about the risk this introduces.

Ideally this library would perform the fetching, and perform caching to ensure the data is not stale. If you want to be very paranoid, the library could have a hook that by default emit some user message the first time the data is used in a session, but the caller can replace the hook with their own implementation, so that the caller can control the message according to their legal requirements, etc.

from pyisbn.

jayvdb avatar jayvdb commented on August 26, 2024

Since I did a little analysis at https://phabricator.wikimedia.org/T132919 , I couldnt find any library that deals with the staleness problem. They all appear to fetch the data at install time, and dont offer a way to update the package data, except with periodic (and not syncronised) package releases.

Here are the various raw data updates for three of them

https://pypi.python.org/pypi/isbn_hyphenate - https://github.com/TorKlingberg/isbn_hyphenate/commits/master/isbn_hyphenate/isbn_lengthmaps.py
https://pypi.python.org/pypi/isbnid - https://github.com/nekobcn/isbnid/commits/master/data
https://pypi.python.org/pypi/isbnlib - https://github.com/xlcnd/isbnlib/commits/master/isbnlib/_data/data4mask.py

from pyisbn.

JNRowe avatar JNRowe commented on August 26, 2024

tldr; While I appreciate the effort that went in to your comment, I'm still quite unmoved on the topic.

If I understand the licensing issue, if it is still valid in 2016, it is that while the International ISBN Agency provides machine readable version of their data , they do not provide a license to redistribute the information.

Yep, that is the crux. I tried contacting them about this at one point, but it was quite fruitless.

In your option 3 you note "In some regions the data is freely available". I suspect that refers to the copyright laws of some countries preventing a data file of this kind from being copyrightable, for various reasons.

Exactly that, and I should've been clearer in the first place ;)

[ways forward]

Frankly, I'm just not seeing the benefit at my end.

At this point users who feel they need hyphenation have likely switched to another package.

Ideally this library would perform the fetching, and perform caching to ensure the data is not stale. If you want to be very paranoid, the library could have a hook that by default emit some user message the first time the data is used in a session, but the caller can replace the hook with their own implementation, so that the caller can control the message according to their legal requirements, etc.

That is the main difference in option 2 and 3. Implementing your suggestion means either adding a chain of dependencies to deal with fetching and cache validation or writing a big chunk of code. In both cases it is a bunch of code which would only very rarely be used.

If I was to implement this PR I'd go with option 2, and have users shovel the data in themselves.

I'll also note that "option 3"-style solutions would have required patching to update the fetching mechanism since I first looked in to this too, based on a quick look at the site and my notes.

from pyisbn.

jayvdb avatar jayvdb commented on August 26, 2024

If I understand correctly, you are more open to option 2, in which case... lets do it?

We need to decide on a data format that your library will accept, and then someone (me..) creates an external package that provides the data in the agreed upon format, so it is easy for someone to use the two packages together.

If you're willing, so am I. Then we can work with the other isbn libraries mentioned above to see if they are interested in using a centralised data provider package.

Of the existing libraries mentioned above, https://github.com/TorKlingberg/isbn_hyphenate/blob/master/isbn_hyphenate/isbn_lengthmaps.py seems to be the best basis for a data structure, however we probably want groups_length and publisher_length to be members of a single object.

If you're not keen, I'll focus my efforts elsewhere.

from pyisbn.

JNRowe avatar JNRowe commented on August 26, 2024

I'd suggest approaching some of the other projects first to see if there is any interest in a data-only package. It is definitely an interesting solution if it gained traction, and there is some precedent with things like certifi.

[I'm not trying to pile on the stop motion here, I'm just trying to be honest about the situation as I see it.]

Another possibility to add to your list is arthurdejong/python-stdnum.

from pyisbn.

JNRowe avatar JNRowe commented on August 26, 2024

I've tried to move this forward again, but it doesn't look like it will be going anywhere.

So that this issue doesn't trip up others who think it is simply a SMOP, I'm closing this {CANT,WONT}FIX.

Times have changed... people who desire hyphenation can be served with one of the alternatives, and those of us who don't can continue without touching the licensing problem.

from pyisbn.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.