milianw / springer_download Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 40.0 153 KB

a python script for downloading ebooks from springerlink.com

Home Page: http://milianw.de/code-snippets/take-2-download-script-for-springerlinkcom-ebooks

Python 100.00%

springer_download's People

Contributors

Stargazers

Watchers

springer_download's Issues

Chapters not in the correct order

After downloading all individual pdfs, chapters are not put in the right order. For example when running:

./springer_download.py -c 978-3-642-18101-6

The frontpage is succeeded by the 5th chapter.

title contains < (tag open) leads to bad link

http://springerlink.com/content/978-3-540-17069-3/contents/

    <h1 lang="en" class="title">
        A Course in H<sub>8</sub> Control Theory

Sometimes, the chapters of the books are sorted alphabetically on the contents page of springerlink, as the script only uses this information for its list order, the chapters are mixed up which isn't very nice.
Maybe there could be a sorting, based on the page numbers of the chapters. I think it should be possible, but I'm not very good on regex, so I can't present a solution myself.

cannot download isbn

 ./springer_download.py -c 978-3-658-01152-9
fetching book information...
    http://www.springerlink.com/content/978-3-658-01152-9/contents/

ERROR: Could not evaluate book title - bad link http://www.springerlink.com/content/978-3-658-01152-9/contents/

don't put back-matter in pdf twice

Some books I've downloaded contained the back matter after the front matter and also after the other chapters.
This patch removes the first appearance after the front matter.

--- springer_download.py.ori    2010-09-29 00:59:40.000000000 +0100
+++ springer_download.py    2010-10-15 13:30:46.000000000 +0100
@@ -142,6 +155,9 @@
                     front_matter = True
             if re.search(r'back-matter.pdf', chapterLink) and re.search(r'<a href="([^"]+)">Next</a>', page):
                 continue
+            #skip backmatter if it is in list as second chapter - will be there at the end of the book also
+            if re.search(r'back-matter.pdf', chapterLink):
+                if len(chapters)<2:
+                    continue
 
             chapters.append(chapterLink)

include helptext for http-proxy

Its helpful to include a howto for setting http_proxy environment variable, example (this commit depends on socks-related changes and is therefore not compatible to milanw/springer_download@master): yvesf/springer_download@75e4a086a0a16aa072b7

the python behavior regarding http_proxy is described in http://docs.python.org/library/urllib.html#urllib.urlopen

[mac] Error after downloading

Traceback (most recent call last):
  File "/path/to/springer", line 279, in <module>
    main(sys.argv[1:])
  File "/path/to/springer", line 195, in main
    pdfcat(fileList, bookTitlePath)
  File "/path/to/springer", line 28, in pdfcat
    subprocess.Popen(command, shell=False).wait()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 595, in __init__
    errread, errwrite)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 1106, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied

Only the first page of PDFs is downladed...

... because the link now has also a Title attribute, and the old regex doesn't work. Here is the new regex which worked for me:

<a href="([^"#]+)"[^>]*>Next</a>

(replaced the old one on lines 148 and 158)

Thanks for a great script!

How to login for download

my university requires to login to dowload the pdfs.

click on login
"Log In as an Institution (via Shibboleth or Athens)"
Select Country and University
Login

is there a way to pass the login credentials?

python2 check fails.

if i run the script as python2 springer-download.py it works just fine.
./springer-download.py gives me (with /usr/bin/env python defaulting to python3)

https://github.com/milianw/springer_download

File "./springer_download.py", line 92
print "fetching book information...\n\t%s" % link
^
SyntaxError: invalid syntax

add ISBN to created merged pdf file

Adding ISBN from Springer into filename helps to identify and to rename book with external scripts

Such form as (ISBN XXXXXXXXXX) at the end of the pdf file would be optimal

Add support for the new Springer website & fix chapter ordering

Seems that Springer has updated their website structure and URL space, for instance a book previously located at http://www.springerlink.com/content/978-94-007-4655-8 is now being redirected to http://link.springer.com/book/10.1007/978-94-007-4655-8/page/1, causing the script not to work.

Before this change, Springer, I guess, made some changes to the old site as well since the chapter ordering in the bound pdf is incorrect (see comments in http://milianw.de/code-snippets/take-2-download-script-for-springerlinkcom-ebooks)

Would be great if these issues were fixed. This is a great tool!

invalid mime type text/html

The script always stops at chapter 3:

ERROR: downloaded chapter http://springerlink.com/content/m65623732836hr43/fulltext.pdf has invalid mime type text/html - are you allowed to download XXX

The problem is, I’am 1) allowed and 2) it works via the browser.

How to fix this?

Add bookmarks to merged Pdf

Nice to have:
It would be helpfull getting bookmarks to the merged file.

Unfortunately I'm not good at regexp but I think it could work like described here: http://blog.tremily.us/posts/PDF_bookmarks_with_Ghostscript/

[request] Give a switch to just download

…but not merge.

milianw / springer_download Goto Github PK

springer_download's People

Contributors

Stargazers

Watchers

Forkers

springer_download's Issues

Recommend Projects

Recommend Topics

Recommend Org