milianw / springer_download Goto Github PK
View Code? Open in Web Editor NEWa python script for downloading ebooks from springerlink.com
Home Page: http://milianw.de/code-snippets/take-2-download-script-for-springerlinkcom-ebooks
a python script for downloading ebooks from springerlink.com
Home Page: http://milianw.de/code-snippets/take-2-download-script-for-springerlinkcom-ebooks
(deleted)
After downloading all individual pdfs, chapters are not put in the right order. For example when running:
./springer_download.py -c 978-3-642-18101-6
The frontpage is succeeded by the 5th chapter.
http://springerlink.com/content/978-3-540-17069-3/contents/
<h1 lang="en" class="title">
A Course in H<sub>8</sub> Control Theory
Sometimes, the chapters of the books are sorted alphabetically on the contents page of springerlink, as the script only uses this information for its list order, the chapters are mixed up which isn't very nice.
Maybe there could be a sorting, based on the page numbers of the chapters. I think it should be possible, but I'm not very good on regex, so I can't present a solution myself.
./springer_download.py -c 978-3-658-01152-9
fetching book information...
http://www.springerlink.com/content/978-3-658-01152-9/contents/
ERROR: Could not evaluate book title - bad link http://www.springerlink.com/content/978-3-658-01152-9/contents/
Some books I've downloaded contained the back matter after the front matter and also after the other chapters.
This patch removes the first appearance after the front matter.
--- springer_download.py.ori 2010-09-29 00:59:40.000000000 +0100 +++ springer_download.py 2010-10-15 13:30:46.000000000 +0100 @@ -142,6 +155,9 @@ front_matter = True if re.search(r'back-matter.pdf', chapterLink) and re.search(r'<a href="([^"]+)">Next</a>', page): continue + #skip backmatter if it is in list as second chapter - will be there at the end of the book also + if re.search(r'back-matter.pdf', chapterLink): + if len(chapters)<2: + continue chapters.append(chapterLink)
Its helpful to include a howto for setting http_proxy environment variable, example (this commit depends on socks-related changes and is therefore not compatible to milanw/springer_download@master): yvesf/springer_download@75e4a086a0a16aa072b7
the python behavior regarding http_proxy is described in http://docs.python.org/library/urllib.html#urllib.urlopen
Traceback (most recent call last):
File "/path/to/springer", line 279, in <module>
main(sys.argv[1:])
File "/path/to/springer", line 195, in main
pdfcat(fileList, bookTitlePath)
File "/path/to/springer", line 28, in pdfcat
subprocess.Popen(command, shell=False).wait()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 595, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", line 1106, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
... because the link now has also a Title attribute, and the old regex doesn't work. Here is the new regex which worked for me:
<a href="([^"#]+)"[^>]*>Next</a>
(replaced the old one on lines 148 and 158)
Thanks for a great script!
my university requires to login to dowload the pdfs.
click on login
"Log In as an Institution (via Shibboleth or Athens)"
Select Country and University
Login
is there a way to pass the login credentials?
if i run the script as python2 springer-download.py it works just fine.
./springer-download.py gives me (with /usr/bin/env python defaulting to python3)
https://github.com/milianw/springer_download
File "./springer_download.py", line 92
print "fetching book information...\n\t%s" % link
^
SyntaxError: invalid syntax
Adding ISBN from Springer into filename helps to identify and to rename book with external scripts
Such form as (ISBN XXXXXXXXXX) at the end of the pdf file would be optimal
Seems that Springer has updated their website structure and URL space, for instance a book previously located at http://www.springerlink.com/content/978-94-007-4655-8 is now being redirected to http://link.springer.com/book/10.1007/978-94-007-4655-8/page/1, causing the script not to work.
Before this change, Springer, I guess, made some changes to the old site as well since the chapter ordering in the bound pdf is incorrect (see comments in http://milianw.de/code-snippets/take-2-download-script-for-springerlinkcom-ebooks)
Would be great if these issues were fixed. This is a great tool!
The script always stops at chapter 3:
ERROR: downloaded chapter http://springerlink.com/content/m65623732836hr43/fulltext.pdf has invalid mime type text/html - are you allowed to download XXX
The problem is, I’am 1) allowed and 2) it works via the browser.
How to fix this?
Nice to have:
It would be helpfull getting bookmarks to the merged file.
Unfortunately I'm not good at regexp but I think it could work like described here: http://blog.tremily.us/posts/PDF_bookmarks_with_Ghostscript/
…but not merge.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.