Comments (13)
I replaced the pdftk calls with the pure-python pyPdf Library.
Pleasy try this branch, you need to run git submodule init
to clone pyPdf under lib/
http://github.com/yvesf/springer_download/
(this branch also adds socks support and ugliness while modify sys.path)
from springer_download.
hey yvesf,
nice job! It works like a charm. I’am especially exited about the support for proxies as I prefer this way to access springerlink. Do you have instructions, may I help you to test?
Thank you.
from springer_download.
thanks for response. I've made some minor changes, now applied in my master branch. It would be nice if you could apply some testing.
You can use SOCKS like this:
ssh #you@your ssh login server# -D 1234 ./springer_download.py --socksaddr=localhost --socksport=1234 -l http://Spring-LINK
Please Note: I've changed the sanitizeFilename routine: (hope it works)
def sanitizeFilename(filename): - p1 = subprocess.Popen(["echo", filename], stdout=subprocess.PIPE) - p2 = subprocess.Popen(["iconv", "-f", "UTF-8", "-t" ,"ASCII//TRANSLIT"], stdin=p1.stdout, stdout=subprocess.PIPE) - return re.sub("\s+", "_", p2.communicate()[0].strip().replace("/", "-")) + return re.sub("\s+", "_", unicode(filename).encode("ascii", "replace").replace("/","-"))
from springer_download.
There seems to be some kind of timeout. I’am trying this over a sloppy cellular network right now. Will do some serious testing tomorrow morning.
fetching book information...
http://springerlink.com/content/978-3-531-15883-9/contents/
^CTraceback (most recent call last):
File "/path/to/springer_download.py", line 302, in <module>
main(sys.argv[1:])
File "/path/to/springer_download.py", line 86, in main
page = loader.open(link).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open
return getattr(self, name)(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 338, in open_http
h.endheaders()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py", line 868, in endheaders
self._send_output()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py", line 740, in _send_output
self.send(msg)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py", line 699, in send
self.connect()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py", line 683, in connect
self.timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 505, in create_connection
sock.connect(sa)
File "/path/to/lib/socksipy/socks.py", line 369, in connect
self.__negotiatesocks5(destpair[0],destpair[1])
File "/path/to/lib/socksipy/socks.py", line 228, in __negotiatesocks5
resp = self.__recvall(4)
File "/path/to/lib/socksipy/socks.py", line 141, in __recvall
data = data + self.recv(bytes-len(data))
KeyboardInterrupt
from springer_download.
weird, have you tried using your socks connection with a web browser?
additional you could test your connection without socks, i think at most front-matter.pdf and back-matter.pdf should load.
from springer_download.
socks support works as expected. no problems at all over a stable connection. great. but: I had some problems yesterday with merging the downloaded pdfs. I’am sorry but I can’t provide error messages. I think it was due to some special chars. I will investigate further. Merge upstream!
from springer_download.
I have observed some strange behavior: I’am on a Mac and the script fails/timeouts if I have activated a (unrelated ad blocker) web proxy. If I change the network settings and deactivate the http-proxy, everything is working fine?
from springer_download.
there is also an issue with the sanitizeFilename function:
fetching book information...
http://springerlink.com/content/978-3-531-13634-9/contents/
Traceback (most recent call last):
File "/path/to/springer_download.py", line 302, in <module>
main(sys.argv[1:])
File "/path/to/springer_download.py", line 123, in main
bookTitlePath = curDir + "/%s.pdf" % sanitizeFilename(bookTitle)
File "/path/to/springer_download.py", line 279, in sanitizeFilename
return re.sub("\s+", "_", unicode(filename).encode("ascii", "replace").replace("/","-"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)
from springer_download.
i've made a error in sanitizeFilename. Should work using last commit to yvesf/springer_download@ad1659a60b01e6b3ed54
from springer_download.
yay, fantastic. thank you yvesf!
from springer_download.
anything I should merge into the original branch?
from springer_download.
@milianw yvesfs proxy support is really nice. pleae merge.
from springer_download.
i like the additional extraction of metadata (but not the idea to storage them in so called NFO files).
the coding style is not inefficient but dirty. Invasive changes like that are hard to merge back into main.
Altough windows support isnt in my focus, i don't think that including various binaries (Windows PE, .net, dll's) is the way to go. Not to speak about possible licensing issues.
Not least, you should create a new bug-tracker entry for this topic, this one is about mac support.
from springer_download.
Related Issues (16)
- Only the first page of PDFs is downladed... HOT 2
- include helptext for http-proxy
- python2 check fails.
- How to login for download HOT 1
- add ISBN to created merged pdf file HOT 2
- chapter sorting HOT 5
- (deleted)
- invalid mime type text/html HOT 1
- Add bookmarks to merged Pdf
- Add support for the new Springer website & fix chapter ordering HOT 3
- Chapters not in the correct order
- cannot download isbn
- title contains < (tag open) leads to bad link HOT 1
- [request] Give a switch to just download HOT 1
- don't put back-matter in pdf twice HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from springer_download.