lorenzodifuccia / safaribooks Goto Github PK

Download and generate EPUB of your favorite books from O'Reilly Learning (aka Safari Books Online) library.

License: Do What The F*ck You Want To Public License

Python 100.00%

python epub calibre safaribooksonline safaribooks safari safari-books-online oreilly

safaribooks's Introduction

SafariBooks

Download and generate EPUB of your favorite books from Safari Books Online library.
I'm not responsible for the use of this program, this is only for personal and educational purpose.
Before any usage please read the O'Reilly's Terms of Service.

⚠ Attention needed ⚠

If you are a developer and want to help this project, please take a look to the current Milestone.
Checkout also the new APIv2 branch: apiv2
The Community thanks 🙏🏻

✨ ADV ✨

Take a look at my other GitHub projects: https://github.com/lorenzodifuccia 👀 ❤️

Overview:

Requirements & Setup
Usage
Single Sign-On (SSO), Company, University Login
Calibre EPUB conversion
Example: Download Test-Driven Development with Python, 2nd Edition
Example: Use or not the --kindle option

Requirements & Setup:

First of all, it requires python3 and pip3 or pipenv to be installed.

$ git clone https://github.com/lorenzodifuccia/safaribooks.git
Cloning into 'safaribooks'...

$ cd safaribooks/
$ pip3 install -r requirements.txt

OR

$ pipenv install && pipenv shell

The program depends of only two Python 3 modules:

lxml>=4.1.1
requests>=2.20.0

Usage:

It's really simple to use, just choose a book from the library and replace in the following command:

X-es with its ID,
email:password with your own.

$ python3 safaribooks.py --cred "[email protected]:password01" XXXXXXXXXXXXX

The ID is the digits that you find in the URL of the book description page:
https://www.safaribooksonline.com/library/view/book-name/XXXXXXXXXXXXX/
Like: https://www.safaribooksonline.com/library/view/test-driven-development-with/9781491958698/

Program options:

$ python3 safaribooks.py --help
usage: safaribooks.py [--cred <EMAIL:PASS> | --login] [--no-cookies]
                      [--kindle] [--preserve-log] [--help]
                      <BOOK ID>

Download and generate an EPUB of your favorite books from Safari Books Online.

positional arguments:
  <BOOK ID>            Book digits ID that you want to download. You can find
                       it in the URL (X-es):
                       `https://learning.oreilly.com/library/view/book-
                       name/XXXXXXXXXXXXX/`

optional arguments:
  --cred <EMAIL:PASS>  Credentials used to perform the auth login on Safari
                       Books Online. Es. ` --cred
                       "[email protected]:password01" `.
  --login              Prompt for credentials used to perform the auth login
                       on Safari Books Online.
  --no-cookies         Prevent your session data to be saved into
                       `cookies.json` file.
  --kindle             Add some CSS rules that block overflow on `table` and
                       `pre` elements. Use this option if you're going to
                       export the EPUB to E-Readers like Amazon Kindle.
  --preserve-log       Leave the `info_XXXXXXXXXXXXX.log` file even if there
                       isn't any error.
  --help               Show this help message.

The first time you use the program, you'll have to specify your Safari Books Online account credentials (look here for special character).
The next times you'll download a book, before session expires, you can omit the credential, because the program save your session cookies in a file called cookies.json.
For SSO, please use the sso_cookies.py program in order to create the cookies.json file from the SSO cookies retrieved by your browser session (please follow these steps).

Pay attention if you use a shared PC, because everyone that has access to your files can steal your session. If you don't want to cache the cookies, just use the --no-cookies option and provide all time your credential through the --cred option or the more safe --login one: this will prompt you for credential during the script execution.

You can configure proxies by setting on your system the environment variable HTTPS_PROXY or using the USE_PROXY directive into the script.

Calibre EPUB conversion

Important: since the script only download HTML pages and create a raw EPUB, many of the CSS and XML/HTML directives are wrong for an E-Reader. To ensure best quality of the output, I suggest you to always convert the EPUB obtained by the script to standard-EPUB with Calibre. You can also use the command-line version of Calibre with ebook-convert, e.g.:

$ ebook-convert "XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)/9781491958698.epub" "XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)/9781491958698_CLEAR.epub"

After the execution, you can read the 9781491958698_CLEAR.epub in every E-Reader and delete all other files.

The program offers also an option to ensure best compatibilities for who wants to export the EPUB to E-Readers like Amazon Kindle: --kindle, it blocks overflow on table and pre elements (see example).
In this case, I suggest you to convert the EPUB to AZW3 with Calibre or to MOBI, remember in this case to select Ignore margins in the conversion options:

Examples:

Download Test-Driven Development with Python, 2nd Edition:

$ python3 safaribooks.py --cred "[email protected]:MyPassword1!" 9781491958698

       ____     ___         _ 
      / __/__ _/ _/__ _____(_)
     _\ \/ _ `/ _/ _ `/ __/ / 
    /___/\_,_/_/ \_,_/_/ /_/  
      / _ )___  ___  / /__ ___
     / _  / _ \/ _ \/  '_/(_-<
    /____/\___/\___/_/\_\/___/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[-] Logging into Safari Books Online...
[*] Retrieving book info... 
[-] Title: Test-Driven Development with Python, 2nd Edition                     
[-] Authors: Harry J.W. Percival                                                
[-] Identifier: 9781491958698                                                   
[-] ISBN: 9781491958704                                                         
[-] Publishers: O'Reilly Media, Inc.                                            
[-] Rights: Copyright © O'Reilly Media, Inc.                                    
[-] Description: By taking you through the development of a real web application 
from beginning to end, the second edition of this hands-on guide demonstrates the 
practical advantages of test-driven development (TDD) with Python. You’ll learn 
how to write and run tests before building each part of your app, and then develop
the minimum amount of code required to pass those tests. The result? Clean code
that works.In the process, you’ll learn the basics of Django, Selenium, Git, 
jQuery, and Mock, along with curre...
[-] Release Date: 2017-08-18
[-] URL: https://learning.oreilly.com/library/view/test-driven-development-with/9781491958698/
[*] Retrieving book chapters...                                                 
[*] Output directory:                                                           
    /XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)
[-] Downloading book contents... (53 chapters)                                  
    [#####################################################################] 100%
[-] Downloading book CSSs... (2 files)                                          
    [#####################################################################] 100%
[-] Downloading book images... (142 files)                                      
    [#####################################################################] 100%
[-] Creating EPUB file...                                                       
[*] Done: /XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition 
(9781491958698)/9781491958698.epub

    If you like it, please * this project on GitHub to make it known:
        https://github.com/lorenzodifuccia/safaribooks
    e don't forget to renew your Safari Books Online subscription:
        https://learning.oreilly.com

[!] Bye!!

The result will be (opening the EPUB file with Calibre):

Use or not the --kindle option:
```
$ python3 safaribooks.py --kindle 9781491958698
```
On the right, the book created with --kindle option, on the left without (default):

Thanks!!

For any kind of problem, please don't hesitate to open an issue here on GitHub.

Lorenzo Di Fuccia

safaribooks's People

Contributors

Stargazers

Watchers

Forkers

sts0mrg0 ivan-zhang01 robossliu billhongs wajiii sjsdfg bogos nzcode planetroute gabrielelana munai-das uchenm yjw868 balarajuyogesh xdsu jinzeren b0rjitaaa e-minguez secfb xuewenfei it-crasy liskin sdknij bakasurarce liyixcn banv wudong richardxy bogdanlica a1howard huytemp01 dafalcon zhengquan qingdimeng zvoronz thangphuocnguyen pranjalgohain java2kus usaamch thienlequang-agilityio extremenelson a-martins shashankraider happyslowly blackout314 vanvictorlim koenkarsten mohammedomar1974 dimitar-petrov asavoy owen800q gad0lin tzinckgraf kryptorado cristinelcalugarita binhphi109 search-good-project pnhoang jeanvs80 kvick larrynung farzad1120 alexgerassimov vitkin n873 ahmedriza balasista asaddodhy apoloa ariesdevil fedme slowtroph standemancini minasys gtramontina spcified as-you-like niteshmistry pthulasiram invious venumeda jonathanhle kharthigeyan bryanwweber kathgironpe joelcris spacelatte abhimanyoo qiaoxingli john-hc-doe ashokrai trucnguyenlam xqfz wangyung shekarsiri roskenet penleychan roustem ricky-lim netoht

safaribooks's Issues

JSONDecodeError

Why this happen?

[23/Aug/2018 22:00:33] ** Welcome to SafariBooks! **
[23/Aug/2018 22:00:33] Logging into Safari Books Online...
[23/Aug/2018 22:00:35] Retrieving book info...
[23/Aug/2018 22:00:37] Title: C++ Reactive Programming
[23/Aug/2018 22:00:37] Authors: Peter Abraham, Praseed Pai
[23/Aug/2018 22:00:37] Identifier: 9781788629775
[23/Aug/2018 22:00:37] ISBN: 9781788629775
[23/Aug/2018 22:00:37] Publishers: Packt Publishing
[23/Aug/2018 22:00:37] Rights: Copyright © 2018 Packt Publishing
[23/Aug/2018 22:00:37] Description: Learn how to implement the reactive programming paradigm with C++ and build asynchronous and concurrent applicationsAbout This BookEfficiently exploit concurrency and parallelism in your programsUse the Functional Reactive programming model to structure programsUnderstand reactive GUI programming to make your own applications using QtWho This Book Is ForIf you're a C++ developer interested in using reactive programming to build asynchronous and concurrent applications, you'll find this book extr...
[23/Aug/2018 22:00:37] Release Date: 2018-06-29
[23/Aug/2018 22:00:37] URL: https://www.safaribooksonline.com/library/view/c-reactive-programming/9781788629775/
[23/Aug/2018 22:00:37] Retrieving book chapters...
[23/Aug/2018 22:00:54] File "safaribooks.py", line 1023, in
SafariBooks(args_parsed)
File "safaribooks.py", line 295, in init
self.book_chapters = self.get_book_chapters()
File "safaribooks.py", line 509, in get_book_chapters
return result + (self.get_book_chapters(page + 1) if response["next"] else [])
File "safaribooks.py", line 509, in get_book_chapters
return result + (self.get_book_chapters(page + 1) if response["next"] else [])
File "safaribooks.py", line 509, in get_book_chapters
return result + (self.get_book_chapters(page + 1) if response["next"] else [])
[Previous line repeated 7 more times]
File "safaribooks.py", line 492, in get_book_chapters
response = response.json()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\models.py", line 896, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\json_init_.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None

[23/Aug/2018 22:00:54] Unhandled Exception: Expecting value: line 2 column 1 (char 1) (type: JSONDecodeError)
[23/Aug/2018 22:00:54] Last request done:
URL: https://www.safaribooksonline.com/api/v1/book/9781788629775/chapter/?page=12

Only book cover image is downloaded. Other book images are not downloaded and available in epub file.

<module> Error

I Run your sample and get this Error :

Traceback (most recent call last):
File "safaribooks.py", line 12, in
from html import escape
ImportError: No module named html

Regards

resume download

There are books which are very large, especially mathematics books which contain lots of pictures, e.g. Advanced Engineering Mathematics 6th Edition (9781284105971). When downloading this book the download hangs, never finishes. I restarted the download many. many times after deleting the book. I think it should be useful to be able to resume downloading from where it left off.

allow_abbrev error

Hello

Whenever I try to run it, it shows the following error:

Traceback (most recent call last):
File "safaribooks.py", line 983, in
allow_abbrev=False)
TypeError: init() got an unexpected keyword argument 'allow_abbrev'

My python version: Python 3.4.3

Not working anymore

Tried to download and here's the error:

[-] Logging into Safari Books Online...
[*] Retrieving book info...
[#] API: book's not present in Safari Books Online.
The book identifier is the digits that you can find in the URL:
https://www.safaribooksonline.com/library/view/book-name/XXXXXXXXXXXXX/
[+] Please delete all the <BOOK NAME>/OEBPS/*.xhtml files and restart the program.
[!] Aborting...

Error when convert Epub to Kindle by Calibre

Hi, thanks for this lib.
Everybook I downloaded when convert with Calibre always get this error.
Do guys have any idea?

Conversion options changed from defaults:
  output_profile: 'kindle_voyage'
  cover: u'/var/folders/h0/ytf50xh121v90wvl_sz8lx_80000gn/C/calibre_3.35.0_tmp_WI1XDR/8zm4ie.jpeg'
  read_metadata_from_opf: u'/var/folders/h0/ytf50xh121v90wvl_sz8lx_80000gn/C/calibre_3.35.0_tmp_WI1XDR/N4AnUa.opf'
  verbose: 2
  mobi_toc_at_start: True
Resolved conversion options
calibre version: 3.35.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': u'/var/folders/h0/ytf50xh121v90wvl_sz8lx_80000gn/C/calibre_3.35.0_tmp_WI1XDR/8zm4ie.jpeg',
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_compress': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x10c99a790>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'mobi_file_type': u'old',
 'mobi_ignore_margins': False,
 'mobi_keep_original_images': False,
 'mobi_toc_at_start': True,
 'no_chapters_in_toc': False,
 'no_inline_navbars': True,
 'no_inline_toc': False,
 'output_profile': <calibre.customize.profiles.KindleVoyageOutput object at 0x10c99afd0>,
 'page_breaks_before': u'/',
 'personal_doc': u'[PDOC]',
 'prefer_author_sort': False,
 'prefer_metadata_cover': False,
 'pretty_print': False,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'/var/folders/h0/ytf50xh121v90wvl_sz8lx_80000gn/C/calibre_3.35.0_tmp_WI1XDR/N4AnUa.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'share_not_sync': False,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
Python function terminated unexpectedly: Invalid namespace u'http://idpf.org/2007/opf' for OPF document
InputFormatPlugin: EPUB Input running
on /var/folders/h0/ytf50xh121v90wvl_sz8lx_80000gn/C/calibre_3.35.0_tmp_WI1XDR/yhnFRD.epub
Found HTML cover OEBPS/cover.xhtml
Parsing all content...
Traceback (most recent call last):
  File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 154, in main
    return run_entry_point()
  File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 114, in run_entry_point
    return getattr(pmod, func)()
  File "site-packages/calibre/utils/ipc/worker.py", line 199, in main
  File "site-packages/calibre/gui2/convert/gui_conversion.py", line 42, in gui_convert_override
  File "site-packages/calibre/gui2/convert/gui_conversion.py", line 27, in gui_convert
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 1117, in run
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 1311, in create_oebbook
  File "site-packages/calibre/ebooks/oeb/reader.py", line 72, in __call__
  File "site-packages/calibre/ebooks/oeb/reader.py", line 133, in _read_opf
calibre.ebooks.oeb.base.OEBError: Invalid namespace u'http://idpf.org/2007/opf' for OPF document```

Contrasting log messages when Out of Session while EPUB creation in process

I got an Out of Session error during EPUB creation, after downloading all book contents, CSS and images. Re-running without deleting worked well for me, however I got contrasting log messages. See below.

The log says:

[-] Logging into Safari Books Online...                                                                                                        
[*] Retrieving book info...                                                                                                                    
[-] ...<all book info>        
[*] Retrieving book chapters...                                                                                                                
[*] Output directory:                                                                                                                          
    /Users/...
[-] Downloading book contents...                                                                                                
    [##########################################################################################################] 100%
[-] Downloading book CSSs...                                                                                                          
    [##########################################################################################################] 100%
[-] Downloading book images...                                                                                                     
    [##########################################################################################################] 100%
[-] Creating EPUB file...                                                                                                                      
[#] API: Out-of-Session (Authentication credentials were not provided.).                                                                       
Don't delete any files, just run again this program in order to complete the `.epub` creation!
[+] Please delete all the `<BOOK NAME>/OEBPS/*.xhtml` files and restart the program.                                                           
[!] Aborting...

Note the last lines of the log:

[#] API: Out-of-Session (Authentication credentials were not provided.).
Don't delete any files, just run again this program in order to complete the .epub creation!
[+] Please delete all the <BOOK NAME>/OEBPS/*.xhtml files and restart the program.

Re-running without deleting the files worked well for me.

Not working on some books

python 3.7.0 for macOS
errors alway on ch02.html

Books URL:
https://www.safaribooksonline.com/library/view/learning-swift-3rd/9781491987568/
https://www.safaribooksonline.com/library/view/you-dont-know/9781491905241/

error info:
[-] URL: https://www.safaribooksonline.com/library/view/learning-swift-3rd/9781491987568/ [*] Retrieving book chapters... [*] Output directory: /safaribooks/Books/Learning Swift 3rd Edition (9781491987568) [-] Downloading book contents... (32 chapters) [#] Parser: book content's corrupted or not present: ch02.html (2. The Swift Programming Language) [+] Please delete all the /OEBPS/*.xhtml files and restart the program. [!] Aborting...

[-] URL: https://www.safaribooksonline.com/library/view/you-dont-know/9781491905241/ [*] Retrieving book chapters... [*] Output directory: /safaribooks/Books/You Don_t Know JS (9781491905241) [-] Downloading book contents... (17 chapters) [#] Parser: book content's corrupted or not present: ch02.html (2. Syntax) [+] Please delete all the /OEBPS/*.xhtml files and restart the program. [!] Aborting...

Argparse allow_abbrev problem

Probably I have old argparse in my ubuntu 14.04. But except this it works.

Traceback (most recent call last):
  File "safaribooks.py", line 965, in <module>
    allow_abbrev=False)
TypeError: __init__() got an unexpected keyword argument 'allow_abbrev'

cannot import name escape

I get this error when running python2.7 safaribooks.py

Traceback (most recent call last):
File "safaribooks.py", line 12, in
from html import escape
ImportError: cannot import name escape

pip list | grep html
html 1.16
requests-html 0.2.2

What package am i missing?

Regards,
Kevin

conversion to kindle using Kindle Previewer fails

Warning(prcfile):W14028: Following file does not exist : ../fonts/fontawesome-webfont.eot?#iefix&v=4.4.0

Error(core):E1003: Unknown error in class String.

Error(parsing):E3013: More number of characters are hidden using display:none than allowed limit. Limit: 10000 in file: C:\Users\Codrut\AppData\Local\Temp\mbp_7E2_A_19_F_22_36_2BF_30E0_400C_1\OEBPS\toc01.xhtml line: 13

any idea how to resolve this? I don't want to install Calibre.

Thanks

Cannot download a book series because of using the same folder

When I'm trying to download a book series such as "You Don't know JS" , it's always using the same folder ./Books/You Don_t Know JS (KyleSimpson)/ for all books.
Then I received these message

[-] Downloading book images... (9 files)
[*] Some of the book contents were already downloaded.
    If you want to be sure that all the images will be downloaded,
    please delete the `<BOOK NAME>/OEBPS/*.xhtml` files and restart the program.
[*] File `default_cover.jpeg` already exists.
    If you want to download again all the images,
    please delete the `<BOOK NAME>/OEBPS/*.xhtml` and `<BOOK NAME>/OEBPS/Images/*` files and restart the program.
    [###########################################################################################################################################] 100%

Could you append the book id to the folder name to avoid it?

no csrf token found, can't login

It says:
No CSRF token found on the page, Please try login again
Have tried many times, on many different pages, same response.

[21/Jul/2018 22:10:29] ** Welcome to SafariBooks! **
[21/Jul/2018 22:10:29] Logging into Safari Books Online...
[21/Jul/2018 22:11:01] Login: no CSRF Token found in the page. Unable to continue the login. Try again...
info_9780134278308.log

SSO support

Hi, is it somehow possible to use SSO instead on username & pwd? Thanks

No images downloaded

Thanks for the good work! The code works, but all the images are missing in the book. Please advise how to address the problem. Thanks!

Unhandled Exception: 'gbk' codec can't encode character '\xa9' in position 2797

[-] Creating EPUB file...

[#] Unhandled Exception: 'gbk' codec can't encode character '\xa9' in position 2797: illegal multibyte sequence (type: UnicodeEncodeError)

[+] Please delete all the <BOOK NAME>/OEBPS/*.xhtml files and restart the program.

[!] Aborting...

Path too long in windows

[27/Mar/2018 16:08:38] Unhandled Exception: [WinError 3] The system can't find the directory properly: 'D:\books\safaribooks\Books\9781260116601.zip' -> 'D:\books\safaribooks\Books\CompTIA Cloud_ Certification Study Guide, Second Edition (Exam CV0-002), 2nd Edition (EricA.Vanderburg,ScottWilson)\CompTIA Cloud_ Certification Study Guide, Second Edition (Exam CV0-002), 2nd Edition (EricA.Vanderburg,ScottWilson).epub' (type: FileNotFoundError)

Infinite "Creating EPUB file..." + fill filesystem

Hello (me again)
once again, thank you for this script, it's really useful !
I'm using it to prepare some certifications at work and have a valid Safari account.

A) For some books it works perfectly ( 9781449342562 or the Python example you give)
B) For some, it seems to loop somewhere in the zipping process and create a X GB file and crashes because of space issue at one point,
for example 9780134466330 or 9780134030999

C) I tried debugging and issue seems to be here :
shutil.make_archive(zip_file, 'zip', self.BOOK_PATH)

D) I guess it's because of the title of these books which contains a whole lot of special character

E) I tried to workaround by zipping the "Ebook Path" folder and renaming to epub and it works :)
I imagine the solution / workaround could be to allow to generate the "epub" filename with "book id" instead of book_path

F) Also when opening an epub generated by this script, it contains :
folders :
META-INF
OEBPS

Files :
mimetype
.zip
which also contains empty folders META-INF and OEBPS + mimetype

So I'd guess not needed

Have a great day !

[+] Please delete all the `<BOOK NAME>/OEBPS/*.xhtml` files and restart the program.

[#] API: Out-of-Session (Authentication credentials were not provided.)
[+] Please delete all the <BOOK NAME>/OEBPS/*.xhtml files and restart the program.
[!] Aborting...

I can´t download the books

API: Out-of-Session (Authentication credentials were not provided.)

I am getting this error after Retrieving book info message. How can I fix this?

error trying to retrieve this page / Exception: Expecting value

info_9781617292774.log
error trying to retrieve this page: kindle_split_016.html (Chapter 7. Shaping the relevance function)\n
info_9781782161363.log
error trying to retrieve this page: ch02s06.html (Summary)

info_9781783984923.log
Unhandled Exception: Expecting value: line 2 column 1 (char 1) (type: JSONDecodeError)'

info_9781783987023.log
error trying to retrieve this page: ch04s03.html (Built-in analyzers)
info_9781784391010.log
error trying to retrieve this page: ch03s08.html (Sorting your data)\n
info_9781784399641.log
error trying to retrieve this page: toc.html (Table of Contents)
info_9781786460011.log
b'Unhandled Exception: Expecting value: line 2 column 1 (char 1) (type: JSONDecodeError)'
info_9781786460189.log
b'Unhandled Exception: Expecting value: line 2 column 1 (char 1) (type: JSONDecodeError)'
info_9781787128453.log
error trying to retrieve this page: 051e3348-bdf0-47dc-b558-0f2be7fbb674.xhtml (Boolean)\n
info_9781787281868.log
b'Unhandled Exception: Expecting value: line 2 column 1 (char 1) (type: JSONDecodeError)'
info_9781787286849.log
error trying to retrieve this page: 75d192b4-d51a-48e9-b3f8-798eb58c3e35.xhtml (There's more...)\n
info_9781787288546.log
error trying to retrieve this page: ch13.html (3. Not Only Full Text Search)\n

info_9781788837385.log
error trying to retrieve this page: cdd9d5a0-5524-4b2a-b760-c8270f85cb94.xhtml (Production Solr setup)\n

Please let me know if you need log files. I may send them in a secure channel because they contain sensitive data.

work with video

hi orenzo,
not sure how much is hard to implement
what adding support to dump videos too ;)
label feature request

Feature request for proper figure formatting during conversion with Calibre

Most of the books specify the size of the figures on the XHTML files. For instance:
<img src="Images/Ch4Fig15.png" alt="This is the caption" width="2356" height="578"/>
This create issues while converting the .epub file in .azw3 or .pdf, where the images gets distorted. The problem can be easily solved by eliminating width/height in the html statement:
<img src="Images/Ch4Fig15.png" alt="This is the caption"/>
It would be nice to have an option to strip those from the XHTML files automatically.

problem install safaribooks

Hello,

When i want install safaribooks in python3

C:\safaribooks>pip3 install -r requirements.txt
Requirement already satisfied: lxml>=4.1.1 in c:\python\lib\site-packages (from -r requirements.txt (line 1)) (4.2.5)
Requirement already satisfied: requests>=2.20.0 in c:\python\lib\site-packages (from -r requirements.txt (line 2)) (2.20.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\python\lib\site-packages (from requests>=2.20.0->-r requirements.txt (line 2)) (2018.11.29)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\python\lib\site-packages (from requests>=2.20.0->-r requirements.txt (line 2)) (1.24.1)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\python\lib\site-packages (from requests>=2.20.0->-r requirements.txt (line 2)) (3.0.4)
Requirement already satisfied: idna<2.8,>=2.5 in c:\python\lib\site-packages (from requests>=2.20.0->-r requirements.txt (line 2)) (2.7)

C:\safaribooks>pip list
Package Version

certifi 2018.11.29
chardet 3.0.4
idna 2.7
lxml 4.2.5
pip 18.1
requests 2.20.1
setuptools 40.6.2
urllib3 1.24.1
youtube-dl 2018.11.23

How solve it ?

Non è legale

Ciò che hai sviluppato non è legale

Every double quotation marks are replaced by single quotation marks

On safaribooksonline:

package main

import "fmt"

func main(){
    fmt.Println(a:"Hello World")
}

Downloaded epub:

package main

import 'fmt'

func main(){
    fmt.Println(a:'Hello World')
}

UnicodeEncodeError

UnicodeEncodeError: 'ascii' codec can't encode characters in position 6580-6581: ordinal not in range(128).

Character: \xa9

Running Python 3.6.5 :: Anaconda, Inc.

Cookies.json format

Hello, my safaribooksonline account is SSO handled by my company, thus I have NO password.
Could you post an example of the cookies.json format ? (with dummy values of course)

downloaded content is not properly ordered?

Example book : sudo python3.6 safaribooks.py 9780134030999

It looks like it downloaded fine but i see many of the image files placed at the end of the book, which should be part of a chapter placed between text with explanation.

If the downloads are not in order then i am not sure how to put together images and content in order there are a lot of images with many chapters.

Unhandled Exception: [WinError 267] Directory name invalid

[01/Mar/2018 17:38:49] ** Welcome to SafariBooks! **
[01/Mar/2018 17:38:49] Logging into Safari Books Online...
[01/Mar/2018 17:38:53] Retrieving book info...
[01/Mar/2018 17:38:55] Title: Revive: How to Transform Traditional Businesses into Digital Leaders
[01/Mar/2018 17:38:55] Authors: Brian Manning, Jason Albanese
[01/Mar/2018 17:38:55] Identifier: 9780134307626
[01/Mar/2018 17:38:55] ISBN: 9780134307626
[01/Mar/2018 17:38:55] Publishers: PH Professional Business
[01/Mar/2018 17:38:55] Description: GAME-CHANGING DIGITAL TRANSFORMATION:USE DIGITAL STRATEGIES, CHANNELS, AND PLATFORMS TO TRANSFORM ENTERPRISES TO COMPETE IN THE DIGITAL AGE Move from “reactive digital” to “transformative digital” Use digital capabilities to fundamentally change the way you lead, direct, and structure organizations and teams Stay focused on the “moving target” of digital best practices, and accelerate your progress towards digital maturity REVIVE will help you build a core business model for creating your own ...
[01/Mar/2018 17:38:55] Release Date: 2015-11-17
[01/Mar/2018 17:38:55] URL: https://www.safaribooksonline.com/library/view/revive-how-to/9780134307626/
[01/Mar/2018 17:38:55] Retrieving book chapters...
[01/Mar/2018 17:39:01] File "safaribooks.py", line 992, in
SafariBooks(args_parsed)
File "safaribooks.py", line 305, in init
self.create_dirs()
File "safaribooks.py", line 671, in create_dirs
os.makedirs(self.BOOK_PATH)
File "C:\apps\Python\Python36\lib\os.py", line 220, in makedirs
mkdir(name, mode)

[01/Mar/2018 17:39:01] Unhandled Exception: [WinError 267] Directory name invalid : 'D:\books\safaribooks\Books\Revive: How to Transform Traditional Businesses into Digital Leaders' (type: NotADirectoryError)
[01/Mar/2018 17:39:01] Last request done:
URL: https://www.safaribooksonline.com/api/v1/book/9780134307626/chapter/?page=3
DATA: None
OTHERS: {}

200
Server: nginx/1.10.3 (Ubuntu)
Content-Type: application/json
Content-Language: en-US
X-Frame-Options: SAMEORIGIN
x-content-type-options: nosniff
Allow: GET, HEAD, OPTIONS
strict-transport-security: max-age=3600; includeSubDomains, max-age=31536000; includeSubdomains
x-xss-protection: 1; mode=block
Content-Encoding: gzip
Cache-Control: s-maxage=1800
Content-Length: 770
Accept-Ranges: bytes
Date: Thu, 01 Mar 2018 09:39:01 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Client-IP: 111.204.128.114
X-Served-By: cache-hnd18738-HND
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1519897141.881305,VS0,VE569
Vary: Accept, Accept-Language, Authorization, Cookie

error on login

"Login: no CSRF Token found in the page. Unable to continue the login. Try again..."

Command:

python3 safaribooks.py --cred "[email protected]:xxx" 9781786468949

Output:


       ____     ___         _
      / __/__ _/ _/__ _____(_)
     _\ \/ _ `/ _/ _ `/ __/ /
    /___/\_,_/_/ \_,_/_/ /_/
      / _ )___  ___  / /__ ___
     / _  / _ \/ _ \/  '_/(_-<
    /____/\___/\___/_/\_\/___/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[-] Logging into Safari Books Online...
[#] Login: no CSRF Token found in the page. Unable to continue the login. Try again...
[+] Please delete all the `<BOOK NAME>/OEBPS/*.xhtml` files and restart the program.
[!] Aborting...

info_9781786468949.log

Download Not Successful for a special book

[-] Release Date: 2018-06-27
[-] URL: https://www.safaribooksonline.com/library/view/mastering-machine-learning/9781788997409/
[*] Retrieving book chapters...
[#] HTTPSConnectionPool(host='www.safaribooksonline.com', port=443): Max retries exceeded with url: /api/v1/book/9781788997409/chapter/?page=19 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0xb5d689ac>: Failed to establish a new connection: [Errno -2] Name or service not known',))
[#] API: unable to retrieve book chapters.
[+] Please delete all the <BOOK NAME>/OEBPS/*.xhtml files and restart the program.
[!] Aborting...

Book ID: 9781788997409

Out of session error

Hi,

The script does not seem to work anymore, probably some API things have changed?

The error I get from the script:

[-] Logging into Safari Books Online...
[*] Retrieving book info...
[#] API: Out-of-Session (Authentication credentials were not provided.).

Last Log entry (Some info made anonyous):

'
[02/Aug/2018 14:15:08] Last request done:
URL: https://www.safaribooksonline.com/api/v1/book/9780071744324/
DATA: None
OTHERS: {}

    401
    Server: nginx/1.10.3 (Ubuntu)
    Content-Type: application/json
    x-xss-protection: 1; mode=block
    Content-Language: en-US
    strict-transport-security: max-age=3600; includeSubDomains
    Allow: GET, HEAD, OPTIONS
    WWW-Authenticate: Bearer realm="api"
    ETag: "d5972c9c85159c7a7494e2554c26def1"
    x-content-type-options: nosniff
    X-Frame-Options: SAMEORIGIN
    Set-Cookie: api_key=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/, logged_in=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/, sessionid=xxxx Domain=.www.safaribooksonline.com; expires=Thu, 16-Aug-2018 12:15:10 GMT; Max-Age=1209600; Path=/; secure
    Accept-Ranges: bytes, bytes
    Content-Length: 58
    Date: Thu, 02 Aug 2018 12:15:08 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Client-IP: XXXX
    X-Served-By: cache-ams4138-AMS
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1533212108.238304,VS0,VE169
    Vary: Accept, Accept-Language, Authorization, Cookie

{"detail":"Authentication credentials were not provided."}
'

Started the script with --cred "credentials" (so credentials were provided)

can't give the password

hi, i have an account in safari online books which was given by my employer so if i give my company credentials its not working. so how can i use my office credentials to download it?

Encoding Error

Unhandled Exception: 'charmap' codec can't encode character '\u03c0'
Unhandled Exception: 'charmap' codec can't encode character '\u2192'

Book ID : 0596101538

Generate to pdf

Hello Lorenzo,
First of all thank you for this great app!

Is it possible to add feature of loading books directly to pdf format?
It would be very helpful.

UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' in position 50: ordinal not in range(128)

I am getting error during downloading book.
Python 3.5.3

[-] Logging into Safari Books Online...
[*] Retrieving book info...
[-] Title: Prometheus: Up & Running
[-] Authors: Brian Brazil
[-] Identifier: 9781492034131
[-] ISBN: 9781492034148
[-] Publishers: O'Reilly Media, Inc.
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' in position 50: ordinal not in range(128)
Call stack:
  File "safaribooks.py", line 1024, in <module>
    SafariBooks(args_parsed)
  File "safaribooks.py", line 292, in __init__
    self.display.book_info(self.book_info)
  File "safaribooks.py", line 135, in book_info
    self.info("{0}{1}{2}: {3}".format(self.SH_YELLOW, t[0], self.SH_DEFAULT, t[1]), True)
  File "safaribooks.py", line 67, in info
    self.log(message)
  File "safaribooks.py", line 61, in log
    self.logger.info(str(message))
Message: "\x1b[33mRights\x1b[0m: Copyright \xa9 O'Reilly Media, Inc."
Arguments: ()
[#] Unhandled Exception: 'ascii' codec can't encode character '\xa9' in position 239: ordinal not in range(128) (type: UnicodeEncodeError)
[+] Please delete all the `<BOOK NAME>/OEBPS/*.xhtml` files and restart the program.
[!] Aborting...
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6722-6723: ordinal not in range(128)
Call stack:
  File "safaribooks.py", line 95, in unhandled_exception
    self.exit("Unhandled Exception: %s (type: %s)" % (o, o.__class__.__name__))
  File "safaribooks.py", line 90, in exit
    self.save_last_request()
  File "safaribooks.py", line 100, in save_last_request
    .format(*self.last_request))
  File "safaribooks.py", line 61, in log
    self.logger.info(str(message))
Message: 'Last request done:\n\tURL: https://www.safaribooksonline.com/api/v1/book/9781492034131/\n\tDATA: None\n\tOTHERS: {}\n\n\t200\n\tServer: nginx/1.10.3 (Ubuntu)\n\tContent-Type: application/json\n\tAllow: GET, HEAD, OPTIONS\n\tx-xss-protection: 1; mode=block\n\tContent-Language: en-US\n\tx-content-type-options: nosniff\n\tX-Frame-Options: SAMEORIGIN\n\tstrict-transport-security: max-age=3600; includeSubDomains, max-age=31536000; includeSubdomains\n\tContent-Encoding: gzip\n\tCache-Control: s-maxage=31536000\n\tContent-Length: 1711\n\tAccept-Ranges: bytes\n\tDate: Thu, 24 May 2018 21:25:45 GMT\n\tVia: 1.1 varnish\n\tConnection: keep-alive\n\tX-Client-IP: 62.245.87.212\n\tX-Served-By: cache-hhn1541-HHN\n\tX-Cache: MISS\n\tX-Cache-Hits: 0\n\tX-Timer: S1527197145.328775,VS0,VE551\n\tVary: Accept-Encoding\n\n{"url":"https://www.safaribooksonline.com/api/v1/book/9781492034131/","natural_key":["9781492034131"],"authors":[{"name":"Brian Brazil"}],"subjects":[],"topics":[{"score":-0.692030336285675,"name":"Information Technology / Operations","slug":"information-technology-operations","uuid":"c363ab45-daa6-442e-b572-1a4467aceeb0","epub_identifier":"9781492034131"}],"publishers":[{"name":"O\'Reilly Media, Inc.","id":1,"slug":"oreilly-media-inc"}],"chapters":["https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/cover.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/toc01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/titlepage01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/copyright-page01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/preface01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch02.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part02.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch03.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch04.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch05.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch06.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part03.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch07.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch08.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch09.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch10.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch11.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch12.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part04.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch13.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch14.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch15.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch16.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch17.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part05.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch18.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch19.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/part06.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch20.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ix01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/colophon01.html","https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/colophon02.html"],"cover":"https://www.safaribooksonline.com/library/cover/9781492034131/","chapter_list":"https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/","toc":"https://www.safaribooksonline.com/api/v1/book/9781492034131/toc/","flat_toc":"https://www.safaribooksonline.com/api/v1/book/9781492034131/flat-toc/","web_url":"https://www.safaribooksonline.com/library/view/prometheus-up/9781492034131/","last_chapter_read":{"url":"https://www.safaribooksonline.com/api/v1/book/9781492034131/chapter/ch09.html","title":"9. Containers and Kubernetes","web_url":"https://www.safaribooksonline.com/library/view/prometheus-up/9781492034131/ch09.html"},"academic_excluded":false,"opf_unique_identifier_type":"pub-identifier","has_mathml":false,"created_time":"2018-05-01T16:47:13.772384Z","last_modified_time":"2018-05-01T16:53:31.924780Z","identifier":"9781492034131","name":"book.epub","title":"Prometheus: Up & Running","format":"book","content_format":"book","source":"application/epub+zip","orderable_title":"Prometheus: Up & Running","has_stylesheets":true,"description":"<span><div><p>Get up to speed with Prometheus, the metrics-based monitoring system used by thousands of organizations in production. This practical guide provides application developers, sysadmins, and DevOps practitioners with a hands-on introduction to the important aspects of Prometheus, including infrastructure and application monitoring, dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.</p><p>This open source system has gained popularity over the past few years for good reason. With its simple yet powerful data model and query language, Prometheus does one thing and it does it well. Author and Prometheus core developer Brian Brazil guides you through Prometheus setup, the Node Exporter, and the Alertmanager, then guides you through its use in application and infrastructure monitoring.</p><ul><li>Know where and how much to apply instrumentation to your application code</li><li>Expose metrics through client libraries to make them available to Prometheus</li><li>Identify metrics with labels: unique key-value pairs associated with a time series</li><li>Get an introduction to Grafana, a popular tool for building dashboards</li><li>Learn how to use the node exporter to monitor your infrastructure</li><li>Use service discovery to provide different views of your machines and services</li><li>Use Prometheus with Kubernetes, and examine exporters you can use with containers</li><li>Convert data from other monitoring systems into the Prometheus format</li></ul></div></span>","isbn":"9781492034148","issued":"2018-07-15","language":"en","rights":"Copyright \xc2\xa9 O\'Reilly Media, Inc.","updated":"2018-05-01T16:40:44.367564Z","orderable_author":"Brazil, Brian","purchase_link":null,"publisher_resource_links":{"Errata Page":"http://oreilly.com/catalog/0636920147343/errata"},"is_free":false,"is_system_book":true,"is_active":true,"is_hidden":false,"virtual_pages":458,"duration_seconds":null,"pagecount":384}\n'
Arguments: ()

info_9781492033905.log  info_9781492034131.log
lukas@PRGN00008351A:~/git/safaribooks$ cat info_9781492034131.log
[24/May/2018 23:22:14] ** Welcome to SafariBooks! **
[24/May/2018 23:22:14] Logging into Safari Books Online...
[24/May/2018 23:22:20] Retrieving book info...
[24/May/2018 23:22:21] Title: Prometheus: Up & Running
[24/May/2018 23:22:21] Authors: Brian Brazil
[24/May/2018 23:22:21] Identifier: 9781492034131
[24/May/2018 23:22:21] ISBN: 9781492034148
[24/May/2018 23:22:21] Publishers: O'Reilly Media, Inc.
[24/May/2018 23:22:21]   File "safaribooks.py", line 1024, in <module>
    SafariBooks(args_parsed)
  File "safaribooks.py", line 292, in __init__
    self.display.book_info(self.book_info)
  File "safaribooks.py", line 135, in book_info
    self.info("{0}{1}{2}: {3}".format(self.SH_YELLOW, t[0], self.SH_DEFAULT, t[1]), True)
  File "safaribooks.py", line 70, in info
    self.out(output)
  File "safaribooks.py", line 64, in out
    sys.stdout.write("\r" + " " * self.columns + "\r" + put + "\n")

[24/May/2018 23:22:21] Unhandled Exception: 'ascii' codec can't encode character '\xa9' in position 122: ordinal not in range(128) (type: UnicodeEncodeError)
[24/May/2018 23:25:43] ** Welcome to SafariBooks! **
[24/May/2018 23:25:43] Logging into Safari Books Online...
[24/May/2018 23:25:45] Retrieving book info...
[24/May/2018 23:25:46] Title: Prometheus: Up & Running
[24/May/2018 23:25:46] Authors: Brian Brazil
[24/May/2018 23:25:46] Identifier: 9781492034131
[24/May/2018 23:25:46] ISBN: 9781492034148
[24/May/2018 23:25:46] Publishers: O'Reilly Media, Inc.
[24/May/2018 23:25:46]   File "safaribooks.py", line 1024, in <module>
    SafariBooks(args_parsed)
  File "safaribooks.py", line 292, in __init__
    self.display.book_info(self.book_info)
  File "safaribooks.py", line 135, in book_info
    self.info("{0}{1}{2}: {3}".format(self.SH_YELLOW, t[0], self.SH_DEFAULT, t[1]), True)
  File "safaribooks.py", line 70, in info
    self.out(output)
  File "safaribooks.py", line 64, in out
    sys.stdout.write("\r" + " " * self.columns + "\r" + put + "\n")

[24/May/2018 23:25:46] Unhandled Exception: 'ascii' codec can't encode character '\xa9' in position 239: ordinal not in range(128) (type: UnicodeEncodeError)

not working

hi lorenzo
does it work?

[hdu@bd sb]$ cat info_%1.log
[28/Apr/2018 11:34:45] ** Welcome to SafariBooks! **
[28/Apr/2018 11:34:45] Logging into Safari Books Online...
[28/Apr/2018 11:34:48] Retrieving book info...
[28/Apr/2018 11:34:48]   File "safaribooks.py", line 1005, in <module>
    SafariBooks(args_parsed)
  File "safaribooks.py", line 291, in __init__
    self.book_info = self.get_book_info()
  File "safaribooks.py", line 464, in get_book_info
    response = response.json()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 892, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

[28/Apr/2018 11:34:48] Unhandled Exception: Expecting value: line 1 column 1 (char 0) (type: JSONDecodeError)
[28/Apr/2018 11:34:48] Last request done:
        URL: https://www.safaribooksonline.com/api/v1/book/%1/
        DATA: None
        OTHERS: {}

        400
        Server: nginx/1.10.3 (Ubuntu)
        Content-Type: text/html
        Content-Length: 182
        Accept-Ranges: bytes
        Date: Sat, 28 Apr 2018 15:34:48 GMT
        Via: 1.1 varnish
        Connection: keep-alive
        X-Client-IP: 78.9.182.146
        X-Served-By: cache-ams4137-AMS
        X-Cache: MISS
        X-Cache-Hits: 0
        X-Timer: S1524929688.191481,VS0,VE163

<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/1.10.3 (Ubuntu)</center>
</body>
</html>

rendering is mess

titles over some texts, snippets of codes not rendered, completely mess

9781491941294

book not formatted correctly

9781457166730
It's true, it contain comics. But it would be nice if it could be downloaded

Fail to login when password has special chars

First, thank you for this, it's going to be extremely valuable to me (reading on an e-reader is so much more comfortable than on a tablet or a phone)

Now, my problem is my passwords are generated, and, as such, contain special chars.

Right now, two of them are giving me problems: : and $

As a workaround, I escaped $ with \, but I had to modify credentials parsing to change the configured separator into a character which does not appear in my password.

May I suggest adding a --sep option, defaulting to :, so that user can choose whatever char suits best ?

As for $, I think it's more of a shell related problem, and I can't see for now a solution to this (other than escaping it, that is). Do you have any idea how we could overcome this ?

I am not fluent in python, but I can probably submit a PR making this separator option change, what do you think ?

Problem with CSS file

One of the CSS file (Style00.css) does not get downloaded correctly and gets saved as binary file (probably just corrupted). All the other ones are downloaded correctly. This creates problem when converting later with Calibre. Here is na abstract of my log.

The Style00.css correspond to the font file downloaded from google API (first one). If you type the URL in a browser, the file gets downloaded correctly (in text format).

...
[10/Jul/2018 12:22:52] Crawler: found a new CSS at https://fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400,600,700,900,200italic,300italic,400italic,600italic,700italic,900italic
[10/Jul/2018 12:22:52] Crawler: found a new CSS at https://www.safaribooksonline.com/static/CACHE/css/da86975166c9.css
[10/Jul/2018 12:22:52] Crawler: found a new CSS at https://www.safaribooksonline.com/static/css/annotator.ef38b0457d7b.css
[10/Jul/2018 12:22:52] Crawler: found a new CSS at https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css
...

Could you create Mobipocket eBook (.mobi) file?

Thank you for creating epub file!
But it would be wonderful if you could create a .mobi file, too. So I can read it on my Kindle.
Thank so much!

UnicodeEncodeError on logging events

Always throws error when downloading a new book:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' in position 50: ordinal not in range(128)
Call stack:
  File "safaribooks.py", line 1023, in <module>
    SafariBooks(args_parsed)
  File "safaribooks.py", line 292, in __init__
    self.display.book_info(self.book_info)
  File "safaribooks.py", line 135, in book_info
    self.info("{0}{1}{2}: {3}".format(self.SH_YELLOW, t[0], self.SH_DEFAULT, t[1]), True)
  File "safaribooks.py", line 67, in info
    self.log(message)
  File "safaribooks.py", line 61, in log
    self.logger.info(str(message))
Message: '\x1b[33mRights\x1b[0m: Copyright \xa9 2018 Packt Publishing'

Script is always downloading images

I don't know why but for all downloads, it's downloading images.

      / __/__ _/ _/__ _____(_)
     _\ \/ _ `/ _/ _ `/ __/ /
    /___/\_,_/_/ \_,_/_/ /_/
      / _ )___  ___  / /__ ___
     / _  / _ \/ _ \/  '_/(_-<
    /____/\___/\___/_/\_\/___/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[-] Logging into Safari Books Online...
[*] Retrieving book info...
[-] Title: Computational Thinking - A beginner's guide to problem-solving and programming
[-] Authors: Karl Beecher
[-] Identifier: 9781780173641
[-] ISBN: 9781780173641
[-] Publishers: BCS Learning & Development Limited
[-] Rights: Copyright ©BCS Learning & Development Limited
[-] Description: Computational thinking (CT) is a timeless, transferable skill that enables you to think more clearly and logically, as well as a way to solve specific problems. With this book you'll learn to apply computational thinking in the context of software development to give you a head start on the road to becoming an experienced and effective programmer.  Beginning with the core ideas of computational thinking, with this book you'll build up an understanding of the practical problem-solving approach an...
[-] Release Date: 2017-08-31
[-] URL: https://www.safaribooksonline.com/library/view/computational-thinking-/9781780173641/
[*] Retrieving book chapters...
[*] Output directory:
    /Users/rigo.sarmiento/safaribooks/Books/Computational Thinking - A beginner_s guide to problem-solving and programming (9781780173641)
[-] Downloading book contents... (34 chapters)
    [###############################################################################################################################################################################################################] 100%
[-] Downloading book CSSs... (4 files)
    [###############################################################################################################################################################################################################] 100%
[-] Downloading book images... (343 files)
    [######################################################################-----------------------------------------------------------------------------------------------------------------------------------------]  34%

Here's my command:

python3 safaribooks.py --no-kindle --cred "account:password" <book>

unable to download

Traceback (most recent call last):
File "safaribooks.py", line 1023, in
SafariBooks(args_parsed)
File "safaribooks.py", line 269, in init
self.display = Display("info_%s.log" % escape(args.bookid))
File "safaribooks.py", line 38, in init
logs_handler = logging.FileHandler(filename=self.log_file)
File "C:\Users\Arshad\AppData\Local\Programs\Python\Python37-32\lib\logging_init_.py", line 1092, in init
StreamHandler.init(self, self.open())
File "C:\Users\Arshad\AppData\Local\Programs\Python\Python37-32\lib\logging_init.py", line 1121, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
OSError: [Errno 22] Invalid argument: 'C:\Users\Arshad\Downloads\Compressed\safaribooks\info_https:\www.safaribooksonline.com\library\view\gamification-with-moodle\9781782173076.log'

Missing image at title page

I find that an image, tittlepage_footer_ebook.png on title page are missing. It seems that original url is https://www.safaribooksonline.com/library/view/XXXXXXXXXX/YYYYYYYYYY/css_assets/titlepage_footer_ebook.png.

XXXXXXXXXX is book tittle and YYYYYYYYYY is ID.

Since I am beginner on python and web technology, I have no idea on how to fix it.

Following is image missing from title page.

API: Out-of-Session (Authentication credentials were not provided.).

[13/Dec/2018 21:32:13] ** Welcome to SafariBooks! **
[13/Dec/2018 21:32:13] b'Logging into Safari Books Online...'
[13/Dec/2018 21:32:16] b'Retrieving book info...'
[13/Dec/2018 21:32:17] b'API: Out-of-Session test (Authentication credentials were not provided.).\n'
[13/Dec/2018 21:32:17] b'Last request done:\n\tURL: https://www.safaribooksonline.com/api/v1/book/9781260132359/\n\tDATA: None\n\tOTHERS: {}\n\n\t401\n\tServer: nginx/1.10.3 (Ubuntu)\n\tContent-Type: application/json\n\tx-xss-protection: 1; mode=block\n\tAllow: GET, HEAD, OPTIONS\n\tETag: "d5972c9c85159c7a7494e2554c26def1"\n\tContent-Language: en-US\n\tX-Frame-Options: SAMEORIGIN\n\tstrict-transport-security: max-age=3600; includeSubDomains\n\tWWW-Authenticate: Bearer realm="api"\n\tx-content-type-options: nosniff\n\tSet-Cookie: logged_in=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/, sessionid=81rh7rek0s66jzqdoh2de0zmloafslo0; expires=Thu, 27-Dec-2018 13:32:16 GMT; Max-Age=1209600; Path=/; secure, api_key=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/\n\tAccept-Ranges: bytes, bytes\n\tContent-Length: 58\n\tDate: Thu, 13 Dec 2018 13:32:16 GMT\n\tVia: 1.1 varnish\n\tConnection: keep-alive\n\tX-Client-IP: 123.112.251.2\n\tX-Served-By: cache-tyo19925-TYO\n\tX-Cache: MISS\n\tX-Cache-Hits: 0\n\tX-Timer: S1544707936.393940,VS0,VE482\n\tVary: Accept, Accept-Language, Authorization, Cookie, Origin\n\n{"detail":"Authentication credentials were not provided."}\n'

lorenzodifuccia / safaribooks Goto Github PK

safaribooks's Introduction

SafariBooks

⚠ Attention needed ⚠

✨ ADV ✨