Giter VIP home page Giter VIP logo

documentdownloader's Introduction

文档下载器

Sync to Gitee Publish to PyPI Publish to TestPyPI Release
version pypi version License

可用于下载book118的PDF文档

思路

  1. 爬虫爬取图片链接
  2. 下载图片
  3. 将图片拼合成pdf文件

参数说明

参数 解释 必备参数
-h--help 显示帮助
-i--id 要下载的文件id(或网页地址)
-o--output 文件保存名,默认是book118.pdf
-p--proxy 设置要使用的代理地址(默认使用环境变量中HTTP_PROXYHTTPS_PROXY设置的值),可以使用-p ''强制设置不走代理
-f--force 强制重新下载,不使用缓存
-t--thread 要使用的线程数

使用模块

使用已上传到 PyPI 的包

python3 -m pip install documentDownloader

安装完成后即可直接使用 documentDownloader 命令

如:documentDownloader -i https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019.pdf' -p http://127.0.0.1:1080 -f -t 20

直接使用源码中的 main.py

克隆该项目,或在releases页面选择版本下载

  1. 安装Python3
  2. 安装依赖模块(Pillow、reportlab、requests) python -m pip install -r requirements.txt
  3. 使用 python3 main.py 执行

如:python main.py -i https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019.pdf' -p http://127.0.0.1:1080 -f -t 20

仅供学习爬虫及相关知识,请支持正版图书
虽然book118上的好多pdf也是盗版吧

贡献列表

更新

  • 2019-01-29: Book118网站更新,更改对应部分代码. @JodeZer
  • 2020-01-09: 重构代码,增加多线程下载加速,允许使用代理,允许通过已有缓存直接建立pdf,自动识别图片大小生成pdf @OhYee
  • 2020-05-25: 发布到 PyPI

documentdownloader's People

Contributors

jodezer avatar ohyee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.