Giter VIP home page Giter VIP logo

layout__part3's Introduction

Parse info and download books from tululu.org

The script allows its user to parse info and download books and its covers from the website.

Setup

  1. Create venv
python3 -m venv venv
  1. Activate venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run the script
python3 main.py

Parameters

The script accepts parameters that set a range of IDs for the script to work through. Running it this way:

python3 main.py

will cause it to run with the default parameters, namely 1 and 10. This means that the books with those IDs will be checked and downloaded.
Similarly, if it is run like this:

python3 main.py --start_id 10 --end_id 20

the script will iterate through the range of IDs from 10 to 20 inclusively.

Parsing the sci-fi category

The repository has another script which allows you to download sci-fi books and their covers. It also has a set of acceptable parameters. The script's main task is to iterate through pages of the category, parsing each book on every page. It retrieves the link to download the corresponding text file and the associated image for each book.

python3 parse_tululu_category.py

Please consider using parameters to customize the script's behavior.

  • --start_page - choose a value from 1 to 701. If no value is provided, the default starting page for iteration will be 1.
  • --end_page - choose a value from 1 to 701. If no value is provided, the default end page for iteration will be 701.
  • --dest_folder /Users/username - allows you to specify the directory where you want your results to be saved. By default, the script will use the current directory for saving the results.
  • --skip_imgs - if specified, the script will skip the image downloading process.
  • --skip_txt - if specified, the script will skip the txt files downloading process.
  • --json_path /Users/username/dev - allows you to independently set the file path for the JSON description output..

Rendering a website

There is a script named render_website.py in the repository. You can create a website out of downloaded sources from previous step with it.
Here are the steps:

  1. Run the script (by the way you can customise its behavior related to the description file it will get data to create web pages from, just add --description_path flag and specify its location or leave it blank so books_description.json will be entered for you), it creates pages at /pages/ and start serving them infinitely so they are available at http://127.0.0.1:5500
  2. Go http://127.0.0.1:5500/pages/index1.html

An example's deployed for you.
The complete product you are welcome to observe is at https://frqhero.github.io/layout__part3/pages/index1.html

Offline usage

It is worth noting that the project can be used offline in two ways. After the pages are created, you can either open the files directly from the 'pages' directory or, if the script is running and serving, access them at http://127.0.0.1:5500/pages/index1.html.

layout__part3's People

Contributors

frqhero avatar

Stargazers

Rick avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.