PH_miner

Installation

Execute the following commands:

$ git clone https://github.com/collab-uniba/PH_miner.git
$ git submodule init
$ git submodule update

Setup

Register two apps using the dashboard, PH_miner and PH_updater.
For the first app, in the root folder, create the file credentials_miner.yml with the following structure:

api:
  key: CLIENT_KEY
  secret: CLIENT_SECRET
  redirect_uri: APP_REDIRECT_URI
  dev_token: DEVELOPER_TOKEN

For the second app, follow the same steps as above to create the file credentials_updater.yml.
Create the folder db/cfg/, then create therein the file dbsetup.yml to setup the connection to the MySQL database:

mysql:
    host: 127.0.0.1
    user: root
    passwd: *******
    db: producthunt
    recycle: 3600

NOTE: If you're using a MySQL database, the default parameter pool_recycle for resetting the database connection is fine, since the wait_timeout is set to 28800 by default. But, if you're using Maria DB, then wait_timeout is set by default to 600 seconds. Edit the my.cnf file and change it to anything larger than the value chosen for pool_recycle.

Install packages via pip:

$ pip install -r requirements.txt

Enable execution via crontab:

$ crontab -e

Add the following lines. Make sure to enter the correct path.

SHELL=bash
# New products are uploaded at 12.01 PST (just past midnight, 9am next morning in CET timezone):
# minute hour day-of-month month day-of-week command
    35     8       *          *       *       /path/.../to/PH_miner/cronjob.sh /var/log/ph_miner.log 2>&1
    05    20       *          *       *       /path/.../to/PH_miner/cronjob.sh --update -c credentials_updater.yml >> /var/log/ph_miner_updates.log 2>&1
    */30   *       *          *       *       /path/.../to/PH_miner/cronjob.sh --newest -c credentials_updater.yml >> /var/log/ph_miner.log 2>&1

Enable the rotation of the log files:

$ sudo ln -s /fullpath/to/../ph_miner.logrotate /etc/logrotate.d/ph_miner

Install Chromium browser and the chromedriver

This step depends on the OS. On Ubuntu boxes, run:

$ sudo apt-get install chromium-browser chromium-chromedriver
$ sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/bin/chromedriver

Resources & Libraries

Product Hunt API
ph_py - ProductHunt.com API wrapper in Python
Scrapy - A scraping and web-crawling framework
Selenium - A suite of tools for automating web browsers
ChromeDriver - Tool to connect to Chromium web browser
Beautiful Soup 4 - HTML parser

License

The project is licensed under the MIT license.

collab-uniba / ph_miner Goto Github PK

ph_miner's Introduction

PH_miner

Installation

Setup

Resources & Libraries

License

ph_miner's People

Contributors

Stargazers

Watchers

Forkers

ph_miner's Issues

Recommend Projects

Recommend Topics

Recommend Org