Giter VIP home page Giter VIP logo

aristotle's Introduction

Aristotle is a highly customizable tool that collects links from sites.

Aristotle

With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database.

Usage

config/properties.yaml

These settings are basically:

  1. database: Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the name property, a database must be created in the DB and its name must be entered in this parameter.
  2. locale: According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered.
  3. request: General features of the request.
  4. parser: In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given
database:
  dialect: mysql+pymysql
  url: localhost
  port: 3306
  name: aristotle
  userName: root
  password: root

locale: en_EN

request:
  timeout: 3
  userAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11

parser:
  titleCharLimit: 100
  descriptionCharLimit: 300

config/sources-{locale}.yaml

article:
  - domain: cnn.com
    active: true
    link: https://edition.cnn.com/
    filterForLink:
      mandatoryWords: ["/politics/"]
      permissibleWords: []
      impermissibleWords: []
    tagForMetadata:
      title:
      description:
      image:
      publishDate:
      publishDateFormat: "%Y-%m-%d"

technology:
  - domain: mashable.com
    active: true
    link: https://mashable.com
    filterForLink:
      mandatoryWords: ["-"]
      permissibleWords: ['/article/']
      impermissibleWords: []
    tagForMetadata:
      title:
      description:
      image:
      publishDate: datetime
      publishDateFormat: "%d.%m.%Y"

Development

If you'd like to contribute the project, feel free to clone a development version of this repository locally:

git clone https://github.com/egcodes/aristotle.git

Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:

$ pip3 install -r requirements.txt
$ python3 setup.py install

Requirements

  • Python 3.x
    • beautifulsoup4>=4.9.1
    • requests>=2.24.0
    • PyYAML>=5.3.1
    • SQLAlchemy>=1.3.18

For database dialect, you must install the special dialect package for the database you use. For example, if you are using MySQL, the PyMySQL package must be installed.

aristotle's People

Contributors

egcodes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aristotle's Issues

https://www.haberbus.com/

Bu siteye ulaşılamıyor www.haberbus.com bağlanmayı reddetti.
Aşağıdakileri deneyin:

Bağlantınızı kontrol etme
Proxy'yi ve güvenlik duvarını kontrol etme
ERR_CONNECTION_REFUSED

Sistem güncelliği ?

Hocam merhaba, öncelikle başarılı bir proje olduğunu söylemek istiyorum. Sorunuma gelirsek eğer, sistem güncel mi onu öğrenmek istiyorum.

Sistem üzerinden tarama yapmak istediğim de her site için benzer bir hata alıyorum.

Hata çıktısı;

teknoloji webrazzi.com

  • Starting [2016-06-23 15:20:35]
    # Source: (1 / 50) http://www.webrazzi.com
    [2016-06-23 15:20:36]
    Main.run: http://www.webrazzi.com:ServerDatabaseHandler instance has no attribute 'cursor'
    Traceback (most recent call last):
    File "mainParseSources.py", line 1164, in run
    newsLinkDict.update(self.getNewsLinkFromSource(present, category, source, newsSourceLink, newsSourceDiffWords, newsSourceBlackWords))
    File "mainParseSources.py", line 365, in getNewsLinkFromSource
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;""" % (self.yearMonth))
    File "C:\Users\whoami\Desktop\haber\ServerDatabaseHandler.py", line 25, in executeQuery
    self.cursor.execute(query)
    AttributeError: ServerDatabaseHandler instance has no attribute 'cursor'

    # Time: 0:00:01
    # Insert To Database
    # ---------------------
    
  • 2016-06-23 15:20:36.941000: Finished [0:00:01.086000]


Time: 0:00:00

    # Source: (50 / 50) http://amkspor.sozcu.com.tr

[2016-06-23 15:16:29]
Main.run: http://amkspor.sozcu.com.tr:ServerDatabaseHandler instance has no attribute 'cursor'
Traceback (most recent call last):
File "mainParseSources.py", line 1164, in run
newsLinkDict.update(self.getNewsLinkFromSource(present, category, source, newsSourceLink, newsSourceDiffWords, newsSourceBlackWords))
File "mainParseSources.py", line 365, in getNewsLinkFromSource
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;""" % (self.yearMonth))
File "C:\Users\whoami\Desktop\haber\ServerDatabaseHandler.py", line 25, in executeQuery
self.cursor.execute(query)
AttributeError: ServerDatabaseHandler instance has no attribute 'cursor'


Time: 0:00:00

    # Source: (51 / 50) http://www.sporx.com/?giris=ok

[2016-06-23 15:16:30]
Main.run: http://www.sporx.com/?giris=ok:ServerDatabaseHandler instance has no attribute 'cursor'
Traceback (most recent call last):
File "mainParseSources.py", line 1164, in run
newsLinkDict.update(self.getNewsLinkFromSource(present, category, source, newsSourceLink, newsSourceDiffWords, newsSourceBlackWords))
File "mainParseSources.py", line 365, in getNewsLinkFromSource
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;""" % (self.yearMonth))
File "C:\Users\whoami\Desktop\haber\ServerDatabaseHandler.py", line 25, in executeQuery
self.cursor.execute(query)
AttributeError: ServerDatabaseHandler instance has no attribute 'cursor'

Hata: Missing parentheses in call to 'print'`

Merhaba hocam öncelikle bu başarılı proje için sizleri tebrik ediyorum. Python'da biraz acemiyim henüz belki ondan belki de başka bir nedenden bilmiyorum ama gerekli kütüphanelerin hepsini yüklemiş ve database için gerekli ayarları yapmışta olsam örneği çalıştırdığımda aldığım hata şu şekilde;

File "mainParseSources.py", line 211 print url ^ SyntaxError: Missing parentheses in call to 'print'

Yardımcı olabilirseniz çok memnun olurum.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.