Giter VIP home page Giter VIP logo

hntrends's Introduction

Hacker News Front Page Trends

A Ruby on Rails app that stores Hacker News items that have appeared on the front page, and exposes a few JSON API endpoints that let users search for terms, domains, and users to see how popular they have been on the HN front page over time.

Click here for a live dashboard that uses this API

Screenshot

screenshot

Caveat

HN only provides the exact list of front page items for dates since 11/11/2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:

  • it excludes job posts before 11/11/2014 since they always have 1 point
  • items with high scores don’t always get to the front page
  • it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover

But it should be a decent approximation, and the code could also be modified to use other heuristics. It would also probably be an improvement to fetch all job posts from pre 11/11/14 via the HN API.

Structure

There are 3 files of interest:

  1. app/lib/hn_client.rb - code to collect front page data via the HN website and API
  2. app/models/hn_item.rb - code that uses the HnClient to store the appropriate records in PostgreSQL database
  3. app/lib/hn_trends_calculator.rb - code to calculate trends over time and top items for given search terms. The trends endpoint returns 4 metrics for each term/date:
    1. Fraction of all front page items
    2. Number of all front page items
    3. Fraction of total front page score, i.e. the total score of items matching the search term divided by the total score of all front page items
    4. Front page score

The trends calculator supports searching titles, domains (with or without subdomains), and usernames. When searching by title, there are 3 different search styles:

  1. Web search uses PostgreSQL full text search, in particular the websearch_to_tsquery() function and GIN indexes. By default the tsv column uses the simple text search configuration
  2. Case-insensitive exact title match uses the ~* PostgreSQL regular expression operator, combined with a trigram index
  3. Case-sensitive exact title match is the same as #2, but uses the ~ regex operator instead of ~*

Requirements

Requires PostgreSQL 11+, since websearch_to_tsquery() was added in version 11

hntrends's People

Contributors

toddwschneider avatar

Watchers

James Cloos avatar Saif Ahmed avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.