Giter VIP home page Giter VIP logo

reddit-bot's Introduction

Reddit-Bot

This project currently looks like a building site, I'm still heavily experimenting with different ideas but essentially part 1 is topic clustering/classification then part 2 is detection of an in-joke or reference to some sort of Reddit folk-lore...

===

Finds references made about popular threads

We have references people make in comments to posts either popular recently or possibly a post which is famous/infamous in Reddit history. The first part of this problem is finding the most simple references, i.e. those which can be linked using the title text, after that it will get weird with references to images and references made using images.

To start off I am going to cheat by using references people have pointed out already, perhaps by text or by posting that captain america "I get that reference" gif

The criteria for success of this project is the ability to identify a thread or comment as being a reference to a previous thread, at very least inside a given timeframe such as a week.

The optimistic aim is to be able to classify new posts as being completely original or a reference/repost before it reaches hot/front page or a human user points it out.

01/10/13:

I'm storing a load of comments in the hope of finding out more about the references people make - i.e. which subreddit's these witty people comment on, which sorts of posts references are made on, which sorts of posts references are made about, and what kinds of responses these references get in terms of up/downvotes and replies.

After systematically mining a load of comments I'm going to get these audit scripts to feed me comments which triggered replies in which people have remarked that a reference has been made.

I never knew there were so many ways of storing trees http://docs.mongodb.org/manual/tutorial/model-tree-structures/

I have no idea which is best at the moment, but the array of ancestors one would be useful for reading conversations. Most of the time sibling comments don't interact with each other...that might not matter though.

From comments I am taking...

  • body
  • createdutc
  • ups
  • downs
  • id -- permalink -- score ---children ---submission ---subreddit

From submissions I am taking

  • author
  • created_utc
  • subreddit.title (the subreddit name)
  • ups
  • downs
  • permalink
  • title
  • selftext
  • url -- direct comments --score --- all comments -- I am not sure wheter I want to store less data and synthesis at runtime things such as permalinks to comments (submission permalink + comment id) and overall scores (ups - downs)

--- Things marked next to "---" are things which I am hoping the structure of my objects/db will tell me.

05/10/13

Thought on classification: The title of a post will be the most important way of identifying if a reference has been made to it during classification, but the comments could help build up a profile of what it is really about.

12/10/13

For the audit files, it would be helpful having a comment id or a way of actually finding the comment in the thread on the site

memory is the main bottleneck at the moment

some of these subreddits are not helping

/r/adviceanimals /r/AskReddit /r/aww /r/bestof /r/books /r/earthporn /r/explainlikeimfive /r/funny /r/gaming /r/gifs /r/IAmA /r/movies /r/music /r/news /r/pics /r/science /r/technology /r/television /r/todayilearned /r/videos /r/worldnews /r/wtf

Im going to narrow things down to: /r/adviceanimals /r/AskReddit /r/funny /r/gifs /r/IAmA /r/pics /r/todayilearned /r/videos /r/wtf

Seems like CPU isn't an issue so I'm going to calculate as much as I can at runtime e.g. score = ups - downs

09/11/13

Two types of reference; a submission reference, which is composed of a combination of title + content + selftext, and a comment reference, which is likely to be just text, and therefore probably easier to analyse.

examples:

submission reference: http://www.reddit.com/r/pics/comments/1q4i4e/i_miss_my_phone/,

referring to http://www.reddit.com/r/pics/comments/1q3tfu/i_miss_my_phone/

comment reference: http://www.reddit.com/r/IAmA/comments/1o5ndh/iama_guy_who_went_from_430_pounds_to_170_pounds/ccp40hs

referring to http://www.reddit.com/r/AskReddit/comments/1nzfg3/what_is_the_weirdest_thing_money_can_legally_buy/ccnjg1i

Like with references, there are submission and comment referees. It seems like comment referees are likely to be crazy stories in a sub like ask reddit, usually a top level comment.

23/01/14

Starting with a supervised classifier with a single example: Is this thread a reference to the banana for scale joke or not. A frequency distribution of words from the thread will be the feature vector.

reddit-bot's People

Contributors

alexr1993 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.