Giter VIP home page Giter VIP logo

cs224w_dev's Introduction

Create networks of subreddits.

Before using:

Data:

Outputs:

  • python 00_parse_subreddits.py -> graph with approx. 1M nodes representing subreddits. No edges. Node attributes:
    • created_utc (int): UTC timestamp of creation date
    • subscribers (int): # of subscribers to subreddit
    • description (str): Plaintext of subreddit description
    • id (str): t5_ + id is equal to the name field.
    • lang (str): Subreddit language. Most subreddits are in English (en).
    • name (str): Subreddit's identifier in the PushShift dataset.
    • public_description (str): Plaintext of brief subreddit description.
    • submit_text (str): Text shown to users about to make a submission.
    • title (str): HTML title tag.
    • url (str): Subreddit name, e.g. /r/politics
    • desc_subreddits (str): Space-separated list of subreddits mentioned in this subreddit's description field.
  • 01_Submissions and Comments to Tables.ipynb will use the output of parse_subreddit.py and the PushShift comment/submission data to write all relevant submission and comment data into tab-separated .txt files. Submission and comment files contain: author; subreddit/post/comment/ids; upvotes, downvotes, scores, and gold; creation timestamps; and associated text for NLP.
  • 02_Tables to Post-Comment TNEANets.ipynb will use the .txt files to create one TNEANet per subreddit containing nodes for all of the comments and posts captured from that subreddit. Node attributes:
    • score (int): Comment or post score
    • gilded (int): Number of times post or comment received Reddit Gold
    • created_utc (int): Post or comment creation timestamp; seconds after Jan 01 1970 00:00 UTC
    • author (str): Reddit username of post or comment author, in lowercase
    • text (str): Title of post (plaintext) or text of comment (markdown)
    • id (str): Reddit ID of the post (starts with t3_) or comment (starts with t1_)
  • 03_Tables to User-User Graphs.ipynb will use the .txt files to create two TNEANets and two TNGraphs representing comments between users on Reddit. The TNEANets contain one directed edge from comment author to comment recipient (parent commenter or parent poster) for each comment captured in the .txt table. The TNGraphs disallow multi-edges, so if user A has made multiple comments in response to user B, there will only be one A->B edge. The _nodelete TNEANet does not create a node for the [deleted] placeholder user, nor does it add edges for any comments whose author or recipient is [deleted] or whose text is [deleted] or [removed]. The _nodelete TNGraph does not create a node for the [deleted] placeholder user, but it does add edges for comments whose text is [deleted] or [removed].
    • TNEANet node attributes:
      • username (str): The user the node represents
    • TNEANet edge attributes:
      • score (int)
      • gilded (int)
      • created_utc (int)
      • comment_id (str)
      • subreddit (str)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.