Giter VIP home page Giter VIP logo

brainfeed's Introduction

brainfeed

A tool for domain experts to find recent and relevant public discourse on topics they are familiar with

Project overview

Motivation

The increasing accessibility of scientific articles and surrounding public discourse is generally beneficial to society. A tradeoff to this increased public consumption of knowledge (in formats traditionally meant for domain experts) is the rise of misinformation. Articles in scientific journals often describe specific facts and precise outcomes under specific conditions, and their validity and generalizability are usually only understood by a few experts. On the other hand, social media such as Reddit and Twitter allow anyone (often anonymously) to post articles and comment about their contents, and it is in these forums that misunderstandings and wrong information are conveyed and spread. The wide audience of these forums, coupled with increased public interest on scientific topics (for example, in relation to the Covid-19 pandemic), has made it imperative that experts be able to find and engage such posts.

This project, pitched for Brainhack Toronto 2021, seeks to create a live feed of active and relevant public discussions on widely used social media forums. While the initial focus of this project is to detect discussions that revolve around brain imaging, the tools to be developed here should in principle be useful for other scientific fields.

Implementation

A tentative implementation of the project is as follows:

Project communications

For Brainhack Toronto 2021, we'll communicate through a Brainhack Toronto discord channel.

Project goals

For Brainhack 2021:

  • Post and comment detection on Reddit via Reddit API
  • Abstract and keyword detection based via Crossref API
  • Classification of posts relevancy based on abstract keywords
  • Post data to central repository (Firebase)
  • Web application to view recent posts

Future ideas:

  • Extend to other forums (Twitter?)
  • Sentiment detection of discussions
  • Mobile applications to display discussion feed
  • Analysis of scientific information spread across social networks

Current progress

  • Web Application (not deployed yet)

    This web application allows users to search reddit posts in all or a specific subreddit, search the most recent posts or posts in a given time windown, or search posts with specific keywords.

    python run_app_reddit_search.py
    
  • Reddit Posts Search & Store

    Store a user's search results into a PostgreSQL database.

    For demonstraction, run the following codes:

    python demo_reddit_search.py
  • Reddit Post Recommender

    For a given reddit post, this recommender recommends the top 5 most similar posts based on the content of the post title

    See the Jupyter Notebook reddit_recommender.ipynb

  • Reddit Post Topic (Flair Tag) Classification

    This classification models predicts the topic (flair tag) of reddit posts based on the contnet of the post title.

    For simplicity and demonstration, the present model performs a binary classification on posts with Biology and Environment flair tags.

    See the Jupyter Notebook reddit_classification.ipynb

Contributing

Contributors of all backgrounds and experiences are welcome.

Requirements

  • Python

    For simplicity and consistency, you can create a conda environment using the following command:

    conda create \
    --name brainfeed \
    python=3.7

    After creating the environment, you can activate it by running conda activate brainfeed.

  • Python packages

    You will need the following packages:

    • habanero (for Crossref)
    • PRAW (for Reddit)
    • firebase-admin (for Firebase / Firestore)
    • spyder (optional, a Python IDE)

    You can install the required packages with this command, after activating the conda environment:

    pip install habanero==1.0.0 praw==7.5.0 firebase-admin==5.1.0
    conda install spyder=5.1.5
  • A Reddit account

    Setup a Reddit account, and create a script app by clicking the "Create app" button here. More details on this can be found at: https://github.com/reddit-archive/reddit/wiki/OAuth2

  • A Firebase project

    Create here: https://console.firebase.google.com/

Setup

  1. Clone this repository

    git clone [email protected]:yohanyee/brainfeed.git

  2. Activate the conda environment

    conda activate brainfeed

  3. Copy the praw.ini_TEMPLATE_DO_NOT_ENTER_INFO_HERE file to your config directory and rename it to praw.ini (see https://praw.readthedocs.io/en/stable/getting_started/configuration/prawini.html). Then, fill in your Reddit authentication information, following Reddit guidelines for the user_agent field. Make sure to not have this publicly visible.

  4. Initialize the Firebase SDK (create a service account and download the private key)

    See https://firebase.google.com/docs/admin/setup/#initialize-sdk

  5. Add an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the location of this private key (which should not be publicly visible)

    export GOOGLE_APPLICATION_CREDENTIALS="/home/user/.config/service-account-file.json"

Helpful links

brainfeed's People

Contributors

ericchchang avatar yohanyee avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.