brainfeed

A tool for domain experts to find recent and relevant public discourse on topics they are familiar with

Project overview

Motivation

The increasing accessibility of scientific articles and surrounding public discourse is generally beneficial to society. A tradeoff to this increased public consumption of knowledge (in formats traditionally meant for domain experts) is the rise of misinformation. Articles in scientific journals often describe specific facts and precise outcomes under specific conditions, and their validity and generalizability are usually only understood by a few experts. On the other hand, social media such as Reddit and Twitter allow anyone (often anonymously) to post articles and comment about their contents, and it is in these forums that misunderstandings and wrong information are conveyed and spread. The wide audience of these forums, coupled with increased public interest on scientific topics (for example, in relation to the Covid-19 pandemic), has made it imperative that experts be able to find and engage such posts.

This project, pitched for Brainhack Toronto 2021, seeks to create a live feed of active and relevant public discussions on widely used social media forums. While the initial focus of this project is to detect discussions that revolve around brain imaging, the tools to be developed here should in principle be useful for other scientific fields.

Implementation

A tentative implementation of the project is as follows:

Project communications

For Brainhack Toronto 2021, we'll communicate through a Brainhack Toronto discord channel.

Project goals

For Brainhack 2021:

Post and comment detection on Reddit via Reddit API
Abstract and keyword detection based via Crossref API
Classification of posts relevancy based on abstract keywords
Post data to central repository (Firebase)
Web application to view recent posts

Future ideas:

Extend to other forums (Twitter?)
Sentiment detection of discussions
Mobile applications to display discussion feed
Analysis of scientific information spread across social networks

Current progress

Web Application (not deployed yet)

This web application allows users to search reddit posts in all or a specific subreddit, search the most recent posts or posts in a given time windown, or search posts with specific keywords.
```
python run_app_reddit_search.py
```
Reddit Posts Search & Store

Store a user's search results into a PostgreSQL database.

For demonstraction, run the following codes:
```
python demo_reddit_search.py
```
Reddit Post Recommender

For a given reddit post, this recommender recommends the top 5 most similar posts based on the content of the post title

See the Jupyter Notebook reddit_recommender.ipynb
Reddit Post Topic (Flair Tag) Classification

This classification models predicts the topic (flair tag) of reddit posts based on the contnet of the post title.

For simplicity and demonstration, the present model performs a binary classification on posts with Biology and Environment flair tags.

See the Jupyter Notebook reddit_classification.ipynb

Contributing

Contributors of all backgrounds and experiences are welcome.

Requirements

Python

For simplicity and consistency, you can create a conda environment using the following command:
```
conda create \
--name brainfeed \
python=3.7
```
After creating the environment, you can activate it by running conda activate brainfeed.
Python packages

You will need the following packages:
- habanero (for Crossref)
- PRAW (for Reddit)
- firebase-admin (for Firebase / Firestore)
- spyder (optional, a Python IDE)
You can install the required packages with this command, after activating the conda environment:
```
pip install habanero==1.0.0 praw==7.5.0 firebase-admin==5.1.0
conda install spyder=5.1.5
```
A Reddit account

Setup a Reddit account, and create a script app by clicking the "Create app" button here. More details on this can be found at: https://github.com/reddit-archive/reddit/wiki/OAuth2
A Firebase project

Create here: https://console.firebase.google.com/

Setup

Clone this repository

git clone [email protected]:yohanyee/brainfeed.git
Activate the conda environment

conda activate brainfeed
Copy the praw.ini_TEMPLATE_DO_NOT_ENTER_INFO_HERE file to your config directory and rename it to praw.ini (see https://praw.readthedocs.io/en/stable/getting_started/configuration/prawini.html). Then, fill in your Reddit authentication information, following Reddit guidelines for the user_agent field. Make sure to not have this publicly visible.
Initialize the Firebase SDK (create a service account and download the private key)

See https://firebase.google.com/docs/admin/setup/#initialize-sdk
Add an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the location of this private key (which should not be publicly visible)

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/.config/service-account-file.json"

Helpful links

Reddit API: https://www.reddit.com/dev/api
- Python API for Reddit (PRAW): https://praw.readthedocs.io/en/stable/index.html
Twitter API: https://developer.twitter.com/en/docs/twitter-api
Crossref API: https://www.crossref.org/documentation/retrieve-metadata/
- Python API for Crossref (Habanero): https://github.com/sckott/habanero
Altmetric API: https://www.altmetric.com/products/altmetric-api/
Firebase documentation: https://firebase.google.com/docs
- Firebase Admin SDK reference: https://firebase.google.com/docs/reference/admin
- Firebase Admin Python SDK: https://github.com/firebase/firebase-admin-python
- Firestore quickstart guide: https://firebase.google.com/docs/firestore/quickstart#python

ericchchang / brainfeed Goto Github PK

brainfeed's Introduction

brainfeed

Project overview

Motivation

Implementation

Project communications

Project goals

Current progress

Contributing

Requirements

Setup

Helpful links

brainfeed's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent