uncg-cse / bioinformatics_trends Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 78.78 MB

Jupyter Notebook 100.00%

bioinformatics_trends's People

Contributors

Watchers

Forkers

lukeusername mounakalidindi

bioinformatics_trends's Issues

ML Implement (Classification)

Implement Random Forest and XGB classification on the dataset and try different ways for dividing the classes for each.

Document topic probabilities from LDA models

Publications venues and their citations

Data analysis and graphical representation of Authors as they relate to Citations

Explore data on authors and how they might relate to citations. Produce at least 1 graph to illustrate your findings.

Research Machine Learning Algorithms

Deciding which ML algorithms are best suited for our project

Generate Visual representations for work done on Author Keywords

Generate Probability Distribution Functions and Box plots to better understand the field of Author Keywords

Generate a graphical design for data

This issue was started and completed.

Create TFIDF Model and obtain topics for each paper

Research NLP

Research NLP and how it relates to our project and goal. Produce specifics for the next team meeting concerning how or why we should use NLP in the context of our project.

Machine learning (Grouping based on quartile range)

Stemming

Apply statistical analysis to graphs, and then use that analysis to direct how we achieve our goal

Create relevant graphs for # of authors and # of affiliations

Find techniques to be used for dividing classes in classification problem

Find the best approach to divide the rows into different classes for classification type ML

Merge all member's files across all years

Merge Mouna, Logan and Darpan's files together and Merge Luke and Steve's files separately

Explore random forest algorithm

This was research for whether this particular algorithm is useful or not to our project

NLP processing using NLTK

Completed research to better understand how to use NLTK to prepare our non-numerical data for machine-learning processing.

Extract relevant author data from .json files for future merge into final .csv file for analysis

authors
author id
sequence number
affiliation id
indexed-name (first name, middle initial, last initial)

Analysis on year and citations proportions

NLP: calculate paragraph # of topics, top topic, top topic prob. & then add to master_file

Presntation

Script preparation for the presentation

Regression

Research for GINSEM part II Create the stoplist

Create graphical analysis representing subject's relation to number of citations

Generate two bar graphs relating average number citations per publication to publication subject
Generate box plots for the same graphical analysis.
Compare probability distributions of different subjects' avg citations per publication.
Compare two groups:

subjects w/ most overall number publications
subjects w/ most overall number citations

https://github.com/UNCG-CSE/Bioinformatics_Trends/blob/master/Data%20Exploration/Logan/Logan_p3.ipynb
https://github.com/UNCG-CSE/Bioinformatics_Trends/blob/master/Data%20Exploration/Logan/Logan_p3_boxplot.ipynb

uncg-cse / bioinformatics_trends Goto Github PK

bioinformatics_trends's People

Contributors

Watchers

Forkers

bioinformatics_trends's Issues

Recommend Projects

Recommend Topics

Recommend Org