bioinformatics_trends's People
bioinformatics_trends's Issues
ML Implement (Classification)
Implement Random Forest and XGB classification on the dataset and try different ways for dividing the classes for each.
Document topic probabilities from LDA models
Publications venues and their citations
Data analysis and graphical representation of Authors as they relate to Citations
Explore data on authors and how they might relate to citations. Produce at least 1 graph to illustrate your findings.
Research Machine Learning Algorithms
Deciding which ML algorithms are best suited for our project
Generate Visual representations for work done on Author Keywords
Generate Probability Distribution Functions and Box plots to better understand the field of Author Keywords
Generate a graphical design for data
This issue was started and completed.
Create TFIDF Model and obtain topics for each paper
Research NLP
Research NLP and how it relates to our project and goal. Produce specifics for the next team meeting concerning how or why we should use NLP in the context of our project.
Machine learning (Grouping based on quartile range)
Stemming
Apply statistical analysis to graphs, and then use that analysis to direct how we achieve our goal
Create relevant graphs for # of authors and # of affiliations
Find techniques to be used for dividing classes in classification problem
Find the best approach to divide the rows into different classes for classification type ML
Merge all member's files across all years
Merge Mouna, Logan and Darpan's files together and Merge Luke and Steve's files separately
Explore random forest algorithm
This was research for whether this particular algorithm is useful or not to our project
NLP processing using NLTK
Completed research to better understand how to use NLTK to prepare our non-numerical data for machine-learning processing.
Extract relevant author data from .json files for future merge into final .csv file for analysis
authors
author id
sequence number
affiliation id
indexed-name (first name, middle initial, last initial)
Analysis on year and citations proportions
NLP: calculate paragraph # of topics, top topic, top topic prob. & then add to master_file
Presntation
Script preparation for the presentation
Regression
Research for GINSEM part II Create the stoplist
Create graphical analysis representing subject's relation to number of citations
- Generate two bar graphs relating average number citations per publication to publication subject
- Generate box plots for the same graphical analysis.
- Compare probability distributions of different subjects' avg citations per publication.
- Compare two groups:
- subjects w/ most overall number publications
- subjects w/ most overall number citations
https://github.com/UNCG-CSE/Bioinformatics_Trends/blob/master/Data%20Exploration/Logan/Logan_p3.ipynb
https://github.com/UNCG-CSE/Bioinformatics_Trends/blob/master/Data%20Exploration/Logan/Logan_p3_boxplot.ipynb
NLP: calculate for 'Author Keywords': # of topics, top topic, top topic prob. & then add to master_file
Author Keyword analysis
Work on author keyword analysis and how the author keywords corresponds to number of citations
Probability Distribution for Average Number Citations
I created a probability distribution for the average number of citations per paper (grouped by subject area)
Correlation Graph
Create a correlation graph between length of title of a paper and number of citations
Extract publication details from JSON files
Avg Citations Per Paper (grouped by subject)
I summed the total citations for each subject area. Then divided total citations by number of papers to get the average citations per paper. Then created bar plot comparing each subject area.
Research ML Algorithms relevant to our Project
Find out which ML Algorithms would work best for the available dataset.
Correlation (Authors and Citations)
Created Graph/Map
Map is a representation of the number of publications of a respective country. In this case, I used the top ten.
GROUP - Create/Format Final Presentation
Represent Citation Data
Produce at least one graphical representation of the citation data in whatever forms you judge to be useful.
Features from JSON Extraction
The is a generic issue created to show an example.
LDA - code to create keywords classifying as such.
LDA - DFIDF
NLP: calculate for 'Title': # of topics, top topic, top topic prob. & then add to master_file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.