Giter VIP home page Giter VIP logo

grant_database's Introduction

Hi there ๐Ÿ‘‹

  • ๐Ÿ˜Ž My name is Titipat Achakulvisut
  • ๐Ÿ”ฌ Lecturer (tenure track) at Mahidol University, Thailand. Runs Biomedical and Data lab
  • ๐Ÿ”ญ Formerly a PhD student at Konrad Kording lab at University of Pennsylvania
  • ๐ŸŒ“ Research interests: Applied Natural Language Processing, Applied Machine Learning, Metascience
  • ๐Ÿ’ฌ Open source contributor, blogger at tupleblog.github.io, Stack Overflow contributor
  • ๐Ÿ‡น๐Ÿ‡ญ Bangkok / ๐ŸŒฆ Previous cities: Philadelphia, Seattle, Chicago

Languages and Tools:

grant_database's People

Contributors

bluenex avatar okjed avatar titipata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

grant_database's Issues

Create bash script to download and process all NSF grant

I want to create a bash script that download all grant zip files > unzip them > parse them (will add script for doing that later on). Right now, what I have is as following (see this post on Stack Overflow):

python download_award_links.py # load to nsf
while read p; do
  echo $p
done <nsf_awards.txt

# unzip all files and parse later

Would be great if we can have full workflow for this.

Link unique NIH PIs with MEDLINE data

We will use pubmed_parser in order to grab all PIs name from MEDLINE. This will be in format: KP Kording. Right now, in NIH, we have first_name, last_name columns. We will transform author string into same format as in MEDLINE. String matching would be nice for the first attempt. We will check affiliation later to make sure they are same person.

File from MEDLINE is located in S3, downloading by using,

aws s3 sync s3://science-of-science-bucket/medline/pmid_author_affil.csv/ pmid_author_affil/

Fetch new href links

We will modify bash or python later on if data is already existed, we will find new links > copy new zip file and parse that instead.

Scraping data from NIH page doesn't work

I use the script to scrape this page. However, it doesn't work right now.

import lxml
from lxml import html
import requests
url = 'http://exporter.nih.gov/ExPORTER_Catalog.aspx?sid=0&index=1'
page = requests.get(url)
tree = html.fromstring(page.content)
awards_links = tree.xpath('//tbody//tr[@class="row_bg"]//td//div//a/@href')

Dedupe affliation script doesn't work for me

@daniel-acuna, can you check dedupe_affiliation.py in deduplication part. I got error when running the script as follows:

Traceback (most recent call last):
  File "dedupe_affiliation.py", line 94, in <module>
    deduper.train(ppc=None, recall=0.95)
  File "/Users/titipat/anaconda3/lib/python3.5/site-packages/dedupe/api.py", line 659, in train
    index_predicates)
  File "/Users/titipat/anaconda3/lib/python3.5/site-packages/dedupe/api.py", line 679, in _trainBlocker
    recall)
  File "/Users/titipat/anaconda3/lib/python3.5/site-packages/dedupe/training.py", line 102, in learn
    raise ValueError(NO_PREDICATES_ERROR)
ValueError: No predicate found! We could not learn a single good predicate. Maybe give Dedupe more training data or increasing the `max_comparisons` argument to the train method

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.