Giter VIP home page Giter VIP logo

recesearch's Introduction

rECEsearch

A simple Python project to grab Google Scholar data for research at UBC.

made-with-python MIT license

Requirements

  • Python
  • scholarly (in lieu of a Google Scholar API)
  • Data, in the form a csv file

Example CSV data:

Lab,                      ID,           URL
Biomedical Technologies,  ZImFmCUAAAAJ, http://ece.sites.olt.ubc.ca/research/biomedical-technologies/
Communication Systems,    PhdzKFcAAAAJ, http://ece.sites.olt.ubc.ca/research/communication-systems/

(Do not include the spaces if you choose to use this data.)

Virtual environment quick start (for Windows):

pip install virtualenv
virtualenv env
source ./env/Scripts/activate
pip install -r requirements.txt
pip freeze > requirements.txt

N.B.: As of July 10th, 2020 you should manually tweak the scholarly package to get the desired output from research.py

There's an open issue for this, but for now go to env/Lib/scholarly/author.py and change line 10 to read:

_CITATIONAUTH = '/citations?hl=en&user={0}&sortby=pubdate'

The "sortby=pubdate" is what we're after here.

And, as of August 8th, 2020 you should manually tweak one more thing

In _scholarly.py change line 85 to contain:

patents: bool = False

Patents should be skipped for this use case.

Usage

Run python research.py -i <input file> -o <output file>, where 'input file' is the name of a CSV file containing professor names. See research.py for more information on the anticipated structure of the CSV data. In general, your input file should have three columns: lab, lab ID, and a URL (in that order).

  • 'Lab' should be the name of the lab at UBC
  • 'Lab ID' should be the Google Scholar ID. For example, if you navigate to this link you want the user=... part of the link, so in this case the ID is EmD_lTEAAAAJ.
  • 'URL' should be the homepage this content is displayed on the UBC website. As of right now, this field is not utilized, so don't worry about it to much.

After executing the command, an output CSV file is produced.

  • 'Lab' and 'Lab ID' are the same as above
  • 'Publications' is a kind-of placeholder for an arbitrary amount of rows (like a file tree); the publication information is printed in the next N rows with the following (rather self-explanatory) headers:
    • 'Title'
    • 'Author'
    • 'Year'
    • 'Cited by' (the number of other publications that have cited the give publication)
    • 'Publisher'

Example

Here's an example console call:

Example console output

And then here would be the generated csv (converted to a Markdown table):

Lab Lab ID Publications Title Author Year Cited By Publisher
Biomedical Technologies ZImFmCUAAAAJ ...
Guidelines for the use and interpretation of assays for monitoring autophagy Daniel J Klionsky an 2016 8739 Taylor & Francis
On robust Capon beamforming and diagonal loading Jian Li and Petre St 2003 1431 IEEE

Or, in Excel:

Example program output

Future

Collect research from more sources, export to RSS feed.

recesearch's People

Contributors

dependabot[bot] avatar michaelfromyeg avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

recesearch's Issues

Add method of caching data

Right now the script makes a lot of rather, err, excessive calls to scholarly which can lead to timeouts. Everytime it runs a fill, the data should be cached, and before data is every queried, the cache should be checked.

Though I've barely ever used it before, I think pickle is suitable for this, so I'll leave the reference here.

Group publications by research groups

Though the authors used currently are technically research groups, the results aren't very accurate. Provide an option such that:

  • for author given, fetch the publications
  • grab data that is in the form research_group: [faculty_member, ..., faculty_member]
  • re-organize publications by those groups;
  • if a publication doesn't have an author that belongs to a group, discard it
  • if a publication has authors belonging to multiple groups, majority rules

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.