Giter VIP home page Giter VIP logo

oag_in_elasticsearch's Introduction

OAG in Elasticsearch

Want to explore the Microsoft Academic Graph (MAG) and Aminer in a search engine? Use these scripts to put MAG and Aminer dataset in the Elasticsearch!

Requirement

pip install elasticsearch tqdm 

Datasets

OAG can be downloaded from here. Open Academic Graph (OAG) unifies two billion-scale academic graphs: Microsoft Academic Graph (MAG) and AMiner.

MAG V1

In total, 167 files named in pattern mag_papers_[0-166].txt are included in MAG V1 dataset. Running the script below to upload the dataset to Elasticsearch. The index name is set up by --index option and is mag_v1 by default. The script was tested on Elasitcsearch >= 7.4 using English only publications in MAG V1.

python index_mag_v1.py --inputs [path/mag_papers*.txt]

Aminer V1

In total, 155 files named in pattern aminer_papers_[0-154].txt are included in Aminer V1 dataset. Running the script below to upload the dataset to Elasticsearch. The index name is set up by --index option and is aminer_v1 by default. The script was tested on Elasitcsearch >= 7.4 and publications which have both title and abstract.

python index_aminer_v1.py --inputs [path/aminer_papers*.txt]

oag_in_elasticsearch's People

Contributors

taoranj avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.