Giter VIP home page Giter VIP logo

icml-nips-iclr-dataset's Introduction

Paper dataset from ICML, NeurIPS and ICLR

The dataset contains all paper titles, authors and their affiliations from the years

  • ICML: 2017-2023
  • NeurIPS: 2006-2022
  • ICLR: 2018-2023 (except 2020)

The earliest years are always the years in which the respective conference introduced the web interface which this script is compatible with.

Conference,Year,Title,Author,Affiliation
NeurIPS,2006,Attentional Processing on a Spike-Based VLSI Neural Network,Yingxue Wang,"Swiss Federal Institute of Technology, Zurich"
NeurIPS,2006,Attentional Processing on a Spike-Based VLSI Neural Network,Rodney J Douglas,Institute of Neuroinformatics
NeurIPS,2006,Attentional Processing on a Spike-Based VLSI Neural Network,Shih-Chii Liu,"Institute for Neuroinformatics, University of Zurich and ETH Zurich"
NeurIPS,2006,Multi-Task Feature Learning,Andreas Argyriou,Ecole Centrale de Paris
# ...

In 2020 the corona virus spread over the world and forced the conferences to adopt a new virtual format. ICLR decided to announce the conference schedule via a separate page for just this one year. For 2021 their schedule includes poster sessions for all papers again as usual. This alternate schedule page is not covered by the scraping script which is why there are no papers for ICLR2020.

Update the Data

The first and definitely correct option is to re-scrape the whole dataset as in the following example.

python scrape.py 2006-2021

A faster alternative is just scraping the new data and appending it to the CSV file.

python scrape.py --output update.csv 2019-2021
cat update.csv >> papers.csv

The file is sorted by year, so appending at the end keeps the order in tact. However, you need to take care that you do not end up with duplicate entries. Let's say that the current file contains all papers until 2019 but when the file was created, only ICLR had happened yet. If you then later scrape 2019 again to add the other conferences as above, you would get the ICLR papers twice.

icml-nips-iclr-dataset's People

Contributors

martenlienen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

icml-nips-iclr-dataset's Issues

Is there any way to merge the same institutions with different names?

Hey, thank for your data, it is awesome!
I am trying to build a network for institutions and use edges to represent collaboration. However, the data contains the same institutions with different names, eg, UCB = UC Berkely = University of California, Berkely. So, my question is that do you know if there is any quick fix of these problems or if there is another same database that may have cleaner data?
Thanks!

where is dataclasses?

RT, when I use the instruction followed by the content introduced in "README.md", I can't find the dataclasses.
Here is the mistake:

from dataclasses import dataclass
ModuleNotFoundError: No module named 'dataclasses'

Hope your early reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.