Giter VIP home page Giter VIP logo

aashitak / nyt-comments Goto Github PK

View Code? Open in Web Editor NEW
22.0 4.0 11.0 148 KB

A python package that retrieves comments from the New York Times articles customized with respect to timeline and queries and return the comments' and articles' data as pandas dataframes with an option to store it as csv files.

Home Page: https://pypi.org/project/nytcomments/

License: MIT License

Python 31.52% Jupyter Notebook 68.48%

nyt-comments's Introduction

nyt-comments

The package includes three main functions to perform three distinct tasks involving the retrieval of comments' and articles' from New York Times as ready-to-use dataset for data science/machine learning projects:

  1. The main function get_dataset returns two dataframes - one each for the articles and the comments on them. The retrieval can be customized based on a number of optional parameters such as a specific timeline for the articles, search keywords, filter queries based on a number of options such as the week of the day, the word count of the articles, source, etc., maximum limit on the number of comments or articles or both, sorting the articles chronologically based on either the newest or oldest articles, option to suppress or activate the output log for the process, option to save the data as two csv files, etc. The function returns only the articles that were open to comments along with the comments on them.
  2. The function get_articles can be used as an API wrapper for NYT article search API. It returns the cleaned up and preprocessed data for articles as a ready-to-use pandas dataframe (with an option to store it in csv files). The retrieval can be customized with the same options as above and unlike the above function, it returns all the articles that satisfy the search criteria.
  3. The function get_comments retrieves the comments on NYT article(s) given their urls. It can be used as a substitute for the comments by url option in the NYT Community API that is now deprecated and only return comments that were picked as editor's selection on account of an unresolved issue. This function does not use NYT API for the retrieval unlike the above two.

Dependencies

  • Python 3.4+
  • pandas
  • requests

Usage

from nytcomments.nytcomments import get_dataset
articles_df, comments_df = get_dataset(ARTICLE_API_KEY, page_lower=0, page_upper=2)

Please refer to the tutorial here for illustration of the three functions get_dataset, get_comments and get_articles as well as detailed information about the function arguments. The functions get_dataset and get_articles requires the use of NYT API key that can be obtained by registering at the NYT developers' site whereas get_comments can be used without the API key. You must agree to the Terms of Use for the NYT article search API to use the key.

Note : The dataset of comments posted on NYT articles in the period Jan - May 2017 and Jan - April, 2018 is available on Kaggle and is at the top among the 20 featured datsets as of April 28, 2018.

Acknowledgement

  • The url used to retrieve comments from a given article in the function get_comments is taken from the blog by Neal Caren.
  • NYT article search API is used for the article search.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.