Giter VIP home page Giter VIP logo

news-sharing's Introduction

README

Overview

This replication package constructs the results for the paper titled News-sharing as a measure of media alignment by Gabor Simonovits and Adam Vig, published in the Journal of Quantitative Description: Digital Media. A Makefile runs all of the code to clean and organize the data and generates the tables and figures in the paper. The replicator should expect the code to run for about 2-3 minutes.

Data Availability and Provenance Statements

Statement about Rights

  • I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

License for Data

The code is licensed under a Creative Commons/CC-BY-NC/CC0 license. See LICENSE for details.

Summary of Availability

  • All data are publicly available.
  • Some data cannot be made publicly available.
  • No data can be made publicly available.

Details on each Data Source

The data were collected by the authors, and are available under a Creative Commons Non-commercial license.

  • Database of citations on 24.hu and origo.hu data/raw/newspaper_citations_data_raw.csv were collected using scraping code writen in Python by the authors. Scraper code is available at the code/scraper folder.

  • Validation data on the slant of Hungarian news portals data/validation/hungarian_media_slant.csv is from Bátorfy (2020).

  • Readership data for Hungarian domains There is a .csv or .xlsx file in the data/raw/gemius/ folder named as gemius_YEAR_MONTH for each consecutive month which contains the number of real user for Hungarian domains estimated by Gemius (2022).

Computational requirements

Software Requirements

The following versions of Python, R and GNU Make were used by the authors:

  • Python 3.10.0
  • R 4.1.2
  • GNU Make 3.81

Package requirements are stored in the Pipfile.lock and renv.lock files. pipenv is used for Python, renv for R dependency management. CNU Make for execution.

Note: requirements for scraping the citations database is listed in requirements.txt in the code/scraper folder.

Memory and Runtime Requirements

Summary

A few minutes needed to reproduce the analyses on a standard 2023 desktop machine. The authors would not recommend running the scraper code on a destop machine.

Details

The analysis code were last run on a 8-core M1 Macbook Pro with 16 GB RAM and MacOS version 13.2.

The scraper code were last run on a 8-core AWS EC2 server with 32 GB of RAM.

Instructions to Replicators

  • Scraper

There is no master script for the scraper code. The scripts scrape-24hu-sitemaps.py and scrape-origo-archive.py collect the urls for every published article at 24.hu or origo.hu respectively. 24hu_parser.py and origo_parser.py contain parser functions for the sites. The functions parse the html script of one article published on 24.hu or origo.hu respectively. The scraper codes were last executed on 2022.08.02.

Note: The parser functions might need modification to work properly, as the sites might change their HTML content since our last execution.

  • Analysis
  1. run make install in terminal to set up the environment and required folders for the analysis.
  2. run make in terminal to clean and organize data and replicate the 7 figures and 3 tables used in the paper.

Description of code

  • Scripts in code/scraper can be used to replicate the raw data used by the analysis.
  • code/activate_renv.R is run by make install and activates the virtual environment for R.
  • Scripts in code/data-processing/
    • clean and reformat the scraped citation database data/raw/newspaper_citations_data_raw.csv.
    • clean and concatenate the montly datafiles of real usership in data/raw/gemius/.
  • Scripts in code/figures generate all figures in the main body and the appendix of the article.
  • Scripts in code/tables generate all tables in the appendix of the article.

List of tables and figures with scripts

The provided code reproduces:

  • All numbers provided in text in the paper
  • All tables and figures in the paper
Exhibit Script
Figure 1 alignment_score_by_ncitations.py
Figure 2 xsec.R
Figure 3 dynamic.R
Figure A1 n_citations_cdf.py
Figure A2/a readership_ts.R
Figure A2/b readership_xsec.R
Figure A3/a xsec_hardnews.R
Figure A3/b xsec_softnews.R
Figure A4/a dynamic_hardnews.R
Figure A4/b dynamic_softnews.R
Table A1 number_of_articles.R
Table A2 citations_by_year_page.R
Table A3 hun_citations_sample_by_year_page.R

The output files are named after their corresponding script and are in results/figures or results/tables.

References


news-sharing's People

Contributors

adamvig96 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.