Giter VIP home page Giter VIP logo

covid-19's Introduction

This repository contains raw and processed data related to COVID19 in the Netherlands. We collect case numbers, hospitalizations, deaths, and many auxiliary data. The repository also contains output of multiple statistical models and a daily report on the epidemiological situation in the Netherlands.

DOI for citation: https://doi.org/10.5281/zenodo.5163263

Dataset: COVID-19 case counts in The Netherlands

We collect numbers on COVID-19 disease count cases in The Netherlands. The numbers are collected from various sources on a daily base, like RIVM (National Institute for Public Health and the Environment), NICE (Nationale Intensive Care Evaluatie), and the National Corona Dashboard. The data in this repository are mainly used to inform the general public with daily updates on COVID-19 disease count cases. This project standardizes, and publishes data and makes it Findable, Accessible, Interoperable, and Reusable (FAIR).

Dutch:

Wij verzamelen ziektecijfers over COVID-19 in Nederland. Dagelijks worden de cijfers verzameld van het RIVM (Rijksinstituut voor de Volksgezondheid en Milieu), NICE (Nationale Intensive Care Evaluatie), LCPS (Landelijk Coördinatiecentrum Patiënten Spreiding), en Nationale Corona Dashboard. De data in deze repository worden voornamelijk gebruikt om het algemene publiek te informeren met dagelijkse updates ten aanzien van ziektecijfers over COVID-19. Dit project standaardiseert en publiceert de gegevens en maakt ze vindbaar, toegankelijk, interoperabel en herbruikbaar (FAIR).

License

The graphs and data are licensed CC0. The original data is licensed under the 'Public Domain Mark' by the RIVM.

Datasets

The datasets available in this repository are updated on a daily base. Availability depends on the publication by the respective sources (N.B. since July 1st, the epidemiological reports published by RIVM will be released on a weekly instead of daily basis). The project divides the datasets into four main categories:

NICE data

This folder contains various raw datasets as well as compilations of those raw data from the NICE website. NICE is the national organization for IC data but collected COVID-19 data from clinical departments as well. Data is collected every day at 14:00 from NICE.

covid-19's People

Contributors

davylandman avatar edwinveldhuizen avatar hjwesteneng avatar jofam avatar mzelst avatar nienkeipenburg avatar yorickbleijenberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19's Issues

Kolommen zijn anders

Een tijd geleden had je deze twee kolommen:

  • positivetests
  • values.tested_total

Deze betsaan nu niet meer, zijn deze vervangen? En zo ja, door welke?

all_data.csv heeft andere eerste regel

In het bestand:
data/all_data.csv

is de eerste regel niet meer de regel met kolomkoppen maar staat er:
<<<<<<< HEAD

Is dit een foutje of blijft dat vanaf nu op deze manier?

Calculate daily positive test rate from values.tested_total and values.infected for higher precision

First of all, thanks for the amazing work done daily!

Currently values.infected_percentage in data-dashboards/percentage-positive-daily-national.csv is rounded to one decimal. With the high infections rates currently not a big problem, but once we (hopefully ever) go back to the <2% rate the decrease in precision is quite significant.

For the plots it would be better to calculate them from the source data, so values.infected / values.tested_total (and * 100 for percents).

Eemsdelta en Haaren

In the province of Groningen three municipalities merged: Appingedam, Delfzijl and Loppersum. Since the first of January 2021 the new municipality is called Eemsdelta.

In reports/daily_report.pdf the figures: reports/daily_report_files/figure-latex/Gemeentes - Sinds vorige week-1. and reports/daily_report_files/figure-latex/Gemeentes - sinds gisteren-1. the municipalities Appingedam, Delfzijl and Loppersum are still separate municipalities.
For reports/daily_report_files/figure-latex/Gemeentes - Sinds vorige week-1. one can argue whether it should be Eemsdelta or Appingedam, Delfzijl and Loppersum, because 2021 is over 3,5 days old, the average should give Eemsdelta :-p

I have no experience with R so can't find how to merge the data of the three municipalities into a new municipality.

According to Wikipedia there is a simulare issue in Noord-Braband with Haaren: Gemeentelijke herindelingen
I think Haaren is still visible in the images?

Compress original datasets to reduce repository size

Currently the repository grows quite a bit per day, taking a fresh clone took 600MB. Luckily git already does internal de-duplication of chunks of the data files else it would have been a bit more than 2GB.

Most of the data usage goes into data-rivm. Compression is an easy trick to reduce the file size of commits and the git repo.

R (and other programs) have automatic support for reading compressed csv's. If you for example compress the files with gzip (the stream compressor: gzip -9 -k COVID-19_casus_landelijk_2020-10-20.csv, not the zip application). You reduce casus file size from 26MB to 884KB. (xz (aka lzma2/7zip) would take it down to 761KB at the cost of slower code):

$ ll -h  COVID-19_casus_landelijk_2020-10-20.csv*
-rw-r--r-- 1 Davy   26M Oct 21 14:08 COVID-19_casus_landelijk_2020-10-20.csv
-rw-r--r-- 1 Davy  884K Oct 21 14:08 COVID-19_casus_landelijk_2020-10-20.csv.gz

In R, if you open a csv.gz file, it will automatically decompress the file in memory before parsing the csv.

B.1.1.7 (UK) and B.1.351 (South Africa) variants

With the B.1.351 and especially the B.1.1.7 variant quickly spreading around I think it would be important to start publishing data to raise awareness. I have been searching for good data on B.1.1.7 and B.1.351 lineages recently, but unfortunately the RIVM or GGDs don't publishes anything yet.

cov-lineages.org is tracking all sequences shared with GIDAIDS however and shared these two reports:

They also provide a GitHub repo with most of their data and scripts.

I'm not sure yet what would be the best way to report this data daily, but maybe a global case count, EU case count, of case count of neighbors can be reported. The last update from the RIVM about B.1.1.7 indicated 50 on January 6th, the GISAID data from January 7th only reported 13 cases from The Netherlands, so it's lagging a bit behind.

I also opened a issue on the Corona Dashboard repo, but haven't had any response unfortunately: minvws/nl-covid19-data-dashboard#1310

all_data.csv aantal kolommen aangepast met de commit van vandaag

Hoi Marino,

Ten eerste bedankt voor al je werk in deze repository.
Met plezier gebruiken we jouw dataset uit de all_data.csv voor een dashboard.
Nu valt me op dat af en toe de structuur van dit bestand aangepast wordt waardoor ons dashboard de data niet meer goed kan verwerken.
Ik snap dat je de data primair voor je eigen overzichten klaar zet maar persoonlijk zou ik het erg fijn vinden als deze structuur niet te vaak aangepast wordt.

Sinds vandaag is bijvoorbeeld de kolom X.1 ineens niet meer beschikbaar.
Is er een reden waarom deze kolom er niet meer is? En zijn dit soort wijzigingen vaker te verwachten?

Groeten,
Maarten

Age groups of 5 instead of 10 years

Would it be possible to split out the age groups in sections of 5 years instead of 10 years? So 0-4 and 5-9 instead of 0-9, etc.?

Especially for 10-14 and 15-19, and 20-24 and 25-29 there are big differences.

data/all_data.csv has wrong date for entry 2020-07-01

Hi all,

Spotted this last night, figured out the source only now. In the data/all_data.csv there's 2 entries for 2020-07-02. The first of those entries is actually for 2020-07-01. Since all_data.csv is generated from the daily RIVM files, the source for this can be seen in the individual case files per day:

"2020-07-02",50335,11878,6115

While the file name is correct, the date for this entry is listed as 2020-07-02.

Kind regards,

Fryslân wrong as Fryslân in PDF and TEX: double unicode

(mag ook in nederlands, hoor)

Fryslân wrong in PDF and TEX: probably because double unicoded

image

Cause: https://github.com/mzelst/covid-19/blob/master/reports/daily_report.tex#L225

So that says "Fryslân". Inspection learns that "â" are total 4 bytes:

/git/covid-19/reports$ cat daily_report.tex | grep Frysl | head -1  | hd
00000000  46 72 79 73 6c c3 83 c2  a2 6e 20 26 20 31 37 30  |Frysl....n & 170|
00000010  32 20 26 20 32 36 31 2e  31 20 26 20 31 20 26 20  |2 & 261.1 & 1 & |
00000020  30 2e 32 20 26 20 31 20  26 20 30 2e 32 5c 5c 0a  |0.2 & 1 & 0.2\\.|
00000030

So: c3 83 c2 a2 ... that's a bit too much.

In other locations, Fryslân is spelled correctly:

~/git/covid-19$ cat data-rivm/disabled-people-per-day/rivm_daily_2020-12-23.csv | grep Frysl | head -1 "2020-12-23 10:00:00","2020-07-01","VR02","Fryslân",0,0,0,0

... and encoded correctly: just one Unicode UTF8 2-byte c3 a2

~/git/covid-19$ cat data-rivm/disabled-people-per-day/rivm_daily_2020-12-23.csv | grep Frysl | head -1  | hd
00000000  22 32 30 32 30 2d 31 32  2d 32 33 20 31 30 3a 30  |"2020-12-23 10:0|
00000010  30 3a 30 30 22 2c 22 32  30 32 30 2d 30 37 2d 30  |0:00","2020-07-0|
00000020  31 22 2c 22 56 52 30 32  22 2c 22 46 72 79 73 6c  |1","VR02","Frysl|
00000030  c3 a2 6e 22 2c 30 2c 30  2c 30 2c 30 0a           |..n",0,0,0,0.|
0000003d

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.