Giter VIP home page Giter VIP logo

covid-19-open-data's Introduction

Official Site

Please refer to the official site for this repository for visualizations and other relevant information: https://health.google.com/covid-19/open-data/

Repository No Longer Updated

As of September 15, 2022, we will be turning off real-time updates in this repository, and converting the repository to a retrospective one. The data will continue to be available without interruption for the foreseeable future at the existing location, but it will not be updated further. Users who wish to continue to receive updates are encouraged to inspect our data sources, or clone the code and run the data pipelines locally.

COVID-19 Open-Data

This repository attempts to assemble the largest Covid-19 epidemiological database in addition to a powerful set of expansive covariates. It includes open, publicly sourced, licensed data relating to demographics, economy, epidemiology, geography, health, hospitalizations, mobility, government response, weather, and more. Moreover, the data merges daily time-series, +20,000 global sources, at a fine spatial resolution, using a consistent set of region keys. All regions are assigned a unique location key, which resolves discrepancies between ISO / NUTS / FIPS codes, etc. The different aggregation levels are: The different aggregation levels are:

  • 0: Country
  • 1: Province, state, or local equivalent
  • 2: Municipality, county, or local equivalent
  • 3: Locality which may not follow strict hierarchical order, such as "city" or "nursing homes in X location"

There are multiple types of data:

  • Outcome data Y(i,t), such as cases, tests, hospitalizations, deaths and recoveries, for region i and time t
  • Static covariate data X(i), such as population size, health statistics, economic indicators, geographic boundaries
  • Dynamic covariate data X(i,t), such as mobility, search trends, weather, and government interventions

The data is drawn from multiple sources, as listed below, and stored in separate tables as CSV files grouped by context, which can be easily merged due to the use of consistent geographic (and temporal) keys as it is done for the aggregated table.

Table Keys1 Content URL Source2
Aggregated [key][date] Flat, compressed table with records from (almost) all other tables joined by date and/or key; see below for more details aggregated.csv All tables below
Index [key] Various names and codes, useful for joining with other datasets index.csv, index.json Wikidata, DataCommons, Eurostat
Demographics [key] Various (current3) population statistics demographics.csv, demographics.json Wikidata, DataCommons, WorldBank, WorldPop, Eurostat
Economy [key] Various (current3) economic indicators economy.csv, economy.json Wikidata, DataCommons, Eurostat
Epidemiology [key][date] COVID-19 cases, deaths, recoveries and tests epidemiology.csv, epidemiology.json Various2
Emergency Declarations [key][date] Government emergency declarations and mitigation policies lawatlas-emergency-declarations.csv LawAtlas Project
Geography [key] Geographical information about the region geography.csv, geography.json Wikidata
Health [key] Health indicators for the region health.csv, health.json Wikidata, WorldBank, Eurostat
Hospitalizations [key][date] Information related to patients of COVID-19 and hospitals hospitalizations.csv, hospitalizations.json Various2
Mobility [key][date] Various metrics related to the movement of people.

To download or use the data, you must agree to the Google Terms of Service.
mobility.csv, mobility.json Google
Search Trends [key][date] Trends in symptom search volumes due to COVID-19.

To download or use the data, you must agree to the Google Terms of Service.
google-search-trends.csv Google
Vaccination Access [place_id] Metrics quantifying access to COVID-19 vaccination sites.

To download or use the data, you must agree to the Google Terms of Service.
facility-boundary-us-all.csv Google
Vaccination Search [key][date] Trends in Google searches for COVID-19 vaccination information.

To download or use the data, you must agree to the Google Terms of Service.
Global-vaccination-search-insights.csv Google
Vaccinations [key][date] Trends in persons vaccinated and population vaccination rate regarding various Covid-19 vaccines.

vaccinations.csv Google
Government Response [key][date] Government interventions and their relative stringency oxford-government-response.csv, oxford-government-response.json University of Oxford
Weather [key][date] Dated meteorological information for each region weather.csv NOAA
WorldBank [key] Latest record for each indicator from WorldBank for all reporting countries worldbank.csv, worldbank.json WorldBank
By Age [key][date] Epidemiology and hospitalizations data stratified by age by-age.csv, by-age.json Various2
By Sex [key][date] Epidemiology and hospitalizations data stratified by sex by-sex.csv, by-sex.json Various2

1 key is a unique string for the specific geographical region built from a combination of codes such as ISO 3166, NUTS, FIPS and other local equivalents.
2 Refer to the data sources for specifics about each data source and the associated terms of use.
3 Datasets without a date column contain the most recently reported information for each datapoint to date.

For more information about how to use these files see the section about using the data, and for more details about each dataset see the section about understanding the data.

Why another dataset?

There are many other public COVID-19 datasets. However, we believe this dataset is unique in the way that it merges multiple global sources, at a fine spatial resolution, using a consistent set of region keys in a way we hope facilitate ease of usage. Most importantly, we are committed to transparency regarding open, public, and licensed data sources. Lastly, the code for ingesting and merging the data is easy to understand and modify.

Explore the data

A simple visualization tool was built to explore the Open COVID-19 datasets, the Open COVID-19 Explorer: drawing
A variety of other community contributed visualization tools are listed below.

See the COVID19 Data Block made by the Looker team: If you want to see interactive charts with a unique UX, don't miss what @Mahks built using the Open COVID-19 dataset: You can also check out the great work of @quixote79, a MapBox-powered interactive map site:
Experience clean, clear graphs with smooth animations thanks to the work of @jmullo: Become an armchair epidemiologist with the COVID-19 timeline simulation tool built by @LeviticusMB: Whether you want an interactive map, compare stats or look at charts, @saadmas has you covered with a COVID-19 Daily Tracking site:
Compare per-million data at Omnimodel thanks to @OmarJay1: Look at responsive, comprehensive charts thanks to the work of @davidjohnstone: Reproduction Live lets you track COVID-19 outbreaks in your region and visualise the spread of the virus over time:

Use the data

The data is available as CSV and JSON files, which are published in Google Cloud Storage so they can be served directly to Javascript applications without the need of a proxy to set the correct headers for CORS and content type.

For the purpose of making the data as easy to use as possible, there is an aggregated table which contains the columns of all other tables joined by key and date. However, performance-wise, it may be better to download the data separately and join the tables locally.

Each region has its own version of the aggregated table, so you can pull all the data for a specific region using a single endpoint, the URL for each region is:

  • Data for key in CSV format: https://storage.googleapis.com/covid19-open-data/v3/location/${key}.csv
  • Data for key in JSON format: https://storage.googleapis.com/covid19-open-data/v3/location/${key}.json

Each table has a full version as well as subsets with only the last day of data. The full version is accessible at the URL described in the table above. The subsets can be found by inserting latest into the path. For example, the subsets of the epidemiology table are available at the following locations:

Please note that the aggregated table is not compressed for the latest subset, so the URL is https://storage.googleapis.com/covid19-open-data/v3/latest/aggregated.csv.

Note that the latest version contains the last non-null record for each key. All of the above listed tables have a corresponding JSON version; simply replace csv with json in the link.

If you are trying to use this data alongside your own datasets, then you can use the Index table to get access to the ISO 3166 / NUTS / FIPS code, although administrative subdivisions are not consistent among all reporting regions. For example, for the intra-country reporting, some EU countries use NUTS2, others NUTS3 and many ISO 3166-2 codes.

You can find several examples in the examples subfolder with code showcasing how to load and analyze the data for several programming environments. If you want the short version, here are a few snippets to get started.

BigQuery

This dataset is part of the BigQuery Public Datasets Program, so you may use BigQuery to run SQL queries directly from the online query editor free of charge.

Google Colab

You can use Google Colab if you want to run your analysis without having to install anything in your computer, simply go to this URL: https://colab.research.google.com/github/GoogleCloudPlatform/covid-19-open-data.

Google Sheets

You can import the data directly into Google Sheets, as long as you stay within the size limits. For instance, the following formula loads the latest epidemiology data into the current sheet:

=IMPORTDATA("https://storage.googleapis.com/covid19-open-data/v3/latest/epidemiology.csv")

Note that Google Sheets has a size limitation, so only data from the latest subfolder can be imported automatically. To work around that, simply download the file and import it via the File menu.

R

If you prefer R, then this is all you need to do to load the epidemiology data:

data <- read.csv("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv")

Python

In Python, you need to have the package pandas installed to get started:

import pandas
data = pandas.read_csv("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv")

jQuery

Loading the JSON file using jQuery can be done directly from the output folder, this code snippet loads the epidemiology table into the data variable:

$.getJSON("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.json", data => { ... }

Powershell

You can also use Powershell to get the latest data for a country directly from the command line, for example to query the latest epidemiology data for Australia:

Invoke-WebRequest 'https://storage.googleapis.com/covid19-open-data/v3/latest/epidemiology.csv' | ConvertFrom-Csv | `
    where key -eq 'AU' | select date,cumulative_confirmed,cumulative_deceased,cumulative_recovered

Understand the data

Make sure that you are using the URL linked at the table above and not the raw GitHub file, the latter is subject to change at any moment in non-compatible ways, and due to the configuration of GitHub's raw file server you may run into potential caching issues.

Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported.

For information about each table, see the corresponding documentation linked above.

Aggregated table

Flat table with records from all other tables joined by key and date. See above for links to the documentation for each individual table. Due to technical limitations, not all tables can be included as part of this aggregated table.

Notes about the data

For countries where both country-level and subregion-level data is available, the entry which has a null value for the subregion level columns in the index table indicates upper-level aggregation. For example, if a data point has values {country_code: US, subregion1_code: CA, subregion2_code: null, ...} then that record will have data aggregated at the subregion1 (i.e. state/province) level. If subregion1_codewere null, then it would be data aggregated at the country level.

Another way to tell the level of aggregation is the aggregation_level of the index table, see the schema documentation for more details about how to interpret it.

Please note that, sometimes, the country-level data and the region-level data come from different sources so adding up all region-level values may not equal exactly to the reported country-level value. See the data loading tutorial for more information.

Data updates

The data for each table is updated at least daily. Individual tables, for example Epidemiology, have fresher data than the aggregated table and are updated multiple times a day. Each individual data source has its own update schedule and some are not updated in a regular interval; the data tables hosted here only reflect the latest data published by the sources.

Contribute

Technical contributions to the data extraction pipeline are welcomed, take a look at the source directory for more information.

If you spot an error in the data, feel free to open an issue on this repository and we will review it.

If you do something with this data, for example a research paper or work related to visualization or analysis, please let us know!

For Data Owners

We have carefully checked the license and attribution information on each data source included in this repository, and in many cases have contacted the data owners directly to ask how they would like to be attributed.

If you are the owner of a data source included here and would like us to remove data, add or alter an attribution, or add or alter license information, please open an issue on this repository and we will happily consider your request.

Licensing

The output data files are published under the CC BY license. All data is subject to the terms of agreement individual to each data source, refer to the sources of data table for more details. All other code and assets are published under the Apache License 2.0.

Sources of data

All data in this repository is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health. For a list of individual data sources, please see the documentation for the individual tables linked at the top of this page.

Running the data extraction pipeline

See the source documentation for more technical details.

Acknowledgments and collaborations

This project has been done in collaboration with FinMango, which provided great insights about the impact of the pandemic on the local economies and also helped with research and manual curation of data sources for many regions including South Africa and US states.

Stratified mortality data for US states is provided by Imperial College of London. Please refer to this list of maintainers and contributors for the individual acknowledgements.

The following persons have made significant contributions to this project:

  • Oscar Wahltinez
  • Kevin Murphy
  • Michael Brenner
  • Matt Lee
  • Anthony Erlinger
  • Mayank Daswani
  • Pranali Yawalkar
  • Zack Ontiveros
  • Ruth Alcantara
  • Donny Cheung
  • Aurora Cheung
  • Chandan Nath
  • Paula Le
  • Ofir Picazo Navarro

Recommended citation

Please use the following when citing this project as a source of data:

@article{Wahltinez2020,
  author = "O. Wahltinez and others",
  year = 2020,
  title = "COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2",
  note = "Work in progress",
  url = {https://goo.gle/covid-19-open-data},
}

covid-19-open-data's People

Contributors

a27cheung avatar alvarosg avatar charlottestanton avatar dependabot[bot] avatar dhan16 avatar donnyc avatar dsmurrell avatar geening avatar geraschenko avatar jonathanamar-v avatar l2edzl3oy avatar mansi-kansal avatar murphyk avatar owahltinez avatar pbattaglia avatar pranalipy avatar pranaliyawalkar avatar shekelto avatar themonk911 avatar tildechris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19-open-data's Issues

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/env/lib/python3.7/site-packages/urllib3/util/connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/opt/python3.7/lib/python3.7/socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

ZA fetch fails to find sheet "Eastern Cape"

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 226, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 97, in parse
    return self.parse_dataframes(self._read(sources, **read_opts), aux, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 78, in _read
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/data_source.py", line 78, in <dictcomp>
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/io.py", line 102, in read_file
    return pandas.read_excel(path, **{**default_read_opts, **read_opts})
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 334, in read_excel
    **kwds,
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 888, in parse
    **kwds,
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 439, in parse
    sheet = self.get_sheet_by_name(asheetname)
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 43, in get_sheet_by_name
    return self.book.sheet_by_name(name)
  File "/env/lib/python3.7/site-packages/xlrd/book.py", line 476, in sheet_by_name
    raise XLRDError('No sheet named <%r>' % sheet_name)
xlrd.biffh.XLRDError: No sheet named <'Eastern Cape'>

us_ak_authority.py has missing date column

Traceback (most recent call last):
File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
return data_source.run(output_folder, cache, aux)
File "/home/vmagent/app/lib/data_source.py", line 226, in run
data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
File "/home/vmagent/app/pipelines/epidemiology/us_ak_authority.py", line 55, in parse
data.date = data.date.astype(str).apply(lambda x: x[:10])
File "/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'date'

US_TX data source fails parsing

Traceback (most recent call last): File "/home/vmagent/app/pipelines/epidemiology/us_tx_authority.py", line 44, in _parse_trends return _rename_columns(data, column_adapter) File "/home/vmagent/app/pipelines/epidemiology/us_tx_authority.py", line 29, in _rename_columns return data[column_adapter.values()] File "/env/lib/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__ indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] File "/env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1553, in _get_listlike_indexer keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing File "/env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1640, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['date', 'total_confirmed', 'total_deceased', 'new_confirmed',\n 'new_deceased'],\n dtype='object')] are in the [columns]"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/vmagent/app/lib/pipeline.py", line 169, in _run_wrapper return data_source.run(output_folder, cache, aux) File "/home/vmagent/app/lib/data_source.py", line 205, in run data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts) File "/home/vmagent/app/pipelines/epidemiology/us_tx_authority.py", line 73, in parse df = sheet_processor(read_file(sources[0], sheet_name=sheet_name)) File "/home/vmagent/app/pipelines/epidemiology/us_tx_authority.py", line 46, in _parse_trends return _rename_columns(data.iloc[1:], column_adapter) File "/home/vmagent/app/pipelines/epidemiology/us_tx_authority.py", line 29, in _rename_columns return data[column_adapter.values()] File "/env/lib/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__ indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] File "/env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1553, in _get_listlike_indexer keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing File "/env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1646, in _validate_read_indexer raise KeyError(f"{not_found} not in index") KeyError: "['new_deceased'] not in index"
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:12:33.1050][DataPipeline] No output for TexasDataSource with config {'name': 'pipelines.epidemiology.us_tx_authority.TexasDataSource', 'fetch': [{'url': 'https://dshs.texas.gov/coronavirus/TexasCOVID19CaseCountData.xlsx'}], 'test': {'metadata_query': "key == 'US_TX'", 'skip': True}, 'automation': {'job_group': '74'}}
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Error downloading Google sheet

Traceback (most recent call last):
  File "/home/vmagent/app/appengine.py", line 169, in _pull_source
    download(url, buffer)
  File "/home/vmagent/app/lib/net.py", line 80, in download
    req.raise_for_status()
  File "/env/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://accounts.google.com/ServiceLogin?service=wise&passive=1209600&continue=https://doc-00-78-sheets.googleusercontent.com/pub/l5l039s6ni5uumqbsj9o11lmdc/7dcte5qc1e7d2nuuf42gnqinl4/1597148435000/114544363600438953716/*/e@2PACX-1vQKB9-H7tQmfs_MwoITnT8Z0j6qxPXkEQAL4DpgZ2gLRjGzR_pr4Oew-9XjYOClHbqqWT69Rrg_Tp4h?output%3Dcsv&followup=https://doc-00-78-sheets.googleusercontent.com/pub/l5l039s6ni5uumqbsj9o11lmdc/7dcte5qc1e7d2nuuf42gnqinl4/1597148435000/114544363600438953716/*/e@2PACX-1vQKB9-H7tQmfs_MwoITnT8Z0j6qxPXkEQAL4DpgZ2gLRjGzR_pr4Oew-9XjYOClHbqqWT69Rrg_Tp4h?output%3Dcsv&ltmpl=sheets

Multiple API requests being made to Wikidata with "NaN" in the request.

Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1390'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1346'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1357'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1400'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1341'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the configured id builders'], 'html': {'*': 'The serialization "nan" is not recognized by the configured id builders'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1286'}
Request: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=nan
Response: {'error': {'code': 'param-invalid', 'info': 'The serialization "nan" is not recognized by the configured id builders', 'messages': [{'name': 'wikibase-api-param-invalid', 'parameters': ['The serialization "nan" is not recognized by the con

Argentina has dates in 2018 + Oct 2019

First few dates in the epidemiology.csv are

2018-11-21,AR,0,0,,0,0,0,,0
2018-11-21,AR_B,0,0,,0,0,0,,0
2018-11-21,AR_B_490,0,0,,0,0,0,,0
2019-10-25,AR,0,0,,0,0,0,,0
2019-10-25,AR_B,0,0,,0,0,0,,0
2019-10-25,AR_B_412,0,0,,0,0,0,,0
2019-12-19,AR,0,0,,0,0,0,,0
2019-12-19,AR_C,0,0,,0,0,0,,0

Multiple match failures in Libya datasource parsing

/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.1220][LibyaHumdataDataSource] No key match found for:
match_string          ุงู„ุจุฑูŠู‚ุฉ
Test Samples               13
total_confirmed             1
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                  ุงู„ุจุฑูŠู‚ุฉ
Name: 2, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.1803][LibyaHumdataDataSource] No key match found for:
match_string           ุงู„ุฌู…ูŠู„
Test Samples               15
total_confirmed             1
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                   ุงู„ุฌู…ูŠู„
Name: 6, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.2096][LibyaHumdataDataSource] No key match found for:
match_string         ุงู„ุฑุญูŠุจุงุช
Test Samples               28
total_confirmed             1
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                 ุงู„ุฑุญูŠุจุงุช
Name: 8, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.2241][LibyaHumdataDataSource] No key match found for:
match_string         ุงู„ุฑูŠุงูŠู†ุฉ
Test Samples               40
total_confirmed             7
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                 ุงู„ุฑูŠุงูŠู†ุฉ
Name: 9, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.2719][LibyaHumdataDataSource] No key match found for:
match_string          ุงู„ุดูˆูŠุฑู
Test Samples               17
total_confirmed             3
total_recovered             0
total_deceased              0
date               07/23/2020
country_code               LY
_vec                  ุงู„ุดูˆูŠุฑู
Name: 12, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
  4%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰                                                                                                                                                               | 4/96 [00:01<01:12,  1.27it/s]/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.2887][LibyaHumdataDataSource] No key match found for:
match_string         ุงู„ุนุฌูŠู„ุงุช
Test Samples               38
total_confirmed             4
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                 ุงู„ุนุฌูŠู„ุงุช
Name: 13, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.3057][LibyaHumdataDataSource] No key match found for:
match_string           ุงู„ู‚ู„ุนุฉ
Test Samples               37
total_confirmed            11
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                   ุงู„ู‚ู„ุนุฉ
Name: 14, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.3413][LibyaHumdataDataSource] No key match found for:
match_string         ุงู„ู…ุญุฑูˆู‚ุฉ
Test Samples               35
total_confirmed             3
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                 ุงู„ู…ุญุฑูˆู‚ุฉ
Name: 16, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.3966][LibyaHumdataDataSource] No key match found for:
match_string           ุชุฑู‡ูˆู†ุฉ
Test Samples               48
total_confirmed            26
total_recovered             0
total_deceased              0
date               07/23/2020
country_code               LY
_vec                   ุชุฑู‡ูˆู†ุฉ
Name: 20, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.4135][LibyaHumdataDataSource] No key match found for:
match_string          ุชู†ุฏู…ูŠุฑุฉ
Test Samples               10
total_confirmed             3
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                  ุชู†ุฏู…ูŠุฑุฉ
Name: 21, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.4393][LibyaHumdataDataSource] No key match found for:
match_string             ุฏุฑู†ุฉ
Test Samples                2
total_confirmed             5
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                     ุฏุฑู†ุฉ
Name: 23, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.4552][LibyaHumdataDataSource] No key match found for:
match_string          ุฑู‚ุฏุงู„ูŠู†
Test Samples               21
total_confirmed             3
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                  ุฑู‚ุฏุงู„ูŠู†
Name: 24, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
 19%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰                                                                                                                                      | 18/96 [00:03<00:51,  1.52it/s]/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.5140][LibyaHumdataDataSource] No key match found for:
match_string              ุณุฑุช
Test Samples                7
total_confirmed             1
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                      ุณุฑุช
Name: 28, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
  4%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰                                                                                                                                                               | 4/96 [00:01<01:38,  1.07s/it]/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.5307][LibyaHumdataDataSource] No key match found for:
match_string           ุตุจุฑุงุชุฉ
Test Samples              240
total_confirmed            28
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                   ุตุจุฑุงุชุฉ
Name: 29, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.5454][LibyaHumdataDataSource] No key match found for:
match_string            ุตุฑู…ุงู†
Test Samples              127
total_confirmed            21
total_recovered            11
total_deceased              0
date               07/30/2020
country_code               LY
_vec                    ุตุฑู…ุงู†
Name: 30, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
 18%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–                                                                                                                                       | 17/96 [00:02<00:44,  1.79it/s]/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.5623][LibyaHumdataDataSource] No key match found for:
match_string             ุทุจุฑู‚
Test Samples                8
total_confirmed             8
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                     ุทุจุฑู‚
Name: 31, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.5956][LibyaHumdataDataSource] No key match found for:
match_string            ุบุฏุงู…ุณ
Test Samples               22
total_confirmed             3
total_recovered             2
total_deceased              0
date               07/22/2020
country_code               LY
_vec                    ุบุฏุงู…ุณ
Name: 33, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.6099][LibyaHumdataDataSource] No key match found for:
match_string            ุบุฑูŠุงู†
Test Samples               81
total_confirmed            17
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                    ุบุฑูŠุงู†
Name: 34, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
  5%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹                                                                                                                                                             | 5/96 [00:01<01:10,  1.28it/s]/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.6297][LibyaHumdataDataSource] No key match found for:
match_string       ู‚ุตุฑ ุงู„ุงุฎูŠุงุฑ
Test Samples                12
total_confirmed              5
total_recovered              0
total_deceased               0
date                07/30/2020
country_code                LY
_vec               ู‚ุตุฑ ุงู„ุงุฎูŠุงุฑ
Name: 35, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.7320][LibyaHumdataDataSource] No key match found for:
match_string             ู…ุฒุฏุฉ
Test Samples                7
total_confirmed             1
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                     ู…ุฒุฏุฉ
Name: 39, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.7794][LibyaHumdataDataSource] No key match found for:
match_string            ู‡ุฑุงูˆุฉ
Test Samples               15
total_confirmed             4
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                    ู‡ุฑุงูˆุฉ
Name: 42, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/Users/mdaswani/Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:09.8135][LibyaHumdataDataSource] No key match found for:
match_string             ูŠูุฑู†
Test Samples               32
total_confirmed             9
total_recovered             0
total_deceased              0
date               07/30/2020
country_code               LY
_vec                     ูŠูุฑู†
Name: 44, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Netherlands data source has multiple match failures

Snippet below (more failures recorded)

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:46.6486][NetherlandsDataSource] Key provided but not found in metadata:
key PE_AM
date 2020-03-13
total_confirmed 54
total_deceased 0
total_hospitalized 6
_vec PE_AM
Name: 368, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:46.6625][NetherlandsDataSource] Key provided but not found in metadata:
key PE_AN
date 2020-03-13
total_confirmed 39
total_deceased 0
total_hospitalized 5
_vec PE_AN
Name: 396, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:46.6778][NetherlandsDataSource] Key provided but not found in metadata:
key PE_AP
date                  2020-03-13
total_confirmed 39
total_deceased 0
total_hospitalized 4
_vec PE_AP
Name: 427, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:46.6909][NetherlandsDataSource] Key provided but not found in metadata:
key PE_AR
date 2020-03-13
total_confirmed 57
total_deceased 0
total_hospitalized 7
_vec PE_AR
Name: 454, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:46.7055][NetherlandsDataSource] Key provided but not found in metadata:
key PE_AY
date 2020-03-13
total_confirmed 34
total_deceased 0
total_hospitalized 5

Mexico datasource fails to fetch

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 222, in run
    data = self.fetch(output_folder, cache, fetch_opts)
  File "/home/vmagent/app/lib/data_source.py", line 73, in fetch
    for idx, source_config in enumerate(fetch_opts)
  File "/home/vmagent/app/lib/data_source.py", line 73, in <dictcomp>
    for idx, source_config in enumerate(fetch_opts)
  File "/home/vmagent/app/lib/net.py", line 58, in download_snapshot
    download(url, file_handle, **download_opts)
  File "/home/vmagent/app/lib/net.py", line 80, in download
    req.raise_for_status()
  File "/env/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://epidemiologia.salud.gob.mx/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip

Errors parsing PT data sources

/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:15.7513][PtCovid19L1DataSource] Error running data source PtCovid19L1DataSource with config {'name': 'pipelines.epidemiology.pt_covid19.PtCovid19L1DataSource', 'fetch': [{'url': 'https://raw.githubusercontent.com/carlospramalheira/covid19/master/datasets/PT_COVID_TimeSeries.csv'}], 'test': {'metadata_query': "key.str.match('PT')"}, 'automation': {'job_group': '56'}}
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Traceback (most recent call last):
  File "Documents/open-covid-19/data/src/lib/pipeline.py", line 169, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "Documents/open-covid-19/data/src/lib/data_source.py", line 205, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "Documents/open-covid-19/data/src/lib/data_source.py", line 86, in parse
    return self.parse_dataframes(self._read(sources, **read_opts), aux, **parse_opts)
  File "Documents/open-covid-19/data/src/lib/data_source.py", line 76, in _read
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "Documents/open-covid-19/data/src/lib/data_source.py", line 76, in <dictcomp>
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "Documents/open-covid-19/data/src/lib/io.py", line 93, in read_file
    path, **{**{"keep_default_na": False, "na_values": ["", "N/A"]}, **read_opts}
  File "Documents/open-covid-19/open-covid-env/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "Documents/open-covid-19/open-covid-env/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "Documents/open-covid-19/open-covid-env/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "Documents/open-covid-19/open-covid-env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "Documents/open-covid-19/open-covid-env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Automated error report

Traceback (most recent call last):
File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
return data_source.run(output_folder, cache, aux)
File "/home/vmagent/app/lib/data_source.py", line 222, in run
data = self.fetch(output_folder, cache, fetch_opts)
File "/home/vmagent/app/lib/data_source.py", line 73, in fetch
for idx, source_config in enumerate(fetch_opts)
File "/home/vmagent/app/lib/data_source.py", line 73, in
for idx, source_config in enumerate(fetch_opts)
File "/home/vmagent/app/lib/net.py", line 58, in download_snapshot
download(url, file_handle, **download_opts)
File "/home/vmagent/app/lib/net.py", line 79, in download
req = requests.get(url, headers=headers)
File "/env/lib/python3.7/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/env/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/env/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/env/lib/python3.7/site-packages/requests/sessions.py", line 665, in send
history = [resp for resp in gen] if allow_redirects else []
File "/env/lib/python3.7/site-packages/requests/sessions.py", line 665, in
history = [resp for resp in gen] if allow_redirects else []
File "/env/lib/python3.7/site-packages/requests/sessions.py", line 245, in resolve_redirects
**adapter_kwargs
File "/env/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/env/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Mexico data source fails to fetch

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:16:24.1400][MexicoDataSource] Error running data source MexicoDataSource with config {'name': 'pipelines.epidemiology.mx_authority.MexicoDataSource', 'fetch': [{'url': 'http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip'}], 'parse': {'encoding': 'ISO-8859-1'}, 'test': {'skip': True, 'metadata_query': "key.str.match('MX_.+')"}, 'automation': {'job_group': '29'}}
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "/env/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection raise err File "/env/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen chunked=chunked, File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request conn.request(method, url, **httplib_request_kw) File "/opt/python3.7/lib/python3.7/http/client.py", line 1229, in request self._send_request(method, url, body, headers, encode_chunked) File "/opt/python3.7/lib/python3.7/http/client.py", line 1275, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/opt/python3.7/lib/python3.7/http/client.py", line 1224, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/opt/python3.7/lib/python3.7/http/client.py", line 1016, in _send_output self.send(msg) File "/opt/python3.7/lib/python3.7/http/client.py", line 956, in send self.connect() File "/env/lib/python3.7/site-packages/urllib3/connection.py", line 187, in connect conn = self._new_conn() File "/env/lib/python3.7/site-packages/urllib3/connection.py", line 172, in _new_conn self, "Failed to establish a new connection: %s" % e urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f75bebc5a58>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/requests/adapters.py", line 449, in send timeout=timeout File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/env/lib/python3.7/site-packages/urllib3/util/retry.py", line 439, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='187.191.75.115', port=80): Max retries exceeded with url: /gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f75bebc5a58>: Failed to establish a new connection: [Errno 110] Connection timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/vmagent/app/lib/pipeline.py", line 169, in _run_wrapper return data_source.run(output_folder, cache, aux) File "/home/vmagent/app/lib/data_source.py", line 201, in run data = self.fetch(output_folder, cache, fetch_opts) File "/home/vmagent/app/lib/data_source.py", line 71, in fetch for idx, source_config in enumerate(fetch_opts) File "/home/vmagent/app/lib/data_source.py", line 71, in <dictcomp> for idx, source_config in enumerate(fetch_opts) File "/home/vmagent/app/lib/net.py", line 58, in download_snapshot download(url, file_handle, **download_opts) File "/home/vmagent/app/lib/net.py", line 79, in download req = requests.get(url, headers=headers) File "/env/lib/python3.7/site-packages/requests/api.py", line 76, in get return request('get', url, params=params, **kwargs) File "/env/lib/python3.7/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/env/lib/python3.7/site-packages/requests/sessions.py", line 530, in request resp = self.send(prep, **send_kwargs) File "/env/lib/python3.7/site-packages/requests/sessions.py", line 643, in send r = adapter.send(request, **kwargs) File "/env/lib/python3.7/site-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='187.191.75.115', port=80): Max retries exceeded with url: /gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f75bebc5a58>: Failed to establish a new connection: [Errno 110] Connection timed out'))
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:16:24.1410][DataPipeline] No output for MexicoDataSource with config {'name': 'pipelines.epidemiology.mx_authority.MexicoDataSource', 'fetch': [{'url': 'http://187.191.75.115/gobmx/salud/datos_abiertos/datos_abiertos_covid19.zip'}], 'parse': {'encoding': 'ISO-8859-1'}, 'test': {'skip': True, 'metadata_query': "key.str.match('MX_.+')"}, 'automation': {'job_group': '29'}}
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Worldbank datasource fails to fetch

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 226, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "/home/vmagent/app/pipelines/worldbank/worldbank.py", line 62, in parse
    download(sources[0], buffer, progress=True)
  File "/home/vmagent/app/lib/net.py", line 85, in download
    req.raise_for_status()
  File "/env/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://databank.worldbank.org/data/download/WDI_csv.zip

Brazil epi total_recovered contains wrong data

For BR, on 20/04/2020 and later dates, total_recovered is significantly greater than total_cases. It looks as though total_recovered is the running sum of new_recovered, so both contain wrong data.

Perhaps: what is currently in BR new_recovered should go in BR total_recovered?

Brazil data source fails to fetch

Traceback (most recent call last):
File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
return data_source.run(output_folder, cache, aux)
File "/home/vmagent/app/lib/data_source.py", line 222, in run
data = self.fetch(output_folder, cache, fetch_opts)
File "/home/vmagent/app/pipelines/epidemiology/br_authority.py", line 158, in fetch
return super().fetch(output_folder, cache, fetch_opts)
File "/home/vmagent/app/lib/data_source.py", line 73, in fetch
for idx, source_config in enumerate(fetch_opts)
File "/home/vmagent/app/lib/data_source.py", line 73, in
for idx, source_config in enumerate(fetch_opts)
File "/home/vmagent/app/lib/net.py", line 58, in download_snapshot
download(url, file_handle, **download_opts)
File "/home/vmagent/app/lib/net.py", line 80, in download
req.raise_for_status()
File "/env/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://s3-sa-east-1.amazonaws.com/ckan.saude.gov.br/dados-sp.csv

Wikipedia data source has multiple match failures

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:08:51.3769][WikipediaDataSource] No key match found for:
match_string New cases
date 2020-01-30
confirmed 1
deceased NaN
new_confirmed 1
total_confirmed 1
new_deceased NaN
total_deceased NaN
country_code IN
subregion2_code NaN
_match_string new cases
_vec New cases
Name: 24, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:08:51.3873][WikipediaDataSource] No key match found for:
match_string New deaths
date 2020-03-12
confirmed 1
deceased NaN
new_confirmed 1
total_confirmed 1
new_deceased NaN
total_deceased NaN
country_code IN
subregion2_code NaN
_match_string new deaths
_vec New deaths
Name: 25, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:08:51.4634][WikipediaDataSource] No key match found for:
match_string Unassigned cases
date 2020-04-25
confirmed 49
deceased NaN
new_confirmed 49
total_confirmed 49
new_deceased NaN
total_deceased NaN
country_code IN
subregion2_code NaN
_match_string unassigned cases
_vec Unassigned cases
Name: 34, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:08:59.9664][WikipediaDataSource] No key match found for:
match_string KPK
date 2020-05-01
confirmed 2799
deceased NaN
new_confirmed NaN
total_confirmed 2799
new_deceased NaN
total_deceased NaN
country_code PK
subregion2_code NaN
_match_string kpk
_vec KPK
Name: 4, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/env/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/env/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request

Automated error report

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 226, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 97, in parse
    return self.parse_dataframes(self._read(sources, **read_opts), aux, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 78, in _read
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/data_source.py", line 78, in <dictcomp>
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/io.py", line 95, in read_file
    return pandas.read_csv(path, **{**default_read_opts, **read_opts})
  File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
    data = parser.read(nrows)
  File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1133, in read
    ret = self._engine.read(nrows)
  File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 2037, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 2

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request

ZA fetch fails to find sheet "Eastern Cape"

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 226, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 97, in parse
    return self.parse_dataframes(self._read(sources, **read_opts), aux, **parse_opts)
  File "/home/vmagent/app/lib/data_source.py", line 78, in _read
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/data_source.py", line 78, in <dictcomp>
    return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
  File "/home/vmagent/app/lib/io.py", line 102, in read_file
    return pandas.read_excel(path, **{**default_read_opts, **read_opts})
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 334, in read_excel
    **kwds,
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 888, in parse
    **kwds,
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 439, in parse
    sheet = self.get_sheet_by_name(asheetname)
  File "/env/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 43, in get_sheet_by_name
    return self.book.sheet_by_name(name)
  File "/env/lib/python3.7/site-packages/xlrd/book.py", line 476, in sheet_by_name
    raise XLRDError('No sheet named <%r>' % sheet_name)
xlrd.biffh.XLRDError: No sheet named <'Eastern Cape'>

CZ hospitalization source fails with parsing error

Looks like a change in the sheet naming,

Traceback (most recent call last):
  File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
    return data_source.run(output_folder, cache, aux)
  File "/home/vmagent/app/lib/data_source.py", line 226, in run
    data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
  File "/home/vmagent/app/pipelines/hospitalizations/cz_authority.py", line 30, in parse
    data = page.select("#panel3-hospitalization")[0]
IndexError: list index out of range

Error when processing intermediate results

"Traceback (most recent call last):
  File "/env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/vmagent/app/appengine.py", line 221, in update_table
    output_folder / "intermediate", intermediate_results
  File "/home/vmagent/app/lib/pipeline.py", line 299, in _save_intermediate_results
    export_csv(result, intermediate_folder / file_name, schema=self.schema)
  File "/home/vmagent/app/lib/io.py", line 260, in export_csv
    data_fmt[col] = data[col].apply(lambda val: "" if pandas.isna(val) else fmt(val))
  File "/env/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "/home/vmagent/app/lib/io.py", line 260, in <lambda>

Google Mobility Data has multiple match failures

Most failures appear to be for Sweden, too many to paste all below so here is a snippet,


/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:11:24.4279][GoogleMobilityDataSource] No key match found for:
country_code SE
subregion1_name
subregion2_name
key NaN
match_string Ludvika Municipality
subregion2_code
country_region Sweden
metro_area NaN
iso_3166_2_code NaN
census_fips_code NaN
date 2020-02-15
mobility_retail_and_recreation NaN
mobility_grocery_and_pharmacy 3
mobility_parks NaN
mobility_transit_stations NaN
mobility_workplaces NaN
mobility_residential NaN
_key NaN
_vec SE|||nan|Ludvika Municipality|
Name: 4459, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Cache pull request fails for source

Cache pull failed for https://docs.google.com/spreadsheets/d/e/2PACX-1vQKB9-H7tQmfs_MwoITnT8Z0j6qxPXkEQAL4DpgZ2gLRjGzR_pr4Oew-9XjYOClHbqqWT69Rrg_Tp4h/pub?output=csv

UK authority data source changed API/death processing

RuntimeError: Request failed: {"response":"Invalid parameter 'newDeathsByPublishDate' in the requested JSON structure. Did you mean 'newTestsByPublishDate'?","status_code":404,"status":"Not Found"}
at _get (/env/lib/python3.7/site-packages/uk_covid19/api_interface.py:226)
at get_json (/env/lib/python3.7/site-packages/uk_covid19/api_interface.py:278)
at parse (/home/vmagent/app/pipelines/epidemiology/gb_authority.py:102)
at parse (/home/vmagent/app/pipelines/epidemiology/gb_authority.py:131)
at run (/home/vmagent/app/lib/data_source.py:226)
at _run_wrapper (/home/vmagent/app/lib/pipeline.py:164)

The UK govt is changing the way they report deaths. From the official document (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/908781/Technical_Summary_PHE_Data_Series_COVID-19_Deaths_20200812.pdf)

There are 2 definitions of a death in a person with COVID-19 in England, one
broader measure and one measure reflecting current trends:

  1. A death in a person with a laboratory-confirmed positive COVID-19 test and
    either died within 60 days of the first specimen date or died more than 60 days
    after the first specimen date, only if COVID-19 is mentioned on the death
    certificate
  2. A death in a person with a laboratory-confirmed positive COVID-19 test and
    died within (equal to or less than) 28 days of the first positive specimen date
    c. A measure of acute deaths which can be used to understand current trends is
    further defined as a โ€œa death in any person with a laboratory-confirmed positive
    COVID-19 test AND within (equal to or less than) 28 days of the first positive
    specimen date.

Unfortunately they already changed the API.

publish.py shows memory issues when trying to convert main.csv to json

Traceback (most recent call last):
  File "/srv/publish.py", line 237, in try_json_covert
    convert_csv_to_json_records(schema, csv_file, json_output)
  File "/srv/lib/memory_efficient.py", line 215, in convert_csv_to_json_records
    raise ValueError(f"Size of {csv_file} too large for conversion: {file_size // 1E6} MB")
ValueError: Size of /tmp/tmp6beg7d3c/public/main.csv too large for conversion: 403.0 MB

Match failures in Covid19IndiaOrg source

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:18:55.2348][Covid19IndiaOrgDataSource] No key match found for:
Status
subregion1_code TT
date 2020-04-01
new_confirmed 424
Deceased 9
new_recovered 16
country_code IN
_vec TT
Name: 34, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:18:55.2434][Covid19IndiaOrgDataSource] No key match found for:
Status
subregion1_code UN
date 2020-04-01
new_confirmed 0
Deceased 0
new_recovered 0
country_code IN
_vec UN
Name: 35, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Spain level 1 data

Hi,

I see that the data for Spain at subregion1_name in cases goes only at 8/2 (main) and 8/3(epi). Could you double check the source please?

Thank you

Thank you and request

Thank you for maintaining this repo! Is it possible to include the Continent column corresponding to each location?

Slovenia data source fails to run due to key error on "date" field

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:02:38.6778][SloveniaDataSource] Error running data source SloveniaDataSource with config {'name': 'pipelines.epidemiology.si_authority.SloveniaDataSource', 'fetch': [{'url': 'https://www.gov.si/assets/vlada/Koronavirus-podatki/en/EN_Covid-19-all-data.xlsx'}], 'test': {'metadata_query': "key == 'SI'"}, 'automation': {'job_group': '24'}}

Traceback (most recent call last):
  File "/env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'date'

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/vmagent/app/appengine.py", line 350, in report_errors_to_github
return register_new_errors(os.getenv(ENV_PROJECT))
File "/home/vmagent/app/scripts/cloud_error_processing.py", line 54, in register_new_errors
.list(projectName="projects/{}".format(gcs_project_name), timeRange_period="PERIOD_1_DAY")
File "/env/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/env/lib/python3.7/site-packages/googleapiclient/http.py", line 907, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://clouderrorreporting.googleapis.com/v1beta1/projects/github-open-covid-19/groupStats?timeRange.period=PERIOD_1_DAY&alt=json returned "Error Reporting API has not been used in project 776840531071 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/clouderrorreporting.googleapis.com/overview?project=776840531071 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.". Details: "[{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Google developers console API activation', 'url': 'https://console.developers.google.com/apis/api/clouderrorreporting.googleapis.com/overview?project=776840531071'}]}]">

BR Authority data source error (Automated error report)

Traceback (most recent call last):
File "/home/vmagent/app/lib/pipeline.py", line 164, in _run_wrapper
return data_source.run(output_folder, cache, aux)
File "/home/vmagent/app/lib/data_source.py", line 222, in run
data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts)
File "/home/vmagent/app/pipelines/epidemiology/br_authority.py", line 163, in parse
return super().parse(sources, aux, **parse_opts)
File "/home/vmagent/app/lib/data_source.py", line 97, in parse
return self.parse_dataframes(self._read(sources, **read_opts), aux, **parse_opts)
File "/home/vmagent/app/lib/data_source.py", line 78, in _read
return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
File "/home/vmagent/app/lib/data_source.py", line 78, in
return {name: read_file(file_path, **read_opts) for name, file_path in file_paths.items()}
File "/home/vmagent/app/lib/io.py", line 95, in read_file
return pandas.read_csv(path, **{**default_read_opts, **read_opts})
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in init
self._make_engine(self.engine)
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1937, in init
_validate_usecols_names(usecols, self.orig_names)
File "/env/lib/python3.7/site-packages/pandas/io/parsers.py", line 1233, in _validate_usecols_names
"Usecols do not match columns, "
ValueError: Usecols do not match columns, columns expected but not found: ['sexo', 'dataEncerramento', 'classificacaoFinal', 'evolucaoCaso', 'estadoIBGE', 'dataInicioSintomas', 'idade', 'resultadoTeste', 'dataTeste', 'municipioIBGE']

Match failures in NYT L3 data source

Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:45.2053][NytCovidL3DataSource] Key provided but not found in metadata:
subregion1_name                         Northern Mariana Islands
subregion2_code                                            69110
subregion1_code                                               MP
key                                                  US_MP_69110
date                                                  2020-07-14
total_confirmed                                               35
total_deceased                                                 2
_vec               Northern Mariana Islands|69110|MP|US_MP_69110
Name: 2015, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:45.2072][NytCovidL3DataSource] Key provided but not found in metadata:
subregion1_name                         Northern Mariana Islands
subregion2_code                                            69120
subregion1_code                                               MP
key                                                  US_MP_69120
date                                                  2020-07-14
total_confirmed                                                1
total_deceased                                                 0
_vec               Northern Mariana Islands|69120|MP|US_MP_69120
Name: 2016, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:45.4589][NytCovidL3DataSource] Key provided but not found in metadata:
subregion1_name                         Virgin Islands
subregion2_code                                  78010
subregion1_code                                     VI
key                                        US_VI_78010
date                                        2020-04-06
total_confirmed                                     13
total_deceased                                       0
_vec               Virgin Islands|78010|VI|US_VI_78010
Name: 2865, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:45.4602][NytCovidL3DataSource] Key provided but not found in metadata:
subregion1_name                         Virgin Islands
subregion2_code                                  78020
subregion1_code                                     VI
key                                        US_VI_78020
date                                        2020-04-06
total_confirmed                                      2
total_deceased                                       0
_vec               Virgin Islands|78020|VI|US_VI_78020
Name: 2866, dtype: object
  warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Documents/open-covid-19/data/src/lib/error_logger.py:30: UserWarning: [2020-07-30T15:33:45.4613][NytCovidL3DataSource] Key provided but not found in metadata:
subregion1_name                         Virgin Islands
subregion2_code                                  78030
subregion1_code                                     VI
key                                        US_VI_78030
date                                        2020-04-06
total_confirmed                                     29
total_deceased                                       0
_vec               Virgin Islands|78030|VI|US_VI_78030
Name: 2867, dtype: object

Date formats incorrect for LY and CZ

The output of epidemiology.csv has invalid date formats at the start of the file (500+ lines).

e.g.

01.04.2020,CZ_10,,3,,,,3,,

07/21/2020,LY_BA,,,,,46,3,37,

Peru province data

Hi,

I noticed some discrepancies with the subregion1 level data in Peru. For example in Madre de Dios and Arequipa the total_ confirmed (and by extension the new_confirmed) cases are much lower than what is reported currently, both are in the 100s, but the total cases in both regions are in the 1000s (according to online reporting). Could you double-check the source please?

Thank you

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/env/lib/python3.7/site-packages/grpc/_channel.py", line 826, in call
return _end_unary_response_blocking(state, call, False, None)
File "/env/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

Finland data source times out

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='sampo.thl.fi', port=443): Max retries exceeded with url: /pivot/prod/fi/epirapo/covid19case/fact_epirapo_covid19case.csv?column=hcdmunicipality2020-445268L (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8966b67f28>: Failed to establish a new connection: [Errno 110] Connection timed out'))
at send (/env/lib/python3.7/site-packages/requests/adapters.py:516)
at send (/env/lib/python3.7/site-packages/requests/sessions.py:643)
at request (/env/lib/python3.7/site-packages/requests/sessions.py:530)
at request (/env/lib/python3.7/site-packages/requests/api.py:61)
at get (/env/lib/python3.7/site-packages/requests/api.py:76)
at download (/home/vmagent/app/lib/net.py:79)
at _pull_source (/home/vmagent/app/appengine.py:165)

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/env/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/env/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request

Automated error report

Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/env/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/env/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/env/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/env/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/env/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/vmagent/app/appengine.py", line 350, in report_errors_to_github
return register_new_errors(os.getenv(ENV_PROJECT))
File "/home/vmagent/app/scripts/cloud_error_processing.py", line 57, in register_new_errors
gh_issue_handler = GithubIssueHandler(gcs_project_name)
File "/home/vmagent/app/scripts/cloud_error_processing.py", line 21, in init
self._password = self._get_github_token()
File "/home/vmagent/app/scripts/cloud_error_processing.py", line 31, in _get_github_token
response = client.access_secret_version(name)
File "/env/lib/python3.7/site-packages/google/cloud/secretmanager_v1/gapic/secret_manager_service_client.py", line 968, in access_secret_version
request, retry=retry, timeout=timeout, metadata=metadata
File "/env/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in call
return wrapped_func(*args, **kwargs)
File "/env/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/env/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/env/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/env/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "", line 3, in raise_from
google.api_core.exceptions.PermissionDenied: 403 Permission 'secretmanager.versions.access' denied for resource 'projects/github-open-covid-19/secrets/github-token/versions/latest' (or it may not exist).

Scotland hospitalizations data source fails to fetch


/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:12:17.4273][ScotlandDataSource] Error running data source ScotlandDataSource with config {'name': 'pipelines.hospitalizations.gb_authority.ScotlandDataSource', 'fetch': [{'url': 'https://www.gov.scot/binaries/content/documents/govscot/publications/statistics/2020/04/coronavirus-covid-19-trends-in-daily-data/documents/covid-19-data-by-nhs-board/covid-19-data-by-nhs-board/govscot%3Adocument/COVID-19%2Bdata%2Bby%2BNHS%2BBoard.xlsx'}], 'test': {'metadata_query': "key.str.match('GB_SCT.*')", 'skip': True}, 'automation': {'job_group': '9'}}
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
Traceback (most recent call last): File "/env/lib/python3.7/site-packages/xlrd/book.py", line 474, in sheet_by_name sheetx = self._sheet_names.index(sheet_name) ValueError: 'Table 3a - Hospital Confirmed' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/vmagent/app/lib/pipeline.py", line 169, in _run_wrapper return data_source.run(output_folder, cache, aux) File "/home/vmagent/app/lib/data_source.py", line 205, in run data = self.parse(data, {name: df.copy() for name, df in aux.items()}, **parse_opts) File "/home/vmagent/app/pipelines/hospitalizations/gb_authority.py", line 48, in parse sources[0], sheet_name="Table 3a - Hospital Confirmed", value_name="new_hospitalized" File "/home/vmagent/app/pipelines/hospitalizations/gb_authority.py", line 26, in _parse data = read_file(file_path, sheet_name=sheet_name) File "/home/vmagent/app/lib/io.py", line 102, in read_file path, **{**{"keep_na": False, "na_values": ["", "N/A"]}, **read_opts} File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 334, in read_excel **kwds, File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 888, in parse **kwds, File "/env/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 439, in parse sheet = self.get_sheet_by_name(asheetname) File "/env/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 43, in get_sheet_by_name return self.book.sheet_by_name(name) File "/env/lib/python3.7/site-packages/xlrd/book.py", line 476, in sheet_by_name raise XLRDError('No sheet named <%r>' % sheet_name) xlrd.biffh.XLRDError: No sheet named <'Table 3a - Hospital Confirmed'>
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:12:17.4302][DataPipeline] No output for ScotlandDataSource with config {'name': 'pipelines.hospitalizations.gb_authority.ScotlandDataSource', 'fetch': [{'url': 'https://www.gov.scot/binaries/content/documents/govscot/publications/statistics/2020/04/coronavirus-covid-19-trends-in-daily-data/documents/covid-19-data-by-nhs-board/covid-19-data-by-nhs-board/govscot%3Adocument/COVID-19%2Bdata%2Bby%2BNHS%2BBoard.xlsx'}], 'test': {'metadata_query': "key.str.match('GB_SCT.*')", 'skip': True}, 'automation': {'job_group': '9'}}
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Phillipines data sources show multiple failed matches

/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:34.5223][PhilippinesDataSource] No key match found for:
match_string
subregion2_code NaN
date 2020-03-07
match_string_province
age
sex female
new_confirmed 1
new_deceased 0
new_recovered 0
new_hospitalized 0
match_string_region NaN
country_code PH
_vec |nan
Name: 0, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:34.5330][PhilippinesDataSource] No key match found for:
match_string
subregion2_code
date 2020-03-06
match_string_province NaN
age
sex female
new_confirmed 1
new_deceased 0
new_recovered 0
new_hospitalized 0
match_string_region NCR
Defaul
country_code PH
_vec |
Name: 1, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:34.7802][PhilippinesDataSource] No key match found for:
match_string CITY OF ISABELA (NOT A PROVINCE)
subregion2_code
date 2020-06-25
match_string_province NaN
age
sex female
new_confirmed 4
new_deceased 0
new_recovered 0
new_hospitalized 0
match_string_region Region IX: Zamboanga Peninsula
country_code PH
_vec CITY OF ISABELA (NOT A PROVINCE)|
Name: 31, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:34.7914][PhilippinesDataSource] No key match found for:
match_string COTABATO (NORTH COTABATO)
subregion2_code
date 2020-03-20
match_string_province NaN
age
sex female
new_confirmed 1
new_deceased 0
new_recovered 0
new_hospitalized 0
match_string_region NaN
country_code PH
_vec REPATRIATE|nan
Name: 80, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")
/home/vmagent/app/lib/error_logger.py:30: UserWarning: [2020-07-30T13:00:35.2155][PhilippinesDataSource] No key match found for:
match_string SAMAR (WESTERN SAMAR)
subregion2_code
date 2020-03-25
match_string_province NaN
age
sex male
new_confirmed 1
new_deceased 0
new_recovered 0
new_hospitalized 1
match_string_region Region VIII: Eastern Visayas
country_code PH
_vec SAMAR (WESTERN SAMAR)|
Name: 83, dtype: object
warnings.warn(f"{''.join((f'[{tag}]' for tag in tags))} {msg}")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.