Giter VIP home page Giter VIP logo

covid's People

Contributors

jhellewell14 avatar joehickson avatar kathsherratt avatar sbfnk avatar seabbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid's Issues

Date of report

Put the date on which the report was generated somewhere at the top of each page, as well as "using data up to 2020-xx-xx" - the Rt estimates "as of..." can give the impression that it's outdated.

rate of growth, R^2

Ref: https://epiforecasts.io/covid/posts/national/united-states/ Figure 3

First, the rate of growth should be properly defined. If the serial interval is assumed to be constant, then rate of growth and effective reproduction number is equal. So rate of growth needs a proper time frame, IIRC it is the daily growth rate - in the beginning we had ~25% more infections each day.

I have not fully understood how the results of Fig. 3C on the coefficient of determination were calculated. Does it show an evaluation on A or B? The caption says "with values closer to 1 indicating a better fit", so I would doubt the whole calculation if R squared goes negative.

trend categories: NA vs unsure vs ...

For color / labelling scheme:

  • is there a likely decreasing / decreasing category? if so, should show even if there aren't any examples currently
  • there's no apparent gradient currently from increasing -> likely increasing -> unsure, but there is conceptually; worth having a gradient in the color scheme?
  • the NAs need to be more distinct from other colors
  • the NAs could use a more useful label. Seems like there's two flavors? No reported cases, and not-yet-at-100 cases threshold?

Short term changes

  • Clarify what confidence is in figure captions (perhaps we can drop entirely as added lag?): " In the figures showing RT over time "Confidence in the estimated values is indicated by shading with reduced shading" maybe clearer talking about transparency as shading may lead people to think about the confidence interval."

Website feedback

Overall I think this is great. Comments below are meant as helpful suggestions, not eviscerating the work that's been done.

  • Agree with @kathsherratt in #7 on colour. Greys are too similar, and I'm not sure that blue adequately represents that there are increasing cases. I don't think red-yellow-green is the right palette to use but colorbrewer's 3-class RdYlBu scheme is broadly interpretable as "bad", "not so bad", "good". The five-class scheme can be used for increasing, likely increasing, unsure, likely decreasing, decreasing, with grey reserved for "No data".

  • Equirectangular map projection is better than Mercator but you can probably ditch Antarctica and consider a separate plot for each of the World Bank's continent definitions, optionally splitting the Americas into South and "North and Central". There's too big a difference in variation in country area (both true and distorted) to just show all nations on the one map.

  • Figures 2 and 3 are missing, or relabelling has broken.

  • I assume that "likely increasing" has a median R0 > 1 but that 1 is outside the 50% and inside the 90% interval. This isn't confirmed on the page, though, and makes it a little difficult to interpret figures 1 and 4.

  • Using a different colour for the R0 estimates and the number of cases by date of infection would help make it clear to the reader that figures 5 (and 7) and 6 (and 8) show different things. The consistency of style is great, but the colouring can help identify that they're different. Please don't reuse the colours from the likely increasing/decreasing when doing this, though.

  • Caption for table 2 should go above, and might be worth putting in the note that when the doubling time estimate is "cases decreasing" that this corresponds to cases no longer doubling and hence the doubling time is effectively infinite. One way we got around this with the outbreak delay paper was to say "at least 4.5 days" rather than "4.5 - outbreak delayed".

Really great work.

y on summary plot axis

plot_summary from EpiNow is showing a small y on the axis when it should not. Remove for next update.

Brazilian data adm level 1

Hi,

I have been working in a Brazilian task force for covd-19.

Would you guys be interested to add Brazilian data at subnational level (adm 1)? I could help point out where the data is and help with translation if needed.

Cheers,
Leo

Seb comments

  • Highlight the impact of changes in testing, testing saturation and general step changes in testing on all pages.

  • “new infections” - is possibly a misleading label. It’s the number of new infections that ultimately get confirmed (which means something different in every country). Perhaps call it “New cases by infection date” and in the figure labels like Fig. 6 on “Global” slightly rephrase to “Cases by date of report and their estimated date of infection”.

  • big reproduction number plot (global) - would this look bad if they were all on the same time scale (probably, because of China)?

  • Latest estimates table: I’d remove “new infections” for the reasons given above, unless we can come up with a better term

Speed up using sequential regions

Some regions/countries take much longer to be simulated than others. It might make sense to run all regions sequentially with parallelisation within each region. For example the USA regional breakdown has one state that takes 3 times longer to run than any other during this time all cores excepting one are idle.

Nick comments

This looks great.

  • Comments: The ribbons showing cases by date of infection are beautiful but to me are a little tricky to interpret. I hate to say this, but would this maybe be clearer as a geom_pointrange (with a thicker line for 50% CrI?)

  • The title "Summary of latest reproduction number and case count estimates by date of infection" to me was confusing as the plot is not by date.

  • Can there be more detail on the difference between wide CrI and the measure of uncertainty that gets reflected as translucency?

  • Can you show distributions that go into estimating e.g. time between infection and reporting?

  • Can I ask about the doubling times which often have an upper bound of infinity, and for which the point estimate sometimes is not within the uncertainty range (e.g. -100 (14 – Inf) for Italy)?

Add citation info

The website now has a DOI (DOI) - need to add this to the website along with citation info

Mismatch in USA data sources

It looks like ECDC case counts and John hopkins data have different case counts leading to different number of a national to state scale in the USA

Data for interactive maps and data visualisations

I have been working on interactive maps and data visualisations consulting with @seabbs. The vis are interactive svgs written with d3.js, and packaged as html widgets for inclusion in the .Rmd document.

There is a sample vis here.

Can I request 2 files to make this more straightforward and reliable?

rt.csv is working well to generate the r0 plot for each country. Could we have a similar file for the nowcasts and a summary file for country classifications?

For nowcasts:

Proposed file columns:

country : country name - using the same country names as rt.csv
date : date
median : nowcast median
lower_90 : nowcast lower 90% CI - currently named 'bottom'
upper_90 : nowcast upper 90% CI - currently named 'top'
lower_50 : nowcast lower 50% CI - currently named 'lower'
upper_50 : nowcast upper 50% CI - currently named 'upper'
cases : number of cases on each date - all NA values set to 0

(assuming that the original columns are 50% and 90% CI's)

This would make the format of the new nowcast csv file the same as rt.csv.

For individual country visualisations, these datasets could just be subset for each country and we could output static svgs with the same styles.

For summary map:

Could we output a summary csv file of the classification of each country.

Proposed file columns:

country : country name - using the same country names as rt.csv and the above file.
trajectory : with 6 coded values - decreasing, likely_decreasing, unsure, increasing, likely_increasing, no_data

This would allow for a quick join of the map data to each day's classification and then some styles.

Thanks, let me know what you think.

Medium term changes

  • Add a second global map that is split by continent (@samclifford).
  • Make NAs be informative (can be short term if someone has a speedy fix @jhellewell14 @pearsonca). Flagged this in caption, limitations, and methods for now (3f63772
    )
  • Figure interactivity
  • Consider fixed x-axis in comparative plots
  • Add two dimensions to plots with hatching and colours for certainty and scale
  • Split nowcast results from website to make more modular. Keep nowcast results that are out of data off GitHub in deep store to keep the git repo relatively light weight. Do this by moving now-casts into a new repo covid-nowcasts and using a sub module. Regular git history purges and archives to dropbox for historic results.
  • Show report delay as a plot in the methods

Colour Palette

Multiple reviewers have flagged colour palette as an issue for the map and summary plot. In this meta-thread please battle out your colour palette choices. (I will then choose the survivor)

  • Choose palette
  • Implement palette

Selected comments

From @samclifford: Agree with @kathsherratt in #7 on colour. Greys are too similar, and I'm not sure that blue adequately represents that there are increasing cases. I don't think red-yellow-green is the right palette to use but colorbrewer's 3-class RdYlBu scheme is broadly interpretable as "bad", "not so bad", "good". The five-class scheme can be used for increasing, likely increasing, unsure, likely decreasing, decreasing, with grey reserved for "No data".

From @kathsherratt: Figure 1: “Likely Increasing” colour grey > looks a bit too close to NA to me - could change to e.g. light blue - or to a sequential scale so that all three values are in colour order (Increasing > Likely increasing > Unsure)

From @jhellewell14 : Pick better colours to correspond to Increasing, Likely increasing etc. and sync them with all other plots (begun this in a branch)

From @pearsonca: there's no apparent gradient currently from increasing -> likely increasing -> unsure, but there is conceptually; worth having a gradient in the color scheme?
the NAs need to be more distinct from other colors

A Reminder of the stakes

globe

Venezuela R=

Do you have information about R0 COVID-19 for Veenzuela?

Minor comments

Some minor and non-critical comments:

Global summary page:

  • General: use of word “regions” > personal preference is the word “countries” instead as regions sounds more like a group of countries (but only a preference, either works)
  • Figure 1: “Likely Increasing” colour grey > looks a bit too close to NA to me - could change to e.g. light blue - or to a sequential scale so that all three values are in colour order (Increasing > Likely increasing > Unsure)
  • Figure 4: it might be easier to read and compare if ordered by geographic region, but not a problem as is
  • Figure 8: to me the title reads a bit awkwardly and doesn’t intuitively match the caption (which explains really well). Going by the caption, I’d change from “Cases with date of onset on the day of report generation in all regions” to “Cases by date of report, and estimated cases by date of infection”

Methods page:

  • Typo in header caption: “publically” > “publicly” available data

German region map

It looks like German regions are now not mapping correctly. This may be a change in region name processing or maybe a problem higher up the tool chain (NCoVUtils).

-> Release

To do:

  • Make it possible to fit reporting delay across a single region and then apply in sub regions (EpiNow - inprogress)
  • Add fitting of report delay to the regional pipeline (EpiNow -> in progress)
  • Reogranise global/nowcast to be based on regional_rt_pipeline rather than the current custom set up
  • Write a function to generate a page for each country rather than manual creation
  • Make sure the global map still works
  • Check each regional analysis still works
  • Move from using the templates here to using the templates built into EpiNow (broadly the same)
  • Change title from Epiforecasts -> Covid-19
  • Think about URL structure vs organisation in the repo. It probably makes sense to move all pages into the _posts folder at build time but otherwise keep separate.
  • Sort the organisation of pages in the site yml or find another way to organise
  • Update contributors page without splits
  • Update methods, summary, and limitations with new methods
  • Review and bug check
  • Run all analysis
  • Check
  • Internal peer review
  • Changes from internal peer review
  • Release

Data for Germany

Great work, thank you for providing this!

Could you please clarify which exact data source you are using for Germany? The linked source https://github.com/jgehrcke/covid-19-germany-gae is providing official data from RKI as well as data "curated" by two large german newspapers. The newspaper source is always more recent but the quality of the curation is debatable. The data plotted at your tool today looks unreliable (seems to have a gap for the last days and looks in general inconsistent to the offical data). Maybe it would be worth considering switching to the official RKI data? Might even be an option to take RKI data augmenting it using other source for the last three days where RKI is lacking behind? Please see screenshots attached.

Thank you

"Your Data" (gap, weird jumps):
Screenshot 2020-05-04 at 19 37 02
.
"Official Data" (dense, looking very consistent in general except for the weekend effect):
Screenshot 2020-05-04 at 19 36 37

Restrict scale on state comparison

It'd be nice if the comparative plot was limited to a maximum y axis value of 3 (similar to the other R plots). Guam has been throwing things off for the last several updates:

image

website update interval

The website was updated every ~2 days for a while, until about a week ago, nothing since. I suggest adding a clear indication on the website with the update frequency/schedule.

Reported doubling times in table have strange intervals

At the time of writing, Table 1 on https://epiforecasts.io/covid/posts/global/ has some bizarre entries, such as Australia's doubling time of -71 (7 – -6), or Austria's of 440 (15 – -16). Belgium has -14 (-22 – -10), which indicates the issue might be the way doubling/halving times are reported back to the table when the interval contains 0.

Cameroon is 110 (15 – -21) and the estimate is not contained within the interval at all. Same with Cote d'Ivoire, 51 (9.7 – -16). Croatia is -10 (15 – -3.8). Bahrain is 200 (17 – -20). These are reflected in the national summaries, e.g. https://epiforecasts.io/covid/posts/national/cameroon/ and it's clear that there's something wrong when converting from growth rates to doubling/halving times.

UK references

References on the UK sub-national page need updating to reflect new data sources, as below:

‘Coronavirus (COVID-19) UK Historical Data’. [link] White, T., 2020
‘Coronavirus (COVID-19) Cases in the UK’. [link] Public Health England, 2020

Highlight testing limitation

Make it very clear on all pages that estimates are impacted by changes in testing and reporting. Running out of tests is a particular issue until a new reporting equilibrium is arrived at.

IT region map

Two of Italy's regions are missing both in the map and the table:

  • Basilicata (Provinces Potenza and Matera)
  • Molise (Provinces Campobasso and Isernia)

Website feedback

Methods:

  • spacing issues after citations (4f6be5e)
  • need to space out the equations more (4f6be5e)

Contributors

  • Decide upon final organisation of contributors

Italy

  • Fix NAs on Italy map. Partially fixed by changing region_codes.rds but other regions don't seem to return any result, need to check why. (2af1778)

Maps

  • Pick better colours to correspond to Increasing, Likely increasing etc. and sync them with all other plots (begun this in a branch)
  • Better explanation of why certain locations are NA, state that not enough data/cases etc

Layout

  • Put Figure 1 above the summary table on each region page

Projections for New Zealand seem to be erroneous

I note that for a day or two the projections for New Zealand appear to be very high. The predicted range for the Reproduction number having a range up to 75 and the recorded cases being dwarfed by the predicted range.

I've attached an image to illustrate.

(Niger and Palestine may be similar.)

The doubling time plot also looks odd.

I have not personally checked the number of data points and how that might impact the computations.
COVID-19_CMMID_NewZealand_TrendsWithDetail_20200430

Remove file LICENSE as there is also LICENSE.md

I had a slight confusion when trying to find the licence of this repo: the top-right link "View License" on https://github.com/epiforecasts/covid points to file LICENSE, which appears to say that it's a normal proprietary software:

YEAR: 2020
COPYRIGHT HOLDER: Epiforecasts

However then I found LICENSE.md, which states it's open source software under the MIT license. Which is great 🙂 The redundant and incomplete / confusing file LICENSE can probably be simply deleted. Github will then automatically pick up LICENSE.md for the licensing information.

Data issues

You have mixed up total cases with daily case load. The R0 plots for some states are clearly wrong. Hawaii, Montana, Alaska.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.