Giter VIP home page Giter VIP logo

task-ts's People

Contributors

adzuci avatar antonpolishko avatar efawe avatar isaacmg avatar kritim13 avatar maggiewyzw avatar mgavish avatar pranjalya avatar wwymak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

task-ts's Issues

Add BEAT-19 Data

We would like to incorporate the BEAT-19 data into our models.

Acceptance Criteria

  • Code to join in BEAT19 Data incorporated into data crawler.
  • Appropiate unit test coverage
  • Code committed on GitHub with passing unit tests.

Experiments with weather data added

From an epidemiological perspective it would be useful to see if weather data helps or hinders model performance. Also it would be useful to what features the model attends to.

  • Run on target large US counties w/o transfer
  • Run on target large US counties w transfer (flow)
  • Run on target large US counties w transfer (county+flow)
  • Run on target large US counties w transfer (county)
  • Run on target large Italy counties w/o transfer
  • Run on target large Italy counties w transfer (flow)
  • Run on target large Italy counties w transfer (county+flow)
  • Run on target large Italy counties w transfer (county)

Poster preparation

We will need to prepare a one page PDF poster for our workshop on Global Health at ICML.

Acceptance Criteria

  • 1 Page PDF
  • PDF verified by team members

Demographic and Disease Prevalence data

We want to gather static demographic data on the factors that may affect the number of cases/deaths for each county:
Population age groups
Disease prevalence
Income level
Population density
Mean distance to hospital
Primary industries
Race
etc

Acceptance criteria:

  • Data should be collected for every geographic region listed in our data-frame.
  • Data should be saved to GCS/Dataverse for future analysis.
  • Data should include a column specifying the sub_region in the df.
  • Code should be committed to our task-ts repo

Update data crawler to include sub_region in mobility data

Current version of mobility data in master has data at region level.
Task-geo is update with latest mobility data at sub-region level.
Update the data crawler to reflect the latest change.

This issue only occurs when installing task-geo from source which gives the latest but not stable version.

Benchmark models and differences versus California County Models

Based on the results of #53 we want to compare results for our best models and their ability to accurately forecast versus California models. We also want to highlight our methodological differences and how our approach can enhance forecast versus California models

Acceptance criteria

  • Utilize California models to back-test on May 30-Jun-14.
  • Compare MSE of our best models versus California models
  • Write Wandb report to show difference between the two models.

Large scale pre-trained weights on flow for transfer

In order to successfully examine pre-training we need to pre-train a large number of weights on river flow data.

Acceptance Criteria

  • Pre-trained weights for 3 and 4 encoders
  • Pre-trained weights for flow data stashed to GCS trained on at least 50 rivers.
  • Documentation page detailing what rivers model trained on and their evaluation metrics.

Deploy county cases and hospitalizations forecasting model with features

A central goal of our COVID-19 forecasting efforts is to deploy a that provides tangible value to public policy officials. In order to do this we need the following steps completed

  • Add models that effectively generalize. Including coming out of lockdown, resuming lockdown, giant pool parties, bars opening, etc. Possible approaches to this include Neural ODE's, probabilistic models, auto encoders (i.e. Uber method) and other models.
  • Create automatic module to compare model performance to California county baseline models.
  • Have epidemiologist evaluate the model's learned features and give feedback.
  • Create continuing evaluation mode in flow-forecast repository
  • Create inference mode in flow-forecast repository
  • Create Docker containers for flow-forecast deploy.
  • Create Airflow DAG to run deployed Docker container and persist predictions.
  • Create Web based app for epis to view predictions, relevant features, and different scenarios. (This will likely be several separate issues when we get there)
  • Create descriptions to analyze past model performance.

Rolling 7 day average

Based on @wwymak comments we would like to try forecasting cases on a rolling weekly average. While this removes a degree of granularity it may ease problems with reporting issues.

Acceptance criteria:

  • Experiments logged on primary counties list.
  • Pre-processing code for merged into the task-ts repo
  • Report detailing finding of using the seven day average.

Run experiments with enhanced evaluation metrics

  • Run on primary US counties w/o transfer
  • Run on primary US counties w transfer (flow)
  • Run on primary US counties w transfer (county)
  • Run primary US counties w transfer (flow + county)
  • Run on Italy Counties w/o transfer
  • Run on Italy counties w transfer (flow)
  • Run on Italy counties w transfer (county)
  • Run on Italy counties w transfer (flow + county)

Multiple counties with same name

The data frame should not be masked on county alone otherwise we get this problem. Many counties particularly in the US have the exact same name. Instead create a new column that concatenates county and state/province. This will ensure no counties wind up in same geo forecasting segment. This is causing really weird negative case numbers which LSTM does not understand.
image

will it work for multivariate time series?

great code thanks
may you clarify :
will it work for multivariate time series?
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color        weight     gender  height  age  

1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32

Deploy county new-cases forecasting model to production

As an organization we want to inform public policy makers and residents if their county could be at increased risk for an outbreak. In order to do this we need to have daily updated results on new data and a simple dashboard to display predicted cases along with the CI for a specific county. Targeted completion

  • #43 Run experiments with enhanced evaluation metrics
  • #52 Determine whether to use transfer and best hyper-parameters
  • #53 Select best models
  • Have epidemiologist verify candidate models
  • Test models on new data (July 10th+)
  • Add inference mode to flow-forecast
  • #61 Create cloud function to partition new cases by county
  • Create Docker container for serving flow-forecast models and deploy.
  • Create Airflow DAG that runs model(s) daily and persists results.

Preform exploratory analysis of BEAT-19 Data

Before integrating the BEAT-19 in #45 we need to explore the file structure

Acceptance Criteria:`

  • Do survey participants have multiple time steps? If so what are the average number per participant?
  • What percentage of columns have null values and what percentage of the time?
  • Average number of entries per-day per county (determine what zips map to counties county-to-zip)
  • Have Serge look over the findings

Insight into transmission dynamics and public policy interventions

Public policy officials and epidemiologists could use information on how specific policies impact new cases. Specifically, we would like to be able to determine the casual impacts of masking and social distancing.

  • Incorporate interpretability features into flow
  • Evaluate models with interpretability features #55
  • Review relevant epidemiology studies based on previous steps

Report to select the "best models"

Based on the results of #43 I would like to write a formal report to examine the best models both in terms of test_loss and test_loss on the final week.

Acceptance criteria

  • Report on Wandb detailing tradeoffs between overall test_loss and test_loss on the final week.
  • Investigation of which models have the best Sharpe values
  • Evaluation of our model versus O IMHE and other CDC models

Bug with loop_through_locations function

/usr/local/lib/python3.6/dist-packages/pandas/core/strings.py in _validate(data)
   2096 
   2097         if inferred_dtype not in allowed_types:
-> 2098             raise AttributeError("Can only use .str accessor with string values!")
   2099         return inferred_dtype
   2100 

AttributeError: Can only use .str accessor with string values!

CDC Comparing models

Background: We need an objective way to compare our models to standard models such as IMHE, Yougang, etc. This poses difficulty as our current models operate with respect to forecasting new cases on a daily basis for counties. Many of these other models operate at the state level.

  • Research current models and see if the model is currently forecasting on county level daily.
  • For models that are not further research to see if code is publically available.

Acceptance criteria:

  • Document detailing models and metrics
  • Links to code repos in documentation

Better error proceeding in code loops

When looping through Wanda_sweeps there seems to be an error related to the validation loader.

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/wandb/wandb_agent.py", line 64, in _start
    function()
  File "<ipython-input-14-43e09d654e98>", line 7, in <lambda>
    wandb.agent(sweep_id, lambda: train_function("PyTorch", make_config_file(file_path2, len(region), weight_path=None)))
  File "/content/github_aistream-peelout_flow-forecast/flood_forecast/trainer.py", line 33, in train_function
    train_transformer_style(trained_model, params["training_params"], params["forward_params"])
  File "/content/github_aistream-peelout_flow-forecast/flood_forecast/pytorch_training.py", line 65, in train_transformer_style
    test = compute_validation(test_data_loader, model.model, epoch, model.params["dataset_params"]["forecast_length"], criterion, model.device, decoder_structure=True, use_wandb=use_wandb, val_or_test="test_loss")
  File "/content/github_aistream-peelout_flow-forecast/flood_forecast/pytorch_training.py", line 108, in compute_validation
    for src, targ in validation_loader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 384, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 62, in __iter__
    return iter(range(len(self.data_source)))
ValueError: __len__() should return >= 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.