Comments (10)
I guess this should be implemented by downloading models, datasets etc to a cache or data directory. Probably only the models the user uses should be downloaded.
I found a python package for accessing platform specific data/cache/config directories, https://github.com/ActiveState/appdirs
For example for the user trentm
the cache dir on MacOS would be /Users/trentm/Library/Caches/posteriordb
, on Windows C:\Users\trentm\AppData\Local\author_name\posteriordb\\Cache
(author_name
could also be posteriordb
) and on Linux /home/trentm/.cache/posteriordb
I'm not sure if it would be better to use the cache dir or the data dir.
The main technical challenge is probably determining when a downloaded file is stale and needs to be downloaded again. One approach would be having a file in this repo that contains hashes of each file in the posterior database. Then with the hash we could be determine if a file needs to be re-downloaded. There are other approaches too, such as just redownloading everything once a new version of posteriordb
is released.
from posteriordb.
Another possibility would be to include the whole PDB in the R and python packages (or as a separate posteriordbdata
package). With this approach we wouldn't need to worry about platform-specific directories, but using this would mean that that the package needs to be updated to access new models etc and also that the whole database is always downloaded even if the user wants to use only a few posteriors
from posteriordb.
from posteriordb.
from posteriordb.
I feel a persistent cache is the way to go, having to redownload the models and datasets just because R session was closed doesn't feel right to me. The implementation effort of the persistent cache also doesn't seem large to me.
So with the git sha approach we redownload everything when the PDB gets new commits? I agree that for the first version this is good enough, it can be improved later.
from posteriordb.
from posteriordb.
A few questions:
-
If the user calls
pos <- posterior_names(my_pdb)
is it OK that all the posterior info files inposteriors/
get downloaded? And same withmodel_names
anddataset_names
-
Is it best to not yet download the dataset file and model code when the user calls
po <- posterior("8_schools-8_schools_centered", my_pdb)
and instead download them when the user callsdataset(po)
orstan_code(po)
? And the same applies for downloading the model info and dataset info files.The posterior info however would be downloaded at this point.
A downside to this approach is that it can be hard to validate that the posterior has valid dataset and model code.
from posteriordb.
This has now been solved. It just needs to be updated in the vignette and documentation.
from posteriordb.
Good job!
from posteriordb.
To eeros comment above. I have now added a cache_path to the pdb object so it is can be stored wherever. If we want we can point the local db path to be the cache, then nothing is cached.
from posteriordb.
Related Issues (20)
- Example problems from astrophysics HOT 5
- Transferring to stan-dev HOT 1
- Updating Stan Models to be more performant HOT 3
- Convert golden samples to arviz IData HOT 3
- Add model code dependency structure
- Document data variables separately
- Handling of posterior licenses HOT 11
- Include explicit stan version in reference posterior computations
- install instructions for R package HOT 4
- Add eight schools with flat prior
- Add the occupancy model (population biology)
- Better examples needed HOT 5
- How to get the log probabilities of MCMC samples
- dogs-dogs not constrained properly in Stan HOT 1
- PyMC3 eight school example HOT 5
- Missing reference posteriors HOT 2
- Make a new release? HOT 3
- Change Stan syntax to new syntax HOT 6
- Proposal: Add `data-used` to posterior `.json` files where relevant
- Discussion: Correctness Checking Between PPLs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from posteriordb.