Giter VIP home page Giter VIP logo

reproducible-research's Introduction

reproducible-research's People

Contributors

bast avatar blankdots avatar eglerean avatar gregordecristoforo avatar johanhellsvik avatar korbinib avatar matiasjj avatar matuskalas avatar patricholmvall avatar rantahar avatar rkdarst avatar samumantha avatar sparrow0hawk avatar thomasa avatar vathasav avatar wikfeldt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reproducible-research's Issues

include material on data management

We need at least 0.5h module on data in a 3-day workshop, and it might fit into this lesson. Discuss e.g. file formats, databases, flat files...

make/snakemake: include a two-target makefile example

I suggest there be a two-rule Makefile, with the output of one rule the input of another (at the very top of the file). I had to explain the concept of dependencies there, and a real example to point at would have helped (in place of / below the one with target/dependencies/command in it).

Move conda to before the snakemake section

With new schedule, this is the first lesson where we needed the conda environment (for snakemake). Thus, I had to briefly explain the concept of conda environments, enough to make "and now I activate my conda environment" make sense. But conda is introduced in the last section after snakemake.

It does make sense with virtual environments in some sense, but perhaps flow could be made better by splitting conda and docker (the section gets quite long) and putting conda earlier.

use tracked data and source code under project directory in episode 03?

In episode 3, the learner is instructed to clone the bast/make-pipeline repo somewhere, and then later copy the datafiles and source files to a project directory. Would it be better (more realistic and a better practice) to create a separate repo which already has the correct directory structure, so that the example project directory is already a git repo?

Add notes for conda in HPC environment

Notes raised in Gothenburg workshop regarding conda and performance / filesystem burden in HPC environment:

  • Option to specify build (e.g., MKL vs openblas)
  • Environment location conda create -n NAME vs conda create -p PATH (non-backed-up filesystem can be preferred)
  • conda clean to clean cached files

upload images and fix typo

I see that images are missing in the SnakeMake episode. A typo in episode 2 - wordcount.py, finds the frequency wdistribution of ords used in a text

Short lesson with repo2docker only for CodeRefinery workshop

It is a nice lesson but cannot be taught in 1 or 2 hours.
Second time I see this lesson (last time was in Goteborg) and I have the same feeling. We have it at the workshop (everybody is tired and we have no more than 1 hour for this lesson, including exercises) but do not have a clear idea on how this tool fits into the Software development framework.

I suggest we make a 1/1.5 hour lesson:

  • Teach this lesson after Reproducible Research or documentation (on Wednesday morning) and show how to use jupyter notebooks to document/publish worflows/examples using repo2docker (binder)
  • Have one single exercise where using binder is the main objective.

We could also add links to examples such as (these are from Geosciences but you probably have good examples too!):

Keep a separate 1/2 day lesson on Jupyter ecosystem we can use as standalone lesson (we would use it in Oslo for 1 day workshop).

What do you think?

Latest version used snakemake, no install instructions

This person was using windows. The conda install instructions on the snakemake site didn't work. Also the windows installation instructions didn't work. (both said package not found in the requested channels).

This person's computer had plenty of problems and I'm not sure if the environment was set up correctly. The problems mostly seem related to confusion about (git bash shell vs anaconda shell (windows shell)) and conda environments and and environment variables.

Anyway, if installation instructions work on other windows computer (win10), then you can probably just close this issue.

Docker analogies

When reviewing the lesson some time ago, I remember the docker analogy of images/containers not being very good, so have been trying to come up with a better one. Here's the best I have so far (does anyone have something better?):

A docker container is like a piece of paper with all the operating system on it. When you run it, a transparent sheet is placed on top to form a container. The container runs and writes only on that transparent sheet (and what other mounts have been layered on top). When you are done, transparency is thrown away. It can be repeated as often as you want, and base is always the same.

Analogy breaks down when you use the same base to run multiple containers without copying the base. But it works when you discuss how images are composed (multiple transparent sheets stacked on top of each other, not that we go into it).

pip installation of snakemake fails

$ pip install snakemake

...
Failed building wheel for datrie
...
src/datrie.c:24942:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
       tstate->exc_traceback = *tb;
               ^~~~~~~~~~~~~
               curexc_traceback
  error: command 'gcc' failed with exit status 1

clean up the Dockerfile

there should be a simpler way to install python3 and snakemake, and the existing dockerfile might not follow good docker practices

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.