scitools-classroom / courses Goto Github PK

Python courses for the scientific researcher

License: BSD 3-Clause "New" or "Revised" License

Shell 0.27% Jupyter Notebook 94.78% Roff 4.96%

courses's Introduction

Python Courses for the Scientific Researcher

This repository contains courses on several Python software packages for the benefit of scientific researchers, particularly in the fields of oceanography and meteorology. All the content is fully open-source and freely downloadable.

The courses are presented as interactive self-learning tutorials, contained in "Jupyter notebooks". The user will run the notebook interactively, and will also edit it and run their own code, particularly to complete the set learning exercises.

These courses assume some prior knowledge of the numpy and matplotlib packages. "Additional Learning Resources", below, suggests some suitable learning resources for those.

Download and Run Courses

The most flexible learning method is to download this repository and run it in a Python environment with the correct dependencies. A suitable environment can be produced with conda (only 'miniconda' is needed) :

To create the environment, use the command :
$ conda create -n testenv iris iris-sample-data jupyter nc-time-axis
To run the courses, navigate to the repository 'course_content' folder and run :
$ jupyter notebook

Run Courses in Binder

It is also possible to run any of these courses in a cloud instance with Binder, simply by following the links provided below. This is quick to try, and highly convenient. However you must be aware that this runs in a transient session, so all your edits will be lost when that expires (usually some 10s of minutes after the last interaction). It is possible to download your modified notebook before the session times out, but not (easily) to re-upload and run it again in Binder.

Principal Courses

There are two main courses provided :

A First Look at Cartopy (for Iris)

0.5 hours — depends on basic knowledge of matplotlib (see options below):

in course_content/cartopy_course/cartopy_intro.ipynb
launch in binder here

An Introduction to Iris

6 hours — depends on the "First Look at Cartopy" course:

start from course_content/iris_course/0.Iris_Course_Intro.ipynb
launch in binder here

Note: to run this course fully, you need to have installed not only iris and its dependencies, but also iris-sample-data (it requires to be importable as a package).

Additional Learning Resources

There are also two older courses included here, covering numpy and matplotlib at an entry level. However, these sources are no longer actively maintained. As a similar fast introduction to numpy and matplotlib, we can recommend the sections on those topics in the Scipy lectures. For more detail, in time, you should also refer to the standard tutorials for those projects.

The older Scitools course contents are these :

An Introduction to Numpy

3.5 hours — depends on a basic Python background

in course_content/extra_courses/numpy_intro.ipynb
launch in Binder here

An Introduction to Matplotlib

3 hours — depends on a knowledge of numpy

in course_content/extra_courses/matplotlib_intro.ipynb
launch in binder here

courses's People

Contributors

Stargazers

Watchers

courses's Issues

[Dask] Explore re-ordering §load content

As noted in #107 (review), the content in the loading section may not be as clear as it could be. We should explore re-ordering the content in this section to bring all the pure Iris code together into a single subsection as the demonstration, and then provide the dask + Iris code in a following single subsection as a comparison.

Note: this may not make the content any clearer. If it doesn't, we can make a note here of the fact we explored the option and then leave it as it is.

Introduction to python course

There is a set of things that we teach in every course that we run. These things don't directly relate to any of the specific SciTools courses; they are general Python things that are nevertheless useful to know.

They are:

*args and **kwargs,
list comprehensions,
variable unpacking,
__str__ and __repr__,
naming conventions for classes, functions and methods, and
probably a few other bits as well.

These could be added to a mini-course to be taught alongaside the other existing courses.

numpy course feedback May17

indexing exercise

remove commas from print output, numpy prints an array as

[1 4 5]

not

[1, 4, 5]

alter example to return

[4]]

not

[[1 4]]

Incomplete sentence in Iris course section 7 (Advanced_Concepts)

One of the cells in the Advanced_Concepts part of the Iris course reads only

As you can see, by loading a

Variables are referred to with Exercises in between

For example:
In the Iris course 3 (Subcube_Extraction), in section 3.2, the cubes variable is defined at the top of the section and is referred to at the bottom. Between this, there are two exercises where the user my redefine or change the cubes variable, the second of which involves loading from another file. When loading this other file, I made use of the same name cubes which meant that, later on, the notebook was referring to a different CubeList than it was expecting.
It may be wise to redefine such variables when there is an exercise in between for better consistency.

Reintroduce Travis notebook tests

Since #150 we haven't any proper Travis testing, since we broke it.

Travis used to make the notebooks into a non-interactive document build, but we have stopped doing that.
We could re-introduce this.

Mostly, during the 'feature_self_learn' development, we tried to deliver notebooks that would all run through without errors.
The many new "sample solution" code examples are designed also to run : It should seek out + un-comment "# %load" lines first.
Ideally, everything will run through, and that can be our "notebook tests" !

( NOTE: at the moment though, we are getting stuck in a "run all cells" operation. This could be a Jupyter problem or local only ? )

There may be 1 or 2 remaining examples of purposely failing examples for "what went wrong there?" demonstrations. We would need to fix that, maybe with try/except (as in several code solution examples)

Merge resource behaviour in Python3 / Iris v2

When using Python3 & Iris v2, the resources for the merge exercise load differently to using Python2 and Iris v1. Specifically, it appears like each dataset is loaded twice. This needs to be fixed, preferably for consistent behaviour between the two Python/Iris combinations.

Investigate:

is this being caused by Python 3 (i.e. has wildcard filename match behaviour changed?)
is this being caused by Iris v2 (i.e. has dataset load behaviour changed in a manner that we have not allowed for?)
is this being caused by a combination of the above?
is this being caused by something else?

[dask] task graphs one to many

Dask task graphs seem to be optimised for many to one style processing: you load multiple files and perform some sort of aggregation / reduction to arrive at a single result. This does not seem to be the paradigm anything like as often in Iris, where you might need to...

take multiple statistics of a single cube and return all of them (e.g. retrieve mean and standard deviation concurrently)
extract multiple sub-cubes out of one or more input cubes
and so on.

Put together an example of making a graph that looks like this and then computing the graph to easily retrieve the requested data.

Iris course - conda environment

In Section 6 of the Iris course (https://github.com/scitools-classroom/courses/blob/master/course_content/iris_course/6.Data_Processing.ipynb), the solution # %load solutions/iris_exercise_6.3e raises an error that nc-time-axis is not found when running in a suitable conda environment as documented in the README ($ conda create -n testenv iris iris-sample-data jupyter).

At the AVD Surgery, I was advised to set up a conda environment with nc-time-axis specified ($ conda create -n testenv iris iris-sample-data jupyter nc-time-axis). This successfully avoids the error. Perhaps the README could be updated?

Thanks

Broken Link to iris.save documentation

In the Iris course Chapter 2.2. Saving Cubes, there is a link to iris.save documentation that does not work: https://scitools.org.uk/iris/docs/latest/iris/iris.html#iris.save

It should be updated to this link: https://scitools-iris.readthedocs.io/en/stable/userguide/saving_iris_cubes.html?highlight=iris.save#saving-iris-cubes

Out of date coordinate creation loop in: Iris Tutorial, Chapter 4

In code block 19, new coordinates are being added to a cube to allow for a merge. However, units are not hard coded, which causes them to load as unknown, whereas existing coordinates have units of 1. Potential solution below, although suitable explanation will also be needed to be added.

Iris course - corrupt notebook (7.Advanced concepts)

The Advanced Concepts notebook in the Iris course seems to be corrupt.

A solution appears to be adding ", to the end of line 497.

Sections to possibly remove from Iris course (04/16)

Sections of the Iris course that may be excess to requirements and could be removed without leaving a big unfilled gap in what the Iris course teaches:

cell - cell comparisons: the section on cell - point comparisons show that cell comparisons are possible. The section on cell - cell comparisons just goes back over this but makes the situation far more complex as it just isn't clear why cell - cell comparisons behave as they do.
Partial datetimes: in many ways they don't add anything that you can't do with a cell datetime object.

Adapting course material - citation/attribution

Hi there,

I just wanted to check if it is OK to use these course materials as the basis for an Iris tutorial I am writing for https://github.com/ourcodingclub/ourcodingclub.github.io (I have to develop the materials in a set layout/format, otherwise I would just deliver the course 'as is' from the notebook.)

I see it's GPL licensed but just wanted to check if you have any specific citation/attribution, or copyright messages you would like included. (Other than the GPL)

Best wishes,
Declan

University of Edinburgh GeoSciences

Feedback from Iris course (02/16)

A few niggles and oversights within the Iris course that are hanging over from the recent updates to the course:

In §Constraints, in the part on constraining on time, under the code cell iris.FUTURE.cell_datetime_objects = True, the markdown cell reads "it is now possible to do the same constraint" when the "same constraint" being referenced has been removed.
Still in §Constraints, perhaps we should add a subsection 'Time Constraints'.
In §Plotting and the comparison of iplt with qplt, we need to adjust wspace not hspace.
In §Cube maths, when the scenario difference is calculated the markdown cell below incorrectly states that "the coordinates “time” and “forecast_period” have been removed".

Produce physical resources

One way to boost learning is to improve the physical environment within which these courses are taught. We could do this by producing posters that relate to the teaching material that are added to the learning environment.

The resources should be primarily graphical and ideally colourful too. Some possible examples:

The NumPy broadcasting example image
The different types of matplotlib plots shown in the course
The pictographic representation of a cube
Cube data plotted on a cartopy map

course feedback 1mar

https://github.com/SciTools/courses/blob/a5e4a5f88d4f8c8590dc3dd6fc2ca40199031270/course_content/notebooks/numpy_intro.ipynb

result of arr_2d[0, ::2] is [[1, 3], [4, 6]]

wrong! arr_2d[0:, ::2]

the first column, retaining the outside dimension: resulting in [[1, 4]]

tricksy, intentional??

conditional indexing is useful and interesting, include?

print(np.where(arr_2d == 4))
print(arr_2d[arr_2d % 2 == 0])

The Array Object: Summary of key points

properties : shape, dtype
arrays are homogeneous, all elements have the same type: dtype
creation : array([list]), ones, zeros, arange, linspace
- indexing arrays to produce further arrays, subsets of the original
- multi-dimensional indexing and conditional indexing
~~indexing like Python objects : integers and slices~~
~~indexing produces further array objects~~
~~multi-dimensional indexing with multiple indices~~
~~indexing differences from list-of-lists~~

desired_result = np.array([[ 1, 2, 3],
[104, 105, 406],
[407, 407 8, 409]])

You can assume that the ordering of the values is the same as in the earlier example. That is, the order is [day1-station1, day1-station2, day1-station3, day2-station1, ...] and so on.

this is not clear enough, needs another look

lessons for calcs:

defensive programming
clear syntax

masked array:

how to unmask a masked value
space for the exercise (new cell)

efficiency

when to optimise, as well as how

Missing html files

The make.sh would suggest that html files should be made, but this doesn't seem to be the case.

Use Binder Jupyterlab

I like the revamp of the courses a lot, great work!

Good to see Binder being used for the notebooks. I have a couple of suggestions of where you could go next for a better user interaction:

Use Jupyter Lab instead of Jupyter Notebooks on Binder
It is easy to implement by changing the end of the Binder url from
...?filepath=path%to%notebook.ipynb to ...?urlpath=lab/tree/path/to/notebook.ipynb
See this example from the Informatics Lab
https://binder.pangeo.io/v2/gh/informatics-lab/itk-3dvis/master?urlpath=lab/tree/itk-3dvis.ipynb
and guidance on some of the nuances in this repo
https://github.com/binder-examples/jupyterlab
Use Pangeo Binder
https://binder.pangeo.io is a Binder service from the Pangeo organisation on Google Cloud Services. It has more resources than the regular Binder deployment so might have a bit more grunt for Iris's data crunching requirements.

numpy broadcasting rules could do with simplification

The numpy course Broadcasting section can be difficult to understand due to confusing technical language. The concept itself is quite straightforward, but there are several layers of complexity in the sentences which describe the rules, and they do not correlate to the graphics although they seem like they should, so the images actually just become confusing.

I think that this section would be easier to understand if the images matched the rules and allowed the user to understand all the loaded phrases (like 'the shape of the array with fewer dimensions ... padded ... leading (left) side'; this would be easier to unpack if you maybe showed the shape of the dimensions in a code cell, and then what it means to 'pad' it to match the shape of the other array).

[Dask] merge conflicts

Add a subsection detailing what to do if you use dask + Iris for parallel file load and encounter merge conflicts when you try and merge the cubes.

For example:

files = glob.glob(iris.sample_data_path('GloSea4', '*.pp'))
cb = db.from_sequence(files).map(iris.load_cube)
dmc = delayed(lambda cubes: iris.cube.CubeList(cubes).merge_cube())(cb)
dmc.compute()

MergeError                                Traceback (most recent call last)
...
MergeError: failed to merge into a single cube.
  Coordinates in cube.aux_coords (scalar) differ: realization.
  Coordinates in cube.aux_coords (non-scalar) differ: forecast_period.

Feedback from Iris course on 08/16

Use working examples that may appeal/taylored to scientists from:

applied science
climate science
foundation science
weather science

Iris: area-averaging not explained

There is an exercise asking for area-averaging, but we don't explain it anywhere. Bit too far of a leap to skip that, so needs adding to the course.

Feedback from NumPy course (02/16)

The course needs more exercises - there are long sections of teaching with no breaks / changes of style.

For example, a short exercise (on creating/indexing arrays?) before §Multidimensional Array Creation.
Change the values / number of points in the x and y arrays in the final exercise.