Giter VIP home page Giter VIP logo

rse-course's Introduction

Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. Build documentation Code style Licence: MIT Python 3.8 | 3.9 | 3.10

Research Software Engineering Course

Course materials for Turing's Research Software Engineering course.

Documentation

Documentation for the course is hosted at https://alan-turing-institute.github.io/rse-course. You can build the documentation from source by running:

pip install -U jupyter-book
./build_docs.sh

Contributing

Contributions are always welcome! Please do the following:

  • Add an issue to the course repo, explaining the problem and, potentially, its solution.
  • If you know how to fix it, please also open a pull request that contains the fix. By doing this, you will improve the instructions for future users. ๐ŸŽ‰
  • If you need to add a dependency, please edit pyproject.toml which is used to automatically generate requirements.txt

The full list of contributors can be seen here.

Acknowledgements

This course began as a fork of the UCL RSD course.

rse-course's People

Contributors

andrewphilipsmith avatar callummole avatar dependabot[bot] avatar developeratexample avatar drjonnyt avatar edwardchalstrey1 avatar ezherman avatar flowirtz avatar giovanni1085 avatar goodship1 avatar helendduncan avatar iain-s avatar jack89roberts avatar jamespjh avatar jemrobinson avatar jimmadge avatar joeblacksci avatar kasra-hosseini avatar mark-hobbs avatar mhauru avatar nbarlowati avatar nickynicolson avatar oc-n avatar otnemrasordep avatar pafoster avatar pwochner avatar radka-j avatar rpirie96 avatar triangle-man avatar yournamewoshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rse-course's Issues

Add module 0: Setup

These changes will be coordinated by @edwardchalstrey1

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 0: Setup

  • General setup instructions
  • Better instructions for Windows users (see these instructions)
    • should we recommend VSCode as a cross-platform solution? Or Spyder (since this comes with Conda)?
  • How to git clone and how to start jupyter notebook
  • Some checks to run showing things work
  • Should we explicitly list something like the Software Carpentry course as a pre-requisite?

Update module 9: Programming for speed

These changes will be coordinated by @jemrobinson

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 9: Programming for speed

Update module 5: Testing your Code

These changes will be coordinated by YOUR NAME HERE

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 5: Testing your Code

  • The last exercise (DiffusionExample) is too hard and too long, and not particularly related to testing.
  • For the debugging section, there is not much in the notebook but we did a demo of the debugging capabilities in Jupyter lab.
  • Similarly for the Continuous Integration, there isn't much in the notebook, but we showed an example of a .travis.yml file and results page on travis-ci.com, and also did a live demo of a Github Action (checking for a README file) on a fork of the github-example repo that was created in the previous day's course.
  • ๐Ÿ‘ for our inclusion of the GH Action example
  • No-one uses pdb on the commandline. We demo'd "here's how it could be done in pdb and here's how you probably would do it in an IDE", which I think was the right strategy.
    • definitely no-one runs pdb from inside a Jupyter notebook!
  • There is almost no real content in the CI notebook. The Memory and Profiling section doesn't really belong there and should be moved or removed.
    • Jack proposes moving this to programming for speed
  • The final notebook uses simulated annealing but doesn't say so. It is quite long so there is a reasonable probability that it will skipped on the day or need to be covered very quickly.
  • Replace all old py.test names with pytest
  • In the list of common testing frameworks: Make pytest the top of the list for Python, remove nose or replace it with nose2
  • check the frameworks and links for other languages are up to date
  • Remove all uses of nose in favour of pytest equivalents

Potential changes to RSE course structure

Potential changes to the RSE course structure discussed during planning meeting on 19/01:

  • Move Greengraph example in module 1 to end of module (rather than beginning)
  • Swap order of Greengraph & Boids example in module 3
  • Possibly replace Boids example in module 3 with something more useful -e.g., intro to pandas/scipy

Update module 6: Software projects

These changes will be coordinated by YOUR NAME HERE

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
    • E.g. There are two packaging related exercises in the assessments - 1, 2
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 6: Software projects

  • Add something on virtual environments
  • Alternatives to setup.py (e.g. for specifying dependencies, but also Python is moving towards pyproject.toml / setup.cfg )
  • not sure itโ€™s really helpful to have all the stuff about the python path - I think itโ€™s easier to just package stuff up and install it, and if weโ€™re encouraging people to have different environments for different projects, I donโ€™t really see a downside to thatโ€ฆ
  • Mention JOSS/similar (in the context of licensing)
  • Benefits of pip vs conda vs Poetry etc.
  • How to define and maintain dependencies, check for security issues/bloat
  • Need/convention for if __name__ == "__main__"
  • Where to host docs?
  • 1st and 2nd notebooks should be swapped
  • Title of (current) 1st notebook should be Installing Libraries, not Packaging your Code
  • Avoid use of Iris dataset (PR #79)
  • Libraries not in PyPi section should mention installing directly from git with pip (in general not sure we should be talking about downloading zips of code)
  • Link to docs for publishing a package on PyPI
  • Should docker be mentioned in here somewhere?

Notes from January 2021 RSE Course

How should we field questions?

  • Having a dedicated contact person to send chat messages to and/or send to the whole group

What should instructors/helpers be doing in breakout rooms?

  • giving people answers or observing? Clarify this before going into the breakout room.

How should breakout rooms be structured?

  • groups that were pair-programming were more engaged
  • Breakout rooms weren't totally effective. Ways to improve could be:
    • Ice-breakers to get students in breakout rooms comfortable talking to one another

How can we confirm that students have understood the material?

  • Equivalent of green/red notes?
  • Summarise "here's what you need to know even if you've not been paying attention" at breakpoints
  • Anonymous polls: built-in to Zoom or eg. Slido
    • Emphasise that these are anonymous to students

What do the students think about the overall structure of the course?

  • Some modules are very didactic with not much for students to do
  • Some modules have several consecutive exercises with not much teaching in between
  • Can we structure the course to have more targeted questions asked by the non-presenting instructor? Often students won't have a specific question but will be confused, and having an instructor to ask "dumb" questions to make them feel comfortable asking really simple things if they don't totally understand.
  • Partially completed exercises with some comments that need to be converted into code?

What do we think about the structure of the course?

  • Schedule of 4hrs per day over 2 weeks works better than something more intensive

Suggestions for material to add

  • pandas
  • better explanation of decorators
  • type hinting
  • visualizations
  • Does scipy get covered?
  • Would there be enough interest and appropriate level for fundamentals of machine learning? Could use end-user friendly packages like scikit-learn.
  • IDE? pick any, lots of common features and we're already opinionated on things like github. Would make things like debugging more usable

Suggestions for material to remove

  • Update/remove references to outdated topics/syntax (e.g. python 2)
  • meta programming (most of)
  • operators (less depth? more general on magic methods?)
  • Remove/reduce the RDF/SPARQL ontologies section [๐Ÿ“ JH note - please don't take away my ontologies!!!]

Misc.

  • Can we use our connections with The Carpentries to get an external, education-oriented review of the course?
  • We sometimes send out a pre-survey: "how would you rate your skills in coding, data science, whatever? What do you expect to learn from this course? Then post-survey: "Did you learn what you were expecting to?"
  • How to merge back with the UCL fork of the course? James R has been in touch with David Perez-Suarez at UCL about this
  • What should the course be? Whatโ€™s in/out of scope of the course? What are our processes for reviewing PRs and deciding whether they are appropriate?

Actions

  • Have a meeting to coordinate updates to lessons
  • Get in touch with David Perez-Suarez about merging with the UCL fork of the course
  • Get in touch with Malvika/Toby about moving under the Carpentries umbrella

Update module 10: Scientific file formats

These changes will be coordinated by @nbarlowATI

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 10: Scientific file formats

  • There were some good questions and good discussion about other binary file formats (parquet, arrow, feather) - this could be a good thing for us to cover.
  • The last two notebooks on Ontologies, and Semantic Web, are very difficult for a non-expert to teach - the Turtle syntax is very hard to read, and there isn't a huge amount of explanatory text. I would be in favour of downsizing these sections considerably, and devoting the time to something else - possibly other file formats as mentioned above, or perhaps something about streaming or distributed data.
    • We also found these difficult to teach last year! James H (who wrote the course) argued strongly for leaving them in, but maybe we can add additional material as noted about and make these optional reading?
  • We also had a question about Docker - this could be a good thing to mention at some point in the course (not necessarily this module).

Add git clone and how to start jupyter notebook to prerequisites

During the first session some setup instructions were missing that seemed to confuse some people. Since the repo is quite big, it would be useful to clone the repo before the course starts. It would probably be sufficient to add:

Clone the git repository by typing into your terminal:

git clone https://github.com/alan-turing-institute/rse-course.git 

On your terminal with an activated environment, type:

jupyter notebook

Current section on commenting perpetuates a damaging narrative

I'm so surprised that this course is teaching data science learners not to comment their code.

https://alan-turing-institute.github.io/rsd-engineeringcourse/ch05construction/03comments.html

I think this quote in particular is appalling.

The proper use of comments is to compensate for our failure to express yourself in code. Note that I used the word failure. I meant it. Comments are always failures.

This narrative around comments being failures, that only people who "aren't real coders" is something I've been giving talks against for more than 7 years. This chapter alone is enough to make someone stop trying to learn this new skill, feel undervalued and like they don't fit into the clique. Nevermind whether they will feel confident sharing their code openly or contributing to an open source project. It is one of the ultimate bro behaviours in data science: https://thepsychologist.bps.org.uk/volume-33/november-2020/bropenscience-broken-science

Who is the audience for this course? Who benefits from being told that the comments that make their code easier for themselves and others to read make them a "failure at coding"?

Is this section destined to stay in this course? Can it be updated? Who is reviewing this course material from a perspective of equity and inclusion?

Research data in python thoughts

A few thoughts I had after teaching part of research data in python:

  • Consider replacing/modifying the Earthquakes example (and a lot of the first half) to be an analysis in pandas (#12), maybe similar in style to some of the software carpentries lessons.
  • Add visualisation exercises.
  • Add an exercise using NumPy broadcasting. Probably remove the mind-bending example of recreating matrix multiplication with broadcasting.
  • Boids example is very hard to explain the various components of, although it's a really cool example. Could it be modified to be simpler (maybe including exercises so students build it up themselves)?
  • To accommodate all the above, maybe reduce the amount of time spent on directly interacting with/parsing different file formats. E.g. Maybe just one/two notebooks on reading text files, JSON and YAML (CSVs dealt with in pandas). But keep something about interacting with the internet.

Link directly to issues?

In the introductory pages there is a comment about adding issues of any bugs found - links straight to the repo not the issue - do we want to update this?

Update module 1: Introduction to Python

These changes will be coordinated by @edwardchalstrey1

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 1: Introduction to Python

  • Set up a virtual environment for the whole course and teach what this is
    • Idea: Make this part of Module 0 (#83)?
  • Give a brief introduction about what research software engineering means and why it's important
  • Skip the first few notebooks with the complicated long example
    • Idea: Maybe making 01_01_data_analysis_example an exercise for Module 2 could be an option? (#85). But there are a few exercises in that module already and the original idea of 01_01 was as a "this is the kind of thing you can do" demo before teaching.
  • Explain difference between a Python file; Jupyter notebook; Python console; what is a terminal?
    • Idea: Make this part of Module 0 (#83)?

Update module 4: Version control with Git

These changes will be coordinated by @Iain-S

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 4: Version control with Git

  • Check in advance that people have GitHub accounts with SSH keys setup (see also #83 )
  • Note that some command line knowledge is a prerequisite
  • Use a vanilla terminal like what the participants will use
  • If switching between text editor/IDE and terminal use light theme for one and dark scheme for another to make it explicit which is which
    • if possible, split screen
  • Move theory to beginning
  • Decide which commands are most useful (eg. rebase over reset)
  • General I had fun getting 60 PRs into the ATI demo repo.
  • General I think that this may be too big a module for the time allowed. Working with multiple remotes is quite advanced so maybe the examples/exercises could all be done with a single remote?

Update module 2: Intermediate Python

These changes will be coordinated by YOUR NAME HERE

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 2: Intermediate Python

  • In 02_01_functions, several people in the audience asked questions related to the 'Side effects' section. Because the reasons behind why we can have functions with side-effects are somewhat counter-intuitive, it might be useful to add a preceding section on scoping and to contrast the differences in behaviour using more examples (in addition to the example which involves [:], there is the possibility of demonstrating side effects by appending to a list.)
  • It might also be useful to link back to the Memory and Containers section in 01_05_containers here. We might consider the following order of presentation:
    • Scoping (examples involving immutables)
    • Side effects (appending to a list)
    • Side effects (example involving [:])
  • Layout In this module, we have the first exercise in the normal lesson notebook but the solution in its own notebook. Do we want to be more consistent because we don't seem to always do that.
  • Exercise The "Occupancy Dictionary" exercise uses a data structure from the day before, which confused me as a helper and does not allow for module independence.
  • Exercise No exercise for functions notebook!!!
  • Exercise Maze class example could be smaller / split in two perhaps.

Participant requests

  • Can we include an example of type hinting here?

Corrections in Chapter 9

  • In notebook 11, when I follow the link to dbpedia.org at the end, I get a "high risk website" warning on Firefox as it doesn't by default switch to https.
  • In notebook 12, the URN www.turing.ac.uk/rsd-engineering/ontologies/reactions/ gets rendered as a hyperlink that goes nowhere (I think it should be escaped like above)

Synchronising with the UCL version of the course

Hi @iamleeg. When you were at the Turing last week, you mentioned that you'd been involved in updating the UCL version of this course. Do you think there's anything there that's worth combining into this course? It would be nice if we could get to a point where people simply forked one repo for their own purposes, rather than having several versions floating around.

Reduce use of notebook magic?

Although it's nice to have everything in notebooks (both for teaching and the web-site), I wonder whether it would be better for us to avoid using notebook magic to do things that probably wouldn't be done in a notebook in a normal workflow, e.g. instead of %%writefile create/edit files in a separate editor, and use a terminal instead of %%bash.

GitHub authentication for Chapter 2

Builds for chapter 2 require pushing commits to an example repo, which means the builds need a token/account details for a GitHub user. Previously this has been one of our accounts, but I remember there being a problem where if a build/one of the pushes in Chapter 2 failed the token could be exposed. GitHub was smart enough to detect and deactivate the token in these cases, but we should avoid that (especially with one of our accounts that have access to a lot in the Turing org).

  • Look into alternatives/more secure ways of dealing with authentication for chapter 2 and/or
  • Make a dummy GitHub user with minimal access to use for chapter 2 builds.

Download notes as PDF - link broken

On the GitHub pages (far left at bottom) is an option to download the notes as PDF.
This is referenced in the introduction text.
As of 11/07/2022 this link does not work

commit individual files from staging area

Just an addition to how to commit individual files:

There are two ways:

  1. You can stage an individual file and than commit all files in the staging area (i.e. that file), as explained in the 04Publishing notebook
  2. You can add all files to the staging area and then commit individual files

How to teach this course

Add instructions on how to teach the course:

  • Some kind of instructions slide for breaks/exercise
  • Be more consistent about when people should do the exercises (in breaks or not)
  • Can we turn off Zoom chat and enforce usage of Slack?
  • Add a drop-in session before the first module
  • Teachers to have no windows other than the ones being used open & make use of split screen wherever possible to avoid lots of switching
  • How to deal with questions

Intermediate Python Classes

In the notebook 02_03_defining_classes.ipynb the room kwarg in the Person constructor is not used in the first implementation -- perhaps a note clarifying that we'll use it later but we want to keep the interface the same would be a good idea?

Metropolis Monte-Carlo explanation

It might be worth adding some short notes explaining

  1. The relationship between ฮฒ and T
  2. Periodic boundary conditions

as these are necessary to reach the solution and we can't expect everyone to have come across them before.

Add timing estimates

If we collect data about approximate timings for each section we can add this to the docs/directly into the notebooks.

Module 1: Introduction to Python

  • Session 1 [45 mins]: Intro + Notebook 2 (started from "Variables")
  • Session 2 [45 mins]: Notebooks 3 + 4
  • Session 3 [45 mins]: Notebooks 5-7
  • Session 4 [45 mins]: Notebooks 8 + 9

Module 2: Intermediate Python

  • Session 1 [45 mins]: Notebook 00 and collaborative exercise
  • Session 2 [45 mins]: Notebook 01
  • Session 3 [45 mins]: Notebook 02, started notebook 03
  • Session 4 [45 mins]: Finish notebook 03, Notebook 04
  • Extra time [15 mins]: Notebook 05

I (Eric) was expecting Notebook 01 to take less time than it did due to a large number of questions about scope (which is discussed more in the advance programming notebooks), side effects and early return (or lack of a return statement), and args/kwargs. I have not taught this section before so cannot assess if I made a bad estimate or we were unlucky with a lot of questions that happened to fall in this one (which takes me on a tangent: I wonder if questions should be expected to follow a Poisson process?).

Module 3: Research Data in Python

  • Session 1 [45 mins]: Notebooks 0 & 1 (fields & records; structured data)
  • Session 2 [45 mins]: Notebook 2 (maze exercise, earthquakes exercise)
  • Session 3 [45 mins]: Notebooks 3 & 4 (matplotlib, numpy)
  • Session 4 [45 mins]: Notebook 01_01_data_analysis_example, 6, 5 (greengraph, greengraph + classes, boids)

Module 4: Version Control

  • Session 1 [45 mins]: Notebook 0, 6 (introduction, git theory)
  • Session 2 [60 mins]: Notebooks 1, 2, 3 (solo work, fixing mistakes, publishing)
  • Session 3 [30 mins]: Notebooks 4, 5 (collaboration, fork & pull)
  • Session 4 [45 mins]: Notebooks 7, 8, 10 (branches, stash, rebasing)
  • SKIPPED: notebook 9 & 11

If rewriting, consider taking inspiration from here https://www.atlassian.com/git/tutorials or here https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners

Module 5: Testing your code

  • Session 1 [45 mins]: Notebooks 0 and 1 (introduction, how to test)
  • Session 2 [45 mins]: Notebooks 2 and some of 3 (testing frameworks, energy example - we had 15 mins of group work on the exercise at the end of the session)
  • Session 3 [45 mins]: More on notebook 3 (did a poll at the start of the session to see who needed more time for the exercise - roughly 50% wanted another 5 mins), notebooks 4 and 5 (mocking, using debugger).
  • Session 4 [45 mins]: Notebook 6 (continuous integration). Not enough time to do notebook 7 (diffusion example) as a user exercise - went through the problem and solution directly.

For the debugging section, there is not much in the notebook but we did a demo of the debugging capabilities in Jupyter lab. Similarly for the Continuous Integration, there isn't much in the notebook, but we showed an example of a .travis.yml file and results page on travis-ci.com, and also did a live demo of a Github Action (checking for a README file) on a fork of the github-example repo that was created in the previous day's course.

Module 6: Software Projects

  • Session 1 [45 mins]: Notebooks 0, 1 (PyPI and Libraries)
  • Session 2 [45 mins]: Notebooks 2, 3 (Argparse and Non-notebook Python)
  • Session 3 [45 mins]: Notebooks 4, 5 (Packaging and Documentation)
  • Session 4 [45 mins]: Notebooks 6, 7, 8 (Project management, Licensing, Software issues)

There were several questions about Virtual/Conda environments and using Poetry to manage dependencies instead of pip. Neither of the instructors had used Poetry enough to give an informed opinion (though we have heard positive things). Also some questions on how to ensure a minimal list of dependencies is specified when packaging, and differences between Python Eggs and Wheels (which we didn't know the answer to). They seemed to be evenly spread throughout the notebooks so nothing took more or less time than I (Eric) might have expected.

Module 7: Construction & Design

  • Session 1 [60 mins]: Notebooks 0,1,2 (intro, conventions, comments)
  • Session 2 [35 mins]: Notebook 3 (refactoring)
  • Session 3 [75 mins]: Notebooks 4,5,6 (object-oriented design, classes, design patterns) - skipper over some aspects, particularly in design patterns
  • Session 4 [5 mins before and after a break]: Notebook 7 (bad boids exercise) - some time for participants to start looking at it during a break if they wanted, but largely skipped. Stayed behind a bit longer to answer a few questions and linked to the better implementation available in the repo.

Most of the notebooks took us longer than expected, we hoped to be able to leave most of the last session for the refactoring exercise. In the end we ran out of time for participants to meaningfully do anything on the exercise (even after trying to get through the second half of the material a bit faster, skipping some parts). Also meant we ended up with a very long session 3 and essentially no session 4.

Module 8: Advanced Programming Techniques

  • Session 1 [45 mins]: Notebooks 00, 01 (skipped last part)
  • Session 2 [45 mins]: Notebook 02 (through context managers)
  • Session 3 [45 mins]: Notebook 02 (decorators), Notebook 03 (through try/catch series of functions)
  • Session 4 [45 mins]: Finish Notebook 03, Notebook 04

We thought that decorators would be more clear if the repeater function itself was a decorator, rather than the return value of a function. We skipped the final two advanced advanced notebooks due to lack of time.

Module 9: Programming for speed

  • Session 1 [45 mins]: Notebooks 00, 01 and part of 02
  • Session 2 [45 mins]: Remainder of Notebook 02
  • Session 3 [45 mins]: Notebook 03
  • Session 4 [45 mins]: Notebook 04

Module 10: Scientific file formats

  • Session 1 [45 mins] Notebook 00
  • Session 2 [45 mins] Notebook 01
  • Session 3 [40 mins] Notebook 02
  • Session 4 [50 mins] Notebooks 03, 04, 05

Went quickly through the last two notebooks, particularly 05, partly through lack of time, and partly through lack of understanding of the material by the instructor (me) :-/

Testing module comments

A few comments on the Testing material after today's session:

  • Mocks is a complex topic and maybe a smaller exercise after the taught notebook would help. Also, some more intro text should be provided at the beginneing of the notebook - maybe move the text that is now before the patching example.
  • Some of the advice on boundary cases The Fields of Saskatchewan is not entirely clear (e.g. what is meant by testing 0, N when indices appear? What is meant by a matrix reaching one row?)
  • The energy function template in energy example has a parameter coeff=1.0 which is redundant at that stage and is not explained.
  • The diffusion example is too complex and students spend their time developing the implementation rather than the tests. We could replace it with something simpler and focus on test driven development?
  • More comments could be added in various parts of the notebooks to explain what is happening in more detail.

Corrections to Chapter 7 (Construction and Design)

In 07_06_design_patterns the constructor of the Controller class (near the end) is defined like this:

class Controller:
    def __init__(self):
        self.model = Model()  # Or use Builder
        self.view = View(self.model)

        def animate(frame_number):
            self.model.simulation_step()
            self.view.update()

        self.animator = animate

I think it's confusing to have the animate function defined inside __init__ and then assign it to self.animator (also raised questions about why animate doesn't need a self argument). Unless there's a nuance I'm missing, this would work and is more conventional:

class Controller:
    def __init__(self):
        self.model = Model()  # Or use Builder
        self.view = View(self.model)

    def animator(self, frame_number):
        self.model.simulation_step()
        self.view.update()

There may be something about how it's passed to animation.FuncAnimation that means this doesn't work (and explains why the frame_number argument is needed).

Fix banners

  • Mobile
  • Resizing the window to make it small
  • Favicon

All bring back the UCL banner: add a small Turing one.

Update module 8: Advanced programming techniques

These changes will be coordinated by @jack89roberts

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 8: Advanced programming techniques

  • Decorators would be clearer if the repeater function itself was a decorator, rather than the return value of a function.
  • Clearly distinguish between an example decorator that takes and does not take an argument
  • We skipped the final two advanced advanced notebooks due to lack of time. Decision: Bear in mind and see how things go in the flipped classroom model. Maybe highlight last two "advanced topic" notebooks as optional more strongly.
  • PR #78
  • Is this advanced programming or specifically advanced Python? ie. is it demonstrating general principles or Python idioms? Decision: I think the name is ok as is, especially under the context of a RSE with Python course. There are concepts like functional programming, exceptions etc. that appear that are not Python specific.

Review introduction and prerequisites

General review, check the following:

  • Remove links to UCL-specific resources
  • Ensure that all instructions/Python commands are up to date (we assume everyone is using Python 3.7 or later)
  • Does the notebook run on Binder?
  • Is the output rendered correctly by JupyterBook?
  • Fix typos
  • Edit for clarity, or mention points below that require further discussion

Update module 3: Research Data in Python

These changes will be coordinated by @jemrobinson

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching); add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 3: Research Data in Python

03_00_fields_and_records

03_01_structured_data

  • Show how to read JSON and write to YAML (or vice versa). Add an XML example.
  • Mention HDF5 and NetCDF.
  • Should we move Greengraph out of module 1? If so, put it in module 3 between sunspots and earthquakes
  • Consider moving 03_03_plotting_with_matplotlib earlier to break-up the two exercises

03_03_plotting_with_matplotlib

  • Update link at end of notebook to examples from up-to-date matplotlib here maybe cherry-pick some specific examples

03_04_NumPy

  • Is matplotlib + NumPy + IPython the key feature of Python these days? I think scipy; pandas; ML frameworks are probably more important

03_05_boids

  • What to do about Boids?
    • Difficult to teach, but some students praised it as a cool example.
    • Can we refactor?
    • Show final result and leave it as exercise for users to do in their own time?

Programming for Speed Corrections

  • When defining the x and y values to test in the mandelbrot set, code to compute the x/y values are repeated -- should be a function! (They are later computed using a numpy function, but for the sake of demonstration I think writing a function is cleaner).
  • Why do we need to specify the numpy array type as long in the Cython notebook?

Add section on type hinting

Since the introduction of type hinting in python3.7 it only got more popular.
They provide greater readability and spotting type-related bugs if used in an IDE like PyCharm. As well as support from type checking tools now likes Pyre.
Type hinting is spreading to many popular open-source repositories and some are even mandating types like BoTorch (Bayesian Optimization in Pytorch by Facebook) and PyTorch itself.

Python3.9 also shows the support for types is only continuing to grow, with PEP 585 providing support to generics in a standard collection without having to import the type lib, it shows we are likely moving toward a more type-safe(r) python.

Suggested material to be revised:

  • In ch05 construction, Hungarian Notation is mentioned in naming convention, this notation is cumbersome and can instead be replaced with a simple name + type hint.
  • Placing a section in ch05 talking about types for the standard library, and then encouraging students as part of their refactoring of the Boid class to also type-hint it as a Boid to see the usefulness of it in practice.
  • Include an optional material in ch05 (or ch03 tests?) to run pyre or other type-checking tool and fix any bugs it reported, this might be a bit complicated as it requires setting up a 3rd party tool, we can potentially include script inside the python notebook for that?

Update module 7: Construction and Design

These changes will be coordinated by @jack89roberts

General (applicable to every module)

  • Make each module independent of earlier ones (ie. do not rely on examples/specific code from earlier modules)
  • Ensure that each module has some small exercises (say 1x 5 min exercise per 45 mins or teaching)
  • add a larger (optional) exercise per course for attendees to complete in their own time
  • Update to use more modern libraries and tools (eg. pandas/scipy/numba/poetry)
  • Add instructions to the notebooks that you should "Take 5 minutes to do this exercise"
  • Add estimated timing information at the top of each notebook
  • Switch to a colourblind-friendly colour scheme throughout
  • What do you wish people had told you about Python/coding in general?

Module 7: Construction and Design

  • Ran out of time for participants to meaningfully do anything on the refactoring exercise. Decision: Made patterns notebook optional/advanced to indicate it may not be covered, but have added more on linters so may need to cut further material in future. Flipped classroom model should ensure time for exercise.
  • Update __init__ definition (see #72)
  • Remove some asserts (see #72)
  • Boids refactoring exercise had changes that broke tests. New version in this repo (and PR #80)
  • A lot of good reference material in the module (e.g. ideas for refactorings, different OOP concepts) but not sure how well it translates into teaching. Felt a bit like reading a long list. Decision: I'm not sure how to get around this, maybe something that will also work better in flipped classroom model. Perhaps to consider again after next delivery.
  • Add discussion on type-hinting (see #37)
  • Consider replacing Hungarian Notation with a simple name + type hint.
  • Discuss standard library types and encouraging student to also type-hint Boid to see the usefulness of it in practice. as part of their refactoring of the Boid class.

Questions from participants included:

  • Translating research/user requirements into software - could be a topic to add somewhere in the course. I ended up linking to Module 1 of the RDS course which has some discussion on scoping etc. for data science projects. Decision: Nothing added.
  • How to verify refactoring doesn't change results (testing)? Decision: More extensive linters section and bad boids exercise.
  • Recommendations for/possible caveats with using linters (Proposal: Split linters into new notebook, have a more opinionated recommendation). Decision: More extensive linters section.
  • Inheriting from multiple classes. Decision: Beyond scope but can point to multiple inheritance if asked.
  • General questions about understanding what functions did, understanding scope etc. Decision: Keep in mind, may be something to clarify in other modules.

Ensure UCL course is acknowledged

As this repository is not a fork of the original UCL repository, we should ensure that we correctly acknowledge it.

  • we have a link in repo README.
  • we should add a link to the jupyter book notes (eg. the copyright notice)
  • we should add a section to the jupyter book linking directly to the UCL course

Corrections to Chapter 8 (Advanced Programming)

  • 08_01_functional_programming: Function defined under this statement computes a sum not a mean:

    • We very often want to loop with some kind of accumulator (an intermediate result that we update), such as when finding a mean sum

  • 08_01_functional_programming: Replace all instances of sys.float_info.min (smallest positive value) with -sys.float_info.max (most negative value).

  • 08_01_functional_programming: The defined accumulate function has arguments in a different order than Python's reduce, which is introduced later. Would be clearer to use the same order as the built-in.

Reorganise chapters

The ongoing plan is to teach this as 10 modules - currently we split Scientific File Formats into two, but when we ran the course in January 2021, we found that modules 1 and 2 were rushed while modules 9 and 10 had excess time.

I propose to extend the time modules 1 & 2 to cover three modules and to compress 9 & 10 into one.

Inclusive naming

We should consider changing the following:

  • name of the default branch to main

Are there are any other changes needed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.