A course for intermediate-level best practice and software development skills for working as part of a team in a research environment developed for AstraZeneca (using Python).

Home Page: https://softwaresaved.github.io/az-intermediate-software-skills-course/

License: Other

Ruby 0.72% Makefile 5.67% HTML 17.82% SCSS 10.74% CSS 4.67% JavaScript 1.83% R 7.43% Shell 0.53% Python 50.61%

az-intermediate-software-skills-course's People

Contributors

Watchers

Forkers

samuelhlewis

az-intermediate-software-skills-course's Issues

Code testing: more on decorators

What is a decorator? It’s sort of a function that calls a function.
Why use it?

Abstraction, ‘declarative’ format
May harm readability? Adds complexity to people who aren’t familiar with them

Code testing: better explanation for `setup.py`

Question about setup.py - what is the exact purpose of each line of code, and does Pytest use it directly (or is it that Pip uses it, and Pytest uses the Pip information)?

It's the latter - the first line imports the functions we'll use, and the second is describing a new package which contains our code called 'inflammation-analysis', at version 1.0, which contains packages that are located using the find_packages() function (i.e. the code held in 'inflammation'). When pip encounters this file (i.e. when doing pip3 install -e .) it runs it to obtain these details so it can know how to 'install' it within the current environment. Without this file, pytest (when run on the command line) is unable to find the inflammation source code when it's imported by the tests.
Interestingly, if you invoke pytest directly from within python (and not as a command) by doing 'python3 -m pytest' for example in the repo directory, this setup.py isn't needed (since python is able to find the inflammation source files from its current location).

Code Review: Rephrase or leave out reference to 'main' in 'Code Reviews via GitHub Pull Requests'

In the 1st paragraph of “Code Reviews via Github’s Pull Requests”, we refer to “the main develop branch”. It’s been suggested we might want to leave out ‘main’ here, or rephrase this to ‘principle’ (or similar), to avoid any possible confusion with the main branch itself.

New MVC architecture revisited episode - feels like there should be an exercise in here somewhere, to wrap it up

Setup: Python issues

After getting the user to enter the Python console, setup instructs the user

Press CONTROL-D to exit the Python console.

This doesn't work on Ubuntu Python 3.8.10 in Bash; CONTROL-C does, though. exit() will work on all platforms.

The page recommends Python 3.3+, and says "Things won't work well if you use Python 2". Is it worth firming this up? Strictly, I can't see anything that wouldn't be doable in Python 2 as long as the users switch out pip3 etc., but it's likely to complicate helping out and require helpers to install Python 2 for testing. Unless we know there are users who are still on Python 2 and can't/won't use, it seems like this should be a stronger discouragement.

1.4: Personal Access Tokens

We recommend the users add a personal access token, and link to some tutorials to configure one and cache it. However, it only caches for as long as Git normally caches a password, 15 minutes. This is going to require the users to keep copy-pasting their PAT, and basically just keeping it in a text file on their desktop.

We can suggest they use store instead of cache for the credential manager, which will keep the password indefinitely - but this could cause possible security issues? The PAT timeouts should address that in theory.

Installation issues/technical problems from pilot #1

Reported via https://docs.google.com/document/d/1unvi31D273yttPCHf50MCgZIzP9Lc17QUlCAQRtZlaY/edit#

Code testing: GH actions documentation link

Question regarding github actions, and when these trigger - we could perhaps link out to the github action documentation, to show that more options are available than simply ‘push’ (and/or add an example where they set the trigger to only be pushing to the main branch, to show how to modify the triggers)

Switch to using key pairs and SSH for authentication in GitHub

Code Review: broken link in exercise

Link back to solution requirements in “Implement Tests for New Feature” exercise (lesson 4.1, Code Review) is broken.
Currently it is: https://softwaresaved.github.io/31-software-requirements/index.html#solution-requirements
But instead should be: https://softwaresaved.github.io/az-intermediate-software-skills-course/31-software-requirements/index.html#solution-requirements

Update max references in screenshots and text in diagnosing issues episode

The episode text and pytest-pycharm-debug.png image needs to be updated since the 'max' variable referenced in the debugger console screenshot has been renamed from 'max' to 'max_data' (to avoid the inbuilt Python 'max' command).

The broadcast images referencing 'max' also need to be updated for the same reason.

Index: Learning objectives link does not work

On the index, the target audience section is intended to link to the learning objectives, but instead links to https://softwaresaved.github.io/index.html#learning-objectives (should be https://softwaresaved.github.io/az-intermediate-software-skills-course/#learning-objectives-for-the-workshop).

This course is not for you if:

You are well familiar with the learning objectives of the course and those of individual episodes

2.4: Undocumented environment variable

The material shows the interpreter set up with PYTHONUNBUFFERED=1, but a search shows that's not mentioned in the materials. It's probably worth us adding a small callout, especially as it can be one of those little quirks that it's useful to make people aware of in case they need it later.

Architecture revisited: add an explanatory text about argument passing

Architecture revisited episode - a big jump forward, comes out of blue to add a new view and common line parameters, not as easy to absorb the episodes
A brief discussion on argument passing, the context in which these things are coming - at the moment it comes a bit cold if people are not familiar on argument passing - 2 paragraphs above to explain argument passing

Functional programming link broken

The functional programming link here: https://softwaresaved.github.io/az-intermediate-software-skills-course/31-software-paradigms/index.html#procedural-programming
Is broken.

Architecture revisited needs an intro on argument passing

Comments from learners:

Last episode in this section - architecture revisited - a big jump forward, comes out of blue to add a new view and common line parameters, not as easy to absorb the episodes
A brief discussion on argument passing, the context in which these things are coming - at the moment it comes a bit cold if people are not familiar on argument passing - 2 paragraphs above to explain argument passing

Familiarity with the OO paradigm as a optional prerequisite

Should we add to prerequisites that people should be familiar with the basics of object oriented programming?

Out of place material in “Functional Programming”

Out of place material? In “Functional Programming” the “Testing Impure Functions” solution (https://softwaresaved.github.io/az-intermediate-software-skills-course/34-functional-programming/index.html#solution-1) is given using a class built on “unittest.Testcase” - but classes have not yet been introduced, so it could be confusing for learners. Should this solution be changed to something more appropriate?

Pull before push callout

We should make an explicit note that it is considered best practice to do a 'git pull' before doing a git push. Either early on in the material or as they start collaborating.

"Improve this page" links do not work

I think this is due to the current branch variable being blank - this comes from the Carpentries remote theme's code but is set to blank possibly because this is a private repo.

GitHub projects - explain better that they are now owned by accounts and repos

Mention other programming paradigms and expand callout "So which one is Python?"

In episode 33-programming-paradigms, mention other programming paradigms (e.g. aspect-oriented programming paradigm) and link to further reading and expand the callout "So which one is Python?" to mention the two big Python libraries (NumPy and pandas) and where they fit in terms of programming paradigms. Or even better add a new callout for Pandas and NumPy if the existing callout becomes too big.

Update section 3 and 4 diagrams to reflect the updated wording

Check the code in all 'Extras' episodes

Another attempt at restructuring of Section 3

After the most recent restricting of Section 3, some small issues still remain:

section "Addressing New Requirements" in episode 32-software-design hangs a bit - we are not saying anything apart from repeating the paragraph on solution requirements verbatim. This is then followed by a section "How should I test this?" which should perhaps be a sub-section of "Addressing New Requirements"
the section finishes abruptly with functional programming episode, with no connection to Section 4
code review (with which Section 4 starts) is mentioned in the middle of Section 3 in episode 32-software-design at the end of the section "Best Practices for ‘Good’ Software Design" (which used to be the last episode in Section 3 so made sense)

A proposed solution would be to:

swap episodes on functional and OO programming paradigms
move the episode "OO design patterns" (AKA architecture revisited) to be the last episode and add the bit where we mention code review at the bottom to connect to episode 4
as an extra, in callout "So which one is Python?" in episode 33-programming-paradigms, we can address which Python libraries are more suited to which programming paradigm (e.g. Pandas as an example of functional programming and NumPy as an example of procedural)

Add code solution for the exercise "A model patient" in episode 31-software-paradigms

Would be nice to have a suggested solution for the exercise "A model patient" with some tests so learners can compare their solutions to (as commented by the participants of the second external pilot).

It would be nice to use the Test Driven Development - as introduced earlier in this episode - so learners get to see it in action too.

Also copy over the example code to the course version in the Incubator.

1.3: Checking virtualenv packages

In 1.2, we reference the site-packages directory as an aside, and use pip list and pip freeze to show our installed packages. But then, in the "Compare external libraries" section the two locations where dependency information are found are pip freeze and site-packages. Then, in the "Update requirements" task we use site-packages as our first port of call to see if a package is installed.

This seems a little odd - I don't think I've ever had to play around with site-packages. I'd advocate switching the "Update requirements" task to use pip list to show the installed packages, and mentioning pip list alongside pip freeze earlier.

Add numba and precompiling as an optional functional programming exercise

As an add-on to the decorator discussion, we could include the jit decorator from the numba library. This would complement the multiprocessing optional exercise (showing 2 different methods for speeding up python code).

Example code could be a simple add-on to the previous example:

import time
from numba import jit

def profile(func):
    def inner(*args, **kwargs):
        start = time.process_time_ns()
        result = func(*args, **kwargs)
        stop = time.process_time_ns()

        print("Took {0} seconds".format((stop - start) / 1e9))
        return result
    
    return inner
        

@profile
@jit
def measure_me(n):
    total = 0

    for i in range(n):
        total += i * i

    return total

print(measure_me(1000000))
print(measure_me(1000000))

Example output:

Took 0.119796 seconds
333332833333500000
Took 4e-06 seconds
333332833333500000

parent-child relationship between branches explanation

parent-child relationship between branches that get merged in GitHub to be explained a bit better (which branch gets merged onto which branch and how there is no parent or child as you can try to merge any branch on top of any other) - ask Steve (this popped up in Blue breakout group)

Update the course to use GitHub Enterprise instead of GitHub

Update the course to use GitHub Enterprise instead of GitHub (including updating the screenshots).

Section 5: Explain a bit better MoSCoW prioritisation

MoSCoW section seems to talk about prioritisation within a timebox, rather than across a project, but this is a bit iffy as it suggests ‘Should Haves’ could be ‘Must haves’ for a later timebox, but the point of Agile is everything other than ‘Must Haves’ can be dropped. If 90% of the project is actually Must Haves then there’s no flexibility.

Object oriented: precondition reference

Consider adding something around using preconditions in code in the test material. A technique used in writing functions that is particularly useful to ensure data is (by some measures) correct and as expected before you actually do something with it. The 'Why Should We test Invalid Input Data' callout could be modified to include it.

Continuous IntegrationI: rename GA workflow

For week 2 material: rename workflow name 'CI' as something more meaningful - when seeing it in the GitHub GA interface it's quite ambiguous.

Code Review: include code for creating a local branch from remote branch

In ‘Step 2: Preparing Your Local Environment for a Pull Request’, in step 3, we ask learners to create a local branch from a remote branch. This is the reverse of what they have done before, so would it be sensible to include here the code to do this? For reference we did this:

git checkout --track origin/<remote branch>
git branch -

Add a note on deleting old merged branches

Question on cleaning up old git branches - is it safe to do so?
A: yes, if the features are all merged - mention the cleaning phase (deleting old branches once they have been merged) in the text?

Q: git/GitHub housekeeping: should you delete old branches once they have been merged in and are no longer being used?
A: Yes - it is a good idea to keep on top of branch house-keeping both locally and in your remote repository! You may want to remove branches once they have been merged in and are no-longer being used or branches containing content you wish to abandon. A good starter discussion can be found here: https://railsware.com/blog/git-housekeeping-tutorial-clean-up-outdated-branches-in-local-and-remote-repositories/

1.2: Virtualenv python versions

We introduce virtual environments, but keep doing everything explicitly using python3/pip3 - which kind of implies that, if you have Python 2 as python, it might still be accessible. I think it's worth adding a callout that says within a virtual environment, python and pip are the versions you created the venv with. Not least because pip is much easier to type than pip3!

Section 4 review - code review comments

Some comments on the code review episode from the section 4 review:

Advantages of code review exercise: perhaps rephrase to “Discuss as a group: what do you think are the reasons behind, and advantages of, code review?” (since they won’t know the actual reasons yet)
The repository ‘Settings’ menu looks quite different, and uses different option names in some cases (notably “manage access” becomes “collaborators”), so probably best to regenerate the screenshot
In Step 2: preparing your local environment for a pull request, it mentions:
- “Create the appropriate local branch add-std-dev or add-view (based on the feature you are working on) off the remote feature branch to contain your new code.”
- Perhaps needs some clarification. Wouldn’t the ‘add-std-dev’ or ‘add-view’ branches already exist since the collaborator would have already created one of them (and has been asked not to merge it)? I suspect the new branch should be either add-std-dev-tests or add-view-tests?
In Step 4: submitting a pull request, it woule be more complete to add a couple of steps for leaving a comment and creating the pull request (although could add this as a single step)

Code Review: PR group work synchronisation

In one room, there were not enough people who had full versions of the repo, the people who do end up paired with people who don’t and have nothing to do.

It would work better if it was rejigged to be based on a single repo in the group
OR: Provide a working version of the repo up to this point for people to template off
People without working repos are furthest behind, so slowest to implement changes

Include another concept map for the wrap-up

Include another concept map for the wrap-up, e.g.: https://docs.google.com/drawings/d/1wy14KVYMhgwR3x0yQx6b-oza7OV3qyZ8X7P6t0De2Is/edit

Notes for external pilots

people still coming with not enough pre-requisite knowledge (ideally people who have been coding for at least 6-12 months, some familiarity with OO paradigm would be good)
there are loads of concepts mentioned and linked to for further reading - JSON, YAML, OO programming, etc. - a bit overbearing
common issues page - make sure everyone is aware of it
best to switch to ssh + key pairs for authN with GitHub
synchronising for team exercises from section 3 - harder to do if workshop runs over several weeks as people drop off and it is hard to organise people in the same teams
for time-saving purposes instructor-led mode of delivery is probably preferred
to save time and deliver the course in 4 half-days:
- skip linting in section 1
- bundle section 4 and 5 together and skip packaging code
share introductory slides

Section 5: instruct people to put the consensus times in the issue comments

Form the AZ pilot 2: We should instruct people to put the consensus times in the issue comments.

I assume this is about effort estimation.

Add benefits of using preconditions in code

Consider adding something brief around using preconditions in code in the test material. Preconditions are a technique used when writing functions that is particularly useful to ensure data is (by some measures) correct and as expected before you actually do something with it.

The 'Why Should We test Invalid Input Data' callout could be modified to include it since it's directly related, or expanded into a separate subsection to include both aspects.

Section 5: extra task on contributing.md

From Kamilla: I've added an extra task based on how things went for my group in the Summer pilot; have asked team to write a contributing.md file to explain how contributions will be handled (so we have a decision on branching strategy and PRs in advance of the sprint with instructions for the team to refer to).

Code testing: better formatting of tuples in the parameterisation section

At first glance, the array of tuples in the parameterisation section are difficult to parse as two arrays in each test case (it's easy to mistake them as four separate arrays). They should be rewritten with additional spacing for clarification - same for the solution to "Exercise: Write Parameterised Unit Tests"
@pytest.mark.parametrize(
"test, expected",
[
([ [0, 0], [0, 0], [0, 0] ], [0, 0]),
([ [1, 2], [3, 4], [5, 6] ], [3, 4]),
])

softwaresaved / az-intermediate-software-skills-course Goto Github PK

az-intermediate-software-skills-course's People

Contributors

Watchers

Forkers

az-intermediate-software-skills-course's Issues

Recommend Projects

Recommend Topics

Recommend Org