Giter VIP home page Giter VIP logo

softwaresaved / az-intermediate-software-skills-course Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 187.44 MB

A course for intermediate-level best practice and software development skills for working as part of a team in a research environment developed for AstraZeneca (using Python).

Home Page: https://softwaresaved.github.io/az-intermediate-software-skills-course/

License: Other

Ruby 0.72% Makefile 5.67% HTML 17.82% SCSS 10.74% CSS 4.67% JavaScript 1.83% R 7.43% Shell 0.53% Python 50.61%

az-intermediate-software-skills-course's People

Contributors

abbycabs avatar alanocallaghan avatar anenadic avatar erinbecker avatar evanwill avatar fmichonneau avatar gvwilson avatar jacalynlaird avatar jag1g13 avatar joaorodrigues avatar jsta avatar katrinleinweber avatar mawds avatar maxim-belkin avatar mr-c avatar neon-ninja avatar orchid00 avatar pbanaszkiewicz avatar raynamharris avatar rgaiacs avatar smangham avatar sstevens2 avatar steve-crouch avatar tkphd avatar tobyhodges avatar tracykteal avatar twitwi avatar unode avatar wclose avatar zkamvar avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

samuelhlewis

az-intermediate-software-skills-course's Issues

Code testing: more on decorators

What is a decorator? It’s sort of a function that calls a function.
Why use it?

  • Abstraction, ‘declarative’ format
  • May harm readability? Adds complexity to people who aren’t familiar with them

Code testing: better explanation for `setup.py`

Question about setup.py - what is the exact purpose of each line of code, and does Pytest use it directly (or is it that Pip uses it, and Pytest uses the Pip information)?

  • It's the latter - the first line imports the functions we'll use, and the second is describing a new package which contains our code called 'inflammation-analysis', at version 1.0, which contains packages that are located using the find_packages() function (i.e. the code held in 'inflammation'). When pip encounters this file (i.e. when doing pip3 install -e .) it runs it to obtain these details so it can know how to 'install' it within the current environment. Without this file, pytest (when run on the command line) is unable to find the inflammation source code when it's imported by the tests.
  • Interestingly, if you invoke pytest directly from within python (and not as a command) by doing 'python3 -m pytest' for example in the repo directory, this setup.py isn't needed (since python is able to find the inflammation source files from its current location).

Setup: Python issues

After getting the user to enter the Python console, setup instructs the user

Press CONTROL-D to exit the Python console.

This doesn't work on Ubuntu Python 3.8.10 in Bash; CONTROL-C does, though. exit() will work on all platforms.

The page recommends Python 3.3+, and says "Things won't work well if you use Python 2". Is it worth firming this up? Strictly, I can't see anything that wouldn't be doable in Python 2 as long as the users switch out pip3 etc., but it's likely to complicate helping out and require helpers to install Python 2 for testing. Unless we know there are users who are still on Python 2 and can't/won't use, it seems like this should be a stronger discouragement.

1.4: Personal Access Tokens

We recommend the users add a personal access token, and link to some tutorials to configure one and cache it. However, it only caches for as long as Git normally caches a password, 15 minutes. This is going to require the users to keep copy-pasting their PAT, and basically just keeping it in a text file on their desktop.

We can suggest they use store instead of cache for the credential manager, which will keep the password indefinitely - but this could cause possible security issues? The PAT timeouts should address that in theory.

Code testing: GH actions documentation link

Question regarding github actions, and when these trigger - we could perhaps link out to the github action documentation, to show that more options are available than simply ‘push’ (and/or add an example where they set the trigger to only be pushing to the main branch, to show how to modify the triggers)

Update max references in screenshots and text in diagnosing issues episode

The episode text and pytest-pycharm-debug.png image needs to be updated since the 'max' variable referenced in the debugger console screenshot has been renamed from 'max' to 'max_data' (to avoid the inbuilt Python 'max' command).

The broadcast images referencing 'max' also need to be updated for the same reason.

2.4: Undocumented environment variable

The material shows the interpreter set up with PYTHONUNBUFFERED=1, but a search shows that's not mentioned in the materials. It's probably worth us adding a small callout, especially as it can be one of those little quirks that it's useful to make people aware of in case they need it later.

Architecture revisited: add an explanatory text about argument passing

Architecture revisited episode - a big jump forward, comes out of blue to add a new view and common line parameters, not as easy to absorb the episodes
A brief discussion on argument passing, the context in which these things are coming - at the moment it comes a bit cold if people are not familiar on argument passing - 2 paragraphs above to explain argument passing

Architecture revisited needs an intro on argument passing

Comments from learners:

  • Last episode in this section - architecture revisited - a big jump forward, comes out of blue to add a new view and common line parameters, not as easy to absorb the episodes
  • A brief discussion on argument passing, the context in which these things are coming - at the moment it comes a bit cold if people are not familiar on argument passing - 2 paragraphs above to explain argument passing

Pull before push callout

We should make an explicit note that it is considered best practice to do a 'git pull' before doing a git push. Either early on in the material or as they start collaborating.

"Improve this page" links do not work

I think this is due to the current branch variable being blank - this comes from the Carpentries remote theme's code but is set to blank possibly because this is a private repo.

Mention other programming paradigms and expand callout "So which one is Python?"

In episode 33-programming-paradigms, mention other programming paradigms (e.g. aspect-oriented programming paradigm) and link to further reading and expand the callout "So which one is Python?" to mention the two big Python libraries (NumPy and pandas) and where they fit in terms of programming paradigms. Or even better add a new callout for Pandas and NumPy if the existing callout becomes too big.

Another attempt at restructuring of Section 3

After the most recent restricting of Section 3, some small issues still remain:

  • section "Addressing New Requirements" in episode 32-software-design hangs a bit - we are not saying anything apart from repeating the paragraph on solution requirements verbatim. This is then followed by a section "How should I test this?" which should perhaps be a sub-section of "Addressing New Requirements"
  • the section finishes abruptly with functional programming episode, with no connection to Section 4
  • code review (with which Section 4 starts) is mentioned in the middle of Section 3 in episode 32-software-design at the end of the section "Best Practices for ‘Good’ Software Design" (which used to be the last episode in Section 3 so made sense)

A proposed solution would be to:

  • swap episodes on functional and OO programming paradigms
  • move the episode "OO design patterns" (AKA architecture revisited) to be the last episode and add the bit where we mention code review at the bottom to connect to episode 4
  • as an extra, in callout "So which one is Python?" in episode 33-programming-paradigms, we can address which Python libraries are more suited to which programming paradigm (e.g. Pandas as an example of functional programming and NumPy as an example of procedural)

1.3: Checking virtualenv packages

In 1.2, we reference the site-packages directory as an aside, and use pip list and pip freeze to show our installed packages. But then, in the "Compare external libraries" section the two locations where dependency information are found are pip freeze and site-packages. Then, in the "Update requirements" task we use site-packages as our first port of call to see if a package is installed.

This seems a little odd - I don't think I've ever had to play around with site-packages. I'd advocate switching the "Update requirements" task to use pip list to show the installed packages, and mentioning pip list alongside pip freeze earlier.

Add numba and precompiling as an optional functional programming exercise

As an add-on to the decorator discussion, we could include the jit decorator from the numba library. This would complement the multiprocessing optional exercise (showing 2 different methods for speeding up python code).

Example code could be a simple add-on to the previous example:

import time
from numba import jit

def profile(func):
    def inner(*args, **kwargs):
        start = time.process_time_ns()
        result = func(*args, **kwargs)
        stop = time.process_time_ns()

        print("Took {0} seconds".format((stop - start) / 1e9))
        return result
    
    return inner
        

@profile
@jit
def measure_me(n):
    total = 0

    for i in range(n):
        total += i * i

    return total

print(measure_me(1000000))
print(measure_me(1000000))

Example output:

Took 0.119796 seconds
333332833333500000
Took 4e-06 seconds
333332833333500000

parent-child relationship between branches explanation

parent-child relationship between branches that get merged in GitHub to be explained a bit better (which branch gets merged onto which branch and how there is no parent or child as you can try to merge any branch on top of any other) - ask Steve (this popped up in Blue breakout group)

Section 5: Explain a bit better MoSCoW prioritisation

MoSCoW section seems to talk about prioritisation within a timebox, rather than across a project, but this is a bit iffy as it suggests ‘Should Haves’ could be ‘Must haves’ for a later timebox, but the point of Agile is everything other than ‘Must Haves’ can be dropped. If 90% of the project is actually Must Haves then there’s no flexibility.

Object oriented: precondition reference

Consider adding something around using preconditions in code in the test material. A technique used in writing functions that is particularly useful to ensure data is (by some measures) correct and as expected before you actually do something with it. The 'Why Should We test Invalid Input Data' callout could be modified to include it.

Code Review: include code for creating a local branch from remote branch

In ‘Step 2: Preparing Your Local Environment for a Pull Request’, in step 3, we ask learners to create a local branch from a remote branch. This is the reverse of what they have done before, so would it be sensible to include here the code to do this? For reference we did this:

git checkout --track origin/<remote branch>
git branch -

Add a note on deleting old merged branches

Question on cleaning up old git branches - is it safe to do so?
A: yes, if the features are all merged - mention the cleaning phase (deleting old branches once they have been merged) in the text?

Q: git/GitHub housekeeping: should you delete old branches once they have been merged in and are no longer being used?
A: Yes - it is a good idea to keep on top of branch house-keeping both locally and in your remote repository! You may want to remove branches once they have been merged in and are no-longer being used or branches containing content you wish to abandon. A good starter discussion can be found here: https://railsware.com/blog/git-housekeeping-tutorial-clean-up-outdated-branches-in-local-and-remote-repositories/

1.2: Virtualenv python versions

We introduce virtual environments, but keep doing everything explicitly using python3/pip3 - which kind of implies that, if you have Python 2 as python, it might still be accessible. I think it's worth adding a callout that says within a virtual environment, python and pip are the versions you created the venv with. Not least because pip is much easier to type than pip3!

Suggested improvements post pilot #1

Main ideas:

  • add new content but in a way that is not too much reading and is more working/exercises
  • Expand software project to be more advanced
  • More exercise and more variety, especially in Section 3 and onwards

Content reorganisation suggestions:

  • Section 1:
    • expand Section 1 with the extra episode on linting (and an exercise to improve learners' existing projects).
  • Section 3:
    • Split Section 3 in two (and start the new Section 4 with a team exercise)
    • Bring in functional programming back to Section 3 potentially with some AZ examples, and more decorators
      • e.g. use a data analysis example in more functional style (using patients data) and use Pandas
    • Make episode on object-oriented programming smaller or link to it as further reading
    • Consider bringing in episode on persistence to Section 3
    • Put the data analysis example to break up loads of reading
      • see comment about exercises using Pandas
  • Section 4:
    • expand new Section 4 a bit with more project management and other collaboration tools in GitHub (such as mentions/notification system, labelling issues, project boards and cards, etc.) and additional team work exercises (more coding in a team against a set of requirements)
  • Section 5:
    • add more collaborative work examples/exercises to new Section 5
    • clarify different packaging with Poetry vs. other methods (stick to one?)
  • Explicit points to further reading and branching off
    • persistence, databases episodes go nicely after OOP
    • have a mini wrap-up and pointers for further reading after each episode
  • Add "Common issues' page under Extras
    • add all identified issues to help instructors and learners in the future and pre-empt installation problems at the workshop
  • Instructor notes:
    • add a link to 'Common issues' page to help instructors by letting them know what might go wrong and how to troubleshoot
    • improve instructor notes and give advice of how to handle certain exercises/situations/splitting into groups
  • Installation instructions:
    • add a link to 'Common issues' page and see if installation instructions can be enhanced with further tests to detect problems (that are not installations issues per se but come to light when using tools) early on
    • add instructions/tests for some additional tasks (e.g. setting ssh with Git and tokens to work with GitHub actions ahead of the workshop)

Question for @jag1g13 and @steve-crouch (for discussion): should we move software paradigms/architecture section before verifying software for correctness section or is this too much and would break too many things at this point?

Section 4 review - code review comments

Some comments on the code review episode from the section 4 review:

  • Advantages of code review exercise: perhaps rephrase to “Discuss as a group: what do you think are the reasons behind, and advantages of, code review?” (since they won’t know the actual reasons yet)
  • The repository ‘Settings’ menu looks quite different, and uses different option names in some cases (notably “manage access” becomes “collaborators”), so probably best to regenerate the screenshot
  • In Step 2: preparing your local environment for a pull request, it mentions:
    • “Create the appropriate local branch add-std-dev or add-view (based on the feature you are working on) off the remote feature branch to contain your new code.”
    • Perhaps needs some clarification. Wouldn’t the ‘add-std-dev’ or ‘add-view’ branches already exist since the collaborator would have already created one of them (and has been asked not to merge it)? I suspect the new branch should be either add-std-dev-tests or add-view-tests?
  • In Step 4: submitting a pull request, it woule be more complete to add a couple of steps for leaving a comment and creating the pull request (although could add this as a single step)

Code Review: PR group work synchronisation

In one room, there were not enough people who had full versions of the repo, the people who do end up paired with people who don’t and have nothing to do.

It would work better if it was rejigged to be based on a single repo in the group
OR: Provide a working version of the repo up to this point for people to template off
People without working repos are furthest behind, so slowest to implement changes

Notes for external pilots

  • people still coming with not enough pre-requisite knowledge (ideally people who have been coding for at least 6-12 months, some familiarity with OO paradigm would be good)
  • there are loads of concepts mentioned and linked to for further reading - JSON, YAML, OO programming, etc. - a bit overbearing
  • common issues page - make sure everyone is aware of it
  • best to switch to ssh + key pairs for authN with GitHub
  • synchronising for team exercises from section 3 - harder to do if workshop runs over several weeks as people drop off and it is hard to organise people in the same teams
  • for time-saving purposes instructor-led mode of delivery is probably preferred
  • to save time and deliver the course in 4 half-days:
    • skip linting in section 1
    • bundle section 4 and 5 together and skip packaging code
  • share introductory slides

Add benefits of using preconditions in code

Consider adding something brief around using preconditions in code in the test material. Preconditions are a technique used when writing functions that is particularly useful to ensure data is (by some measures) correct and as expected before you actually do something with it.

The 'Why Should We test Invalid Input Data' callout could be modified to include it since it's directly related, or expanded into a separate subsection to include both aspects.

Section 5: extra task on contributing.md

From Kamilla: I've added an extra task based on how things went for my group in the Summer pilot; have asked team to write a contributing.md file to explain how contributions will be handled (so we have a decision on branching strategy and PRs in advance of the sprint with instructions for the team to refer to).

Code testing: better formatting of tuples in the parameterisation section

At first glance, the array of tuples in the parameterisation section are difficult to parse as two arrays in each test case (it's easy to mistake them as four separate arrays). They should be rewritten with additional spacing for clarification - same for the solution to "Exercise: Write Parameterised Unit Tests"
@pytest.mark.parametrize(
"test, expected",
[
([ [0, 0], [0, 0], [0, 0] ], [0, 0]),
([ [1, 2], [3, 4], [5, 6] ], [3, 4]),
])

General: Repeated mentions of academia

The course mentions academia quite a lot - one of the attendees mentioned we can generalise it a bit more by removing or toning down those points. Plenty of people in even non-research roles will benefit from (and will be taking?) this course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.