a2i2 / ese Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 2.69 MB

HTML 100.00%

ese's People

Contributors

Stargazers

Watchers

ese's Issues

Study the patterns of retraining the pipeline by looking at experiment log

Would the computational costs be reduced while using static analysis tools to detect particular frequent defects/issues?

Motivational example:

When executing the pipeline often there is a mismatch of data shape. this is only detected at run time, many times after a considerable execution time.
- Does the computational costs saved justify the time costs/overheads involved in setting up the static analysis tools.
Can we map experiment log to particular defects in the project?

Background / State of the Art:

Some environments (e.g. AWS) have similar infrastructure already setup for code-analysis to detect potential issues

Use of experiment logs to measure reproducibility of data science experiments. (If multiple runs of experiment detected in log, did both result in the same outcome?)

Goup 1

Smells are an indication and not concrete, subjective and romantic. Tie smells to productivity.

Do code smells impact review time?
Definition of code smells, and catalogue.
Assumptions:
Developers know what they are
Developers care at review time
Developers disagree of severity/importance

Evolution of code smells: Which ones can be ignored?
Identify code smells first and then evaluate them.
Tie code smells to:
Reproducibility
Performance
*ilities - definition of stakeholders, hard to make concrete

Commented out code indicative of versioning in addition to version control (Not just DS). Solved through education, workstyle of people -> experimentation.
Reading code that has commented out of code.
Does commented out code impact readability of the source code?

Code smells: personal preference of people who made up these phrases, mere guidelines

Small companies don’t care about code smells (experiential)

Code smells should be avoided (low impact and unreliable)

Data versioning tool -> large datasets, experiments, storage perspective.

Motivation: data storage is a problem for cloud service providers. Redundancy between versions of the data and you shouldn’t be storing all the features.
Store code transformations rather than datasets
Efficient caching

Think about the scale of data storage.
Is the transformed data the challenge with storing versions of the data?

Linting data with diffs applied at the smells level -> in the context of ML, data and code smells are all impactful.

Identify smells that change the behaviour of ML. There is a need to define the definition of an ML smell -> this is different from code smells. Thus, ML smells are a) technical debt, b) actual defect, and c) are concrete.

Guidelines for dealing with technical debt ignores the commercial reality and focuses on the ideal. This ties in with the context idea.

-> Case study of ML in small clients/companies (Related work exists for this???)efficiency is vital. Tool support is key to realise the solutions in organisations.

Is static analysis sufficient for ML smells? Interactive environment: development phase, deployment phase.

CI/CD for ML (Thoughtworks)

Start at the upstream process at the data level rather than modelling.

Conclusions:
Code smells should be avoided (low impact and unreliable)
Focus on the messy upstream process at data collection rather than modelling
There is a need to define the definition of an ML smell
Does commented out code impact readability of the source code?
Can diff algorithms be used for data versioning?
Data quality issues should be easy to validate/verify -> portable as data their own
The barrier between data and code is still valid

Dependencies of ML projects

We could try to identify how dependencies affect the maintainability of an ML project.

Possible approaches:

Looking at one project - look at the history of the dependencies and relate it to the maintainability of the project.
Comparing multiple projects - see when developers are adding/removing dependencies.
Try to find a correlation between certain dependencies and maintainability.
Look at the history of the dependency requirements.

Group 2

Blue sky:

AI system (ML??) -> there’s no data -> expert system (bunch of rules) -> expert (non-SE/DS) -> excel

Excel
Good for SME
Doesn’t scale
ML is limited
Limited data types / integrations
Limited data validation

Problem: SME and SE have poor tooling to collaborate when building AI systems.

Extension: searching for information that affects the rules being implemented.

Use this for data collection at the start with invariants for the data???

Have a means to validate data to identify correlations / mistakes in the data.

Consistency: run a data linter that checks for smells, highlight smells to SME.

Context as the relationship between entities - scope context

Problem
Claim: Multidisciplinary teams write codes that is hard to understand by the other discipline
DS comes from different backgrounds
SEs have personal preference for coding styles and development approaches. Where do you draw the line with quality?
Subjective research means it’s hard to lock it down to a specific idea. It depends on? Who, what, and who for. Concrete.

Assumption: compute is done on a single resource (laptop/compute). You move the model to the data. Federated learning: you move the models to the data.
How does devOps change?
What are the things in SE that we can bring to ML?
What standards of SE change for ML? Outcomes change.

ML needs more defensive programming. Data is not in a single location. Rework the engineering pipeline.

Conclusion:
Look at different domains, do a prediction and then extract information from www/sources as to why the crash occurred.

Portable data integrity/invariant checks

This idea was brought up in discussions #4 "Data quality issues should be easy to validate/verify" and #7 "Have a means to validate data to identify correlations / mistakes in the data"

Motivation:
Data issues often occur that in principle should be easy to detect. E.g., Google's data panel for COVID-19 deaths (which in turn was sourced from Wikipedia) was off by a factor of 10 for Australia, and the incorrect figure even found it's way into some news articles. It should have been obvious that something was wrong by the sudden jump and the fact that the number of deaths at country level did not add up to the sum of deaths in states and territories.

We think that the reason these issues are common is that every company that uses data would need to reimplement checks, which they don't have time for. What we need is a portable format for data integrity/invariant checks so that sharing data validation checks is as easy as sharing the data itself. E.g. if one system implements an integrity check that the number of cases in a country should equal to the sum of the number of cases in the states/territories in that country, there needs to be a portable way to share this check with other systems.

Specific problem:
While standards for representing data integrity checks already exist (e.g. SQL CHECK constraint), we need to better understand the practical barriers to reuse of data integrity checks and propose solutions. If successful, this research should have widespread practical impact in improving data quality and preventing misinformation.

Automating the annotation of data science techniques in machine learning source files and artifacts

Questions:

Can deep learning models assist in automatically annotating artifacts (e.g. using active learning?)
What human-machine interactions/interface is needed for optimally annotating artifacts?

Types of annotations:

Annotations to save time performing compliance checks (AI ethics, etc.)

Use cases:

Regulatory compliance (need to be able to explain the project to project managers / compliance / risk, who may not necessarily be trained in software engineering, but need assurance that code follows AI ethics, e.g. fairness, etc.). Could annotations of techniques be used to automatically generate documentation/explanations that do not require reading the code?

How smells affect the system

Smells do affect systems (cite)

Identify new code smells that could affect MLOps

Approach:

Analyse code change history
Taxonomy of important code smells based on the history. Rank this and use interview to evaluate it

Objective:

Saving cost
Saving time to deloy
Avoid faults in production

a2i2 / ese Goto Github PK

ese's People

Contributors

Stargazers

Watchers

ese's Issues

Study the patterns of retraining the pipeline by looking at experiment log

Goup 1

Dependencies of ML projects

Group 2

Portable data integrity/invariant checks

Automating the annotation of data science techniques in machine learning source files and artifacts

How smells affect the system

Source Code Metrics - Are classic software engineering metrics sufficient for ML projects?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent