Giter VIP home page Giter VIP logo

aind-data-transformation's Introduction

aind-data-transformation

License Code Style semantic-release: angular Interrogate Coverage Python

Usage

Please import this package and extend the abstract base class to define a new transformation job

from aind_data_transformation.core import (
    BasicJobSettings,
    GenericEtl,
    JobResponse,
)

# An example JobSettings
class NewTransformJobSettings(BasicJobSettings):
  # Add extra fields needed, for example, a random seed
  random_seed: Optional[int] = 0

# An example EtlJob
class NewTransformJob(GenericEtl[NewTransformJobSettings]):

    # This method needs to be defined
    def run_job(self) -> JobResponse:
        """
        Main public method to run the transformation job
        Returns
        -------
        JobResponse
          Information about the job that can be used for metadata downstream.

        """
        job_start_time = datetime.now()
        # Do something here
        job_end_time = datetime.now()
        return JobResponse(
            status_code=200,
            message=f"Job finished in: {job_end_time-job_start_time}",
            data=None,
        )

Contributing

The development dependencies can be installed with

pip install -e .[dev]

Adding a new transformation job

Any new job needs a settings class that inherits the BasicJobSettings class. This requires the fields input_source and output_directory and makes it so that the env vars have the TRANSFORMATION_JOB prefix.

Any new job needs to inherit the GenericEtl class. This requires that the main public method to execute is called run_job and returns a JobResponse.

Linters and testing

There are several libraries used to run linters, check documentation, and run tests.

  • Please test your changes using the coverage library, which will run the tests and log a coverage report:
coverage run -m unittest discover && coverage report
  • Use interrogate to check that modules, methods, etc. have been documented thoroughly:
interrogate .
  • Use flake8 to check that code is up to standards (no unused imports, etc.):
flake8 .
  • Use black to automatically format the code into PEP standards:
black .
  • Use isort to automatically sort import statements:
isort .

Pull requests

For internal members, please create a branch. For external members, please fork the repository and open a pull request from the fork. We'll primarily use Angular style for commit messages. Roughly, they should follow the pattern:

<type>(<scope>): <short summary>

where scope (optional) describes the packages affected by the code changes and type (mandatory) is one of:

  • build: Changes that affect build tools or external dependencies (example scopes: pyproject.toml, setup.py)
  • ci: Changes to our CI configuration files and scripts (examples: .github/workflows/ci.yml)
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bugfix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • test: Adding missing tests or correcting existing tests

Semantic Release

The table below, from semantic release, shows which commit message gets you which release type when semantic-release runs (using the default configuration):

Commit message Release type
fix(pencil): stop graphite breaking when too much pressure applied Patch Fix Release, Default release
feat(pencil): add 'graphiteWidth' option Minor Feature Release
perf(pencil): remove graphiteWidth option

BREAKING CHANGE: The graphiteWidth option has been removed.
The default graphite width of 10mm is always used for performance reasons.
Major Breaking Release
(Note that the BREAKING CHANGE: token must be in the footer of the commit)

aind-data-transformation's People

Contributors

github-actions[bot] avatar jtyoung84 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aind-data-transformation's Issues

Publish singularity container to github registry

User story

As a user, I want to pull the singularity image from a remote registry, so I can use it where I need to.

Acceptance criteria

  • When a PR is merged into main, then an updated singularity image will published as a github package

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Pull ephys package out and publish to pypi

Is your feature request related to a problem? Please describe.
We'd like to use this as the abstract class that users will import into their own repos

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Port ephys and basic scripts from aind-data-transfer

User story

As a user, I want to run data transformations independently, so I can more easily manage this step. For example, using the necessary hpc resources for this step. It will also make it easier to test and deploy if this is isolated from the other parts of the data workflow.

Acceptance criteria

  • When a user builds this project, a .sif container will be copied over to a folder on the VAST drive
  • A user can log into the hpc login node and run the compress data job on a directory (run compression and basic data transformations) that will save the data to a folder in the staging directory.
  • The model on how to define a job should be consistent and able to be imported in other repos.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.