g-node / gin-proc Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
As of now the micro-service works on my host machine, therefore, access to ssh keys isn't an issue, however, turning the service into a docker image, would require us to mount external volumes to access user's ssh keys or enable something like docker secrets for the same.
We can design a debug
more for the flask server and enable it with an env variable, something like DEBUG=TRUE
during launching of gin-proc
container by the user.
Or we could:
May not be a priority right now, but I think its a professional principle to go by. It should make it easier for the user's to debug their server's problem's later.
Currently I'm using a workaround to read the default saved template files of CI configurations and adding required lines after detecting hash break-points. Ex. #add-annex-files
I feel this is an unprofessional way of doing it. Though, it does the job for now, it would be more professional to do this in a more hackable and clean manner using something like pyYAML.
This will also allow us to read the yaml configs at a later stage and only replace the values that we want to, instead of writing and replacing the entire config - as we are doing currently.
Since we need to override the default clone step to clone via SSH and download annexed data, the clone step should be handled by a container that performs all the necessary steps. We could host this on dockerhub and make it GIN specific. Ideally, the clone step in the drone.yml configuration for all repositories should be as follows:
- name: clone
image: docker:gnode/gin-proc-clone
environment:
SSH_KEY:
from_secret: DRONE_PRIVATE_SSH_KEY
The container will be built with git and git-annex and the entrypoint should be a script that uses the default drone environment to clone the repository (e.g., git clone $DRONE_REMOTE_URL
for the initial clone step).
The gin-proc web service could also add extra fields for specifying which annexed file content to download (if not everything). The clone plugin would then use a predefined env variable (that the gin-proc web service always sets) to determine which annexed files to download.
The SnakeFile Path (location of the snakemake file) should be an optional input.
When the user specifies a directory, the drone.yml should include a line for switching to snakemake directory before running the build.
If no path is specified, the root of the repo should be assumed, so no directory change should be performed.
A lot of functions are wrapped in try .. except
blocks with a catch-all Exception
handler. In some cases, the block is doing things that aren't likely to cause exceptions (like appending items to a list). We should have more specific exception handlers and only have them where necessary.
When trying to update an existing drone.yml file, if it can't be read for some reason, just overwrite it with the new data.
I intend to add automatic checks on pull requests for these, but for now, here's a list of some of the code style issues that need to be addressed:
os.system()
should never be used. All instances should be replaced by subprocess.call()
or check_output()
.os.path.join()
."""string"""
but a # comment
.In some cases the service assumes that the logged in user is also the owner of the repository and uses the username of the logged in user to construct the repository path for API calls. This isn't always necessarily true. Users can enable builds and write configurations for collaborative repositories (either through sharing or as part of an organisation.
We should review all cases where the current user's username is used to infer the repository full name.
The name/description of the key that the gin-proc service installs for the user should be called gin-proc
or something similar to make its purpose clear to the user.
Line 32 in 9690416
This is an alternative idea to the current workflow of pushing output to a gin-proc
branch of the original repository.
One of the original ideas we had for serving build output to the user was having a data store that would serve archives. The output would be privately accessible, either using credentials or by secret URLs available only to the user.
This lead me to the idea that we could use GIN repositories as data stores. The workflow would be:
gin-proc/<user>-<repository>
(the data repository). The repository name is guaranteed unique since the repositories unique names are <user>/<repository>
.For now, we should move forward with the branch-based method, since it's more straightforward. I thought this idea would require GOGS changes as well at first, since there was no API call to add collaborators, but that's available now.
Feel free to use this issue for discussions on this idea and any alternatives.
Files listed for git annex get
should be individually quoted (like they are for commit & push) and there should be no trailing space
Lines 70 to 75 in 606541f
A nice feature would be to support different types of pipelines that make it easier for users to set up common kinds of pipelines. For instance, we could start by offering two types of builds:
snakefile
is located (defaults to root of the repo) and automatically:
cd
ing to the specified directory and runs the snakemake pipeline.More advanced features can then be added to the second option (snakemake) for caching intermediate steps/files, figuring out dependencies, etc.
In the first option. all the "smart" features are disabled so the user can just run any script they want without caching or dependency management. It would all be up to the user to figure out.
As it stands the project consists of three services:
We should have a docker-compose.yml
file that sets up two containers to work together, one for the frontend and backend web services and another for the Drone container.
The backend address (for auth and api routes) in the frontend (see lines below) should be configurable. These should be set to an externally accessible address that the user's browser, running the frontend, can access in order to log in.
gin-proc/front-end/nuxt.config.js
Line 49 in 9690416
gin-proc/front-end/pages/index.vue
Line 163 in 9690416
Use drone.yml
to define which files from the CI build output should be pushed after a build is done.
git annex init
)Currently if the service finds a key with the same name as the one it uses in the user's GIN configuration, it asks the user to delete it. If we use a key name specific to the service (gin-proc) the service can detect it and overwrite it instead of asking the user to do it.
When a user sets up a gin-proc build configuration via the web frontend, the web backend sets up a key pair for itself to push the drone.yml. It should also set up the Drone service for the user, by adding the private key into the secrets for the build.
When setting up the configuration for the user, the SSH key pair should be generated and set up automatically. The public part should be added to the user's profile (via the GIN API /api/v1/user/keys
) and the private part should be stored on the GIN Proc server, accessible by the service. All clone steps can then reference the private key via a plugin for external secrets.
The details of the secrets plugin need to be discussed further.
A web frontend that guides users to set up their builds. It should give users the ability to configure the most important options (input files, scripts/pipelines to run, output files to be pushed back) and generate a drone.yml (or update an existing one) for the builds.
Line 63 in 606541f
This will also download old build files in the working directory, which we don't want.
Use git annex copy --to=origin <filenames>
instead.
Currently, entire pipeline runs in a single step. We need to allow users to add their own intermediate pipeline steps and for that all steps needs to access a shared volume.
Line 86 in 606541f
This is unnecessary since we're committing specific files by name for the push after the workflow is complete.
Line 49 in 606541f
This will fail the second time when the branch already exists.
What happens with the build if a step in the drone.yml produces an error?
It should probably handle errors differently depending on the build step: setup, processing, output handling, etc. For example, if a user specifies some files to be pushed back on success but some of the files don't exist, it should probably warn that some of the specified files don't exist but it shouldn't fail completely (or at least, the files that do exist should be pushed).
Full web service should meet the requirements specified in Issue #8. This issue describes what we would need from a prototype of the service:
Should ask for user input and generate a valid drone.yml file with the following information:
git annex get ...
after git clone
is complete. Empty value should imply all files.git add ...
(annex filtering could be done using a default config) followed by git push
.The generated drone.yml doesn't include the user specified files, but instead just the numbers 1, 2, etc.
As of now just git push
is allowed for storing CI outputs.
Line 77 in 606541f
This line should be removed.
It would be nice if the guide could read any existing drone.yml and populate the fields with existing values for the user to edit.
As of now, all pipelines run from scratch in a separate container, which can be avoided by storing intermediate temporary build files created by being cached, and eventually speed up the build job completion.
Current webhook is expired, and a fresh webhook needs to be generated before pushing the service to production.
We need user documentation that explains how users should interact with the gin-proc interface, what it does, and what they can expect.
Developer/maintainer documentation that explains the program flow, API endpoints for gin-proc backend and how front-end, back-end, and GIN/GOGS interact at each step.
The web frontend should only create (or edit) the drone.yml for the user and nothing else. If the user needs to run a snakemake pipeline they will have their own snakemake file and it's up to them to define the processing steps required to run it. The gin-proc service should not create or edit a user's snakemake configuration.
As for the testing environment, I'm authenticating the dev user using a personal access token
designed manually. Eventually, we'll have to authenticate the user using their GIN username and password, just the way Drone does it.
When the backend service clones for the first time it will be prompted to accept the SSH host key. Fetching of the key should be part of the setup process.
See ssh-keyscan
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.