clinical-genomics / chanjo2 Goto Github PK
View Code? Open in Web Editor NEWPersistent coverage analysis tool using the d4 format
Home Page: https://clinical-genomics.github.io/chanjo2/
Persistent coverage analysis tool using the d4 format
Home Page: https://clinical-genomics.github.io/chanjo2/
I have the feeling that metadata object is called/imported from the wrong place. Bug is present in current main.
Especially when it comes to evaluate the coverage
Current app is using lib version 1.4.31, but 2.0 has great new features that will make code more readable and operations faster:
https://docs.sqlalchemy.org/en/20/changelog/migration_20.html
When it comes to calculate coverage completeness the responses get slow
Startup when env file right now works only if the env file is used in docker to mock env vars. Modify the main file instead to accept and use and env file also when started from conda env.
There is no way to test the endpoints unless we mock an environment that has the availability of a database. This is because the database connection and tables analysis is a prerequisite for the app to start
Testing the app with a real MySQL database. I get the following error:
sqlalchemy.exc.CompileError: (in table 'samples', column 'coverage_file_path'): VARCHAR requires a length on dialect mysql
The demo is working because it's using another SQL dialect (sqlite)
(I don't know if this issue should be reported in the chanjo repository instead, feel free to redirect)
In the Chanjo Report in Scout, it would be nice to have an option (perhaps a tick box) to narrow down the information about fully/incompletely covered transcripts so only information on MANE Select (and perhaps also Plus Clinical) transcripts is shown when requested.
For instance having a d4 file and a bed file with the intervals.
Chanjo2 can be used also in projects not related at all with Scout, but as a container that can be runned to gather quick coverage data when running a pipeline, for instance.
Another useful application for non-scout users: run it as a container and provide a d4 file and a list of genomic coordinates (doesn't require a populated database at all).
After introducing the logs, I've realised that the demo app is launched twice:
Basically it's creating the tables twice and printing the Running a demo instance of Chanjo2
twice. This is not unexpected, but annoying. It's caused by gunicorn running with 2 workers
We could solve this in 2 ways:
uvicorn src.chanjo2.main:app
, which is also easier..I would opt for the second option. We just need to change a line in the README file. I'll fix!
Or something like this, because the db file now is persistent, and the file stays there when you stop the app
Perhaps we might wait that it's functional before doing so but since it has to be used by other centers I think the code of this repo should be public
Create a skeleton of the app that describes how the structure should be.
It will raise an error when switching to SQLAlchemy 2
/Users/chiararasi/Documents/work/GITs/chanjo2/src/chanjo2/dbutil.py:32: MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
Base = declarative_base()
We could provide an example with real data by following this tutorial. There are also real files that can be used!
It should check that the provided URL points to the resource.
Since the analysis is slower, the resource should have an index (check that it exists as well)
Docker image of the app is currently 1.31 GB in size, and it'd be nice too trim it down a bit.
Will be useful for debugging and in development!
Just like in chanjo
Separate them from the intervals code
Discussion might be needed with Scout team and potentially some Scout customers.
Are all important metrics present in current report?
Do we need an additional microservice for chanjo-report or should chanjo2 have a report functionality?
Something simple to test the python d4 libs with. Can be also used on the fly if it's quick enough to return a response with a whole genome d4.
Warning: The save-state
command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
Docker info
Buildx version
Warning: The set-output
command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
This happens in a database pre-populated with data, If database already contains genes AND transcripts, you get an error, since transcripts table contains reference to the genes one, so the genes can't be removed.
What to do:
When the command to update genes is launched, then also eventual old transcripts and exons data should be dropped, which makes sense!
When region definition files gets added to the database they should be validated.
It is not decided what format these files should be on yet, bed
is probably the way to go. csv
could be an alternative.
So far we support D4s with only one track
So far one can query only one interval or a chromosome
This repo should have a coverage > 95% now but it is still showing 64% like some weeks ago
This code will also be used when extracting the intervals to calculate the coverage on the d4 files
It shouldn't be inside the endpoints methods
Display the error instead
Would be nice to have the basics on README file and a more detailed documentation somewhere else
Test files in *.d4 format generated
To do:
Create dockerfile for local development and staging
Add one more parameter to the query so that user can specify a list of samples
Create an endpoint that loads the intervals for each single gene (the entire gene). There will be a similar thing for transcripts and exons as well.
it should accept genome build.
Should download and parse the genes file and create one entry for it in the intervals table with the following cols:
Additionally it should create and link tags for the gene above so it's searchable using the following parameters:
It's required by the pyd4 package that otherwise doesn't install
Hi @Vince-janv and @ramprasadn. I'd like to revive this project and I'm trying to launch the application.
Using docker-compose (docker-compose up
) doesn't work. I get the following error:
(chanjo2) chiararasi@ChiaraRMBP:~/Documents/work/GITs/chanjo2$ docker-compose up
WARNING: The D4DB_NAME variable is not set. Defaulting to a blank string.
WARNING: The D4DB_USER_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_CONTAINER_PORT variable is not set. Defaulting to a blank string.
WARNING: The DATA_VOLUME variable is not set. Defaulting to a blank string.
WARNING: The D4DB_USER_NAME variable is not set. Defaulting to a blank string.
WARNING: The CHANJO_HOST_PORT variable is not set. Defaulting to a blank string.
WARNING: The CHANJO_CONTAINER_PORT variable is not set. Defaulting to a blank string.
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.chanjo2-demo.ports contains an invalid type, it should be a number, or an object
services.d4database.ports contains an invalid type, it should be a number, or an object
(chanjo2) chiararasi@ChiaraRMBP:~/Documents/work/GITs/chanjo2$
Same when I invoke the gunicorn command: gunicorn --config gunicorn.conf.py src.chanjo2.main:app
(I guess?):
File "/Users/chiararasi/Documents/work/GITs/chanjo2/src/chanjo2/main.py", line 9, in <module>
from chanjo2.dependencies import engine, get_session
File "/Users/chiararasi/Documents/work/GITs/chanjo2/src/chanjo2/dependencies.py", line 11, in <module>
engine = create_engine(mysql_url, echo=True)
File "/Users/chiararasi/miniconda3/envs/chanjo2/lib/python3.8/site-packages/sqlmodel/engine/create.py", line 139, in create_engine
return _create_engine(url, **current_kwargs)
File "<string>", line 2, in create_engine
File "/Users/chiararasi/miniconda3/envs/chanjo2/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 309, in warned
return fn(*args, **kwargs)
File "/Users/chiararasi/miniconda3/envs/chanjo2/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 530, in create_engine
u = _url.make_url(url)
File "/Users/chiararasi/miniconda3/envs/chanjo2/lib/python3.8/site-packages/sqlalchemy/engine/url.py", line 731, in make_url
return _parse_rfc1738_args(name_or_url)
File "/Users/chiararasi/miniconda3/envs/chanjo2/lib/python3.8/site-packages/sqlalchemy/engine/url.py", line 787, in _parse_rfc1738_args
components["port"] = int(components["port"])
ValueError: invalid literal for int() with base 10: 'None'
[2023-01-11 10:34:06 +0100] [49258] [INFO] Worker exiting (pid: 49258)
[2023-01-11 10:34:06 +0100] [49255] [WARNING] Worker with pid 49258 was terminated due to signal 15
[2023-01-11 10:34:06 +0100] [49255] [INFO] Shutting down: Master
[2023-01-11 10:34:06 +0100] [49255] [INFO] Reason: Worker failed to boot.
I see where the error is in both cases. If you don't mind I'd like to write a basic fix and some instructions to run the app in README?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.