Giter VIP home page Giter VIP logo

guidemaker's People

Contributors

arivers avatar dependabot[bot] avatar ravinpoudel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

guidemaker's Issues

NumExpr defaulting back to 8 threads?

Running guidemaker in a linux hpc environment with an allocation of 48 cores. It appears NumExpr.utils is detecting all cores on a node (not just those allocated), and defaulting to 8 threads, rather than using all of those allocated or taking the value supplied by the --threads option in the guidemaker call. Here's that info from the stdout, which reliably appears following the indexing step:

2022-10-13 11:40:38,001 root         INFO     Identifying guides that have a hamming distance <= 2 to all other potential guides
2022-10-13 11:40:38,248 numexpr.utils INFO     Note: detected 78 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2022-10-13 11:40:38,248 numexpr.utils INFO     Note: NumExpr detected 78 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2022-10-13 11:40:38,248 numexpr.utils INFO     NumExpr defaulting to 8 threads.

batch job submission is allocating 48 cores and 500g memory

Any idea how to fix?

explain how to increase server size

A user could not upload larger genomes when running the web app locally.

This can be done with an extra --server.maxUploadSize parameter flag to streamlit for example to increase to 5000 MB run like this:

streamlit run /GuideMaker/guidemaker/data/app.py --server.maxUploadSize 5000

I will add instructions to the documentation.

Hi, I can not run guidemarker using genome fasta +gff file. Appreciate any help.

2022-11-10 15:34:20,592 root ERROR GuideMaker terminated with errors. See the log file for details.
Traceback (most recent call last):
File "/home/shuang/miniconda3/envs/gmenv/lib/python3.9/site-packages/guidemaker/cli.py", line 193, in main
anno.get_annotation_features()
File "/home/shuang/miniconda3/envs/gmenv/lib/python3.9/site-packages/guidemaker/core.py", line 761, in get_annotation_features
if not feat_key in feature_dict:
UnboundLocalError: local variable 'feat_key' referenced before assignment

gff example:
JAIZPG010000102.1 AUGUSTUS gene 60 1208 0.39 - . ID=gene-alt2500001;Name=alt2500001;gbkey=Gene;gene_biotype=protein_coding;locus_tag=alt2500001
JAIZPG010000102.1 AUGUSTUS mRNA 60 1208 0.39 - . ID=rna-gnl|WGS:LPVP|mrna.alt2500001;Parent=gene-alt2500001;gbkey=mRNA;locus_tag=alt2500001;orig_protein_id=gnl|WGS:LPVP|alt2500001;orig_transcript_id=gnl|WGS:LPVP|mrna.alt2500001;product=alt2500001
JAIZPG010000102.1 AUGUSTUS CDS 60 234 0.96 - 1 ID=cds-alt2500001;Parent=rna-gnl|WGS:LPVP|mrna.alt2500001;Dbxref=NCBI_GP:alt2500001;Name=alt2500001;gbkey=CDS;locus_tag=alt2500001;orig_transcript_id=gnl|WGS:LPVP|mrna.alt2500001;product=alt2500001;protein_id=alt2500001

onnxruntime error in hpc enviroment

hi there,

I am getting an error during Doench Featurization that I've pasted below. The callback points to an onnxruntime line which is commented on the developer's TODO list and apparently inefficiently holds memory. This error occurs even with a large (1TB) memory allocation for a medium-sized 1GB genome and with subsetting of that genome to further reduce memory demand.

The error message and traceback:

2022-11-20 00:17:31,672 root         INFO     Creating Efficiency Score based on Doench et al. 2016 - only for NGG PAM...
2022-11-20 00:17:31,685 root         ERROR    GuideMaker terminated with errors. See the log file for details.
Traceback (most recent call last):
  File "/hpc/home/cjm124/.local/lib/python3.8/site-packages/guidemaker/cli.py", line 208, in main
    prettydf = guidemaker.core.get_doench_efficiency_score(df=prettydf, pam_orientation=args.pam_orientation, num_threads=args.threads)
  File "/hpc/home/cjm124/.local/lib/python3.8/site-packages/guidemaker/core.py", line 1155, in get_doench_efficiency_score
    doenchscore = doench_predict.predict(np.array(df.target_seq30), num_threads=num_threads)
  File "/hpc/home/cjm124/.local/lib/python3.8/site-packages/guidemaker/doench_predict.py", line 114, in predict
    sess = rt.InferenceSession(model_file)
  File "/hpc/home/cjm124/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/hpc/home/cjm124/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg: 

and the call to guidemaker:

guidemaker \
		--fasta chr1_Hery.fasta \
		--gff chr1anno.gff \
		-p NGG \
		-o outputFiles/run3/ \
		--doench_efficiency_score \
		--cfd_score \
		--pam_orientation 3prime \
		--guidelength 20 \
		--before 100 \
		--into 500 \
		--knum 5 \
		--controls 1000 \
		--threads 96

echo "all done"

exit

The allocated resources and scheduler call:

sbatch -p common \
	-n 96 \
	--mem=1000G \
	--mail-type=END \
	--time=20-12:00:00 \
	--wrap="bash code/make_hery_guides$runNumber.sh"

because the error message appears to use python3.8, I tried this in both the suggested conda environment (which uses python 3.7) and one running python 3.8.

The issue persists, and I am confused. Any ideas? Is this a thing you've seen before?

bioconda

User reported the bioconda recipe would not build

Crashing on using Human Genome

Hi!

I've been trying to use GuideMaker on the human genome but I've been encountering crashes. Hope I didn't miss anything, but here are some details:

guidemaker   \
   --fasta /genome/hg38.fa       \
   --gff /genome/Homo_sapiens.GRCh38.77.gtf   \
   -o /home/out       \
   -p NGG       \
   --pam_orientation 3prime    \
   --guidelength 20    \
   --lsr 11   \
   --threads 5  \

Upon running the command, it quickly goes through to "Identifying PAM sites in the genome" and stay there for 1-2 hours. Then the program quits without warning (no error messages). The output directory is also empty.

Would anyone have any ideas on what could have happened? I didn't see any verbose/debug options so I'm not sure how to look for the culprit.

Thanks in advance!!

Best,
Ivan

Problem after running command line

Dear all
Can you guy help me for running guidemaker?
When I run the command guidemaker -f input.fasta -g input.gtf -p NGG --pam_orientation 3prime --guidelength 20 --lsr 11 -o test_gene --doench_efficiency_score --threads 4
I make the trouble like this:

Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib
2022-05-13 20:28:52,915 root         INFO     Configuration data loaded from /home/tsang/miniconda3/envs/guidemaker/lib/python3.7/site-packages/guidemaker/data/config_default.yaml:
2022-05-13 20:28:52,916 root         INFO     {'NMSLIB': {'M': 16, 'efc': 10, 'post': 1, 'ef': 9}, 'CONTROL': {'MINIMUM_HMDIST': 7, 'CONTROL_SEARCH_MULTIPLE': [10, 100, 1000, 10000]}, 'MINIMUM_PROPORTION': 0.5}
2022-05-13 20:28:52,918 root         INFO     Temp directory is: /tmp/guidemaker_8sm5drw6
2022-05-13 20:28:52,919 root         INFO     Writing fasta file from genbank file(s)
2022-05-13 20:28:52,920 guidemaker.core INFO     check if input.fasta is gzipped
2022-05-13 20:28:52,991 root         INFO     Identifying PAM sites in the genome
2022-05-13 20:28:55,094 root         INFO     Checking guides for restriction enzymes
2022-05-13 20:28:55,096 root         INFO     Number of guides removed after checking for restriction enzymes: 0
2022-05-13 20:28:55,096 root         INFO     Identifing guides that are unique near the PAM site
2022-05-13 20:28:55,491 root         INFO     Number of guides with non unique seed sequence: 2819
2022-05-13 20:28:55,508 root         INFO     Indexing all potential guide sites: 59933. This is the longest step.
2022-05-13 20:28:56,089 nmslib       INFO     M                   = 16
2022-05-13 20:28:56,090 nmslib       INFO     indexThreadQty      = 4
2022-05-13 20:28:56,090 nmslib       INFO     efConstruction      = 10
2022-05-13 20:28:56,091 nmslib       INFO     maxM			          = 16
2022-05-13 20:28:56,092 nmslib       INFO     maxM0			          = 32
2022-05-13 20:28:56,092 nmslib       INFO     mult                = 0.360674
2022-05-13 20:28:56,093 nmslib       INFO     skip_optimized_index= 0
2022-05-13 20:28:56,093 nmslib       INFO     delaunay_type       = 2
2022-05-13 20:28:56,094 nmslib       INFO     Set HNSW query-time parameters:
2022-05-13 20:28:56,094 nmslib       INFO     ef(Search)         =20
2022-05-13 20:28:56,094 nmslib       INFO     algoType           =2

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
****************************************************2022-05-13 20:28:58,958 nmslib       INFO     No appropriate custom distance function for Hamming (bit-storage) space
2022-05-13 20:28:58,958 nmslib       INFO     searchMethod			  = 0
2022-05-13 20:28:58,963 root         INFO     Identifying guides that have a hamming distance <= 2 to all other potential guides
2022-05-13 20:28:59,154 nmslib       INFO     Set HNSW query-time parameters:
2022-05-13 20:28:59,155 nmslib       INFO     ef(Search)         =9
2022-05-13 20:28:59,155 nmslib       INFO     algoType           =2
2022-05-13 20:29:05,492 root         INFO     Formatting data for BedTools
2022-05-13 20:29:05,554 root         INFO     Create GuideMaker Annotation object
2022-05-13 20:29:05,555 root         INFO     Identify genomic features
2022-05-13 20:29:05,556 guidemaker.core INFO     check if input.gtf is gzipped
2022-05-13 20:29:05,558 root         ERROR    GuideMaker terminated with errors. See the log file for details.
Traceback (most recent call last):
  File "/home/tsang/miniconda3/envs/guidemaker/lib/python3.7/site-packages/guidemaker/cli.py", line 193, in main
    anno.get_annotation_features()
  File "/home/tsang/miniconda3/envs/guidemaker/lib/python3.7/site-packages/guidemaker/core.py", line 756, in get_annotation_features
    if not feat_key in feature_dict:
UnboundLocalError: local variable 'feat_key' referenced before assignment

I have try to install nmslib through pip install and it is successfully. But somehow, I also have a problem when I rerun again.
It is really great if you can help me for tackling this problem.
Thank you so much
Tsang

slow feature finding on animal genomes

I'm doing a standard Cas9 search on a 1 gigabase chicken genome and the step "Find genomic features closest to the guide" is taking longer than the guide finding step. I ran this on a 384 GB, 48-core node. using SLURM sacct I get a reported Max RSS of 199GB. I will look into ways to process this that improve speed /memory use. A lot of optimizations of guidemaker were for microbial genomes but most users are using it for eukaryotes. Long-term the plan is to improve the experience for eukaryotic users.

Doench scoring error when N's exist just beyond the PAM

Running Doench scoring on the chicken genome I noticed that scoring will sometimes crash when an "N" is present in the three nts just after the target and PAM sequence. This only happens 7 times in the Chicken genome (1 in 750,000 guides). Omitting scoring in these rare cases is likely the best approach to solving this issue.

The offending 30nt sequences are:

GTTGTGTCTGCTGTGAGTGCGGGGGGGCGN
AGATGGGGTCGGGAGGGAAAGTGGGGGGGN
GATGGGGTCGGGAGGGAAAGTGGGGGGGNN
ATGGGGTCGGGAGGGAAAGTGGGGGGGNNN
TGCGCTGAGCCGGGTGTCAGCCTGAGGCNN
CCACCCTCCCCACCCCACCCCCCCAGGANN
TGGAGGACCTAGTTGTGGGTCAGGAGGACN

the error was

2023-10-11 17:25:01,529 root         ERROR    GuideMaker terminated with errors. See the log file for details.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/project/90daydata/gbru_fy22_amr_cooccurence/leghornenv/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/project/90daydata/gbru_fy22_amr_cooccurence/leghornenv/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/doench_featurization.py", line 58, in featurize_data
    feature_sets["_nuc_pd_Order1"], feature_sets["_nuc_pi_Order1"], feature_sets["_nuc_pd_Order2"], feature_sets["_nuc_pi_Order2"] = get_nuc_features(data)
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/doench_featurization.py", line 196, in get_nuc_features
    pi1dict[let] += 1
KeyError: 'N'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/cli.py", line 218, in main
    prettydf = guidemaker.core.get_doench_efficiency_score(df=prettydf, pam_orientation=args.pam_orientation, num_threads=args.threads)
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/core.py", line 1156, in get_doench_efficiency_score
    doenchscore = doench_predict.predict(np.array([x.upper() for x in df.target_seq30]), num_threads=num_threads)
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/doench_predict.py", line 122, in predict
    feature_sets = parallel_featurize_data(
  File "/project/90daydata/gbru_fy22_amr_cooccurence/GuideMaker/guidemaker/doench_featurization.py", line 103, in parallel_featurize_data
    result = pool.map(partial_fd, dflist)
  File "/project/90daydata/gbru_fy22_amr_cooccurence/leghornenv/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/project/90daydata/gbru_fy22_amr_cooccurence/leghornenv/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
KeyError: 'N'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.