cmu-safari / airlift Goto Github PK

AirLift is a tool that updates mapped reads from one reference genome to another. Unlike existing tools, It accounts for regions not shared between the two reference genomes and enables remapping across all parts of the references. Described by Kim et al. (preliminary version at http://arxiv.org/abs/1912.08735)

Shell 2.23% Python 8.18% Makefile 0.51% C 66.59% JavaScript 8.94% TeX 11.96% Perl 0.06% Gnuplot 0.23% Cython 1.30%

airlift's People

Contributors

Stargazers

Watchers

Forkers

xtmgah yangyxt seifudd genometree

airlift's Issues

Issue about slurm

First of all, thanks for making this tool.

Many would use this tool on the HPC server. However, the prerequisite of installation of slurm is kind of hard to achieve.

Normally the HPC server is maintained by IT support staff in Research Facility. And every facility normally uses only one job scheduling tool for the CPU cluster. Like we are using PBS pro. Regularly, all the pipelines and workflows are running as PBS jobs.

So as a normal user without root privilege and no strong IT background, it's nearly impossible for me to install slurm on our server and get AirLift prepared for daily usage.

Would you pls consider make a detour/alternative env for using AirLift? I don't need to whole-genome level Liftover of BAM files. I just need to do LiftOver between a ref genome and a customized assembly(supposed to be a very small part of the genome).

Pls consider this option and pls share your opinions if this is already available. Thanks!

Gaps_to_fasta.py issue

From my understanding that this python script is used to parse the gaps.fa file into many contigs with skip_length=10(default) between each contig. Which creates a new fasta file.

In this for loop command, I realize it only writes the sequence to the new file when encountering the next in gaps.fa file. In my case, I only have one contig, which leads to an empty output file. And it seems in this way, we'll always skip the last contig in gaps.fa file.

Pls let me know your thoughts. Thanks!

LiftBedToSam require some package support

I've found this issue when using LiftBedToSam

/paedyl01/disk1/yangyxt/ngs_scripts/AirLift_old/5-merge/liftBedToSam: /software/gcc/4.9.1/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /paedyl01/disk1/yangyxt/ngs_scripts/AirLift_old/5-merge/liftBedToSam)

Could you pls help indicate which package should I install to avoid this issue ?
Thanks!

Why input folder storing new directory instead of folder storing assemblies

Dear Firtina,

Thanks for making this tool. But I had a few confusion about the input args of run_pipeline.sh.

First of all, why input OLDREF and NEWREF as folder storing assemblies instead of the path of the assemblies?

I tend to generate my own chain file. And to use that chain file to convert alignment against a custom assembly to alignment against hg19. According to your code in run_pipeline.sh, the new assembly is ${NEWREF}/${chr}.${SEQ_FILE_EXT} where ${chr} is the basename of my chain file.

I dunno why the new assembly's file name must be the same with my chain file's base name. Am I missing something here?

Issue about TAG value Liftover in BAM file.

First of All, thanks for making this tool to bring up a new possibility in liftover bam file

I tried to use CrossMap to liftover bams but it seems not being able to alter the value of some important tags like MD, NM, MC, XA, SA etc.

I wonder whether AirLift can also liftover these tag values for BAM files?

What is readsize?

Dear Firtina,

THanks for the update. I still have some confusion about the definition of the input argument READSIZE.

Is this metric reflecting the number of pairs of reads in my prepared fastq file? or the total number of reads (e.g. 2 times number of pairs, in my case I don't have singleton reads, every read is paired. ) or the read length?

In extract reads.sh, you typed:

Can I ask why comparing the PNEXT in SAM with readsize? array[8] is supposed to be the 8th field delimited by tab in a line of SAM format, which is PNEXT instead of CIGAR string.

From this command, I suppose readsize is read length. While not every read has the exact same length. So could you pls help explain a bit of this argument's usage? Much appreciated.

Please help to trace error in step-by-step guide to replicate the results

I almost made it work, but I still cannot trace down a few remaining errors as I cannot find the source generating them. Please help.
The combined STDOUT & STDERR is attached.

logfile.txt

cmu-safari / airlift Goto Github PK

airlift's People

Contributors

Stargazers

Watchers

Forkers

airlift's Issues

Issue about slurm

Gaps_to_fasta.py issue

LiftBedToSam require some package support

Why input folder storing new directory instead of folder storing assemblies

Issue about TAG value Liftover in BAM file.

What is readsize?

Please help to trace error in step-by-step guide to replicate the results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent