Giter VIP home page Giter VIP logo

cmu-safari / airlift Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 4.0 568.86 MB

AirLift is a tool that updates mapped reads from one reference genome to another. Unlike existing tools, It accounts for regions not shared between the two reference genomes and enables remapping across all parts of the references. Described by Kim et al. (preliminary version at http://arxiv.org/abs/1912.08735)

Shell 2.23% Python 8.18% Makefile 0.51% C 66.59% JavaScript 8.94% TeX 11.96% Perl 0.06% Gnuplot 0.23% Cython 1.30%

airlift's People

Contributors

canfirtina avatar jeremiek avatar tracyewen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airlift's Issues

Issue about slurm

First of all, thanks for making this tool.

Many would use this tool on the HPC server. However, the prerequisite of installation of slurm is kind of hard to achieve.

Normally the HPC server is maintained by IT support staff in Research Facility. And every facility normally uses only one job scheduling tool for the CPU cluster. Like we are using PBS pro. Regularly, all the pipelines and workflows are running as PBS jobs.

So as a normal user without root privilege and no strong IT background, it's nearly impossible for me to install slurm on our server and get AirLift prepared for daily usage.

Would you pls consider make a detour/alternative env for using AirLift? I don't need to whole-genome level Liftover of BAM files. I just need to do LiftOver between a ref genome and a customized assembly(supposed to be a very small part of the genome).

Pls consider this option and pls share your opinions if this is already available. Thanks!

Gaps_to_fasta.py issue

From my understanding that this python script is used to parse the gaps.fa file into many contigs with skip_length=10(default) between each contig. Which creates a new fasta file.

image

In this for loop command, I realize it only writes the sequence to the new file when encountering the next in gaps.fa file. In my case, I only have one contig, which leads to an empty output file. And it seems in this way, we'll always skip the last contig in gaps.fa file.

Pls let me know your thoughts. Thanks!

LiftBedToSam require some package support

I've found this issue when using LiftBedToSam

/paedyl01/disk1/yangyxt/ngs_scripts/AirLift_old/5-merge/liftBedToSam: /software/gcc/4.9.1/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /paedyl01/disk1/yangyxt/ngs_scripts/AirLift_old/5-merge/liftBedToSam)

Could you pls help indicate which package should I install to avoid this issue ?
Thanks!

Why input folder storing new directory instead of folder storing assemblies

Dear Firtina,

Thanks for making this tool. But I had a few confusion about the input args of run_pipeline.sh.

First of all, why input OLDREF and NEWREF as folder storing assemblies instead of the path of the assemblies?

I tend to generate my own chain file. And to use that chain file to convert alignment against a custom assembly to alignment against hg19. According to your code in run_pipeline.sh, the new assembly is ${NEWREF}/${chr}.${SEQ_FILE_EXT} where ${chr} is the basename of my chain file.

I dunno why the new assembly's file name must be the same with my chain file's base name. Am I missing something here?

Issue about TAG value Liftover in BAM file.

First of All, thanks for making this tool to bring up a new possibility in liftover bam file

I tried to use CrossMap to liftover bams but it seems not being able to alter the value of some important tags like MD, NM, MC, XA, SA etc.

I wonder whether AirLift can also liftover these tag values for BAM files?

What is readsize?

Dear Firtina,

THanks for the update. I still have some confusion about the definition of the input argument READSIZE.

Is this metric reflecting the number of pairs of reads in my prepared fastq file? or the total number of reads (e.g. 2 times number of pairs, in my case I don't have singleton reads, every read is paired. ) or the read length?

In extract reads.sh, you typed:
Uploading image.png…

Can I ask why comparing the PNEXT in SAM with readsize? array[8] is supposed to be the 8th field delimited by tab in a line of SAM format, which is PNEXT instead of CIGAR string.

From this command, I suppose readsize is read length. While not every read has the exact same length. So could you pls help explain a bit of this argument's usage? Much appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.