Giter VIP home page Giter VIP logo

abridge's People

Contributors

nathanweeks avatar sagnikbanerjee15 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

abridge's Issues

To Do List

  • Write up Dockerfile that will have abridge, samtools, zpaq, and fclqc
  • Rewrite the abridge script to call the underlying software directly. No need to use docker and/or singularity.
  • Remove all occurrences of "informative CIGAR" and rename those to "integrated CIGAR"
  • Add options to calculate the space saved from the program for each SAM field. Report this in the log file
  • Develop a modular approach to compression and decompression. This will be necessary for troubleshooting and also for incorporating enhancements in the future
  • #4
  • Create a single program to compress both single and paired-ended data. Similarly, create one program to decompress both single and paired-end data
  • Store more information on the first line of the compressed file in addition to the flags. For example the endedness of the data
  • Add comments for each function
  • Add more functions and decide if you wish to make those inline
  • WAF to convert numeric data to a string. Use type-casting while calling the function. Write separate functions for signed and unsigned numbers
  • Similarly, create functions for converting strings to numbers
  • Examine the code to read directly from BAM files
  • Optimize memory allocations
  • Read directly from a BAM file - https://www.biostars.org/p/44424/, https://stackoverflow.com/questions/52915853/how-to-build-a-simple-main-cpp-file-using-samtools-c-api, https://samtools.sourceforge.net/sam-exam.shtml
  • Incorporate SAMBAMBA & BAM in the comparison. Also, compare with different ranges of compression levels
  • Perform tests with SAM/BAM files that contain CIGAR without mismatch indicators and also CIGAR with mismatch indicators
  • Compile the rust code and check if it could be made faster with the C compiler
  • Consider removing the section where a multi-line fasta file is generated. Instead, modify the code snippet to read from multi-line fasta
  • Prepare the CWL workflow for carrying out all comparisons. Write a single workflow for both RNA-Seq and DNA-Seq reads
  • Write a launcher for processing all the samples
  • Write CWL scripts for the following software:
    • Deez
    • Samcomp
    • CSAM
    • Samtools (bam & CRAM)
    • Genozip2
  • Remove the adjustment done to quality scores since in this version those will never be stored with the iCIGAR
  • Adjust the MAPQ value. Store X in place of 255 but check if substantial space reduction can be achieved
  • While generating BAM and CRAM files for comparison, retain only the relevant tags - do not store everything
  • Add spring to the compressor list in place of zpaq

Error during compilation

Hello Nathan,

I have tried to implement the compilation strategy you laid out. But there seems to be an issue with linking the modules properly. You can find the docker image for ABRIDGE here https://github.com/sagnikbanerjee15/dockerized_tools_and_pipelines/tree/main/abridge/1.1.0

The make and installation for htslib works without any glitches but not for abridge
I keep getting the following error:

gcc -Ofast -g -Isubmodules/htslib -fvisibility=hidden -fpic -c -Wall   -c -o src/compute_information_for_better_memory_management.o src/compute_information_for_better_memory_management.c
src/compute_information_for_better_memory_management.c: In function 'findMaximumNumberOfReadsMappedToOneNucleotide':
src/compute_information_for_better_memory_management.c:124:13: warning: unused variable 'aln' [-Wunused-variable]
  124 |     bam1_t *aln = bam_init1(); // initialize an alignment
      |             ^~~
src/compute_information_for_better_memory_management.c:123:16: warning: variable 'bamHdr' set but not used [-Wunused-but-set-variable]
  123 |     bam_hdr_t *bamHdr;         // read header
      |                ^~~~~~
src/compute_information_for_better_memory_management.c:113:43: warning: variable 'curr_value' set but not used [-Wunused-but-set-variable]
  113 |     unsigned long long int curr_position, curr_value;
      |                                           ^~~~~~~~~~
src/compute_information_for_better_memory_management.c:108:9: warning: variable 'number_of_fields' set but not used [-Wunused-but-set-variable]
  108 |     int number_of_fields;
      |         ^~~~~~~~~~~~~~~~
src/compute_information_for_better_memory_management.c:107:15: warning: unused variable 'k' [-Wunused-variable]
  107 |     int i, j, k;
      |               ^
src/compute_information_for_better_memory_management.c:107:12: warning: unused variable 'j' [-Wunused-variable]
  107 |     int i, j, k;
      |            ^
In file included from src/compute_information_for_better_memory_management.c:9:
src/function_definitions.h: In function 'convertUnsignedIntegerToString':
src/function_definitions.h:226:1: warning: control reaches end of non-void function [-Wreturn-type]
  226 | }
      | ^
src/compute_information_for_better_memory_management.c: In function 'findMaximumNumberOfReadsMappedToOneNucleotide':
src/compute_information_for_better_memory_management.c:193:24: warning: 'fhr' may be used uninitialized in this function [-Wmaybe-uninitialized]
  193 |     while ((line_len = getline(&line, &len, fhr)) != -1)
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~
gcc -Lsubmodules/htslib -Wl,-rpath=/software/abridge/Abridge/submodules/htslib -lhts  src/compute_information_for_better_memory_management.o   -o src/compute_information_for_better_memory_management
/usr/bin/ld: src/compute_information_for_better_memory_management.o: in function `findMaximumNumberOfReadsMappedToOneNucleotide':
/software/abridge/Abridge/src/compute_information_for_better_memory_management.c:124: undefined reference to `bam_init1'
/usr/bin/ld: /software/abridge/Abridge/src/compute_information_for_better_memory_management.c:150: undefined reference to `hts_open'
/usr/bin/ld: /software/abridge/Abridge/src/compute_information_for_better_memory_management.c:151: undefined reference to `sam_hdr_read'
collect2: error: ld returned 1 exit status
make: *** [<builtin>: src/compute_information_for_better_memory_management] Error 1
rm src/compute_information_for_better_memory_management.o

Could you please look into it?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.