Giter VIP home page Giter VIP logo

grafimo's Introduction

InfOmics-website2.0

2021 website restyling

grafimo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

grafimo's Issues

missing reference kmers

@ManuelTgn: You mentioned not being able to find some reference kmers in the vg find output, but I'm not able to reproduce this.

Here's my test, based on the sequences you gave me: missing_sequences.txt

wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/chromosomes/chr22.fa.gz
gunzip chr22.fa.gz
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz  22 >22.vcf
cat 22.vcf | awk 'BEGIN {OFS="\t"} { if (/^#/) { print } else { $1="chr22"; print } }' | bgzip >chr22.vcf.gz
vg construct -R chr22 -r chr22.fa -v chr22.vcf.gz -p >chr22.vg
vg index -x chr22.xg chr22.vg

Then, I check to see if each of the sequences in the file is in the results if I were to run the query for that region using vg find.

cat missing_sequences.txt \
    | grep '^sequence_name\|matched' \
    | paste - - \
    | awk '{ print $2, $4}' \
    | tr 'gtac' 'GTAC'  \
    | tr ' ' '\t' \
    | while read x;
do
    r=$(echo $x | cut -f 1 | sed s/C/c/)
    s=$(echo $x | cut -f 2)
    echo $r: $(vg find -x chr22.xg -p $r -K 19 -E | grep $s)
done

I get a result for each one, suggesting that they're not missing at all.

chr22:22464024-22464441: chr22:22464024-22464441 TTTCCACTAGGTGGCGCTG chr22:22464226+ chr22:22464245+ ref 1109741+,1109743+,1109744+,
chr22:20564175-20564879: chr22:20564175-20564879 GCTCCAGCAGGTGGCGCTG chr22:20564459+ chr22:20564478+ ref 937607+,937608+,
chr22:24938879-24939040: chr22:24938879-24939040 TTTCCAGCAGATGGCAGTA chr22:24938941+ chr22:24938960+ ref 1375453+,1375454+,
chr22:46758921-46759044: chr22:46758921-46759044 CCTCCAGAAGGGGGCAGCC chr22:46758977+ chr22:46758996+ ref 3671802+,3671804+,3671805+,
chr22:49979714-49979878: chr22:49979714-49979878 TGACCACTAGGTGGTGCAC chr22:49979775+ chr22:49979794+ ref 4079471+,4079473+,4079474+,
chr22:49767389-49767601: chr22:49767389-49767601 TCGGCACTAGAGGGCAACA chr22:49767473+ chr22:49767492+ ref 4054526+,4054527+,
chr22:39437133-39437557: chr22:39437133-39437557 GAACCGGTAGGGGGAGCTG chr22:39437362+ chr22:39437381+ ref 2883926+,2883928+,2883929+,
chr22:32206911-32207335: chr22:32206911-32207335 GTGCCAGGAGGAGGAGCCA chr22:32207166+ chr22:32207185+ ref 2110545+,2110547+,2110548+,
chr22:32776128-32776241: chr22:32776128-32776241 GAGCCTCTAGAGGGAGCAC chr22:32776187+ chr22:32776206+ ref 2172648+,2172650+,2172651+,2172653+,
chr22:20564175-20564879: chr22:20564175-20564879 GCAGCAGCAGGCGGCGCTA chr22:20564510+ chr22:20564529+ ref 937611+,937612+,
chr22:36977427-36977656: chr22:36977427-36977656 CTCCCTCACGGGGGCGCTG chr22:36977511+ chr22:36977530+ ref 2625316+,2625318+,2625319+,
chr22:35538794-35539218: chr22:35538794-35539218 TTGCCACCAGGTGACTTTG chr22:35538972+ chr22:35538991+ ref 2472472+,2472474+,2472475+,2472477+,2472478+,
chr22:31872398-31872557: chr22:31872398-31872557 TTTTCACATGGGGGCGCTG chr22:31872460+ chr22:31872479+ ref 2077748+,2077749+,2077751+,2077752+,
chr22:29772243-29772667: chr22:29772243-29772667 TGGCCACTGTGAGGCACTG chr22:29772445+ chr22:29772464+ ref 1873108+,
chr22:19632120-19632551: chr22:19632120-19632551 CCACCGCTAGAGGGGGCAT chr22:19632371+ chr22:19632390+ ref 839100+,839102+,839103+,
chr22:20772085-20772509: chr22:20772085-20772509 CATGCAGTAGGTGTAACTG chr22:20772149+ chr22:20772168+ ref 959891+,959892+,
chr22:46758921-46759044: chr22:46758921-46759044 GAACCTCCAGAAGGGGGCA chr22:46758974+ chr22:46758993+ ref 3671801+,3671802+,3671804+,3671805+,
chr22:41926125-41926236: chr22:41926125-41926236 CAGCAGCGCGGGGGCGCCA chr22:41926212+ chr22:41926231+ ref 3123060+,3123061+,
chr22:30255852-30256391: chr22:30255852-30256391 GAGCCTCTGGGAGGAGGAA chr22:30256283+ chr22:30256302+ ref 1919274+,1919276+,1919277+,

Note that I'm converting them to upper case, but they are in lower case or partly lower case in the file you gave me. Could that be the issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.