Giter VIP home page Giter VIP logo

vcfdist's Introduction

vcfdist: benchmarking phased variant calls

build DOI

Overview

Introduction

vcfdist is a distance-based germline variant calling evaluation tool that:

  • simultaneously evaluates SNPS, INDELs, and SVs up to 10Kb
  • requires local phasing information for truth and query variants
  • can standardize query and truth VCF variant representations
  • can evaluate flip and switch phasing errors

vcfdist provides more accurate SNP, INDEL, and SV precision-recall curves than previous work, particularly when complex variants are involved.

This project is currently under active development. We welcome the submission of any feedback, issues, or suggestions for improvement! Check out the wiki for more information.

Citation

Please cite the following works if you use vcfdist:

[Nature Comms] vcfdist: Accurately benchmarking phased small variant calls in human genomes
@article{dunn2023vcfdist,
  author={Dunn, Tim and Narayanasamy, Satish},
  title={vcfdist: Accurately benchmarking phased small variant calls in human genomes},
  journal={Nature Communications},
  year={2023},
  volume={14},
  number={1},
  pages={8149},
  issn={2041-1723},
  doi={10.1038/s41467-023-43876-x},
  URL={https://doi.org/10.1038/s41467-023-43876-x}
}
[bioRxiv] Jointly benchmarking small and structural variant calls with vcfdist
@article{dunn2024vcfdist,
  author={Dunn, Tim and Zook, Justin M and Holt, James M and Narayanasamy, Satish},
  title={Jointly benchmarking small and structural variant calls with vcfdist},
  journal={bioRxiv},
  year={2024},
  publisher={Cold Spring Harbor Laboratory},
  doi={10.1101/2024.01.23.575922},
  URL={https://doi.org/10.1101/2024.01.23.575922}
}

Installation

Option 1: bioconda

If you have conda installed with the bioconda channel (instructions here), run:

conda install vcfdist

Option 2: Docker image

A pre-built Docker Hub image can be downloaded from here using:

sudo docker pull timd1/vcfdist
sudo docker run -it timd1/vcfdist:latest vcfdist --help

Option 3: GitHub source

vcfdist is developed for Linux and its only dependencies are GCC v9.1+ and HTSlib v1.17. If you don't have HTSlib already, please set it up as follows:

> wget https://github.com/samtools/htslib/releases/download/1.17/htslib-1.17.tar.bz2
> tar -xvf htslib-1.17.tar.bz2
> cd htslib-1.17
> ./configure --prefix=/usr/local
> make
> sudo make install
> export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

If you do already have HTSlib installed elsewhere, make sure you've added it to your LD_LIBRARY_PATH, and that the HTSlib headers are included during compilation. At this point, installation is as simple as cloning the repository and building the executable. It should compile in less than one minute.

> git clone https://github.com/timd1/vcfdist
> cd vcfdist/src
> make
> sudo make install
> vcfdist --version
vcfdist v2.5.3

Usage

The demo directory contains a demo script (shown below) and all required inputs. It operates on the first 5 million bases on chr1, and should run in about 3 seconds.

vcfdist \
    query.vcf \
    nist-v4.2.1_chr1_5Mb.vcf.gz \
    GRCh38_chr1_5Mb.fa \
    -b nist-v4.2.1_chr1_5Mb.bed \
    -p results/ \
    -v 0

You can expect to see the following output:

PRECISION-RECALL SUMMARY

TYPE   THRESHOLD     TRUTH_TP  QUERY_TP  TRUTH_FN  QUERY_FP  PREC     RECALL   F1_SCORE  F1_QSCORE
SNP    NONE Q >= 0   8222      8222      1         2         0.9997   0.9998   0.9998    37.3885
SNP    BEST Q >= 0   8222      8222      1         2         0.9997   0.9998   0.9998    37.3885

INDEL  NONE Q >= 0   876       876       51        12        0.9864   0.9449   0.9652    14.5953
INDEL  BEST Q >= 0   876       876       51        12        0.9864   0.9449   0.9652    14.5953

SV     NONE Q >= 0   0         0         0         0         1.0000   1.0000   1.0000    100.000
SV     BEST Q >= 0   0         0         0         0         1.0000   1.0000   1.0000    100.000

ALL    NONE Q >= 0   9098      9098      52        14        0.9984   0.9943   0.9963    24.4200
ALL    BEST Q >= 0   9098      9098      52        14        0.9984   0.9943   0.9963    24.4200

To include more details on intermediate results, run it again at higher verbosity by removing the -v 0 flag.

Wiki

The vcfdist wiki has helpful information on command-line parameters, output documentation, and implementation.

License

This project is covered under the GNU GPL v3 license.

vcfdist's People

Contributors

allyssonallan avatar lh3 avatar timd1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vcfdist's Issues

What about hap.py xcmp?

In the preprint you seem to suggest that vcfdist is the first tool to allow partial credit, but xcmp has had that option for a long time. I suggest citing the hap.py github repo (https://github.com/Illumina/hap.py) as they did in the Krusche et al. Nat Biotech paper. I also suggest benchmarking against xcmp in addition to vcfeval.

Inconsistent classification of same variant

Me again 😬

I have a curious case where the same variants from two different variant callers gives a TP for one and a FP for the other...

Here is the variant (with context) from Clair3 with the vcfdist assessments

chromosome      3462134 .       G       C       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:10999:5:0:0:.:.  1:TP:1.000000:1:0:gm:24:10999:5:0:0:0:0
chromosome      3462135 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:10999:4:0:0:.:.  1:TP:1.000000:1:0:gm:16:10999:4:0:0:0:0
chromosome      3462136 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:10999:3:0:0:.:.  1:TP:1.000000:1:0:gm:23:10999:3:0:0:0:0
chromosome      3462139 .       G       GGT     .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:10999:.:.:0:.:.   1:TP:1.000000:5:0:gm:17:10999:2:0:0:0:0
chromosome      3462140 .       T       G       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:5:0:gm:0:10999:2:0:0:.:.  .:.:.:.:.:.:.:10999:.:.:0:.:.
chromosome      3462141 .       A       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:5:0:gm:0:10999:2:0:0:.:.  .:.:.:.:.:.:.:10999:.:.:0:.:.
chromosome      3462142 .       T       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:10999:.:.:0:.:.   1:TP:1.000000:5:0:gm:16:10999:2:0:0:0:0
chromosome      3462143 .       T       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:5:0:gm:0:10999:2:0:0:.:.  .:.:.:.:.:.:.:10999:.:.:0:.:.
chromosome      3462144 .       ACC     A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:10999:.:.:0:.:.   1:TP:1.000000:5:0:gm:25:10999:2:0:0:0:0
chromosome      3462145 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:5:0:gm:0:10999:2:0:0:.:.  .:.:.:.:.:.:.:10999:.:.:0:.:.
chromosome      3462146 .       C       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:5:0:gm:0:10999:2:0:0:.:.  .:.:.:.:.:.:.:10999:.:.:0:.:.
chromosome      3462151 .       A       G       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:10999:1:0:0:.:.  1:TP:1.000000:1:0:gm:16:10999:1:0:0:0:0

and the same variants from DeepVariant

chromosome      3462134 .       G       C       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:11000:5:0:0:.:.  1:TP:1.000000:1:0:gm:20:11000:5:0:0:0:0
chromosome      3462135 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:11000:4:0:0:.:.  1:TP:1.000000:1:0:gm:12:11000:4:0:0:0:0
chromosome      3462136 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:11000:3:0:0:.:.  1:TP:1.000000:1:0:gm:11:11000:3:0:0:0:0
chromosome      3462139 .       G       GGT     .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:11000:.:.:0:.:.   1:FP:0.600000:5:2:lm:3:11000:2:0:0:0:0
chromosome      3462140 .       T       G       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.600000:5:2:lm:0:11000:2:0:0:.:.  .:.:.:.:.:.:.:11000:.:.:0:.:.
chromosome      3462141 .       A       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.600000:5:2:lm:0:11000:2:0:0:.:.  .:.:.:.:.:.:.:11000:.:.:0:.:.
chromosome      3462142 .       T       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:11000:.:.:0:.:.   1:FP:0.600000:5:2:lm:22:11000:2:0:0:0:0
chromosome      3462143 .       T       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.600000:5:2:lm:0:11000:2:0:0:.:.  .:.:.:.:.:.:.:11000:.:.:0:.:.
chromosome      3462145 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.600000:5:2:lm:0:11000:2:0:0:.:.  .:.:.:.:.:.:.:11000:.:.:0:.:.
chromosome      3462146 .       C       A       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.600000:5:2:lm:0:11000:2:0:0:.:.  .:.:.:.:.:.:.:11000:.:.:0:.:.
chromosome      3462151 .       A       G       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:11000:1:0:0:.:.  1:TP:1.000000:1:0:gm:14:11000:1:0:0:0:0

the truth variants for this region are

chromosome      3462134 2abfe98a        G       C       .       PASS    .       GT      1
chromosome      3462135 75aed307        C       T       .       PASS    .       GT      1
chromosome      3462136 9c7391b9        C       T       .       PASS    .       GT      1
chromosome      3462140 60d3b9d8        T       G       .       PASS    .       GT      1
chromosome      3462141 a08fd538        A       T       .       PASS    .       GT      1
chromosome      3462143 f3d80833        T       A       .       PASS    .       GT      1
chromosome      3462145 b8c36e5e        C       T       .       PASS    .       GT      1
chromosome      3462146 f6f2bac0        C       A       .       PASS    .       GT      1
chromosome      3462151 ecc19214        A       G       .       PASS    .       GT      1

You can see in the clair3 VCF that there are positions marked at TP when there is no QUERY variant and vice versa. This doesn't seem to make a lot of sense? Then the same positions for DeepVariant are (correctly) marked as FPs and FNs.

I have attached two tarballs with the necessary data to reproduce this with the following command

vcfdist ATCC_10708__202309.100x.clair3.filter.vcf.gz truth.vcf.gz mutreference.fna --largest-variant 50 --credit-threshold 1.0 -p c3_ATCC_10708. -b ATCC_10708__202309.bed -mx 99

clair3_bug_data.tar.gz
dv_bug_data.tar.gz

Contig not in truth

When running vcfdist, if a contig has a variant in the query VCF, but not in the truth VCF, I get this error

Contig 'plasmid_2' found in query VCF but not truth VCF. Please provide BED file.

It would be nice if vcfdist could seamlessly deal with this and make any variant on the contig a FP if no BED is provided. (This is what hap.py does)

Infinite loop during backtracking in rare cases

Looks like I'm incorrectly handling an edge case for switching haplotypes with the new alignment algorithm during backtracking. I'll work on fixing this ASAP.

Release 1.0.2 should be stable.

precision-recall-summary.tsv formatting

The precision-recall-summary.tsv header has an extra \t between PREC and RECALL as well as RECALL and F1_SCORE. Not sure if this is intentional.

Code with extra tabs in header

fprintf(out_pr_summ, "VAR_TYPE\tMIN_QUAL\tTRUTH_TP\tQUERY_TP\tTRUTH_FN\tQUERY_FP\tPREC\t\tRECALL\t\tF1_SCORE\tF1_QSCORE\n");

INFO("%sTYPE\tMIN_QUAL\tTRUTH_TP\tQUERY_TP\tTRUTH_FN\tQUERY_FP\tPREC\t\tRECALL\t\tF1_SCORE\tF1_QSCORE%s",

Also when the min and max QUAL are both 0 the tsv table has duplicate rows, would be nice to either include a note about this in the documentation or modify the code to limit the number of row output based on QUAL values in the QUERY vcf. This is arguably a unique issue to my comparison, I'm using a vcf generated by dipcall as TRUTH and PAV as QUERY.

analysis-v2 questions

I have a few starter questions about some scripts inside analysis-v2.

$timer -v truvari refine \
-f $data/refs/$ref_name \
-t 64 \
--regions ./truvari-${aln}/${query_names[i]}/candidate.refine.bed \
--use-original-vcfs \
--recount \
--align ${aln} \
./truvari-${aln}/${query_names[i]} \
2> ./truvari-${aln}/${query_names[i]}.log

This line does not match up with the documentation of how truvari refine runs whole genome analysis (details]. Is there a reason for this?

-e $data/giab-tr-v4.20/GIABTR.HG002.benchmark.regions.bed \

This line appears to be using the very first prototype version of TR regions (v4.20). Would this command also work on the v1.0 release? Also, should there be any consideration for Tier1 vs Tier2 regions?

# source ~/software/Truvari-v4.0.0/venv3.10/bin/activate

Could the changes to phab since v4.0 affect results?

Variant is same as truth, but assigned truth FN and query FP

I have an unusual case where I have a variant in the vcfdist summary VCF which has a truth BD of FN and a query BD of FP.

Here is the vcfdist entry (with some context). The variant of interest is POS 1770515

chromosome_1    1770510 .       A       G       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:TP:1.000000:1:0:gm:0:13581:5:0:0:.:.  1:TP:1.000000:1:0:gm:4697:13581:5:0:0:0:0
chromosome_1    1770512 .       T       TC      .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.000000:1:1:.:0:13581:4:0:0:.:.   .:.:.:.:.:.:.:13581:.:.:0:.:.
chromosome_1    1770513 .       T       C       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:13581:.:.:0:.:.   1:FP:0.000000:.:.:.:4697:13581:2:0:0:0:0
chromosome_1    1770515 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.500000:2:1:lm:0:13581:1:0:0:.:.  1:FP:0.500000:2:1:lm:4697:13581:1:0:0:0:0
chromosome_1    1770516 .       C       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  .:.:.:.:.:.:.:13581:.:.:0:.:.   1:FP:0.000000:.:.:.:4697:13581:0:0:0:0:0
chromosome_1    1770517 .       A       T       .       PASS    .       GT:BD:BC:RD:QD:BK:QQ:SC:SG:PS:PB:BS:FE  1:FN:0.500000:2:1:lm:0:13581:1:0:0:.:.  .:.:.:.:.:.:.:13581:.:.:0:.:.

Here is the context for this variant in the truth VCF

chromosome_1    1770510 d37c924f        A       G       .       PASS    .       GT      1
chromosome_1    1770512 e2a4cbea        T       TC      .       PASS    .       GT      1
chromosome_1    1770515 9a7ac0ad        C       T       .       PASS    .       GT      1
chromosome_1    1770517 3e922a93        A       T       .       PASS    .       GT      1

and the query VCF

chromosome_1    1770510 .       A       G       4697.14 PASS    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=171;CIGAR=1X3M1X2M1I2M1X1M1X;DP=173;DPB=188.75;DPRA=0;EPP=15.2137;EPPR=0;GTI=0;LEN=13;MEANALT=2;MQM=59.4971;MQMR=0;NS=1;NUMALT=1;ODDS=240.275;PAIRED=0.976608;PAIREDR=0;PAO=8;PQA=270;PQR=0;PRO=0;QA=5569;QR=0;RO=0;RPL=151;RPP=220.932;RPPR=0;RPR=20;RUN=1;SAF=78;SAP=5.8675;SAR=93;SRF=0;SRP=0;SRR=0;TYPE=complex        GT:DP:AD:RO:QR:AO:QA:GL 1:173:0,171:0:0:171:5569:-499.353,-51.4761,0
chromosome_1    1770513 .       T       C       4697.14 PASS    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=171;CIGAR=1X3M1X2M1I2M1X1M1X;DP=173;DPB=188.75;DPRA=0;EPP=15.2137;EPPR=0;GTI=0;LEN=13;MEANALT=2;MQM=59.4971;MQMR=0;NS=1;NUMALT=1;ODDS=240.275;PAIRED=0.976608;PAIREDR=0;PAO=8;PQA=270;PQR=0;PRO=0;QA=5569;QR=0;RO=0;RPL=151;RPP=220.932;RPPR=0;RPR=20;RUN=1;SAF=78;SAP=5.8675;SAR=93;SRF=0;SRP=0;SRR=0;TYPE=complex        GT:DP:AD:RO:QR:AO:QA:GL 1:173:0,171:0:0:171:5569:-499.353,-51.4761,0
chromosome_1    1770515 .       C       T       4697.14 PASS    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=171;CIGAR=1X3M1X2M1I2M1X1M1X;DP=173;DPB=188.75;DPRA=0;EPP=15.2137;EPPR=0;GTI=0;LEN=13;MEANALT=2;MQM=59.4971;MQMR=0;NS=1;NUMALT=1;ODDS=240.275;PAIRED=0.976608;PAIREDR=0;PAO=8;PQA=270;PQR=0;PRO=0;QA=5569;QR=0;RO=0;RPL=151;RPP=220.932;RPPR=0;RPR=20;RUN=1;SAF=78;SAP=5.8675;SAR=93;SRF=0;SRP=0;SRR=0;TYPE=complex        GT:DP:AD:RO:QR:AO:QA:GL 1:173:0,171:0:0:171:5569:-499.353,-51.4761,0
chromosome_1    1770516 .       C       T       4697.14 PASS    AB=0;ABP=0;AC=2;AF=1;AN=2;AO=171;CIGAR=1X3M1X2M1I2M1X1M1X;DP=173;DPB=188.75;DPRA=0;EPP=15.2137;EPPR=0;GTI=0;LEN=13;MEANALT=2;MQM=59.4971;MQMR=0;NS=1;NUMALT=1;ODDS=240.275;PAIRED=0.976608;PAIREDR=0;PAO=8;PQA=270;PQR=0;PRO=0;QA=5569;QR=0;RO=0;RPL=151;RPP=220.932;RPPR=0;RPR=20;RUN=1;SAF=78;SAP=5.8675;SAR=93;SRF=0;SRP=0;SRR=0;TYPE=complex        GT:DP:AD:RO:QR:AO:QA:GL 1:173:0,171:0:0:171:5569:-499.353,-51.4761,0

The position 1770515 is exactly the same in the query as it is in the truth, so I am a little unsure why it has been classified in this way? I see it was given partial credit of 0.5 - shouldn't it be 1.0?

How does it deal with sex chromosomes?

The procedure for determining partial credit in hap.py's xcmp takes into account the sex of the query sample (and thus whether the X chromosome is haploid). I don't see a parameter for the sample sex in vcfdist, so how does it take this into account, or do you have another way of dealing with it?

Errors during `make`

Interesting tool! We usually use hap.py for evaluation of DeepVariant, but I am interested to try vcfdist.

I ran into an error immediately though:

# Checking I have htslib as README suggested:
echo $LD_LIBRARY_PATH
/usr/local/google/home/marianattestad/bin/htslib-1.9/

git clone https://github.com/timd1/vcfdist
cd vcfdist/src
make

During make, it gave this error:

g++ -c -Wall -std=c++17 -O2 print.cpp
g++ -c -Wall -std=c++17 -O2 variant.cpp
In file included from variant.h:12,
                 from variant.cpp:8:
variant.cpp: In constructor ‘variantData::variantData(std::string, std::shared_ptr<fastaData>, int)’:
variant.cpp:459:22: warning: format ‘%li’ expects argument of type ‘long int’, but argument 5 has type ‘int32_t’ {aka ‘int’} [-Wformat=]
  459 |                 WARN("No GQ tag in %s VCF at %s:%li",
  460 |                         callset_strs[callset].data(), seq.data(), rec->pos);
      |                                                                   ~~~~~~~~
      |                                                                        |
      |                                                                        int32_t {aka int}
defs.h:118:22: note: in definition of macro ‘WARN’
  118 |     fprintf(stderr, (f_), ##__VA_ARGS__);                       \
      |                      ^~
variant.cpp:467:19: warning: format ‘%li’ expects argument of type ‘long int’, but argument 5 has type ‘int32_t’ {aka ‘int’} [-Wformat=]
  467 |             ERROR("Failed to read %s GT at %s:%li\n",
  468 |                     callset_strs[callset].data(), seq.data(), rec->pos);
      |                                                               ~~~~~~~~
      |                                                                    |
      |                                                                    int32_t {aka int}
defs.h:144:22: note: in definition of macro ‘ERROR’
  144 |     fprintf(stderr, (f_), ##__VA_ARGS__);                        \
      |                      ^~
g++ -c -Wall -std=c++17 -O2 dist.cpp
g++ -c -Wall -std=c++17 -O2 bed.cpp
g++ -c -Wall -std=c++17 -O2 cluster.cpp
g++ -c -Wall -std=c++17 -O2 phase.cpp
g++ -c -Wall -std=c++17 -O2 edit.cpp
g++ -Wall -std=c++17 -O2 globals.o print.o variant.o dist.o bed.o cluster.o phase.o edit.o -o vcfdist main.cpp -lz -lhts -lstdc++fs

Is this a bug or am I missing something?

Thanks!

Add bcftools to docker image

Again, thank you for the awesome work on this method - I love it.

It would be quite handy to have bcftools available in the docker image. Would this be something you'd be open to adding? (Totally fine if not as it is a convenience not a necessity)

A few minor (but quality-of-life) improvements, some questions too

Hi,

I started to use vcfdist recently to evaluate some small variant calls from dual assemblies I have generated and it has been going well so far, thanks for the great work :) I have a few questions and also some minor suggestions if you're open to it.

Questions

  • It is unclear to me if the flip error rate and the switch error rate can be used to actually compare two query VCF files? My understanding is that those metrics are related to the super clusters which might or might not be different from one query VCF to another?
  • Could you point out to some documentation about the GA4GH VCF output for summary.vcf? I'm not sure how to interpret duplicated variants in this output (same positions and alleles but different genotypes?). Seems like homozygous genotypes are split into two heterozygous variants, is it to give partial credit?
  • I would like to stratify my "false" variants which got some credit by edit distance to baseline variants. Is this possible? Right now, seems that the best I could do is stratification by benchmark score (FORMAT/BC).

Minor improvements?

  • Would it be possible to output the Precision, Recall and F1 for combined SNPs and indels? Right now, I am doing it manually from the output but it is a little tedious.
  • In the summary.vcf file, the field FORMAT/BC is said to be of type string in the VCF header which I thought was a mistake at first until I saw the field could be .. It seems the only reason BC would be equal to . is when evaluating a query variant which is a FN. In such case, reporting 0.00000 would be fine I think? Point is, having FORMAT/BC as a string rather than a float prevents any kind of filtering based on this field with bcftools
  • Very minor: in the header of summary.vcf, replace variant Quality with Variant Quality for the field GQ

Thanks,
Guillaume

QUERY_TP and precision calculations

Hi there! We are currently using vcfdist for comparing variant callers. Our true set consists of 54 SNPs and 3 InDels (all of them homozygous). They seem to be picked up correctly as seen in the commands below

tail -n +2 sample.resultstruth.tsv | awk -F "\t" '{if ($3 > 0) print $0}' | wc -l # Awk used to only consider one haplotype
57

tail -n +2 sample.resultstruth.tsv | awk -F "\t" '{if ($3 > 0) print $0}' | cut -f 8 | sort | uniq -c
57 TP

This seems very positive indeed! Nonetheless, I am afraid I might not be interpreting the "resultsprecision-recall-summary.tsv" file correctly. Thus, I wanted to ask if you could help me clearing some points out.

The "resultsprecision-recall-summary.tsv" shows the following output:

image

I understand that homozygous alleles are split into 2 haplotypes, thus TRUTH_TP of 108 SNPs accounts for the 54 SNPs in the true set (54*2 =108). I am also aware that QUERY is used to calculate precision; however, for most variant callers that we have tested the number of variants found for TRUTH_TP and QUERY_TP are equal or very similar. Nonetheless, for the table above where FreeBayes variant caller was used, they seem to differ. Thus I would like to kindly ask:

(1) Is this an expected behaviour? How should this be interpreted and/or are there any implications on precision/recalling calculations?
(2) How should the QUERY_TP be interpreted? Could this be further elaborated?

Thank you very much for any reply and support and also for this very useful tool!!
Cheers,

Support stratified variant comparison

GIAB provides stratification bed files and hap.py gives the option to produce stratified results based on these files. It's very useful to be able to look at benchmarking performance by type of region. It would be great if vcfdist could support this option as well.

command terminated by signal 9

I use vcfdist to evaluate a phased assembly of HG002. The variants are called by dipcall. The command terminted by signal 9. Vcfdist was ran on a server with 1.5Tb RAM using default command where bed file is generated by dipcall.

Handle all possible input alleles

At the moment, vcfdist assumes there are two alleles (sex chromosomes contain only 1) and that each allele is either ., 0, 1, or 2. Some VCFs contains 3, 4, and even beyond.

I should also add warnings if input variants are unphased, while I'm at it.

Bioconda package

Hi, it would be great if there was a bioconda package for this tool.

Very high memory usage when realigning

Using v2.5.1, when I use the --realign-truth --realign-query options I am getting crazy high memory usage. This is for a bacterial sample, so the genome is ~4MB and the VCF is of negligible size. So far I have had all my jobs fail when requesting 64GB of memory on our cluster. This seems much too high right?

Cannot start an already-running timer (writing)

I get an error "Cannot start an already-running timer (writing)." whenever I try to use the realign flags.
Using -rq leads to "Cannot start" error. Using -rt leads to "Cannot stop" error.
Version is v2.5.0

command:

vcfdist tmp1.vcf tmp2.vcf ../../h37rv_20231215.fa -p small_test/ -rq -v 2

output:

[INFO  vcfdist 10:50:17] Command: 'vcfdist tmp1.vcf tmp2.vcf ../../h37rv_20231215.fa -p small_test/ -rq -v 2'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [0/8] Loading reference FASTA '../../h37rv_20231215.fa'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 0/8] Parsing QUERY VCF 'tmp1.vcf'
[WARN  vcfdist 10:50:17] 'PS' tag not defined in QUERY VCF header, assuming one phase set per contig
[INFO  vcfdist 10:50:17]   Genotypes:
[INFO  vcfdist 10:50:17]       1: 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Variant types:
[INFO  vcfdist 10:50:17]     Haplotype 1
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 1
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]     Haplotype 2
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 0
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 1 | 0 variants
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   QUERY VCF overview:
[INFO  vcfdist 10:50:17]     TOTAL: 1
[INFO  vcfdist 10:50:17]     KEPT : 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [T 0/8] Parsing TRUTH VCF 'tmp2.vcf'
[WARN  vcfdist 10:50:17] 'PS' tag not defined in TRUTH VCF header, assuming one phase set per contig
[INFO  vcfdist 10:50:17]   Genotypes:
[INFO  vcfdist 10:50:17]       1: 1
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Variant types:
[INFO  vcfdist 10:50:17]     Haplotype 1
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 1
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]     Haplotype 2
[INFO  vcfdist 10:50:17]       REF: 0
[INFO  vcfdist 10:50:17]       SNP: 0
[INFO  vcfdist 10:50:17]       INS: 0
[INFO  vcfdist 10:50:17]       DEL: 0
[INFO  vcfdist 10:50:17]       CPX: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 1 | 0 variants
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   TRUTH VCF overview:
[INFO  vcfdist 10:50:17]     TOTAL: 1
[INFO  vcfdist 10:50:17]     KEPT : 1
[INFO  vcfdist 10:50:17]   Writing original query VCF to './small_test/orig-query.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Checking contigs:
[INFO  vcfdist 10:50:17]     All contig checks passed!
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 1/8] Wavefront clustering QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 2/8] Realigning QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [Q 3/8] Wavefront reclustering QUERY VCF 'tmp1.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [T 3/8] Wavefront clustering TRUTH VCF 'tmp2.vcf'
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [4/8] Superclustering TRUTH and QUERY variants
[INFO  vcfdist 10:50:17]            Total superclusters: 1
[INFO  vcfdist 10:50:17]   Largest supercluster (bases): 3
[INFO  vcfdist 10:50:17]   Largest supercluster  (vars): 2
[INFO  vcfdist 10:50:17]   Average supercluster (bases): 3.000
[INFO  vcfdist 10:50:17]   Average supercluster  (vars): 2.000
[INFO  vcfdist 10:50:17]               QUERY phase sets: 1
[INFO  vcfdist 10:50:17]         QUERY phase block NG50: 4411532
[INFO  vcfdist 10:50:17]               TRUTH phase sets: 1
[INFO  vcfdist 10:50:17]         TRUTH phase block NG50: 4411532
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Sorting superclusters by size
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [5/8] Calculating precision and recall
[INFO  vcfdist 10:50:17]   Superclusters using   0.000 to   1.000 GB RAM each ( 64 threads):        1
[INFO  vcfdist 10:50:17]   Superclusters using   1.000 to   2.000 GB RAM each ( 32 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   2.000 to   4.000 GB RAM each ( 16 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   4.000 to   8.000 GB RAM each (  8 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using   8.000 to  16.000 GB RAM each (  4 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using  16.000 to  32.000 GB RAM each (  2 threads):        0
[INFO  vcfdist 10:50:17]   Superclusters using  32.000 to  64.000 GB RAM each (  1 threads):        0
[INFO  vcfdist 10:50:17]     done with precision-recall
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [6/8] Skipping distance metrics
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17] [7/8] Phasing superclusters
[INFO  vcfdist 10:50:17]   Contigs:
[INFO  vcfdist 10:50:17]     [ 0] NC_000962.3: 0 switch errors, 0 flip errors, 1 phase blocks
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]              Total  phase blocks: 1
[INFO  vcfdist 10:50:17]              Total switch errors: 0
[INFO  vcfdist 10:50:17]              Total   flip errors: 0
[INFO  vcfdist 10:50:17]   Supercluster switch error rate: 0.000000%
[INFO  vcfdist 10:50:17]     Supercluster flip error rate: 0.000000%
[INFO  vcfdist 10:50:17]                Phase block  NG50: 0
[INFO  vcfdist 10:50:17]   (switch)     Phase block NGC50: 0
[INFO  vcfdist 10:50:17]   (switchflip) Phase block NGC50: 0
[INFO  vcfdist 10:50:17]  
[INFO  vcfdist 10:50:17]   Writing phasing summary to './small_test/phasing-summary.tsv'
[ERROR vcfdist 10:50:17] Cannot start an already-running timer (writing).

I get this even with a simple test vcf:

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=/opt/bin/run_clair3.sh --bam_fn=aln.bam --ref_fn=h37rv_20231215.fa.gz --threads=4 --platform=ont --model_path=r941_prom_hac_g360_g422 --include_all_ctgs --no_phasing_for_fa --haploid_sensitive --enable_long_indel --sample_name=aln.bam --output=clair_out
##reference=/home/ubuntu/clair3_test/work/b3/70944cf0cf12a9e81fa8b66abba325/h37rv_20231215.fa.gz
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=NC_000962.3,length=4411532>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	aln.bam
NC_000962.3	1977	.	A	G	25.50	PASS	P	GT:GQ:DP:AD:AF	1:25:78:0,75:0.9615

and

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=/opt/bin/run_clair3.sh --bam_fn=aln.bam --ref_fn=h37rv_20231215.fa.gz --threads=4 --platform=ont --model_path=r941_prom_sup_g5014 --include_all_ctgs --no_phasing_for_fa --haploid_sensitive --enable_long_indel --sample_name=aln.bam --output=clair_out
##reference=/home/ubuntu/clair3_test/work/76/abd15184b468f56892711a46836a33/h37rv_20231215.fa.gz
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=NC_000962.3,length=4411532>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	aln.bam
NC_000962.3	1977	.	A	G	36.28	PASS	F	GT:GQ:DP:AD:AF	1:36:78:0,75:0.9615

Fix incredibly rare edge case when assigning credit to variants at same position

When comparing PAV to T2T, there are 3 cases (out of 7 million variants) where adjacent INS+DEL variants cause issues with credit assignment. This happens at

  • sc 124112, chr5:80,538,408
  • sc 77422, chr11:41,966,547
  • sc 1741, chr17:1,131,664

To view these errors, uncomment the "Non-zero edit distance..." warning on line 1235 of dist.cpp.

"terminate called after throwing an instance of 'std::out_of_range'"

Hi Tim

I got an error while running vcfdist on a DeepVariant output VCF.
Here is a reproducible example:

# Grab a chr20 subset for quick iteration:
bcftools view https://42basepairs.com/download/gs/deepvariant/case-study-outputs/1.5.0/PacBio/deepvariant.output.vcf.gz chr20 > deepvariant-small-sample.vcf

# Other inputs:
gs://deepvariant/case-study-testdata/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
gs://deepvariant/case-study-testdata/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed
gs://deepvariant/case-study-testdata/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz


DV="deepvariant-small-sample.vcf"
TRUTH="HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz"
TRUTH_REGIONS="HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed"
REF="GCA_000001405.15_GRCh38_no_alt_analysis_set.fa"
OUTPUT="vcfdist_on_deepvariant/"
mkdir $OUTPUT

vcfdist "${DV}" "${TRUTH}" "${REF}" --bed "${TRUTH_REGIONS}" --prefix "${OUTPUT}" --verbosity 1  2>&1 | tee vcfdist_on_deepvariant.log

Output:

[INFO  vcfdist 13:40:31] Command: 'vcfdist deepvariant-small-sample.vcf /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/GRCh38.no_alt_analysis_set.fa.gz --bed /usr/local/google/home/marianattestad/commonly_used_deepvariant_files/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed --prefix vcfdist_on_deepvariant/ --verbosity 1'
[INFO  vcfdist 13:40:31]
[INFO  vcfdist 13:40:31] Parsing VCF 'deepvariant-small-sample.vcf'
[WARN  vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1278060, skipping
[WARN  vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1288626, skipping
[WARN  vcfdist 13:40:31] QUERY VCF variant overlap at chr20:1339612, skipping
[WARN  vcfdist 13:40:31] QUERY VCF variant overlap at chr20:2248990, skipping
[WARN  vcfdist 13:40:31] QUERY VCF variant overlap at chr20:3220062, skipping
...
[INFO  vcfdist 13:40:31]     Haplotype 2
[INFO  vcfdist 13:40:31]       REF  0
[INFO  vcfdist 13:40:31]       SNP  71148
[INFO  vcfdist 13:40:31]       INS  5920
[INFO  vcfdist 13:40:31]       DEL  5642
[INFO  vcfdist 13:40:31]       CPX  0
[INFO  vcfdist 13:40:31]
[INFO  vcfdist 13:40:31]   Contigs:
[INFO  vcfdist 13:40:31]       'chr20' hap 1: 29067 variants
[INFO  vcfdist 13:40:31]       'chr20' hap 2: 82710 variants
[INFO  vcfdist 13:40:31]
[INFO  vcfdist 13:40:31]   VCF Overview:
[INFO  vcfdist 13:40:31]     VARIANTS  311095
[INFO  vcfdist 13:40:31]     KEPT HAP1 29067
[INFO  vcfdist 13:40:31]     KEPT HAP2 82710
[INFO  vcfdist 13:40:31]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

I got the same error on the full VCF that was .gz with a .tbi index as well.

Unphased evaluation?

Hi, thank you for the lovely software and informative publication. Do you still have plans to extend vcfdist to evaluate unphased VCFs, as you indicated in your paper? As part of my work and I'm sure for many other groups, the majority of reference-based SV calling is unphased and in tandem repeats and so it would be great to have an unphased evaluation method that handles the representation issues you mentioned.

Thanks

Compile error on Ubuntu

variant.cpp:384:13: error: 'isnan' was not declared in this scope; did you mean 'std::isnan'

Contig not in reference FASTA or position out of range (generate_str)

I have hit this error

ERROR("Contig '%s' not in reference FASTA or position out of range (generate_str)", ctg.data());

Specifically, I get

[ERROR vcfdist 12:14:06] Contig 'plasmid_2' not in reference FASTA or position out of range (generate_str)

plasmid_2 is definitely in the reference FASTA, so my only assumption is there some out of range/indexing problem.

Here is a tarball with the files used. They were run with v2.5.2 with the command

vcfdist BPH2947__202310.10x.bcftools.filter.vcf.gz truth.vcf.gz mutreference.fna --largest-variant 50 --credit-threshold 1.0 --realign-truth --realign-query -p BPH2947__202310.without_repetitive_regions. -b BPH2947__202310.unique_regions.bed -mx 234.985

test_data.tar.gz

No logging colours if writing to file

If the log is being redirected to a file, it would be nice to disable colouring, otherwise the log file looks like thi

ESC[32m[INFO  vcfdist 09:35:06]ESC[0m ESC[35m[0/8] Loading reference FASTAESC[0m '/data/scratch/projects/punim2009/NanoVarBench/results/truth/ATCC_25922__202309/mutreference.fna'
ESC[32m[INFO  vcfdist 09:35:06]ESC[0m
ESC[32m[INFO  vcfdist 09:35:06]ESC[0m ESC[35m[Q 0/8] Parsing QUERY VCFESC[0m '/data/scratch/projects/punim2009/NanoVarBench/results/call/mutref/clair3/100x/simplex/v4.3.0/[email protected]/ATCC_25922__202309.100x.clair3.filter.vcf.gz'
ESC[33m[WARN  vcfdist 09:35:06]ESC[0m 'PS' tag not defined in QUERY VCF header, assuming one phase set per contig
ESC[32m[INFO  vcfdist 09:35:06]ESC[0m   Genotypes:
ESC[32m[INFO  vcfdist 09:35:06]ESC[0m       1: 8762

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.