Giter VIP home page Giter VIP logo

chm13's Introduction

Telomere-to-telomere consortium CHM13 project

We have sequenced the CHM13hTERT human cell line with a number of technologies. Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. The data includes 30x PacBio HiFi, 120x coverage of Oxford Nanopore, 70x PacBio CLR, 50x 10X Genomics, as well as BioNano DLS and Arima Genomics HiC. Most raw data is available from this site, with the exception of the PacBio data which was generated by the University of Washington/PacBio and is available from NCBI SRA.

A UCSC browser hub is available for CHM13 and T2T-Primates. Track updates will be made to this hub until integrated into the UCSC Genome Browser for hs1. Legacy UCSC browsers are available for v2.0, v1.0 and v1.1 versions.

An interactive dotplot visualization of all genomic repeats is also available from resgen.io. Known issues identified in the assembly are tracked at CHM13 issues.

Latest assembly release

T2T-CHM13v2.0 (T2T-CHM13+Y)

Complete T2T reconstruction of a human genome with Y. Changes from v1.1 is the addition of a finished chromosome Y from the GIAB HG002/NA24385 sample, sequenced both by GIAB and HPRC. This genome is also available at NCBI (GCA_009914755.4) and at UCSC. Note that even though the UCSC browser shows the Genbank accessions as sequence names on the browser itself, it can load annotations in BED/bigBed/BAM/CRAM/bigWig and other formats or search using the "chr1/2/etc" names.

Previous assembly releases are available below:

Downloads

Sequencing data

The sequencing dataset generated for CHM13 is available on this page.

Analysis set

Analysis set for using T2T-CHM13v2.0 (T2T-CHM13+Y) as a reference for mapping based research is available at aws with a README.

  • chm13v2.0.fa.gz: T2T-CHM13v2.0 assembly with sequences soft-masked using the repeat models discovered by the T2T team. The original sequence accession numbers are shown in the FASTA header.
  • chm13v2.0_noY.fa.gz: excluding the Y chromosome. This file only contains sequences derived from the CHM13 cell line and is identical to T2T-CHM13v1.1. Use this file for benchmarking assemblies of CHM13.
  • chm13v2.0_PAR.bed: pseudoautosomal regions (PARs)
  • chm13v2.0_maskedY.fa.gz: PARs on chrY hard masked to "N"
  • chm13v2.0_maskedY.rCRS.fa.gz: PARs on chrY hard masked to "N" and mitochondrion replaced with rCRS (AC:NC_012920.1)

Sep. 28 2022 update: all analysis-set fa.gz files have been re-compressed with bgzip. Index files are available at aws with updated md5s in the README.

Gene annotation

Repeat annotation

Epigenetic profile

Variant calls

Liftover resources

Non-syntenic region

Notes on downloading files

Files are generously hosted by Amazon Web Services under s3://human-pangenomics/T2T/CHM13 and through this web interface.

Although available as straight-forward HTTP links, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3:// addressing scheme, i.e. replace https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/ with s3://human-pangenomics/T2T to download. For example, to download CHM13_prep5_S13_L002_I1_001.fastq.gz to the current working directory use the following command.

aws s3 --no-sign-request cp s3://human-pangenomics/T2T/CHM13/10x/CHM13_prep5_S13_L002_I1_001.fastq.gz .

or to download the full dataset use the following command.

aws s3 --no-sign-request sync s3://human-pangenomics/T2T/CHM13/ .

The s3 command can also be used to get information on the dataset, for example reporting the size of every file in human-readable format.

aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://human-pangenomics/T2T/CHM13/ 

or to obtain technology-specific sizes.

aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://human-pangenomics/T2T/CHM13/nanopore/fast5
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://human-pangenomics/T2T/CHM13/nanopore/rel2
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://human-pangenomics/T2T/CHM13/assemblies

Amending the max_concurrent_requests etc. settings as per this guide will improve download performance further.

Contact

Please raise issues on this Github repository concerning this dataset.

Data reuse and license

All data is released to the public domain (CC0) and we encourage its reuse. We would appreciate if you would acknowledge and cite the "Telomere-to-Telomere" (T2T) Consortium for the creation of this data. More information about our consortium can be found on the T2T homepage and a list of related citations is available below:

T2T-CHM13v2.0, datasets released along the v2.0 and the T2T-Y chromosome

The complete sequence of a human genome and companion papers (T2T-CHM13v0.9-v1.1):

  1. Nurk S, Koren S, Rhie A, Rautiainen M, et al. The complete sequence of a human genome. Science, 2022.
  2. Vollger MR, et al. Segmental duplications and their variation in a complete human genome. Science, 2022.
  3. Gershman A, et al. Epigenetic Patterns in a Complete Human Genome. Science, 2022.
  4. Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, et al. A complete reference genome improves analysis of human genetic variation. Science, 2022.
  5. Hoyt SJ, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science, 2022.
  6. Altemose N, et al. Complete genomic and epigenetic maps of human centromeres. Science, 2022.
  7. Wagner J, et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol, 2022.
  8. McCartney AM, Shafin K, Alonge M, et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods, 2022.
  9. Formenti G, Rhie A, et al. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat Methods, 2022.
  10. Jain C, et al. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods, 2022.
  11. Altemose N, Maslan A, Smith OK et al. DiMeLo-seq: a long-read, single-molecule method for mapping protein–DNA interactions genome wide. Nat Methods, 2022.

Earlier citations:

  1. Vollger MR, et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Annals of Human Genetics, 2019.
  2. Miga KH, Koren S, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature, 2020.
  3. Nurk S, Walenz BP, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Research, 2020.
  4. Logsdon GA, et al. The structure, function, and evolution of a complete human chromosome 8. Nature, 2021.

History

* rel1 and 2: 2nd March 2019. Initial release.
* asm v0.6 and canu rel2 assembly: 28th May 2019. Assembly update.
* Hi-C data added: 25th July 2019. Data update.
* asm v0.6 alignments of rel2 added: 30th Aug 2019. Data Update
* rel3: 16th Sept 2019. Data update.
* chrX v0.7, canu 1.9 and flye 2.5 rel3 assembly: 24th Oct 2019. Assembly update.
* shasta rel3 assembly: 20th Dec 2019. Assembly update.
* chr8 v3, rel4 data: 21 Feb 2020. Data and assembly update.
* update rel3 partition names since some tars included more than a single partition. 16 Apr 2020.
* add CLR/HiFi mappings to chrX v0.7. 8 May 2020.
* update partitions 23,28,30,53,55 and add 227-231 (data was missing from upload). 13 May 2020. Data update.
* add rel5 guppy 3.6.0 data: 4 Jun 2020. Data update.
* add chr8 v9. Aug 26 2020. Assembly update.
* add v0.9/v1.0 genome releases. Sept 22 2020. Assembly update.
* add v0.9/v1.0 alignment files. Sept 29 2020. Assembly update.
* add new UW data. Oct 6 2020. Data update.
* add rna-seq data. Dec 4 2020. Data update.
* add repeat and telomere annotations for v1.0. Dec 17 2020. Assembly annotation update.
* v1.1 assembly and related files. May 7 2021. Assembly update.
* v2.0 assembly and related files. Dec 2 2022. Assembly and annotation update.
* 1KGP variant calls for all chromosomes. Jan. 3 2023. Annotation update.
* 1KGP and SGDP bam / vcf released publicly on [AnVIL_T2T_CHRY](https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_T2T_CHRY). May 23, 2023. Data Update.
* 1KGP AF release. Jul 6 2023. Annotation update.
* Curated RefSeq/Liftoff v5.1 release. Jul 6 2023. Annotation update.

chm13's People

Contributors

aphillippy avatar arangrhie avatar erikhuck avatar matthewramsey avatar maximilianh avatar mschatz avatar skoren avatar snurk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chm13's Issues

Duplicate fast5 downloads

Hello and many thanks for sharing your data.

I'm currently rebasecalling the data using the latest methods and noticed that many of the fast5 downloads are duplicates of other partitions. Are there reads missing and, if so, is it possible to obtain them please?

I've confirmed the duplication unpacking the files and comparing the reads. Its curious that the duplicate files have a different md5sum to the original; presumably the order in which the reads are packed in the file was not deterministic.

In all, I think there are the following equivalent partitions:

  • 97, 98
  • 145, 146, 147, 148
  • 149, 150
  • 151, 152
  • 153, 154
  • 155, 156, 157
  • 164, 165, 166
  • 167, 168
  • 169, 170, 171
  • 198, 199
  • 200, 201
  • 205, 206
  • 209, 210
  • 211, 212
  • 219, 220

You can approximately confirm the duplication by looking at the file sizes provided by S3

aws s3 --no-sign-request ls --recursive  --summarize s3://nanopore-human-wgs/chm13/nanopore/fast5 | sort -gk 3 | cut -d ' ' -f 3,4 | rev |  uniq -c -f 1 | sort | rev

Centromere PFGE Southern blot valiation

in your article ,the centromere was valiated by PFGE Southern blotting.I asked you Which company did this experiment in? we want to do this experiment for my project in the future,thanks~

BAM files

Hey, it's really amazing that you made all this data available! I was just curious if you happened to have alignments (BAM or CRAM) to the latest v1.0 assembly?

CHM13

dear author
image
I'd like to ask, when you assemble the centromere which data pool is applyed,I want to repeat this article

BSGenome package

Hi,
Is it possible to create a BSgenome package for CHM13 v1.0 and v1.1, which will make CHM13 accessible to many R packages?
Thanks.

T2T v1.0 bed file

Hello,

Do you have a bed file or coordinates for v1.0 assembly? I am filtering telomere enriched LR sequences mapped to v1.0. Thank you.

-Todd

hg38 and chm13 1.1 liftovers?

Are there chain files somewhere to go from:
hg38 > chm13 1.1
chm13 1.1 > hg38
?

The only way I saw to achieve this was to go from hg38 > chm13 1.0 > chm13 1.1. It'd be preferred to just go directly to 1.1

Thanks!

Homopolymer compression?

Hi, Has the CHM13 v1 reference undergone homopolymer compression? What is the recommended way to align PacBio Hifi reads to this reference? Are standard pbmm2 settings ok?

Less data in rel7 than rel6?

I've downloaded both the rel6 and rel7 fastq files. The rel6 fastq is 352GB in size while the rel7 fastq is only 100GB in size. Can I check if data is missing in the rel7 fastq files?

Bionano Saphyr BNX and CMAP

I have a few questions about these files

  • Does the BNX file contain all the data necessary to create a truly complete Bionano optical genome mapping version of the CHM13 reference genome? Meaning will it contain telomere to telomere structural variant (SV) information (all the SVs in the genome from all the chromosomes)?
  • Does the BNX file contain data from the Y chromosome as well?
  • How was the cmap created? Was it created from the BNX file using bionano's software (bionano-solve or bionano access)? Can it be used as a reference for aligning other cmap files from samples processed using bionano's saphyr instrument?

CCS reads using CHM13 assembly

To whom it may concern,

I am hoping to use the CHM13 draft genome and the CCS reads to do some benchmarking of germline mutation calls. I was wondering which CCS reads were used for the draft v1.0 genome described in the post.

I found the following CCS reads in the SRA:

CHM13-CCS-20kb-m64062_190806_063919
CHM13-CCS-20kb-m64062_190803_042216
CHM13-CCS-15kb-m64062_190807_194840
CHM13-CCS-15kb-m64062_190804_172951
CHM13-CCS-11kb-m64015_190225_155953
CHM13-CCS-11kb-m64011_190228_190319
CHM13-CCS-11kb-m64015_190221_025712
CHM13-CCS-11kb-m64015_190224_013150

  1. The X chromosome described in the paper Telomere-to-telomere assembly of a complete human X chromosome seems to be assembled from ONT ultra-long reads and 70X CLR reads
  2. the draft genome described in the paper Improved assembly and variant detection of a haploid human genome using single‐molecule, high‐fidelity long reads is assembled from 11kb 24X sequence coveage CCS reads

I was not sure which HiFi reads were used for the draft v1.0 genome construction and I was wondering if someone could illuminate towards which HiFI reads that were not used in the assembly and the polishing process.

Best,
Sangjin

Rel3 Ultra-long vs regular ONT

Hi Sergey,

Do you have fastqs for the ultra long data and the regular ont data as two separate datasets, or a way to figure that out from the rel3 fastq.gz?

I am trying to build length distributions for the different datasets.

Thanks!
Mitchell

hg38 to CHM13 LiftOver and chr15 seg dup

Hoping someone might be able to help in interpretation of a hg38 seg dup and how this sequence is represented in CHM13. We have two variants of interest at chr15:82229032 and chr15:82241415 (GRCh38) - both of which are within a known segmental duplication on chr15 (in GRCh38) https://genome.ucsc.edu/cgi-bin/hgc?hgsid=1302038497_dN3q1vp2PMvkQkstUHwVIwAzCuA3&db=hg38&c=chr15&l=82130232&r=82262734&o=82190662&t=82270742&g=genomicSuperDups&i=chr15%3A84066812

We lifted these variants over using the hg38.t2t-chm13-v1.0.over.chain.gz chain to lift hg38 to CHM13 v1 and then the v1.0_to_v1.1.chain chain to lift CHM13 v1 to v1.1. Interestingly, with this approach, both coordinates lift over to the same position in CHM13v1.1 at chr15:80105532. One interpretation we considered was that the two seg dups in GRCh38 have been consolidated/collapsed into one in CHM13v1.1. Is that the correct interpretation? Could you confirm that chr15:80105532 (CHM13v1.1) is the correct liftover coordinate for both chr15:82229032 and chr15:82241415 (GRCh38) positions?

Additionally, when aligning our reads to CHM13v1.1 we don't see evidence for the chr15:82241415 (GRCh38) variant (at the same position, according to liftover). We interpreted this as the variant-supporting reads being (properly?) aligned to a different region of the CHM13v1.1 reference, instead of being (improperly?) aligned to the chr15:82241415 region where they supported a variant call. Would you say that's the correct interpretation?

Any insights here would be greatly appreciated! Thanks so much!

Access to ONT fasta format reads?

Hello,
The fast5 for ONT data is so big.
Is there any access to all the ONT reads or the long ONT reads (99 Gbp of data in reads >50 kbp, 32x) in the format of "fasta.gz".
It would be much more convenient to download the data and usually doing assembly or SV detection will not need the quality information.

Thanks!

rel2.fastq.gz md5

Hi @skoren

Looks like the md5 hash for rel2.fastq.gz is written incorrectly. It is 35 characters in the README file. The first 3 characters seem to be a mistake (26s), the remainder is what I get when I calculate the hash (7e3f4ded02d500a3db0c76c84cdc42b9)

Y chromosome

Hi,

are you aiming to integrate an existing (or newly assembled) Y chromosome at some point to make this a general purpose genome? Or is this not planned at present ?

I almost used this in a metagenomics reference sequence collection (i.e. for excluding non-microbial reads from the patient) but only just noticed that the Y chromosome was missing.

Thanks for your efforts.
Colin

cannot get accurate protein sequences from the gff file

I tried to extracted the cds sequences from the gff file.

gffread -g chm13.draft_v1.1.fasta -x cds.fa chm13.draft_v1.1.gene_annotation.v4.gff3

however, when trying to translate the cds to proteins, the open reading frame is not correct for quite many sequences. Is there a way to download the predicted protein sequences?

UCSC genome browser hub

Is there a UCSC genome browser hub for CHM13-v1? That would make it easy to start doing easy liftovers, adding tracks, etc. Thanks.

PacBio HIFI raw data

Hello, please let me thank you first for this great effort and releasing publicly all data !
I was curious though concerning the PacBio HIFI raw data as I could not find the publishing directory of these.
Are only the CCS reads available or is there a possibility to access as well the raw sub-reads instead ?
Cheers

Visualize annotations

Good morning,
I would like to know how can i display the last assembly, including the gene annotations, in my laptop. Which programme do you use?
Thanks a lot.

Coordinate conversion in UCSC browser

Hi,
It would be super useful to activate both the View->In other genome, and the Liftover tools in the v1.0 and v1.1 UCSC browsers, both between them and to hg19/h38.

Many users will need very quick conversions of coordinates, without having to run the liftover command line tools with chain files.

Thanks.

canu multi platform assembly parameters

Hi,

The bioRxiv paper says Canu was used for both data types (Nanopore and PacBio).

It seems that authors run Canu with combination of PB and ONT with the parameters (genomeSize=3.1g corMhapSensitivity=normal ovlMerThreshold=500 correctedErrorRate=0.085 trimReadsCoverage=2 trimReadsOverlap=500 -pacbio-raw)

Why didn't they run canu with multi platform parameter (-pacbio-raw -ont-raw)?

If I have a misunderstanding, please let me know.

Thanks a lot,
Changhan

bgzipped genome fasta

Just a suggestion, feel free to ignore and close this issue but it might be helpful to provide the gzipped fasta files (e.g. chm13.draft_v1.0.fasta.gz) as bgzipped files. This will make it possible to use with bioinformatics tools requiring random access to the files.

GFF version 2.0 lacking mitochondrial encoded genes

Hello,

The GFF version 2.0 seems to be lacking mitochondrial encoded genes. Will you be releasing a new annotation including those?

Alternatively, would it be safe to assume that the coordinates for MT genes have not changed? If so I could copy them from the GRCh38 reference annotation in the meantime.

Thanks,
Bernardo

Problem using the provided GFF3 annotations

I'm having trouble using the provided GFF3 annotations from https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v1.1.gene_annotation.v4.gff3.gz

Here's what I'm doing:
bcftools csq $VCF -f chm13.draft_v1.1.fasta -g chm13.draft_v1.1.gene_annotation.v4.gff3.gz -O z -o $VCF.csq.vcf.gz

And here's my ouput:

Parsing chm13.draft_v1.1.gene_annotation.v4.gff3.gz ...
ignored: chr1	CAT	gene	14253	21099	.	+	.	source_gene_common_name=AC114498.1;source_gene=ENSG00000235146.2;gene_biotype=lncRNA;gene_alternate_contigs=chr6:172104635-172111468;gene_id=CHM13_G0000001;gene_name=AC114498.1;transcript_modes=transMap;ID=CHM13_G0000001;Name=AC114498.1;source_transcript=N/A;alternative_source_transcripts=N/A;collapsed_gene_ids=N/A;collapsed_gene_names=N/A;paralogy=N/A;unfiltered_paralogy=N/A;alignment_id=N/A;frameshift=N/A;exon_anotation_support=N/A;intron_annotation_support=N/A;transcript_class=N/A;valid_start=N/A;valid_stop=N/A;proper_orf=N/A;extra_paralog=False
ignored: chr1	CAT	transcript	14253	21099	8940	+	.	source_transcript=ENST00000423796.1;source_transcript_name=AC114498.1-201;source_gene=ENSG00000235146.2;transcript_modes=transMap;gene_biotype=lncRNA;transcript_biotype=lncRNA;alignment_id=ENST00000423796.1-1;frameshift=nan;exon_annotation_support=1,1;intron_annotation_support=1;transcript_class=ortholog;valid_start=True;valid_stop=True;adj_start=nan;adj_stop=nan;proper_orf=True;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000002329.1;havana_transcript=OTTHUMT00000006707.1;paralogy=nan;unfiltered_paralogy=ENST00000423796.1-2;gene_alternate_contigs=chr6:172104635-172111468;source_gene_common_name=AC114498.1;transcript_id=CHM13_T0000001;gene_id=CHM13_G0000001;Parent=CHM13_G0000001;transcript_name=AC114498.1-201;ID=CHM13_T0000001;Name=AC114498.1;gene_name=CHM13_G0000001;alternative_source_transcripts=N/A;collapsed_gene_ids=N/A;collapsed_gene_names=N/A;extra_paralog=False
[csq.c:715 gff_id_parse] Could not parse the line, "Parent=transcript:" not present: chr1	CAT	exon	14253	14325	.	+	.	source_transcript=ENST00000423796.1;source_transcript_name=AC114498.1-201;source_gene=ENSG00000235146.2;transcript_modes=transMap;gene_biotype=lncRNA;transcript_biotype=lncRNA;alignment_id=ENST00000423796.1-1;exon_annotation_support=1,1;intron_annotation_support=1;transcript_class=ortholog;valid_start=True;valid_stop=True;adj_start=nan;adj_stop=nan;proper_orf=True;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000002329.1;havana_transcript=OTTHUMT00000006707.1;paralogy=nan;unfiltered_paralogy=ENST00000423796.1-2;gene_alternate_contigs=chr6:172104635-172111468;source_gene_common_name=AC114498.1;transcript_id=CHM13_T0000001;gene_id=CHM13_G0000001;Parent=CHM13_T0000001;transcript_name=AC114498.1-201;ID=exon:CHM13_T0000001:0;Name=AC114498.1;rna_support=N/A;reference_support=True;gene_name=CHM13_G0000001;alternative_source_transcripts=N/A;collapsed_gene_ids=N/A;collapsed_gene_names=N/A;frameshift=N/A;extra_paralog=False

Any guidance would be much appreciated! Thanks!

Hg38 lift over to v1. 1

Hello,

What is the recommended method to liftover hg38 panel targets bed to v1. 1? I see the chain file for v1. 0 to 1.1 but nothing for hg38. Look forward to trying!

Thanks

Is there a downloadable GTF/GFF file?

Hello T2T team,

Congratulations on the finished genome - this is an absolutely staggering achievement!

Is there a link to the annotation used in the paper in GTF or GFF format? I've found the UCSC browser page, but the gene tracks there are very numerous and it's not clear how to download them.

Thank you in advance!

About 5 rDNA arrays assembly

In your assembly summarization,only the 5 rDNA arrays remain unfinished. in future ,how to your team assembly those unfinished region ?

CHM13 cytoband positions for building an ideogram

Hello,

Thanks for all your great work in building the CHM13 assembly! I would like to plot some of my results of v2.0 using karyoploteR, and I am able to build a custom ideogram, but I cannot find a file with the positions of the cytobands. I see the centromere and telomere files for v1.1 on the UCSC table browser, but I was wondering if there is a BED file with all of the cytoband positions?

Thanks,
Mike

CHM13 to grch37/grch38 liftOver

Dear author,

I was wondering whether the T2T consortium could provide an over.chain file for liftOver of some coordinates from grch37/grch38 to CHM13 V1 draft genome and vice versa. I could create the file based on lastz alignments, but I thought having a publicly available would be useful for other as as well.

Regards,
Sangjin

File sizes on files?

Please put file sizes next to each file or an summary per section. This helps with planning storage requirements before downloading.

Sequencing platform for CHM13 dataset

Hi,

I understand that the CHM13 nanopore datasets were generated at 4 different sites with PromethION sequencing done at UCD (runs 225 and 226), and MinION/GridION was presumably done at the other 3 sites (NHGRI, U of Nottingham, and UW).

From the T-to-T consortium paper (https://www.nature.com/articles/s41586-020-2547-7#Sec6), it was mentioned that "Most sequencing was performed on the Nanopore GridION with FLO-MIN106 or FLO-MIN106D R9 flow cells, with the exception of one Flongle flow cell used for testing."

Can I confirm if the GridION was used to generate the datasets from NHGRI, U of Nottingham, and UW? Alternatively, were some of the runs generated with a MinION? If so, is there a way for us to tell which runs these are?

Thanks for your help!

Centromere locations

Hi, Are you able to upload a BED file of centromere locations? You have one for telomeres, but centromeres would also be useful. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.