Giter VIP home page Giter VIP logo

Comments (4)

martijnvermaat avatar martijnvermaat commented on July 28, 2024

https://github.com/alexpenson/scripts/blob/master/chr_ncbi_to_ucsc.sed

But there must be a better source (where we originally got them from?).

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

This is really a mess. Looks like we can sort of get what we want by first going to UCSC genome browser and selecting from ctgPos2 (GRC Map Contigs) everything that starts at 0 (i.e., is the same as the complete chromosome it is located on):

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19
mysql> select * from ctgPos2 where chromStart = 0;
+----------------------+---------+-----------------------+------------+----------+------+
| contig               | size    | chrom                 | chromStart | chromEnd | type |
+----------------------+---------+-----------------------+------------+----------+------+
| HSCHR17_CTG1         |  296626 | chr17                 |          0 |   296626 | F    |
| HSCHR1_RANDOM_CTG5   |  106433 | chr1_gl000191_random  |          0 |   106433 | F    |
| HSCHR1_RANDOM_CTG12  |  547496 | chr1_gl000192_random  |          0 |   547496 | F    |
...

Now go to the GRC website, click on the assembly accession you want, and click Download the full sequence report:

In that file, lookup the contig we got from the UCSC database, (e.g., HSCHR1_RANDOM_CTG5) and observe the corresponding refseq accession number (e.g., NT_113878.1).

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

@p.e.m.taschner What do you think? Is there an easier way to get the RefSeq accession numbers and UCSC chromosome name couplings?

Here's an example of what we already have.

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

After discussing this:

The mappings on these contigs that have no chromosome definition are bogus anyway, since there is no way to use them in the position converter (Mutalyzer will always first try to get the corresponding chromosome definition). So we decided to discard them for now.

At some point we should decide what to do with all contigs that are not part of the primary assembly. But currently, our user interface is not really suited for using them, so that needs some more thinking. It may really confuse users if we start adding more mappings on these non-primary assembly contigs.

By the way, we only have transcript mappings on the non-primary assembly contigs from our old UCSC imports. The current NCBI mapview import discards them completely. We're not fixing this before we decided how to handle and present them to the user.

from mutalyzer2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.