Giter VIP home page Giter VIP logo

Comments (9)

martijnvermaat avatar martijnvermaat commented on July 28, 2024

Hi @jemieeffendy ,

I assume you mean the transcript mapping database (i.e., what's used for the Position Converter) and you know about the documentation.

The database on mutalyzer.nl has been building up over many years, originally being synced from the UCSC Genome Browser database for hg18 and hg19, and more recently imported from NCBI mapview files.

Just importing the NCBI mapview files as described in the documentation will unfortunately not get you the exact same contents, since older transcripts (and transcript versions) are not in the most recent mapview files.

You can still import on a per-transcript basis from the UCSC database (which often contains transcripts that are not (yet) in the mapview data).

If these are not enough for you we could make a dump of the mutalyzer.nl database. Please let me know.

from mutalyzer2.

jemieeffendy avatar jemieeffendy commented on July 28, 2024

Hi,
Yep, that is exactly right. I tried to download the sources from ncbi, but when i did my integration test, the tests are failed in some cases. Would you mind to provide me the same dump database that you have used?
with syncing the cache, i have read the docs properly. I tried to use it but it did not explain enough what i have to do after downloading the cache file. Do you think synching cache will help me on synching the database? Thanks for your response. I really appreciate this
Regards,
Jemie EffendySent from Yahoo Mail on Android

On Wed, Dec 16, 2015 at 23:29, Martijn [email protected] wrote:
Hi @jemieeffendy ,

I assume you mean the transcript mapping database (i.e., what's used for the Position Converter) and you know about the documentation: https://mutalyzer.readthedocs.org/en/latest/admin.html#managing-genome-assemblies

The database on mutalyzer.nl has been building up over many years, originally being synced from the UCSC Genome Browser database for hg18 and hg19, and more recently imported from NCBI mapview files.

Just importing the NCBI mapview files as described in the documentation will unfortunately not get you the exact same contents, since older transcripts (and transcript versions) are not in the most recent mapview files.

You can still import on a per-transcript basis from the UCSC database (which often contains transcripts that are not (yet) in the mapview data).

If these are not enough for you we could make a dump of the mutalyzer.nl database. Please let me know.


Reply to this email directly or view it on GitHub.

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

Do you mean syncing the cache as described in this section?

https://mutalyzer.readthedocs.org/en/latest/admin.html#synchronizing-the-cache-with-other-installations

That refers to a real cache of reference sequences and is not related to the Position Converter. It's probably not relevant for you.

I can provide you with a dump of our database, we just have to agree on the format. Do you use PostgreSQL in your installation? Then a PostgreSQL level dump would be easiest I guess.

from mutalyzer2.

jemieeffendy avatar jemieeffendy commented on July 28, 2024

The api calls that i used to test my integration test are numberconverter and mapping info which maybe related to position converter of the gene. What is actually the difference between reference sequence and position converter?
Yep, i am using postgresql as recommended. And would that be a problem if i am not installing mutalyzer in virtual env as suggested?
regards,
jemie effendy

Sent from Yahoo Mail on Android

On Wed, Dec 16, 2015 at 23:41, Martijn [email protected] wrote:
Do you mean syncing the cache as described in this section?

https://mutalyzer.readthedocs.org/en/latest/admin.html#synchronizing-the-cache-with-other-installations

That refers to a real cache of reference sequences and is not related to the Position Converter. It's probably not relevant for you.

I can provide you with a dump of our database, we just have to agree on the format. Do you use PostgreSQL in your installation? Then a PostgreSQL level dump would be easiest I guess.


Reply to this email directly or view it on GitHub.

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

I have put up a dump of our mapping database here: https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz

It contains the assemblies, chromosomes, and transcript_mappings tables. You can use it as follows:

wget https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
gunzip -c mappings.2015-12-16.sql.gz | psql mutalyzer

This will drop these three tables, recreate them, and import the dumped data. So to be clear, this will overwrite any data you might already have in there!

Please download the file soon-ishly, since it will be removed on our next Mutalyzer deployment.

As for your other question. When the Mutalyzer Name Checker runs, it downloads a copy of the refrence sequence (e.g., NM_001234.5) from the NCBI. This file is cached for faster processing next time. That cache is totally redundant from a correctness point of view, so you don't need to sync your server with it. If something is not in this cache, it will be added transparently and automatically.

The data I dumped for you just now contains mappings of transcripts to chromosomes, so bascially a big set of coordinates.

from mutalyzer2.

jemieeffendy avatar jemieeffendy commented on July 28, 2024

Hi Martijn,
Really thanks for your help. I really appreciate this. I will do it tomorrow as it is early in the morning this time. Is there anyway for me to get your data if you have updated your remote one?
Regards,
Jemie

Sent from Yahoo Mail on Android

On Thu, Dec 17, 2015 at 0:46, Martijn [email protected] wrote:
I have put up a dump of our mapping database here: https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz

It contains the assemblies, chromosomes, and transcript_mappings tables. You can use it as follows:
wget https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
gunzip -c mappings.2015-12-16.sql.gz | psql mutalyzer

This will drop these three tables, recreate them, and import the dumped data. So to be clear, this will overwrite any data you might already have in there!

Please download the file soon-ishly, since it will be removed on our next Mutalyzer deployment.

As for your other question. When the Mutalyzer Name Checker runs, it downloads a copy of the refrence sequence (e.g., NM_001234.5) from the NCBI. This file is cached for faster processing next time. That cache is totally redundant from a correctness point of view, so you don't need to sync your server with it. If something is not in this cache, it will be added transparently and automatically.

The data I dumped for you just now contains mappings of transcripts to chromosomes, so bascially a big set of coordinates.


Reply to this email directly or view it on GitHub.

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

I will do it tomorrow as it is early in the morning this time.

The file will be there for a while, just not months.

Is there anyway for me to get your data if you have updated your remote one?

Not at the moment, I had to do this manually. But really the only thing we do to update our database from now on is import the mapview file when I see a new one is available. Then I'll update this gist, you can repeat the command and you'll have the exact same update.

from mutalyzer2.

jemieeffendy avatar jemieeffendy commented on July 28, 2024

I have downloaded the dump and put it in my database. It works really well with my integration test. Heaps of thanks. You really have made my day! :)
Last time, i was following the instruction on your gist, but apparently, it turns out that i have to download another map from NCBI still. It would be very good if you can provide a better way to sync up the database on your remote so that it could help someone like me to use local mutalyzer properly. But you have done such an amazing job to be admitted.
Thanks so much for your help
Regards,
Jemie

Pada Kamis, 17 Desember 2015 1:09, Martijn Vermaat <[email protected]> menulis:

I will do it tomorrow as it is early in the morning this time.
The file will be there for a while, just not months.
Is there anyway for me to get your data if you have updated your remote one?
Not at the moment, I had to do this manually. But really the only thing we do to update our database from now on is import the mapview file when I see a new one is available. Then I'll update this gist, you can repeat the command and you'll have the exact same update.—
Reply to this email directly or view it on GitHub.

from mutalyzer2.

martijnvermaat avatar martijnvermaat commented on July 28, 2024

Ideally it would be possible to easily get all the data you need from original sources, we don't really want to become a data broker. I realise however that in practice it could be more convenient for others if we just provide this dump in a more automated manner.

One of the problems is that some transcript (versions) are not in any of the NCBI mapview files (very old, or very new transcripts), and some will never be (very old transcripts). In that sense the contents of our own database we accumulated over the years from different sources is valuable. This is definitely on our radar, thanks for letting us know your opinion.

I'm closing this ticket, feel free to let us know if you have further questions.

from mutalyzer2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.