Comments (9)
Hi @jemieeffendy ,
I assume you mean the transcript mapping database (i.e., what's used for the Position Converter) and you know about the documentation.
The database on mutalyzer.nl has been building up over many years, originally being synced from the UCSC Genome Browser database for hg18 and hg19, and more recently imported from NCBI mapview files.
Just importing the NCBI mapview files as described in the documentation will unfortunately not get you the exact same contents, since older transcripts (and transcript versions) are not in the most recent mapview files.
You can still import on a per-transcript basis from the UCSC database (which often contains transcripts that are not (yet) in the mapview data).
If these are not enough for you we could make a dump of the mutalyzer.nl database. Please let me know.
from mutalyzer2.
Hi,
Yep, that is exactly right. I tried to download the sources from ncbi, but when i did my integration test, the tests are failed in some cases. Would you mind to provide me the same dump database that you have used?
with syncing the cache, i have read the docs properly. I tried to use it but it did not explain enough what i have to do after downloading the cache file. Do you think synching cache will help me on synching the database? Thanks for your response. I really appreciate this
Regards,
Jemie EffendySent from Yahoo Mail on Android
On Wed, Dec 16, 2015 at 23:29, Martijn [email protected] wrote:
Hi @jemieeffendy ,
I assume you mean the transcript mapping database (i.e., what's used for the Position Converter) and you know about the documentation: https://mutalyzer.readthedocs.org/en/latest/admin.html#managing-genome-assemblies
The database on mutalyzer.nl has been building up over many years, originally being synced from the UCSC Genome Browser database for hg18 and hg19, and more recently imported from NCBI mapview files.
Just importing the NCBI mapview files as described in the documentation will unfortunately not get you the exact same contents, since older transcripts (and transcript versions) are not in the most recent mapview files.
You can still import on a per-transcript basis from the UCSC database (which often contains transcripts that are not (yet) in the mapview data).
If these are not enough for you we could make a dump of the mutalyzer.nl database. Please let me know.
—
Reply to this email directly or view it on GitHub.
from mutalyzer2.
Do you mean syncing the cache as described in this section?
That refers to a real cache of reference sequences and is not related to the Position Converter. It's probably not relevant for you.
I can provide you with a dump of our database, we just have to agree on the format. Do you use PostgreSQL in your installation? Then a PostgreSQL level dump would be easiest I guess.
from mutalyzer2.
The api calls that i used to test my integration test are numberconverter and mapping info which maybe related to position converter of the gene. What is actually the difference between reference sequence and position converter?
Yep, i am using postgresql as recommended. And would that be a problem if i am not installing mutalyzer in virtual env as suggested?
regards,
jemie effendy
Sent from Yahoo Mail on Android
On Wed, Dec 16, 2015 at 23:41, Martijn [email protected] wrote:
Do you mean syncing the cache as described in this section?
That refers to a real cache of reference sequences and is not related to the Position Converter. It's probably not relevant for you.
I can provide you with a dump of our database, we just have to agree on the format. Do you use PostgreSQL in your installation? Then a PostgreSQL level dump would be easiest I guess.
—
Reply to this email directly or view it on GitHub.
from mutalyzer2.
I have put up a dump of our mapping database here: https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
It contains the assemblies, chromosomes, and transcript_mappings tables. You can use it as follows:
wget https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
gunzip -c mappings.2015-12-16.sql.gz | psql mutalyzer
This will drop these three tables, recreate them, and import the dumped data. So to be clear, this will overwrite any data you might already have in there!
Please download the file soon-ishly, since it will be removed on our next Mutalyzer deployment.
As for your other question. When the Mutalyzer Name Checker runs, it downloads a copy of the refrence sequence (e.g., NM_001234.5
) from the NCBI. This file is cached for faster processing next time. That cache is totally redundant from a correctness point of view, so you don't need to sync your server with it. If something is not in this cache, it will be added transparently and automatically.
The data I dumped for you just now contains mappings of transcripts to chromosomes, so bascially a big set of coordinates.
from mutalyzer2.
Hi Martijn,
Really thanks for your help. I really appreciate this. I will do it tomorrow as it is early in the morning this time. Is there anyway for me to get your data if you have updated your remote one?
Regards,
Jemie
Sent from Yahoo Mail on Android
On Thu, Dec 17, 2015 at 0:46, Martijn [email protected] wrote:
I have put up a dump of our mapping database here: https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
It contains the assemblies, chromosomes, and transcript_mappings tables. You can use it as follows:
wget https://mutalyzer.nl/static/mappings.2015-12-16.sql.gz
gunzip -c mappings.2015-12-16.sql.gz | psql mutalyzer
This will drop these three tables, recreate them, and import the dumped data. So to be clear, this will overwrite any data you might already have in there!
Please download the file soon-ishly, since it will be removed on our next Mutalyzer deployment.
As for your other question. When the Mutalyzer Name Checker runs, it downloads a copy of the refrence sequence (e.g., NM_001234.5) from the NCBI. This file is cached for faster processing next time. That cache is totally redundant from a correctness point of view, so you don't need to sync your server with it. If something is not in this cache, it will be added transparently and automatically.
The data I dumped for you just now contains mappings of transcripts to chromosomes, so bascially a big set of coordinates.
—
Reply to this email directly or view it on GitHub.
from mutalyzer2.
I will do it tomorrow as it is early in the morning this time.
The file will be there for a while, just not months.
Is there anyway for me to get your data if you have updated your remote one?
Not at the moment, I had to do this manually. But really the only thing we do to update our database from now on is import the mapview file when I see a new one is available. Then I'll update this gist, you can repeat the command and you'll have the exact same update.
from mutalyzer2.
I have downloaded the dump and put it in my database. It works really well with my integration test. Heaps of thanks. You really have made my day! :)
Last time, i was following the instruction on your gist, but apparently, it turns out that i have to download another map from NCBI still. It would be very good if you can provide a better way to sync up the database on your remote so that it could help someone like me to use local mutalyzer properly. But you have done such an amazing job to be admitted.
Thanks so much for your help
Regards,
Jemie
Pada Kamis, 17 Desember 2015 1:09, Martijn Vermaat <[email protected]> menulis:
I will do it tomorrow as it is early in the morning this time.
The file will be there for a while, just not months.
Is there anyway for me to get your data if you have updated your remote one?
Not at the moment, I had to do this manually. But really the only thing we do to update our database from now on is import the mapview file when I see a new one is available. Then I'll update this gist, you can repeat the command and you'll have the exact same update.—
Reply to this email directly or view it on GitHub.
from mutalyzer2.
Ideally it would be possible to easily get all the data you need from original sources, we don't really want to become a data broker. I realise however that in practice it could be more convenient for others if we just provide this dump in a more automated manner.
One of the problems is that some transcript (versions) are not in any of the NCBI mapview files (very old, or very new transcripts), and some will never be (very old transcripts). In that sense the contents of our own database we accumulated over the years from different sources is valuable. This is definitely on our radar, thanks for letting us know your opinion.
I'm closing this ticket, feel free to let us know if you have further questions.
from mutalyzer2.
Related Issues (20)
- Mutalyzer configuration HOT 3
- "This is not an LRG record." error
- About the reference of 13:46108853 HOT 1
- Inconsistent results between mutalyzer and ref genome fasta HOT 2
- 2 allele descrption not recognized
- Position Converter error "not found in database" suggestion
- Position Converter "Chr" error suggestion
- Silent mutations HOT 1
- Alleles HOT 1
- HGVS link outdated
- submitBatchJob error HOT 3
- Inconsistent NameChecker results for transcript-reference disagree HOT 1
- IGV hg38 TTN gene no NM transcript id, only XM id HOT 2
- Emit warning on position correction
- Mutalyzer down? HOT 1
- Error When using local mutalyzer HOT 1
- Alternate sequence extractor tool/feature? HOT 4
- intron numbering - missing warning HOT 3
- Local Mutalyzer runs slowly HOT 2
- The check position is not exactly the same as my query pos HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mutalyzer2.