Giter VIP home page Giter VIP logo

Comments (15)

arrogantrobot avatar arrogantrobot commented on September 3, 2024

I just gave this a go on my own data, downloaded today. I didn't see any of the uninitialized values. I'd really like to help you get this run successfully, though. Which OS are you running this on? I'd also give a cursory glance at your data download. Mine is 8202365B, zipped. Yours should be an equal or greater size. Did you unzip the data before running? You aren't required to do so, but it could be an important piece of this puzzle. Thanks for reporting in!

from 23andme2vcf.

lynxoid avatar lynxoid commented on September 3, 2024

My genome zipped: 8218000B, 24781779B unzipped (I simply used ls -l for this). My I running under Maverics, perl 5.12.4. I run the script on the unzipped data. I haven't used Perl in a while -- are there some settings to turn off the warnings? Thanks for looking into this.

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

I am at a loss. I've run this on my own data on an osx 10.8.5 and am seeing it complete without incident. Do you see any output at all?

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

Would you mind redirecting stderr to a file and, size permitting, paste it here?

from 23andme2vcf.

MattBrauer avatar MattBrauer commented on September 3, 2024

Same issue. Output is as follows:

Use of uninitialized value $data_line in scalar chomp at 23andme2vcf.pl line 164, line 960613. Use of uninitialized value $data_line in split at 23andme2vcf.pl line 165, line 960613. Use of uninitialized value $chr in string eq at 23andme2vcf.pl line 166, line 960613. Use of uninitialized value $pos in numeric gt (>) at 23andme2vcf.pl line 170, line 960613. over and over for what looks like every line in the 23andme SNP file. Looks like it's making the VCF though.

I'm using OS X 10.8.5, Perl v.5.12.4.

from 23andme2vcf.

MattBrauer avatar MattBrauer commented on September 3, 2024

It looks like the problem might be coming from the fact that each reference allele is pulled from disk by searching through the reference file. If a snp is not present in the reference file, the entire file is parsed to the end.

You can load the whole reference file as a hash:

my %ref = ();
my $ref_fh = IO::File->new("gunzip -c $ref_path|");
while(<$ref_fh>) {
     chomp $_;
     my ($chr,$pos,$rsid,$ref) = split /\t/, $_;
     $ref{$chr}{$pos} = { rsid => $rsid,
                          ref => $ref,
                        };
}
close $ref_fh;

and avoid that problem altogether (though you still need to note when a snp is not found in the reference).

from 23andme2vcf.

vasilyev-mit avatar vasilyev-mit commented on September 3, 2024

I have the same problem. It writes endless "use of uninitialized value" and progresses slooooooowly. After 1 week of processing, I killed the script.

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

I have integrated Matt Bauer's suggested method of reading the reference into memory. If any of you still have problems, please report back and let me know. Also, if anyone who has a significant number of sites in their data that are not in the ref would be willing to send me their version of sites_not_in_reference.txt, I can add the reference data for those sites to the reference included in this tool. No personally identifying information is included in that list, just chromosome and position.

from 23andme2vcf.

vasilyev-mit avatar vasilyev-mit commented on September 3, 2024

Thank you very much, Rob!
This time I saw no warnings, and it seemed OK - except it said that "There were 61078 records skipped because the reference is out of date". Is this a big number? The file sites_not_in_reference.txt is empty. 

  • Dmitry.

On Sunday, April 20, 2014 2:03 PM, Rob [email protected] wrote:

I have integrated Matt Bauer's suggested method of reading the reference into memory. If any of you still have problems, please report back and let me know. Also, if anyone who has a significant number of sites in their data that are not in the ref would be willing to send me their version of sites_not_in_reference.txt, I can add the reference data for those sites to the reference included in this tool. No personally identifying information is included in that list, just chromosome and position.

Reply to this email directly or view it on GitHub.

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

@dimacq I would expect the number of lines in sites_not_in_reference.txt to equal the number of records skipped. I'll check into that tonight. I hope we can get your list of sites so I can add them to the reference.

I only have the list of sites that were used for the version of the microarray chip that was used on my data in summer of 2012, so it's believable to me that they have added that many new sites in two years!

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

@dimacq I just pushed a fix. If you pull the latest revision and re-run your data, you should see a sites_not_in_reference.txt file with 61078 lines. If you could submit a pull request with that file, or email to me at [email protected], I'll update the reference to include data for those sites. Then if you update and run once more, I can get those additional sites in your VCF. Thanks for your help!

from 23andme2vcf.

vasilyev-mit avatar vasilyev-mit commented on September 3, 2024

Hi Rob,

I still have warnings "Use of uninitialized value in hash element at 23andme2vcf.pl line 46, line xxx." and lots of missed sites. I tried to send you the "sites_not_in_reference.txt" file, but your mail server rejected it saying it's too big (5MB). 

  • Dmitry.
    On Wednesday, April 23, 2014 10:15 AM, Dmitry Vasilyev [email protected] wrote:

Hi Rob,

I still have warnings "Use of uninitialized value in hash element at 23andme2vcf.pl line 46, line xxx." and lots of missed sites. I am attaching my missed sites file. 

  • Dmitry.

On Monday, April 21, 2014 3:56 PM, Rob [email protected] wrote:

@dimacq I just pushed a fix. If you pull the latest revision and re-run your data, you should see a sites_not_in_reference.txt file with 61078 lines. If you could submit a pull request with that file, or email to me at [email protected], I'll update the reference to include data for those sites. Then if you update and run once more, I can get those additional sites in your VCF. Thanks for your help!

Reply to this email directly or view it on GitHub.

from 23andme2vcf.

vasilyev-mit avatar vasilyev-mit commented on September 3, 2024

Hi Rob,
This is the file sites_not_in_reference:
https://drive.google.com/file/d/0B_3J_3OlxNE7ZFAydXF3ZEhLQ0E/edit?usp=sharing

(you need to use "file"->"download", b/c preview of large txt files is not available on Google drive).
On Wednesday, April 23, 2014 10:19 AM, Dmitry Vasilyev [email protected] wrote:

Sorry Rob - it turned out to be 14MB. This is a zipped file. 
On , Dmitry Vasilyev [email protected] wrote:

Hi Rob,

I still have warnings "Use of uninitialized value in hash element at 23andme2vcf.pl line 46, line xxx." and lots of missed sites. I tried to send you the "sites_not_in_reference.txt" file, but your mail server rejected it saying it's too big (5MB). 

  • Dmitry.
    On Wednesday, April 23, 2014 10:15 AM, Dmitry Vasilyev [email protected] wrote:

Hi Rob,

I still have warnings "Use of uninitialized value in hash element at 23andme2vcf.pl line 46, line xxx." and lots of missed sites. I am attaching my missed sites file. 

  • Dmitry.

On Monday, April 21, 2014 3:56 PM, Rob [email protected] wrote:

@dimacq I just pushed a fix. If you pull the latest revision and re-run your data, you should see a sites_not_in_reference.txt file with 61078 lines. If you could submit a pull request with that file, or email to me at [email protected], I'll update the reference to include data for those sites. Then if you update and run once more, I can get those additional sites in your VCF. Thanks for your help!

Reply to this email directly or view it on GitHub.

from 23andme2vcf.

arrogantrobot avatar arrogantrobot commented on September 3, 2024

@dimacq Thank you so much for getting me the output. It has been very helpful.

I believe I have sorted this out, so to speak. I have added another reference, the v4 chip for people who got their sample processed after Nov. of 2013. If run 'git pull origin master', or you can download whole thing fresh, that should update the script to its latest version and download the new reference. Then, assuming you are on OSX or linux, if you run this command, substituting in your own personal file, it should work:

./23andme2vcf.pl /path/to/your/personal/data.zip output.vcf 4

The "4" at the end tells it to use the v4 reference.

Again, thank you for your feedback. Please let me know if this works, or if there's anything else I can do to help.

from 23andme2vcf.

vasilyev-mit avatar vasilyev-mit commented on September 3, 2024

Rob, thank you very very much!!! It worked flawlessly this time! 
You probably should mention the "4" in "usage" section of the readme?

Again, thank you!

  • Dmitry.

On Wednesday, April 23, 2014 10:42 PM, Rob [email protected] wrote:

@dimacq Thank you so much for getting me the output. It has been very helpful.
I believe I have sorted this out, so to speak. I have added another reference, the v4 chip for people who got their sample processed after Nov. of 2013. If run 'git pull origin master', or you can download whole thing fresh, that should update the script to its latest version and download the new reference. Then, assuming you are on OSX or linux, if you run this command, substituting in your own personal file, it should work:
./23andme2vcf.pl /path/to/your/personal/data.zip output.vcf 4
The "4" at the end tells it to use the v4 reference.
Again, thank you for your feedback. Please let me know if this works, or if there's anything else I can do to help.

Reply to this email directly or view it on GitHub.

from 23andme2vcf.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.