Giter VIP home page Giter VIP logo

btmartin721 / clinehelpr Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 1.0 95.64 MB

Detects Outliers and plots genomic clines from BGC output, and extends the plotting functionality of INTROGRESS to Correlate genomic clines and hybrid indices with Environmental Variables

License: GNU General Public License v3.0

R 82.50% Python 9.13% Shell 5.92% Dockerfile 2.45%
genomic-cline bgc introgression introgress r r-package chromosome ideogram enmeval maxent

clinehelpr's Introduction


Hi there!
I'm Bradley T. Martin, PhD
Bioinformatician, Data Scientist,
Evolutionary Biologist, and Population Geneticist

Twitter URL Follow me on Twitter!

โœ‰๏ธ [email protected]


I earned my Ph.D. in Biological Sciences, focusing on evolutionary biology and population genomics. While pursuing my doctorate, I developed a love for programming. Currently, I work as a bioinformatician and data scientist, specializing in developing machine learning software to process and analyze short-read and long-read genomic data.

I enjoy learning and new challenges and greatly enjoy my work. For personal projects, I am currently developing several new software to implement machine learning into population genomic analyses, which shows promise for resolving long-standing issues involving introgression, species delimitation, and phylogenomics.


๐Ÿงฐ Toolbox

Python Logo R Logo Java Logo Bash Logo C Plus Plus Logo Java Logo Perl Logo



๐Ÿ“ˆ My GitHub Stats

Top Langs

Bradley's GitHub Stats


clinehelpr's People

Contributors

btmartin721 avatar tkchafin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

altingia

clinehelpr's Issues

phiPlot visualization issue

Hello.
Thanks for this great package to visualize and explore bgc output.
I'm having a visualization issue with the phi plot that I haven't been able to figure out.
The image seems to be missing the portions of the curves extending from hybrid index = 0.95.
Can you help?
Regards
Screenshot 2024-02-19 at 1 15 15 PM

get_bgc_outliers error

Hi there! Hope you are doing well. I'm fairly new to this and attempting to use the ClineHelpR functions on my bgc outputs, but keep running into the same error when I get to the get_bgc_ouliers step(see output below).

image

It fails my run each time at this step and never creates the gene.outliers object. I did make sure to include "loci.file=NULL" in my code and I'm only really interested in outputting the Phi Plots. So I'm not quite sure how to fix this issue. Any guidance would be fantastic. Thank you!

image

GSL error.

Hi Bradley,

thank you for your fantastic tool.

I am getting errors when running run_bgc.sh, and I have no idea where to look for the problem:

After:
Initializing MCMC chain
gsl: ../gsl/gsl_vector_int.h:182: ERROR: index out of range

I attached my input files and I'd be most grateful for any pointers.

Kind regards

Ludo

bgc_p1in.txt
bgc_p0in.txt
bgc_loci.txt
bgc_admixedin.txt
bgc_settings.txt

vcf2bgc.py file

Hello,

I'm trying to convert my vcf file (from ipyrad) to bcg using vcf2bgc.py file. For this, I'm using your example data files download from DRYAD but I get an error:

./vcf2bgc.py -v eatt.trans.finalfilt.recode.vcf -m eatt.bgc.popmap_final.txt --p1 PureEA --p2 PureTT --admixed EATT -o example_tutorial

P1 population has 8 individuals...

P2 population has 8 individuals...

Admixed populalation has 85 individuals...

Processing 233 records in VCF file...

Traceback (most recent call last):
File "./vcf2bgc.py", line 378, in
main()
File "./vcf2bgc.py", line 140, in main
normalize_linkagemap(pos_list, pos_min, pos_max, chrom_number, linkage_fh, locus_list)
File "./vcf2bgc.py", line 192, in normalize_linkagemap
mylist[i] = (val - nmin) / (nmax - nmin)
ZeroDivisionError: integer division or modulo by zero

Does anybody have an idea what is wrong? Any help would be appreciated.
The same error occurs with my own data.

The package is really helpful, congrats!
Thank you!

jinja2

Not sure it is only me but I had to change jinja2 version in environment.yml to 2.11.3 else conda could not resolve the environment

Hybrid Index

Hi! I would like to know whether there is a way to provide the hybrid index (admixture coefficient), calculated in previous analysis, in ClineHelpR like in HIEST: https://cran.r-project.org/web/packages/HIest/HIest.pdf

The idea is to avoid classifying individuals in discrete populations, but rather use a continuous measure of admixture coefficient.

Thank you so much!

ClineHelpR R package conda install?

Hi!

I'm in the process of creating my bgc input files. The genind2bgc function is taking a long time to run in RStudio, so I thought I would run the R script on our remote servers to free up my laptop.

I'm running the Rscript through the ClineHelpR conda environment, and I'm getting an error that the "genind2bgc" command is not recognized: Error in genind2bgc(gen = hybrid_0.8gmiss_0.3imiss_minDP5_srich_sp_remove, : could not find function "genind2bgc"

It looks like the genind2bgc is not included in any of the dependencies that were required for the ClineHelpR conda environment. Is there an R package conda install of ClineHelpR that I can install within the ClineHelpR environment?

Thanks!

run bgc error in docker

Dear Martin:
I used the docker to run the bgc:
./bgc -a ~/test_file/YJS_newhybrids_chr10_filter_p0in.txt -b ~/test_file/YJS_newhybrids_chr10_filter_p1in.txt -h ~/test_file/YJS_newhybrids_chr10_filter_admixedin.txt -M ~/test_file/YJS_newhybrids_chr10_filter_map.txt -O 0 -x 50000 -n 25000 -p 1 -q 1 -N 1 -m 1 -D 0.5 -t 5 -E 0.0001 -d 1 -s 1 -I 0 -u 0.04
8fb7de8bec3ec5b1745de9ac6f0b8c0

but it have something wrong like:

Reading input files
Number of loci: 1090786
Number of admixed populations: 1
Number of individuals: 6
Using the linkage model for locus effects
Allowing for uncertainty in allele counts
Allocating memory
gsl: init_source.c:40: ERROR: failed to allocate space for block data
Default GSL error handler invoked.
Aborted (core dumped)

I don't know how to deal with it, any guidance would be fantastic. Thank you!

Error with parsing alleles depth

Hi! I am trying to run vcf2bgc and I found this error:
$ vcf2bgc.py -v chr22_ldna.recode.vcf -m population_map.txt --p1 P1 --p2 P2 --admixed ADMIXED --outprefix clines_chr22

P1 population has 6 individuals...

P2 population has 8 individuals...

Admixed populalation has 67 individuals...

Processing 1563 records in VCF file...

Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 240, in get_allele_depth
alleles = call.data[2].split(",")
AttributeError: 'int' object has no attribute 'split'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 422, in
main()
File "/home/user/app/src/scripts/vcf2bgc.py", line 172, in main
write_output(record, popsamples, ref, alt, locus, args.outprefix, admix, p1, p2)
File "/home/user/app/src/scripts/vcf2bgc.py", line 285, in write_output
admix_output = get_allele_depth(record, "Admixed", ref, alt, sampledict)
File "/home/user/app/src/scripts/vcf2bgc.py", line 254, in get_allele_depth
raise AttributeError("Error with parsing allele depths!")
AttributeError: Error with parsing allele depths!

My vcf was generated with GATK 4. Any idea on what is going on?
Thank you so much!
Best wishes

R function: combine_bgc_output

This function assumes that the lnl output suffix includes "LnL" but using the docker image the output in the suffix has "lnl" (lowercase).
Workaround: rename the output files changing the lnl to LnL

example:
bgc produces: test_stat_lnl_1
change it to: test_stat_LnL_1

pafscaff

Hello,

This R package helps me a lot!
I am trying to plot an Ideogram with the output of bgc. The species I am working on has a draft genome and I aligned this to the chromosome level before running bgc. Is it possible to plot the output with our fasta data which we mapped the reads or something, instead of pafscaff output file? Alternatively. perhaps I can create a dummy file of .scaffold.tdt in pafscaff. Can you give me details of the contents of the .scaffold.tdt?

Thank your very much in advance!

Assessing the convergence of runs

Hello,

First of all, thank you for giving us the opportunity to run BGC so easily.

I am having some trouble understanding my results. I attached the log likelihood and hybrid index graphs that I obtained from two runs (100000 burn-in / 200000 MCMC iterations). I did it only as a test and want to add more runs and probably also more MCMC iterations.

Could you help me understand why my two runs give quite different log likelihood and hybrid index. It looks to me like the model found convergence inside each run but is not coherent between runs.

All the best,

Loรฏc
ABH_hi_convergence.pdf
ABH_LnL_convergence.pdf

How to get the outliers position

Hi btmartin,
I have already run BGC and plot BGC results, see below. How can I get the outlier SNP position in each chromosome?
image

All the best,
Danqi Li

Error

I got it running with genind2bcg, but once I try to run:

bcg.genes I got an error although I have the files in the indicated directory.

Screen Shot 2022-05-02 at 3 48 33 PM

Any ideas on what is wrong?
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.