Giter VIP home page Giter VIP logo

frisk's People

Contributors

adamtaranto avatar gitter-badger avatar kdm9 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

frisk's Issues

IVOM/KLI debugging

I'll attach an example fasta file here. Perhaps do 1<= k <= 4 as that will fit nicely on a screen.

K

Licensing: GPLv3?

We currently use GPLv2, which is an old version of the standard strong copyleft license GPL. We should probably use GPLv3+. I can update it if you want.

pip installed frisk fails to run on mac

Tested pip install on Eli's mac. Not sure if this issue stems from the blank frisk version.

>>> pip install cython numpy scipy
>>> pip install frisk
>>> frisk -h
Traceback (most recent call last):
  File "/usr/local/bin/frisk", line 9, in <module>
    load_entry_point('frisk==0-unknown', 'console_scripts', 'frisk')()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources.py", line 339, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources.py", line 2470, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources.py", line 2184, in load
    ['__name__'])
  File "/usr/local/lib/python2.7/site-packages/frisk/__init__.py", line 19, in <module>
    from hmmlearn import hmm
  File "/usr/local/lib/python2.7/site-packages/hmmlearn/hmm.py", line 15, in <module>
    from sklearn.utils import check_random_state
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/__init__.py", line 16, in <module>
    from .class_weight import compute_class_weight, compute_sample_weight
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/class_weight.py", line 7, in <module>
    from ..utils.fixes import in1d
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/fixes.py", line 316, in <module>
    from ._scipy_sparse_lsqr_backport import lsqr as sparse_lsqr
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/_scipy_sparse_lsqr_backport.py", line 58, in <module>
    from scipy.sparse.linalg.interface import aslinearoperator
  File "/Library/Python/2.7/site-packages/scipy/sparse/linalg/__init__.py", line 109, in <module>
    from .isolve import *
  File "/Library/Python/2.7/site-packages/scipy/sparse/linalg/isolve/__init__.py", line 6, in <module>
    from .iterative import *
  File "/Library/Python/2.7/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 7, in <module>
    from . import _iterative
ImportError: dlopen(/Library/Python/2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so, 2): Library not loaded: /usr/local/lib/gcc/x86_64-apple-darwin12.5.0/4.9.1/libgfortran.3.dylib
  Referenced from: /Library/Python/2.7/site-packages/scipy/sparse/linalg/isolve/_iterative.so
  Reason: image not found

Version tag unknown

Frisk version is not rendering correctly.

Example:
pip install -upgrade frisk
frisk --version
"frisk --0+unknown"

Same if running local version with python -m frisk

Versioneer fail

Have deleted files relating to Versioneer and commented out relevent lines in frisk.py
Need to set up properly.

Finding optimal K

Adapt class for finding f(K) stat from datasciencelab.

Need to enable input of actual Projection coordinates (X), which are currently randomly generated.

Also, support for projection co-ords with up to three dimensions. Currently 2.

Documentation

Needs documentation. Should include:
What is Frisk?
How to cite
How it works?
Installing dependencies
Examples / Use cases
Interpreting output

PEP440 pypi error

python setup.py sdist upload -r pypi
Upload failed (400): Invalid version, cannot use PEP 440 local versions on PyPI.
error: Upload failed (400): Invalid version, cannot use PEP 440 local versions on PyPI.

Add Self-genome Masking

Given repeat-masked genome, learn k-mer abundance only from unmasked regions.

For use training 'self' for non-self-rich genomes i.e. High RIP/TE abundance fungal genomes.

Driving Kmer reports

Which kmers drive divergence from reference population in windows/features with extreme KLI scores.

Helper script: Given sequence (string or fasta) and Self-kmer pickle file, report observed kmers ranked by KLI. If given a multi-fasta of anom sequences, return mean and var of KLI for each observed kmer.

Debian package

I can try and package frisk for debian (which will eventually end up in Ubuntu) once it is stable. This is to remind me to do so.

Xantho test result

frisk.pdf

 $ time python tst.py Xanthomonas_oryzae_pv_oryzae_pxo99a.GCA_000019585.2.29.dna.genome.fa
calculated genome IVOM
done 100 windows
done 200 windows
done 300 windows
done 400 windows
done 500 windows
done 600 windows
done 700 windows
done 800 windows
done 900 windows
done 1000 windows
done 1100 windows
done 1200 windows
done 1300 windows
done 1400 windows
done 1500 windows
done 1600 windows
done 1700 windows
done 1800 windows
done 1900 windows
done 2000 windows

%TIME: [usr:1021.76 sys:21.43 301% wall:5:46.51 rss:313340]

Pybedtools update fucks float to BED conversion.

def thresholdList(intervalList, threshold, args, threshCol=3, merge=True):
    if args.findSelf:
        tItems = [t for t in intervalList if np.log10(t[3]) <= threshold]
    else:
        tItems = [t for t in intervalList if np.log10(t[3]) >= threshold]
    sItems = sorted(tItems, key=itemgetter(0, 1, 2))
    anomaliesBED  = pybedtools.BedTool(sItems) ##Fails here! cannot deal with float (KLI) at index 3.
    if merge:
        anomalies = anomaliesBED.merge(d=args.mergeDist, c='4,4,4', o='max,min,mean')
    else:
        anomalies = anomaliesBED
    return anomalies

Possible solution: Change lines 1158 and 1161 (window data), to store windowKLI, PI, SI, and CRI and strings. Have thresholdList() convert back to float for thresholding window records.

Although, this will probably still leave line 549 pretty fucked when pybedtools trys to do merge stats on strings:

anomalies = anomaliesBED.merge(d=args.mergeDist, c='4,4,4', o='max,min,mean')

Have raised Pybedtools issue, with any luck they will just fix it: daler/pybedtools#150

Versioneer?

Are we going to bother w/ versioneer? and we need to add a setup.py so it is easily pip install-able. And tag releases in git (git tag -s 0.1.0 probably now, will do a 0.2.0 once all is fixed. See semver.org).

Remove redundant columns from k-mer sym-prop matrix

Currently, anomalous features have their k-mer counts made symmetrical (i.e. GA = TC ) and proportional within orders of k (i.e. A=25/100, T=25/100, G=25/100, C=25/100). This means 50% of columns are redundant data, providing no novel signal.

Need to find a way of scrubbing out redundant keys from symmetrical k-mer dictionaries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.