Giter VIP home page Giter VIP logo

latools's People

Contributors

oscarbranson avatar rorytrent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

latools's Issues

Calculate detection limits

Somthing like this:

# get analytes that aren't internal standard
non_ref_analytes = [a for a in dat.analytes if a != dat.internal_standard]

# calculate the X/internal_standard ratio in the background data
bkgratio = dat.bkg.raw.divide(dat.bkg.raw.loc[:, dat.internal_standard], axis='rows').loc[:, non_ref_analytes]
# calculate the X/internal_standard standard deviation in the background data.
bkgratio_std = bkgratio.std()

# get the calibration parameters for all the analytes
calib_ms = la.nominal_values(dat.calib_params).mean(0)

# calculate the X/internal_standard calibration limits.
detlim = 3 * bkgratio_std * calib_ms  # in mol / mol  internal_standard (in this case, Si29)

Git install fails for some users

Cloning https://github.com/oscarbranson/latools.git (to v0.2.2-alpha) to /private/var/folders/f0/sk8ywbjs55l55w_whdnq4mqm0000gn/T/pip-xGeLhx-build

...[omitted]...

running install

running build

running build_py

copying latools/latools_graveyard.py -> build/lib/latools

copying latools/latools.cfg -> build/lib/latools

creating build/lib/latools/resources

copying latools/resources/repro_dataformat.dict -> build/lib/latools/resources

copying latools/resources/SRM_ratios_160713.csv -> build/lib/latools/resources

error: can't copy 'latools/resources/test_data': doesn't exist or not a regular file

Environment:

git version 1.9.3 (Apple Git-50)

Python 2.7

appnope==0.1.0
apptools==4.3.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
brewer2mpl==1.4.1
certifi==2016.2.28
-e git+https://github.com/enthought/chaco.git@f05f64eb07c2a10935178060086d37420650d873#egg=chaco
-e git+https://github.com/enthought/codetools.git@5b5ebd0d417270560ef892696b6aa23fb5296b7e#egg=codetools
configobj==5.0.6
configparser==3.5.0
decorator==4.0.7
-e git+https://github.com/enthought/encore.git@b008fd33e8197889702f965ec06c39fc44787e1b#egg=encore
entrypoints==0.2.2
-e git+https://github.com/enthought/envisage.git@3e39c6a116f4da44546dcb5d5b52c24507f4b47e#egg=envisage
ffnet==0.8.0
functools32==3.2.3.post2
geojson==1.3.2
gnureadline==6.3.3
-e git+https://github.com/enthought/graphcanvas.git@a1a107108c68cec5efacaaf7c6a7f8199e1abc8c#egg=graphcanvas
h5py==2.5.0
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.1.5
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
-e git+https://github.com/enthought/enable-mapping.git@aeae660d9935ba6d2ea3fd9974f8ccd223d4850a#egg=mapping
MarkupSafe==0.23
matplotlib==1.4.2
-e git+https://github.com/Homebrew/homebrew@6322c4787c96943867e0b810ec261c28e5a9c1d6#egg=mayavi
mistune==0.7.2
mlab==1.1.4
mock==1.0.1
mpld3==0.2
nbconvert==4.2.0
nbformat==4.0.1
networkx==1.11
neuronvisio==0.9.1
nose==1.3.4
notebook==4.2.1
numexpr==2.4.3
numpy==1.10.4
pandas==0.17.1
pathlib2==2.1.0
pexpect==4.1.0
pickleshare==0.7.2
plotly==1.8.0
ptyprocess==0.5.1
pyface==5.0.0
Pygments==2.1
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.7
pyzmq==15.2.0
-e git+https://github.com/enthought/qt_binder.git@60059bc5b3732b92df77fe10b3a9e9d259b30683#egg=qt_binder
qtconsole==4.2.1
requests==2.7.0
scikit-learn==0.17.1
-e git+https://github.com/enthought/scimath.git@e2fddcf5c13ffb1c2bd0db830f89bde9769ea747#egg=scimath
scipy==0.17.0
seawater==3.3.4
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
sklearn==0.0
spacepy==0.1.5
terminado==0.6
tornado==4.3
tqdm==4.11.2
traitlets==4.2.1
traits==4.5.0
traitsui==5.0.0
uncertainties==2.4.8.1
widgetsnbextension==1.2.3
xlrd==0.9.3

Identify SRMs using K-means

More elegant & reliable than current algorithm?

Rougly:

  1. Use K-means to identify groups of SRM compositions, using len(srms_used) as number of groups.
  2. Identify elements with largest change in measured SRMs.
  3. Apply K-means to SRM table using only those elements to match up measured to reference table.

`helpers.unitpicker` runs forever when you put zeros

Hello.

Sometimes your data arrays only contains only zeros, for various reasons.
For example, in my case, there was one part looked like this;

'Te125': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0., nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan])

This kind of thing causes trouble for the latools.helpers.helpers.unitpicker() because in its code it says

a = abs(a)
n = 0
if a < llim:
    while a < llim:
        a *= 1000
        n += 1

If a is zero, this loops forever and it doesn't stop.
To avoid this, you can change this to

a = abs(a)
n = 0
if a == 0:
    raise ValueError("The value is zero.")
elif a < llim:
    while a < llim:
        a *= 1000
        n += 1

I hope it helps.

Splitting a long file using the signals of different element, instead of 'total_counts'

Hello. Thank you for implementing #16 !

I used the latools.preprocessing.split.long_file() to split my data.
In my case, using total_counts didn't work well, so changed to the code a bit to use different signals from different elements.
Originally, it is

# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['total_counts'], **autorange_args)

but I wanted to use the signals of Fe. So I did

# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['rawdata']['57Fe'], **autorange_args)

Hope this helps someone!

Clustering at subset-level

Clustering algos should be 'trained' on aggregated data from many samples, and then applied to the individual samples to identify regions. More robust than individual clustering, as groups will have similar compositions.

  • analyse.get_focus(subset) needs to accept subset arg.
  • analyse.train_cluster_filter(filter_name, method, **kwargs) function to work with aggregate data from get_focus. Save resulting filter to analyse level dict of filters (explicitly tied to subset?).
  • analyse.apply_cluster_filter(filter_name, subset) to apply trained classifier to all samples in individual samples. Warn if subset is not the same as the one used for training.
  • analyse.crossplot(..., cluster_filter=None) should accept cluster filter as arg, and plot the results of the classifier in the crossplot.

module 'latools' has no attribute 'preprocessing'

Hi.
If you use
import latools as la
at the beginning, you cannot use
la.preprocessing.split.long_file().

You have to do
from latools import preprocessing as pre
at the beginning and do
pre.split.long_file(), if you want to use the function.

Hope this helps.

Refactor

Re-think structure - put a lot of the work functions in a submodule.

"trace_plot()" broken (fixed)

Hello.
In my environment, trace_plot() didn't plot anything on my jupyter notebook even if I put %matplotlib inline.

I solved this by commenting out plt.close(f) in line 3365 of latools.py file.

Hope this helps.

figsize option in `plot.crossplot()` broken

Hello.
Even if you put the figsize in la.crossplot() like eg.crossplot(figsize = (24, 24)) , the figsize doesn't change.

This is because in
latools.helpers.plot.crossplot(),

the code says

    fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
                             figsize=(12, 12))

If you fix this as

    fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
                             figsize=figsize)

then the problem will be fixed.

Hope this helps.

Isotope Enabled

Calibration currently does everything by element name - mass information is stripped out when matching analyte names to SRM rows. This makes it impossible to calibrate individual isotopes. It would be nice to be able to work with individual isotopes, if values are present in the SRM databse.

This should be relatively easy to implement. When matching analytes to SRM compositions, it should perform a 'first pass' match, where it attempts to match all analytes against row names in the SRM table, where isotopes are identified as [mass][name] (e.g. 23Na).

To Do

  • Update srm_load_database to look for [Mass][Analyte] name matches and return srmdat and dict of analyte-item links.
  • Update srm_id_auto to work with new srmdat format.
  • Update calibrate to use analyte names instead of element names.

Excluding dodgy ablations during file splitting

Need a way to exclude dodgy ablations during file splitting (i.e. the targeted region exploded/broke off shortly after ablation started).
Could perhaps be done by excluding 'fragments'/ablations smaller than N data points.

The code of latools.helpers.helpers.get_date() wrong

Hello.

I think I found a mistake in latools.helpers.helpers.get_date().

In the source,
t = dt.datetime.strftime(datetime, time_format)
should be
t = dt.datetime.strptime(datetime, time_format).

Fixing this worked in my environment.
Hope this helps.

LA-ICP-MS Imaging

Add support for line-scan, raster and spot imaging.

  • Investigate file formats
  • How to handle spot overlaps? Deconvolution methods?

Functions for:

  • Gridding multiple spots post-reduction
  • Combining line scans onto a grid ('laser on' and 'laser off' time points?)
  • Combining complex raster data onto a grid.

Background Calculation on a Long Data File

Dear LaTools Coders,

Thank you for making a useful package that helps reduce LA-ICP-MS data.

I am currently trying to reduce a relative large run (just over an hour with standards and unknowns). I have been able to use the long_data file example to split the data and read it in with the REPRODUCE configurations. See first image.

I am now trying to calculate the background and working between using the interpolation and the Gaussian moving average methods. See second image for the interpolation based background calculation. The image looks strange and I am not sure what is happening. I am able to subtract the background and get ratios and calibration plots with NIST612 as the standard.

However, I am not sure if the background function is working correctly for such a long dataset. I was wondering if you would have any insight or thoughts on to proceed?

Best,
Natasha

Screen Shot 2021-11-12 at 1 30 30 PM

background(1)

Check filter_gradient_threshold_percentile()

Currently in the GUI, creating the filter_gradient_threshold_percentile filter doesn't cause a new filter name (eg '0_Mg24_thresh_below') to be added to the sample's filter list.

If LAtools is working fine for this filter then it may be an issue on the GUI side.

Improve handling of long files.

Shouldn't need to apply pre-processing to long files. Should be able to import them directly along with sample list.

Framework should already be in place with 'split long file'. Could pass output directly to D_obj, instead of saving and re-importing them?

Config & dataformat cleanup

Need functions to add & edit configs. Ideally:

  • latools.add_config(config_name, dataformat, SRMfile, save), takes one/both of: dataformat (option to copy & save file internally), SRMtable. Defaults used if not given?
  • latools.edit_config(): modify particular config. alias of add_config? Could work in same way.
  • latools.get config_files(): print locations of dataformat & SRMtable files

Tools for creating dataformat file:

  • latools.make_dataformat() function to walk user through several prompts & creates dataformat dict for export?

error with calibrate

error when calling eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
Works for 2 of the runs, but not other 2. Files have been split by long data script and then read in with REPRODUCE config.

ValueError Traceback (most recent call last)
in
21 eg.ratio()
22 eg.trace_plots()
---> 23 eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
24 eg.calibration_plot()
25 eg.trace_plots()

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/helpers/logging.py in wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1661
1662 if not hasattr(self, 'srmtabs'):
-> 1663 self.srm_id_auto(srms_used=srms_used, n_min=n_min, reload_srm_database=reload_srm_database)
1664
1665 # make container for calibration params

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in srm_id_auto(self, srms_used, analytes, n_min, reload_srm_database)
1579 classifier = KMeans(len(srms_used)).fit(_srmid)
1580 # apply classifier to measured data
-> 1581 std_classes = classifier.predict(_stdid)
1582
1583 # get srm names from classes

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in predict(self, X, sample_weight)
1154 check_is_fitted(self)
1155
-> 1156 X = self._check_test_data(X)
1157 x_squared_norms = row_norms(X, squared=True)
1158 sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_test_data(self, X)
856
857 def _check_test_data(self, X):
--> 858 X = self._validate_data(X, accept_sparse='csr', reset=False,
859 dtype=[np.float64, np.float32],
860 order='C', accept_large_sparse=False)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
419 out = X
420 elif isinstance(y, str) and y == 'no_validation':
--> 421 X = check_array(X, **check_params)
422 out = X
423 else:

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
718
719 if force_all_finite:
--> 720 _assert_all_finite(array,
721 allow_nan=force_all_finite == 'allow-nan')
722

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Exponential Decay Despiker is broken

Needs fixing. Not urgent.

In practice doesn't make any difference, as any spikes that would have been removed by the expdecay_despiker are caught by noise_despiker.

error with calibrate

I encountered an error when calling eg.calibrate(drift_correct=False,
srms_used=['NIST610', 'NIST612','JCp-1','JCt-1']).

The elemental compositions for JCp-1 and JCt-1 were added to my SRM.csv. Below is the warning.
I also tried only using 'NIST610', 'NIST612', but it did not work. I wonder what the "KeyError: 'I'" means.


KeyError Traceback (most recent call last)
Input In [25], in
1 # calibration
----> 2 eg.calibrate(drift_correct=False,
3 srms_used=['NIST610', 'NIST612','JCp-1','JCt-1'])

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\helpers\logging.py:18, in _log..wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1673, in analyse.calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1648 """
1649 Calibrates the data to measured SRM values.
1650
(...)
1670 None
1671 """
1672 # load SRM database
-> 1673 self.srm_load_database(srms_used, reload_srm_database)
1675 # compile measured SRM data
1676 self.srm_compile_measured(n_min)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1441, in analyse.srm_load_database(self, srms_used, reload)
1438 # calculate SRM polyatom multiplier (multiplier to account for stoichiometry,
1439 # e.g. if internal standard is Na, N will be 2 if measured in SRM as Na2O)
1440 N_denom = float(decompose_molecule(ad[a_denom])[get_analyte_name(a_denom)])
-> 1441 N_num = float(decompose_molecule(ad[a_num])[get_analyte_name(a_num)])
1443 # calculate molar ratio
1444 srmtab.loc[srm, (a, 'mean')] = ((srmdat.loc[(srm, ad[a_num]), 'mol/g'] * N_num) /
1445 (srmdat.loc[(srm, ad[a_denom]), 'mol/g'] * N_denom))

KeyError: 'I'

ValueError not raised when initializing analyse object

Hello.

When initializing analyse object using latools.analyse(), ValueError is not raised even when the internal_standard is not in self.analytes.

This is because the code looks like

 if internal_standard in self.analytes:
            self.internal_standard = internal_standard
        else:
            ValueError('The internal standard ({}) is not amongst the'.format(internal_standard) +
                       'analytes in\nyour data files. Please make sure it is specified correctly.')

This should look like

 if internal_standard in self.analytes:
            self.internal_standard = internal_standard
        else:
            raise ValueError('The internal standard ({}) is not amongst the '.format(internal_standard) +
                       'analytes in\nyour data files. Please make sure it is specified correctly.')

**I think I'll soon make a pull request!

samples option in trace_plots() is not working

Hello.

Since the default of subset option in la.trace_plots(), is All_Analyses,
eg.trace_plots(samples = ['sample1', 'sample2']) does not work.
This plots all samples, instead of just sample 1 and 2.

In the source, there is

if subset is not None:
    samples = self._get_samples(subset)

If you change this to

if subset is not 'All_Analyses':
    samples = self._get_samples(subset)

this code will work.

Hope this helps.

Add `focus_stage` variable to early functions.

Several early-analysis-stage functions should accept a focus_stage argument, which determines which data it should be applied to.

Default selection should be normal stage of analysis that the function relies on.

Need to check/implement in:

  • despike
  • autorange
  • bkg_calc_interp1d
  • bkg_calc_weightedmean
  • bkg_subtract

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.