oscarbranson / latools Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 10.0 123.2 MB

Tools for the reproducible reduction of LA-ICPMS data.

Home Page: http://latools.readthedocs.io

License: MIT License

Python 5.24% Jupyter Notebook 94.66% HTML 0.10% Makefile 0.01%

latools's People

Contributors

Stargazers

Watchers

Forkers

jmunroe sunwillrise coridon dongso kulakovri douglascoenen nsekhon91 lewis-hou sunguochao bamboocoral

latools's Issues

Calculate detection limits

Somthing like this:

# get analytes that aren't internal standard
non_ref_analytes = [a for a in dat.analytes if a != dat.internal_standard]

# calculate the X/internal_standard ratio in the background data
bkgratio = dat.bkg.raw.divide(dat.bkg.raw.loc[:, dat.internal_standard], axis='rows').loc[:, non_ref_analytes]
# calculate the X/internal_standard standard deviation in the background data.
bkgratio_std = bkgratio.std()

# get the calibration parameters for all the analytes
calib_ms = la.nominal_values(dat.calib_params).mean(0)

# calculate the X/internal_standard calibration limits.
detlim = 3 * bkgratio_std * calib_ms  # in mol / mol  internal_standard (in this case, Si29)

Make minimal_export work before calibration

missing srmdat

Git install fails for some users

Cloning https://github.com/oscarbranson/latools.git (to v0.2.2-alpha) to /private/var/folders/f0/sk8ywbjs55l55w_whdnq4mqm0000gn/T/pip-xGeLhx-build

...[omitted]...

running install

running build

running build_py

copying latools/latools_graveyard.py -> build/lib/latools

copying latools/latools.cfg -> build/lib/latools

creating build/lib/latools/resources

copying latools/resources/repro_dataformat.dict -> build/lib/latools/resources

copying latools/resources/SRM_ratios_160713.csv -> build/lib/latools/resources

error: can't copy 'latools/resources/test_data': doesn't exist or not a regular file

Environment:

git version 1.9.3 (Apple Git-50)

Python 2.7

appnope==0.1.0
apptools==4.3.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
brewer2mpl==1.4.1
certifi==2016.2.28
-e git+https://github.com/enthought/chaco.git@f05f64eb07c2a10935178060086d37420650d873#egg=chaco
-e git+https://github.com/enthought/codetools.git@5b5ebd0d417270560ef892696b6aa23fb5296b7e#egg=codetools
configobj==5.0.6
configparser==3.5.0
decorator==4.0.7
-e git+https://github.com/enthought/encore.git@b008fd33e8197889702f965ec06c39fc44787e1b#egg=encore
entrypoints==0.2.2
-e git+https://github.com/enthought/envisage.git@3e39c6a116f4da44546dcb5d5b52c24507f4b47e#egg=envisage
ffnet==0.8.0
functools32==3.2.3.post2
geojson==1.3.2
gnureadline==6.3.3
-e git+https://github.com/enthought/graphcanvas.git@a1a107108c68cec5efacaaf7c6a7f8199e1abc8c#egg=graphcanvas
h5py==2.5.0
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.1.5
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
-e git+https://github.com/enthought/enable-mapping.git@aeae660d9935ba6d2ea3fd9974f8ccd223d4850a#egg=mapping
MarkupSafe==0.23
matplotlib==1.4.2
-e git+https://github.com/Homebrew/homebrew@6322c4787c96943867e0b810ec261c28e5a9c1d6#egg=mayavi
mistune==0.7.2
mlab==1.1.4
mock==1.0.1
mpld3==0.2
nbconvert==4.2.0
nbformat==4.0.1
networkx==1.11
neuronvisio==0.9.1
nose==1.3.4
notebook==4.2.1
numexpr==2.4.3
numpy==1.10.4
pandas==0.17.1
pathlib2==2.1.0
pexpect==4.1.0
pickleshare==0.7.2
plotly==1.8.0
ptyprocess==0.5.1
pyface==5.0.0
Pygments==2.1
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.7
pyzmq==15.2.0
-e git+https://github.com/enthought/qt_binder.git@60059bc5b3732b92df77fe10b3a9e9d259b30683#egg=qt_binder
qtconsole==4.2.1
requests==2.7.0
scikit-learn==0.17.1
-e git+https://github.com/enthought/scimath.git@e2fddcf5c13ffb1c2bd0db830f89bde9769ea747#egg=scimath
scipy==0.17.0
seawater==3.3.4
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
sklearn==0.0
spacepy==0.1.5
terminado==0.6
tornado==4.3
tqdm==4.11.2
traitlets==4.2.1
traits==4.5.0
traitsui==5.0.0
uncertainties==2.4.8.1
widgetsnbextension==1.2.3
xlrd==0.9.3

Identify SRMs using K-means

More elegant & reliable than current algorithm?

Rougly:

Use K-means to identify groups of SRM compositions, using len(srms_used) as number of groups.
Identify elements with largest change in measured SRMs.
Apply K-means to SRM table using only those elements to match up measured to reference table.

stats option in `plot.tplot` is broken

Steps to fix:

Find reason for failure (look at output format of self.stats)
Fix it.

internal_standard is plotted after calibration

It shouldn't be.

`helpers.unitpicker` runs forever when you put zeros

Hello.

Sometimes your data arrays only contains only zeros, for various reasons.
For example, in my case, there was one part looked like this;

'Te125': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0., nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan])

This kind of thing causes trouble for the latools.helpers.helpers.unitpicker() because in its code it says

a = abs(a)
n = 0
if a < llim:
    while a < llim:
        a *= 1000
        n += 1

If a is zero, this loops forever and it doesn't stop.
To avoid this, you can change this to

a = abs(a)
n = 0
if a == 0:
    raise ValueError("The value is zero.")
elif a < llim:
    while a < llim:
        a *= 1000
        n += 1

I hope it helps.

Accommodate long, continuous data collections with multiple samples.

Do people want this?

Splitting a long file using the signals of different element, instead of 'total_counts'

Hello. Thank you for implementing #16 !

I used the latools.preprocessing.split.long_file() to split my data.
In my case, using total_counts didn't work well, so changed to the code a bit to use different signals from different elements.
Originally, it is

# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['total_counts'], **autorange_args)

but I wanted to use the signals of Fe. So I did

# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['rawdata']['57Fe'], **autorange_args)

Hope this helps someone!

Clustering at subset-level

Clustering algos should be 'trained' on aggregated data from many samples, and then applied to the individual samples to identify regions. More robust than individual clustering, as groups will have similar compositions.

analyse.get_focus(subset) needs to accept subset arg.
analyse.train_cluster_filter(filter_name, method, **kwargs) function to work with aggregate data from get_focus. Save resulting filter to analyse level dict of filters (explicitly tied to subset?).
analyse.apply_cluster_filter(filter_name, subset) to apply trained classifier to all samples in individual samples. Warn if subset is not the same as the one used for training.
analyse.crossplot(..., cluster_filter=None) should accept cluster filter as arg, and plot the results of the classifier in the crossplot.

sklearn is deprecated

Change sklearn -> scikit-learn in requirements and setup.py.

autorange fails with spiky data

When the ablation profiles are not relatively uniform, autorange (and subsequently find_expcoef and despike) fails.

module 'latools' has no attribute 'preprocessing'

Hi.
If you use
import latools as la
at the beginning, you cannot use
la.preprocessing.split.long_file().

You have to do
from latools import preprocessing as pre
at the beginning and do
pre.split.long_file(), if you want to use the function.

Hope this helps.

Referencing to second internal standard breaks calibration.

Steps to reproduce

Complete analysis up to calibration.
Re-call ratio with a different reference analyte
re-call calibrate

Boom.

dataformat edge case broken in latest update

Integer line number entries in meta_regex section throw error because of new use of 'isdigit', which requires str input.

Separate function to read SRM table

Make one.

ppm conversion

Output ppm values would be nice.

Refactor

Re-think structure - put a lot of the work functions in a submodule.

"trace_plot()" broken (fixed)

Hello.
In my environment, trace_plot() didn't plot anything on my jupyter notebook even if I put %matplotlib inline.

I solved this by commenting out plt.close(f) in line 3365 of latools.py file.

Hope this helps.

Despiking currently alters raw data

minimal export data seems to be despiked - it shouldn't be.

Make sure data is copied before despiking.

figsize option in `plot.crossplot()` broken

Hello.
Even if you put the figsize in la.crossplot() like eg.crossplot(figsize = (24, 24)) , the figsize doesn't change.

This is because in
latools.helpers.plot.crossplot(),

the code says

    fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
                             figsize=(12, 12))

If you fix this as

    fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
                             figsize=figsize)

then the problem will be fixed.

Hope this helps.

Data import format no flexible enough.

Some data formats can't be passed directly to numpy.genfromtxt.

Needs a pre-processing step in data import.

Warning message when n_min removes a lot of data.

Need to warn user if n_min might be too high (i.e. a large amount of data is removed).

Isotope Enabled

Calibration currently does everything by element name - mass information is stripped out when matching analyte names to SRM rows. This makes it impossible to calibrate individual isotopes. It would be nice to be able to work with individual isotopes, if values are present in the SRM databse.

This should be relatively easy to implement. When matching analytes to SRM compositions, it should perform a 'first pass' match, where it attempts to match all analytes against row names in the SRM table, where isotopes are identified as [mass][name] (e.g. 23Na).

To Do

Update srm_load_database to look for [Mass][Analyte] name matches and return srmdat and dict of analyte-item links.
Update srm_id_auto to work with new srmdat format.
Update calibrate to use analyte names instead of element names.

Excluding dodgy ablations during file splitting

Need a way to exclude dodgy ablations during file splitting (i.e. the targeted region exploded/broke off shortly after ablation started).
Could perhaps be done by excluding 'fragments'/ablations smaller than N data points.

Inter-version stability tests

Need tests to ensure reproducibility of analyses between versions.

The code of latools.helpers.helpers.get_date() wrong

Hello.

I think I found a mistake in latools.helpers.helpers.get_date().

In the source,
t = dt.datetime.strftime(datetime, time_format)
should be
t = dt.datetime.strptime(datetime, time_format).

Fixing this worked in my environment.
Hope this helps.

LA-ICP-MS Imaging

Add support for line-scan, raster and spot imaging.

Investigate file formats
How to handle spot overlaps? Deconvolution methods?

Functions for:

Gridding multiple spots post-reduction
Combining line scans onto a grid ('laser on' and 'laser off' time points?)
Combining complex raster data onto a grid.

Background Calculation on a Long Data File

Dear LaTools Coders,

Thank you for making a useful package that helps reduce LA-ICP-MS data.

I am currently trying to reduce a relative large run (just over an hour with standards and unknowns). I have been able to use the long_data file example to split the data and read it in with the REPRODUCE configurations. See first image.

I am now trying to calculate the background and working between using the interpolation and the Gaussian moving average methods. See second image for the interpolation based background calculation. The image looks strange and I am not sure what is happening. I am able to subtract the background and get ratios and calibration plots with NIST612 as the standard.

However, I am not sure if the background function is working correctly for such a long dataset. I was wondering if you would have any insight or thoughts on to proceed?

Best,
Natasha

'Time' data filter

e.g. remove data before x or after y time.

Check filter_gradient_threshold_percentile()

Currently in the GUI, creating the filter_gradient_threshold_percentile filter doesn't cause a new filter name (eg '0_Mg24_thresh_below') to be added to the sample's filter list.

If LAtools is working fine for this filter then it may be an issue on the GUI side.

Allow import of arbitrary data and application of filtering.

Must still work with minimal export.

'stats' option in 'analyse.trace_plots' is broken.

Unsure of cause. Need to look into it.

Improve handling of long files.

Shouldn't need to apply pre-processing to long files. Should be able to import them directly along with sample list.

Framework should already be in place with 'split long file'. Could pass output directly to D_obj, instead of saving and re-importing them?

Ability to exclude SRMs from calibration

Outlier filter
Manual Exclude

Config & dataformat cleanup

Need functions to add & edit configs. Ideally:

latools.add_config(config_name, dataformat, SRMfile, save), takes one/both of: dataformat (option to copy & save file internally), SRMtable. Defaults used if not given?
latools.edit_config(): modify particular config. alias of add_config? Could work in same way.
latools.get config_files(): print locations of dataformat & SRMtable files

Tools for creating dataformat file:

latools.make_dataformat() function to walk user through several prompts & creates dataformat dict for export?

Make minimal export and reproduce work with .zip files

error with calibrate

error when calling eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
Works for 2 of the runs, but not other 2. Files have been split by long data script and then read in with REPRODUCE config.

ValueError Traceback (most recent call last)
in
21 eg.ratio()
22 eg.trace_plots()
---> 23 eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
24 eg.calibration_plot()
25 eg.trace_plots()

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/helpers/logging.py in wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1661
1662 if not hasattr(self, 'srmtabs'):
-> 1663 self.srm_id_auto(srms_used=srms_used, n_min=n_min, reload_srm_database=reload_srm_database)
1664
1665 # make container for calibration params

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in srm_id_auto(self, srms_used, analytes, n_min, reload_srm_database)
1579 classifier = KMeans(len(srms_used)).fit(_srmid)
1580 # apply classifier to measured data
-> 1581 std_classes = classifier.predict(_stdid)
1582
1583 # get srm names from classes

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in predict(self, X, sample_weight)
1154 check_is_fitted(self)
1155
-> 1156 X = self._check_test_data(X)
1157 x_squared_norms = row_norms(X, squared=True)
1158 sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_test_data(self, X)
856
857 def _check_test_data(self, X):
--> 858 X = self._validate_data(X, accept_sparse='csr', reset=False,
859 dtype=[np.float64, np.float32],
860 order='C', accept_large_sparse=False)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
419 out = X
420 elif isinstance(y, str) and y == 'no_validation':
--> 421 X = check_array(X, **check_params)
422 out = X
423 else:

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
718
719 if force_all_finite:
--> 720 _assert_all_finite(array,
721 allow_nan=force_all_finite == 'allow-nan')
722

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Exponential Decay Despiker is broken

Needs fixing. Not urgent.

In practice doesn't make any difference, as any spikes that would have been removed by the expdecay_despiker are caught by noise_despiker.

error with calibrate

I encountered an error when calling eg.calibrate(drift_correct=False,
srms_used=['NIST610', 'NIST612','JCp-1','JCt-1']).

The elemental compositions for JCp-1 and JCt-1 were added to my SRM.csv. Below is the warning.
I also tried only using 'NIST610', 'NIST612', but it did not work. I wonder what the "KeyError: 'I'" means.

KeyError Traceback (most recent call last)
Input In [25], in
1 # calibration
----> 2 eg.calibrate(drift_correct=False,
3 srms_used=['NIST610', 'NIST612','JCp-1','JCt-1'])

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\helpers\logging.py:18, in _log..wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1673, in analyse.calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1648 """
1649 Calibrates the data to measured SRM values.
1650
(...)
1670 None
1671 """
1672 # load SRM database
-> 1673 self.srm_load_database(srms_used, reload_srm_database)
1675 # compile measured SRM data
1676 self.srm_compile_measured(n_min)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1441, in analyse.srm_load_database(self, srms_used, reload)
1438 # calculate SRM polyatom multiplier (multiplier to account for stoichiometry,
1439 # e.g. if internal standard is Na, N will be 2 if measured in SRM as Na2O)
1440 N_denom = float(decompose_molecule(ad[a_denom])[get_analyte_name(a_denom)])
-> 1441 N_num = float(decompose_molecule(ad[a_num])[get_analyte_name(a_num)])
1443 # calculate molar ratio
1444 srmtab.loc[srm, (a, 'mean')] = ((srmdat.loc[(srm, ad[a_num]), 'mol/g'] * N_num) /
1445 (srmdat.loc[(srm, ad[a_denom]), 'mol/g'] * N_denom))

KeyError: 'I'

bkg_subtract fails with negative background stderr

uncertainties on backgrounds should never be negative...

Add reload_srm_table function

To allow for on-the-fly updates to SRM table.

ValueError not raised when initializing analyse object

Hello.

When initializing analyse object using latools.analyse(), ValueError is not raised even when the internal_standard is not in self.analytes.

This is because the code looks like

 if internal_standard in self.analytes:
            self.internal_standard = internal_standard
        else:
            ValueError('The internal standard ({}) is not amongst the'.format(internal_standard) +
                       'analytes in\nyour data files. Please make sure it is specified correctly.')

This should look like

 if internal_standard in self.analytes:
            self.internal_standard = internal_standard
        else:
            raise ValueError('The internal standard ({}) is not amongst the '.format(internal_standard) +
                       'analytes in\nyour data files. Please make sure it is specified correctly.')

**I think I'll soon make a pull request!

At end of analysis, re-doing from bkg-correction causes crossplot to crash kernel.

Uncertain cause...

samples option in trace_plots() is not working

Hello.

Since the default of subset option in la.trace_plots(), is All_Analyses,
eg.trace_plots(samples = ['sample1', 'sample2']) does not work.
This plots all samples, instead of just sample 1 and 2.

In the source, there is

if subset is not None:
    samples = self._get_samples(subset)

If you change this to

if subset is not 'All_Analyses':
    samples = self._get_samples(subset)

this code will work.

Hope this helps.

Add Spectral Correction function

Add `focus_stage` variable to early functions.

Several early-analysis-stage functions should accept a focus_stage argument, which determines which data it should be applied to.

Default selection should be normal stage of analysis that the function relies on.

Need to check/implement in:

`reproduce` can't handle `sample_stats` calls when `stat_fns` are stored as strings in log.

Need better way of referencing stat functions, especially if they're custom functions...

Pull out logfile parser into separate function.

Should return a dict of all elements.

oscarbranson / latools Goto Github PK

latools's People

Contributors

Stargazers

Watchers

Forkers

latools's Issues

Steps to reproduce

To Do

Recommend Projects

Recommend Topics

Recommend Org