oscarbranson / latools Goto Github PK
View Code? Open in Web Editor NEWTools for the reproducible reduction of LA-ICPMS data.
Home Page: http://latools.readthedocs.io
License: MIT License
Tools for the reproducible reduction of LA-ICPMS data.
Home Page: http://latools.readthedocs.io
License: MIT License
Somthing like this:
# get analytes that aren't internal standard
non_ref_analytes = [a for a in dat.analytes if a != dat.internal_standard]
# calculate the X/internal_standard ratio in the background data
bkgratio = dat.bkg.raw.divide(dat.bkg.raw.loc[:, dat.internal_standard], axis='rows').loc[:, non_ref_analytes]
# calculate the X/internal_standard standard deviation in the background data.
bkgratio_std = bkgratio.std()
# get the calibration parameters for all the analytes
calib_ms = la.nominal_values(dat.calib_params).mean(0)
# calculate the X/internal_standard calibration limits.
detlim = 3 * bkgratio_std * calib_ms # in mol / mol internal_standard (in this case, Si29)
missing srmdat
Cloning https://github.com/oscarbranson/latools.git (to v0.2.2-alpha) to /private/var/folders/f0/sk8ywbjs55l55w_whdnq4mqm0000gn/T/pip-xGeLhx-build
...[omitted]...
running install
running build
running build_py
copying latools/latools_graveyard.py -> build/lib/latools
copying latools/latools.cfg -> build/lib/latools
creating build/lib/latools/resources
copying latools/resources/repro_dataformat.dict -> build/lib/latools/resources
copying latools/resources/SRM_ratios_160713.csv -> build/lib/latools/resources
error: can't copy 'latools/resources/test_data': doesn't exist or not a regular file
Environment:
git version 1.9.3 (Apple Git-50)
Python 2.7
appnope==0.1.0
apptools==4.3.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
brewer2mpl==1.4.1
certifi==2016.2.28
-e git+https://github.com/enthought/chaco.git@f05f64eb07c2a10935178060086d37420650d873#egg=chaco
-e git+https://github.com/enthought/codetools.git@5b5ebd0d417270560ef892696b6aa23fb5296b7e#egg=codetools
configobj==5.0.6
configparser==3.5.0
decorator==4.0.7
-e git+https://github.com/enthought/encore.git@b008fd33e8197889702f965ec06c39fc44787e1b#egg=encore
entrypoints==0.2.2
-e git+https://github.com/enthought/envisage.git@3e39c6a116f4da44546dcb5d5b52c24507f4b47e#egg=envisage
ffnet==0.8.0
functools32==3.2.3.post2
geojson==1.3.2
gnureadline==6.3.3
-e git+https://github.com/enthought/graphcanvas.git@a1a107108c68cec5efacaaf7c6a7f8199e1abc8c#egg=graphcanvas
h5py==2.5.0
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.1.5
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
-e git+https://github.com/enthought/enable-mapping.git@aeae660d9935ba6d2ea3fd9974f8ccd223d4850a#egg=mapping
MarkupSafe==0.23
matplotlib==1.4.2
-e git+https://github.com/Homebrew/homebrew@6322c4787c96943867e0b810ec261c28e5a9c1d6#egg=mayavi
mistune==0.7.2
mlab==1.1.4
mock==1.0.1
mpld3==0.2
nbconvert==4.2.0
nbformat==4.0.1
networkx==1.11
neuronvisio==0.9.1
nose==1.3.4
notebook==4.2.1
numexpr==2.4.3
numpy==1.10.4
pandas==0.17.1
pathlib2==2.1.0
pexpect==4.1.0
pickleshare==0.7.2
plotly==1.8.0
ptyprocess==0.5.1
pyface==5.0.0
Pygments==2.1
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.7
pyzmq==15.2.0
-e git+https://github.com/enthought/qt_binder.git@60059bc5b3732b92df77fe10b3a9e9d259b30683#egg=qt_binder
qtconsole==4.2.1
requests==2.7.0
scikit-learn==0.17.1
-e git+https://github.com/enthought/scimath.git@e2fddcf5c13ffb1c2bd0db830f89bde9769ea747#egg=scimath
scipy==0.17.0
seawater==3.3.4
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
sklearn==0.0
spacepy==0.1.5
terminado==0.6
tornado==4.3
tqdm==4.11.2
traitlets==4.2.1
traits==4.5.0
traitsui==5.0.0
uncertainties==2.4.8.1
widgetsnbextension==1.2.3
xlrd==0.9.3
More elegant & reliable than current algorithm?
Rougly:
Steps to fix:
Needed for gui interaction.
It shouldn't be.
Hello.
Sometimes your data arrays only contains only zeros, for various reasons.
For example, in my case, there was one part looked like this;
'Te125': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan])
This kind of thing causes trouble for the latools.helpers.helpers.unitpicker()
because in its code it says
a = abs(a)
n = 0
if a < llim:
while a < llim:
a *= 1000
n += 1
If a
is zero, this loops forever and it doesn't stop.
To avoid this, you can change this to
a = abs(a)
n = 0
if a == 0:
raise ValueError("The value is zero.")
elif a < llim:
while a < llim:
a *= 1000
n += 1
I hope it helps.
Do people want this?
Hello. Thank you for implementing #16 !
I used the latools.preprocessing.split.long_file()
to split my data.
In my case, using total_counts
didn't work well, so changed to the code a bit to use different signals from different elements.
Originally, it is
# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['total_counts'], **autorange_args)
but I wanted to use the signals of Fe. So I did
# autorange
bkg, sig, trn, _ = autorange(dat['Time'], dat['rawdata']['57Fe'], **autorange_args)
Hope this helps someone!
Clustering algos should be 'trained' on aggregated data from many samples, and then applied to the individual samples to identify regions. More robust than individual clustering, as groups will have similar compositions.
analyse.get_focus(subset)
needs to accept subset arg.analyse.train_cluster_filter(filter_name, method, **kwargs)
function to work with aggregate data from get_focus
. Save resulting filter to analyse level dict of filters (explicitly tied to subset?).analyse.apply_cluster_filter(filter_name, subset)
to apply trained classifier to all samples in individual samples. Warn if subset
is not the same as the one used for training.analyse.crossplot(..., cluster_filter=None)
should accept cluster filter as arg, and plot the results of the classifier in the crossplot.Change sklearn -> scikit-learn in requirements and setup.py.
When the ablation profiles are not relatively uniform, autorange
(and subsequently find_expcoef
and despike
) fails.
Hi.
If you use
import latools as la
at the beginning, you cannot use
la.preprocessing.split.long_file()
.
You have to do
from latools import preprocessing as pre
at the beginning and do
pre.split.long_file()
, if you want to use the function.
Hope this helps.
ratio
with a different reference analytecalibrate
Boom.
Integer line number entries in meta_regex section throw error because of new use of 'isdigit', which requires str input.
Make one.
Output ppm values would be nice.
Re-think structure - put a lot of the work functions in a submodule.
Hello.
In my environment, trace_plot()
didn't plot anything on my jupyter notebook even if I put %matplotlib inline
.
I solved this by commenting out plt.close(f)
in line 3365 of latools.py file.
Hope this helps.
minimal export data seems to be despiked - it shouldn't be.
Make sure data is copied before despiking.
Hello.
Even if you put the figsize in la.crossplot()
like eg.crossplot(figsize = (24, 24))
, the figsize doesn't change.
This is because in
latools.helpers.plot.crossplot()
,
the code says
fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
figsize=(12, 12))
If you fix this as
fig, axes = plt.subplots(nrows=numvar, ncols=numvar,
figsize=figsize)
then the problem will be fixed.
Hope this helps.
Some data formats can't be passed directly to numpy.genfromtxt.
Needs a pre-processing step in data import.
Need to warn user if n_min might be too high (i.e. a large amount of data is removed).
Calibration currently does everything by element name - mass information is stripped out when matching analyte names to SRM rows. This makes it impossible to calibrate individual isotopes. It would be nice to be able to work with individual isotopes, if values are present in the SRM databse.
This should be relatively easy to implement. When matching analytes to SRM compositions, it should perform a 'first pass' match, where it attempts to match all analytes against row names in the SRM table, where isotopes are identified as [mass][name] (e.g. 23Na).
srm_load_database
to look for [Mass][Analyte] name matches and return srmdat and dict of analyte-item links.srm_id_auto
to work with new srmdat format.calibrate
to use analyte names instead of element names.Need a way to exclude dodgy ablations during file splitting (i.e. the targeted region exploded/broke off shortly after ablation started).
Could perhaps be done by excluding 'fragments'/ablations smaller than N data points.
Need tests to ensure reproducibility of analyses between versions.
Hello.
I think I found a mistake in latools.helpers.helpers.get_date()
.
In the source,
t = dt.datetime.strftime(datetime, time_format)
should be
t = dt.datetime.strptime(datetime, time_format)
.
Fixing this worked in my environment.
Hope this helps.
Add support for line-scan, raster and spot imaging.
Functions for:
Dear LaTools Coders,
Thank you for making a useful package that helps reduce LA-ICP-MS data.
I am currently trying to reduce a relative large run (just over an hour with standards and unknowns). I have been able to use the long_data file example to split the data and read it in with the REPRODUCE configurations. See first image.
I am now trying to calculate the background and working between using the interpolation and the Gaussian moving average methods. See second image for the interpolation based background calculation. The image looks strange and I am not sure what is happening. I am able to subtract the background and get ratios and calibration plots with NIST612 as the standard.
However, I am not sure if the background function is working correctly for such a long dataset. I was wondering if you would have any insight or thoughts on to proceed?
Best,
Natasha
e.g. remove data before x or after y time.
Currently in the GUI, creating the filter_gradient_threshold_percentile filter doesn't cause a new filter name (eg '0_Mg24_thresh_below') to be added to the sample's filter list.
If LAtools is working fine for this filter then it may be an issue on the GUI side.
Must still work with minimal export.
Unsure of cause. Need to look into it.
Shouldn't need to apply pre-processing to long files. Should be able to import them directly along with sample list.
Framework should already be in place with 'split long file'. Could pass output directly to D_obj, instead of saving and re-importing them?
Need functions to add & edit configs. Ideally:
latools.add_config(config_name, dataformat, SRMfile, save)
, takes one/both of: dataformat (option to copy & save file internally), SRMtable. Defaults used if not given?latools.edit_config()
: modify particular config. alias of add_config
? Could work in same way.latools.get config_files()
: print locations of dataformat & SRMtable filesTools for creating dataformat file:
latools.make_dataformat()
function to walk user through several prompts & creates dataformat dict for export?error when calling eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
Works for 2 of the runs, but not other 2. Files have been split by long data script and then read in with REPRODUCE config.
ValueError Traceback (most recent call last)
in
21 eg.ratio()
22 eg.trace_plots()
---> 23 eg.calibrate(srms_used=[ 'NIST612', 'NIST610'])
24 eg.calibration_plot()
25 eg.trace_plots()
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/helpers/logging.py in wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1661
1662 if not hasattr(self, 'srmtabs'):
-> 1663 self.srm_id_auto(srms_used=srms_used, n_min=n_min, reload_srm_database=reload_srm_database)
1664
1665 # make container for calibration params
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/latools/latools.py in srm_id_auto(self, srms_used, analytes, n_min, reload_srm_database)
1579 classifier = KMeans(len(srms_used)).fit(_srmid)
1580 # apply classifier to measured data
-> 1581 std_classes = classifier.predict(_stdid)
1582
1583 # get srm names from classes
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in predict(self, X, sample_weight)
1154 check_is_fitted(self)
1155
-> 1156 X = self._check_test_data(X)
1157 x_squared_norms = row_norms(X, squared=True)
1158 sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py in _check_test_data(self, X)
856
857 def _check_test_data(self, X):
--> 858 X = self._validate_data(X, accept_sparse='csr', reset=False,
859 dtype=[np.float64, np.float32],
860 order='C', accept_large_sparse=False)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
419 out = X
420 elif isinstance(y, str) and y == 'no_validation':
--> 421 X = check_array(X, **check_params)
422 out = X
423 else:
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
718
719 if force_all_finite:
--> 720 _assert_all_finite(array,
721 allow_nan=force_all_finite == 'allow-nan')
722
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Needs fixing. Not urgent.
In practice doesn't make any difference, as any spikes that would have been removed by the expdecay_despiker are caught by noise_despiker.
I encountered an error when calling eg.calibrate(drift_correct=False,
srms_used=['NIST610', 'NIST612','JCp-1','JCt-1']).
The elemental compositions for JCp-1 and JCt-1 were added to my SRM.csv. Below is the warning.
I also tried only using 'NIST610', 'NIST612', but it did not work. I wonder what the "KeyError: 'I'" means.
KeyError Traceback (most recent call last)
Input In [25], in
1 # calibration
----> 2 eg.calibrate(drift_correct=False,
3 srms_used=['NIST610', 'NIST612','JCp-1','JCt-1'])
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\helpers\logging.py:18, in _log..wrapper(self, *args, **kwargs)
16 @wraps(func)
17 def wrapper(self, *args, **kwargs):
---> 18 a = func(self, *args, **kwargs)
19 self.log.append(func.name + ' :: args={} kwargs={}'.format(args, kwargs))
20 return a
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1673, in analyse.calibrate(self, analytes, drift_correct, srms_used, zero_intercept, n_min, reload_srm_database)
1648 """
1649 Calibrates the data to measured SRM values.
1650
(...)
1670 None
1671 """
1672 # load SRM database
-> 1673 self.srm_load_database(srms_used, reload_srm_database)
1675 # compile measured SRM data
1676 self.srm_compile_measured(n_min)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\latools\latools.py:1441, in analyse.srm_load_database(self, srms_used, reload)
1438 # calculate SRM polyatom multiplier (multiplier to account for stoichiometry,
1439 # e.g. if internal standard is Na, N will be 2 if measured in SRM as Na2O)
1440 N_denom = float(decompose_molecule(ad[a_denom])[get_analyte_name(a_denom)])
-> 1441 N_num = float(decompose_molecule(ad[a_num])[get_analyte_name(a_num)])
1443 # calculate molar ratio
1444 srmtab.loc[srm, (a, 'mean')] = ((srmdat.loc[(srm, ad[a_num]), 'mol/g'] * N_num) /
1445 (srmdat.loc[(srm, ad[a_denom]), 'mol/g'] * N_denom))
KeyError: 'I'
uncertainties on backgrounds should never be negative...
To allow for on-the-fly updates to SRM table.
Hello.
When initializing analyse object using latools.analyse()
, ValueError is not raised even when the internal_standard is not in self.analytes
.
This is because the code looks like
if internal_standard in self.analytes:
self.internal_standard = internal_standard
else:
ValueError('The internal standard ({}) is not amongst the'.format(internal_standard) +
'analytes in\nyour data files. Please make sure it is specified correctly.')
This should look like
if internal_standard in self.analytes:
self.internal_standard = internal_standard
else:
raise ValueError('The internal standard ({}) is not amongst the '.format(internal_standard) +
'analytes in\nyour data files. Please make sure it is specified correctly.')
**I think I'll soon make a pull request!
Uncertain cause...
Hello.
Since the default of subset
option in la.trace_plots()
, is All_Analyses
,
eg.trace_plots(samples = ['sample1', 'sample2'])
does not work.
This plots all samples, instead of just sample 1 and 2.
In the source, there is
if subset is not None:
samples = self._get_samples(subset)
If you change this to
if subset is not 'All_Analyses':
samples = self._get_samples(subset)
this code will work.
Hope this helps.
Several early-analysis-stage functions should accept a focus_stage
argument, which determines which data it should be applied to.
Default selection should be normal stage of analysis that the function relies on.
Need to check/implement in:
Need better way of referencing stat functions, especially if they're custom functions...
Should return a dict of all elements.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.