Giter VIP home page Giter VIP logo

lightr's Introduction

lightr: import spectral data in R

Lifecycle: stable CRAN version R build status Coverage status Reviewed by rOpenSci JOSS paper

There is no standard file format for spectrometry data and different scientific instrumentation companies use wildly different formats to store spectral data. Vendor proprietary software sometimes has an option but convert those formats instead human readable files such as csv but in the process, most metadata are lost. However, those metadata are critical to ensure reproducibility (White et al, 2015).

This package aims at offering a unified user-friendly interface for users to read UV-VIS reflectance/transmittance/absorbance spectra files from various formats in a single line of code.

Additionally, it provides for the first time a fully free and open source solution to read proprietary spectra file formats on all systems.

πŸ—Ÿ Citation

To cite this package in publications, please use:

Gruson H., White T.E., Maia R., (2019). lightr: import spectral data and metadata in R. Journal of Open Source Software, 4(43), 1857, https://doi.org/10.21105/joss.01857

πŸ”§ Installation

install.packages("lightr")

You can also install the development version from rOpenSci's CRAN-like repository:

install.packages("lightr", repos = "https://dev.ropensci.org")

πŸ’» Usage

A thorough documentation is available with the package, using R usual syntax ?function or help(function). However, users will probably mainly use two functions:

# Get a data.frame containing all useful metadata from spectra in a folder
lr_get_metadata(where = system.file("testdata/procspec_files", 
                                    package = "lightr"), 
                ext = "ProcSpec")

and

# Get a single dataframe where the first column contains the wavelengths and 
# the next columns contain a spectra each (pavo's rspec class)
lr_get_spec(where = system.file("testdata/procspec_files", package = "lightr"),
            ext = "ProcSpec")

lr_get_spec() returns a dataframe that is compatible with pavo custom S3 class (rspec) and can be used for further analyses using colour vision models.

All supported file formats can also be parsed using the lr_parse_$extension() function where $extension is the lowercase extension of your file. This family of functions return a list where the first element is the data dataframe and the second element is a vector with relevant metadata.

Only exceptions are .txt and .Transmission files because those extensions are too generic. Users will need to figure out which parser is appropriate in this case. lr_get_metadata() and lr_get_spec() automatically try generic parsers in this case.

Alternatively, you may simply want to convert your spectra in a readable standard format and carry on with your analysis with another software.

In this case, you can run:

# Convert every single ProcSpec file to a csv file with the same name and 
# location
lr_convert_tocsv(where = system.file("testdata/procspec_files", 
                                      package = "lightr"),
                 ext = "ProcSpec")

βœ” Supported file formats

This package is still under development but currently supports (you can click on the extension in the tables to see an example of this file format):

Extension Parser
jdx lr_parse_jdx()
ProcSpec lr_parse_procspec()
spc lr_parse_spc()
jaz lr_parse_jaz()
JazIrrad lr_parse_jazirrad()
Transmission lr_parse_jaz()
txt lr_parse_jaz()
Extension Parser
ABS lr_parse_abs()
ROH lr_parse_roh()
TRM lr_parse_trm()
trt lr_parse_trt()
ttt lr_parse_ttt()
txt lr_parse_generic()
DRK lr_parse_trm()
REF lr_parse_trm()
IRR8 lr_parse_irr8()
RFL8 lr_parse_rfl8()
Raw8 lr_parse_raw8()
Extension Parser
txt lr_parse_generic()
spc lr_parse_spc()

Others

Extension Parser
csv lr_parse_generic(sep = ",")
dpt lr_parse_generic(sep = ",")

As a fallback, you should always try lr_parse_generic() which offers a flexible and general algorithm that manages to extract data from most files.

If you can't find the best parser for your specific file or if you believe you are using an unsupported format, please open an issue or send me an email.

🌐 Similar projects

  • lightr itself contains some code that has been initially forked from pavo, namely the lr_get_spec() function. The code has since then been refactored and optimised for speed. pavo differs from lightr in its focus and core functionalities. The main strength of pavo is the comprehensive and user-friendly set of functions to analyse spectral data using colour vision models, while lightr focuses on the data import step.
  • photobiologyInOut also provides functions to import spectral data. The goal of the author is to provide a complete pipeline of spectral data import and analysis using a set of tightly integrated R packages. This however makes it more difficult to use a different tool for a given step of the process. On the contrary, lightr aims at proposing a light package with limited dependencies that focuses on the data import step of the process and let the user pick their favourite tool for the analysis step (pavo, colourvision, Avicol, etc.).
  • spectrolab

To our knowledge, lightr is the only gratis tool to import some complex file formats such as Avantes (ABS, ROH, TRM, RFL8) or CRAIC (spc) binary files, or OceanOptics .ProcSpec. Because of its user-friendly high-levels functions and low dependency philosophy, lightr may also hopefully prove useful for people working with other languages than R.

Contributing

There are plenty of ways you can contribute to lightr. Please visit our contributing guide.

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

lightr's People

Contributors

bisaloo avatar danielskatz avatar maelle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

junbinzhao

lightr's Issues

Consider moving to tinytest

This is especially interesting for this package:

  • it is good that users are able to run the tests on their own machine since lightr could be very configuration-dependent (hopefully not too much though but it's good to be able to check)
  • lightr sometimes has to change some settings (such as LC) so it's nice to be able to track side effects

Finally, the tests in this package are well organised and relatively simple so a migration might be easier than for other packages.

Allow passing a vector of paths in `where`

library(lightr)

spec <- lr_get_spec(
  c(
    system.file("testdata", package = "lightr"), 
    system.file("testdata", "non_english", package = "lightr")
  ),
  ext = c(
    "TRM", "ttt", "jdx", "jaz", "JazIrrad", "csv", "txt",
    "Transmission", "spc"
  ),
  sep = ","
)
#> 20 files found; importing spectra:
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/FMNH6834.00000001.Master.Transmission':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/OO_comma.txt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/OOusb4000.txt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/OceanView.txt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/OceanView_nonEN.txt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/UK5.txt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(con, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/avantes_export.ttt':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(filename, "rb"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/avantes_trans.TRM':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(con, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/irrad.JazIrrad':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning in file(file, "r"): cannot open file
#> '/home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/spec.csv':
#> No such file or directory
#> Warning in value[[3L]](cond): cannot open the connection
#> Warning: Could not import one or more files:
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/FMNH6834.00000001.Master.Transmission
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/OO_comma.txt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/OOusb4000.txt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/OceanView.txt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/OceanView_nonEN.txt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/UK5.txt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/avantes_export.ttt
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/avantes_trans.TRM
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/irrad.JazIrrad
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/non_english/spec.csv

head(spec)
#>    wl CRAIC_export FMNH6834.00000002.Master J_MUR_MARS_17_0001 OceanOptics
#> 1 300     5.710900                 3.178905           2.328048    125.9485
#> 2 301     5.794053                 2.940429           2.238900    125.0620
#> 3 302     4.326612                 2.697714           2.313921    127.3013
#> 4 303     4.178850                 2.544286           2.565386    127.9036
#> 5 304     3.671765                 2.756619           2.720976    127.9396
#> 6 305     2.784688                 2.816048           2.583169    128.0920
#>   OceanOptics_comma OceanOptics_period  avantes2 avantes_export_long  avasoft8
#> 1          126.3405           154.3030 12.964709           13.624678 252.16336
#> 2          127.2560           129.2981  8.262921            5.276307   0.00000
#> 3          126.8862           115.9120 11.605302           11.023560  74.36793
#> 4          126.2224           122.5818 13.189732           10.307208  34.58589
#> 5          127.8413           125.9307 15.073677            8.980705 101.20355
#> 6          127.9555           152.3136 14.371996            8.278200   0.00000
#>    jazspec
#> 1 13.78756
#> 2 13.10873
#> 3 12.70862
#> 4 12.93509
#> 5 13.02466
#> 6 13.64287

warnings()

Created on 2023-06-09 with reprex v2.0.2

Unparseable TRM files

For some reasons, some TRM files don't seem to work fine with the parser.

The processed column appears to contain the correct data but it is truncated (on both side) and wavelengths are wrong.

Example file is attached

raw_avantes <- parse_trm("B_PIR_REP_AV2013_0001.TRM")[[1]]
txt_avantes <- get_spec(".", ext = "ttt")

plot(txt_avantes)
plot(raw_avantes$processed)

bluetit.zip

Set "processed" to 0 when "dark" > "white"?

When "dark" > "white", AvaSoft sets the "processed" column to 0. Should we do the same?

It makes sense from a theoretical point of view (the background noise cannot be higher than the reference). But then why not set if to 0 when "dark" > "scope" as well? (AvaSoft doesn't do this)

cc @thomased

Should lr_get_spec() drop columns outside of `lim` or simply return NA?

library(lightr)

lr_get_spec(system.file("testdata", package = "lightr"), ext = "ttt", lim = c(100, 200))
#> 2 files found; importing spectra:
#> Warning in min(bounds): no non-missing arguments to min; returning Inf
#> Warning in max(bounds): no non-missing arguments to max; returning -Inf
#> Warning in min(bounds): no non-missing arguments to min; returning Inf
#> Warning in max(bounds): no non-missing arguments to max; returning -Inf
#> Warning: File import failed.
#> Check input files and function arguments.
#> NULL

Created on 2020-02-16 by the reprex package (v0.3.0)

Alternative output would be a 101x3 rspec object with c("wl", filenames) as column names and the second and third column filled with NA.

Release lightr 1.6.2

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()

Review pkg verbosity

I had to silence quite a lot of warnings in the tests and I wondering if we went overboard with the messages and warnings. It'd be good to review this at some point and either remove some of them or provide global options to set the verbosity level.

Release lightr 1.6.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Make parallelisation work on Windows

It's currently a WIP in the psock_clusters branch but I have namespace issues that make it fail.

Additionally, windows firewall puts a scary pop-up when PSOCK clusters are used so it may not be worth it because users might not use it.

Release lightr 1.7.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post
  • rhub::check(platform = "debian-gcc-devel-nold")

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Release lightr 1.4

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Dates are only parsed when `LC_TIME=en_US.UTF-8`

in parse_oceanoptics_converted.R.

Solutions:

  • use anytime.
    • pros: simpler interface, works with all locales
    • cons: additional BH dependency
  • use withr.
    • pros: no dependencies, can be used to specify other options locally

Release lightr 1.5.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Re-enable hash tests on 32bits machines

I disabled hash tests on 32bits machines in a26fed7. For some reason, hash computation differs on i386 on winbuilder on rhub (rhub::check_on_windows(check_args = "--force-multiarch")):

set.seed(20200328)
expect_known_hash(
  rnorm(100),
  "c304db5d84"
)

On i386:

Value hashes to d0a5b3ad98, not c304db5d84

It's not inherent to all 32bits systems though since the above test works fine on a raspberry pi.

Release lightr 1.6.1

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • rhub::check(platform = "debian-gcc-devel-nold")
  • rhub::check_on_solaris()

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()

Release lightr 1.4

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post
  • rhub::check(platform = "debian-gcc-devel-nold")
  • rhub::check_on_solaris()

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted πŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

More informative error when wl is out of range in `lr_get_spec()`

library(lightr)

lr_get_spec(system.file("testdata", package = "lightr"), ext = "ttt", lim = c(100, 299))
#> 2 files found; importing spectra:
#> Warning in min(bounds): no non-missing arguments to min; returning Inf
#> Warning in max(bounds): no non-missing arguments to max; returning -Inf
#> Warning: Could not import one or more files:
#> /home/hugo/R/x86_64-pc-linux-gnu-library/3.6/lightr/testdata/avantes_export.ttt
#>      wl avantes_export_long
#> 1   100                  NA
#> 2   101                  NA
#> 3   102                  NA
#> 4   103                  NA
#> 5   104                  NA
#> ...

Instead of generic Could not import, message Wavelength range is outside of limits...

Make convert_tocsv export metadata as well

At the moment, when you use convert_tocsv() on example.jdx, you get example.csv as an output.

It would probably be useful if convert_tocsv() returned both example.csv and example_metadata.csv instead

lr_get_spec values all NAs

I'm trying to convert multiple .csv files to a matrix for downstream analysis, using lr_get_spec, however all values are returned NAs.

Script used
spec <- lr_get_spec( where = getwd(), ext = "csv", lim = c(450,4500), sep = ",", subdir = FALSE, subdir.names = FALSE, ignore.case = TRUE, interpolate = TRUE )

Attaching zip of .csv files.
a4_bulk(12).zip

Thanks!

lr_get_spec with csv mixed with others

lr_get_spec() has challenges when using multiple extensions.

When β€˜csv’ files are included with others e.g. β€˜Master.Transmission’ we have a problem because the β€˜csv’ requires the sep = β€˜,’ argument.

For:

extension <- c('csv', 'Master.Transmission', 'ProcSpec') 

you would need:

lr_get_spec(ext = extension, sep = ',') # .Master.Transmission and ProcSpec not affected by sep argument

But this may affect how other file types are read

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.