Comments (14)
We have already gone one step further, #44 , and tried to go directly via libmono-2.so (similar to what python projects, e.g., alphapept do).
It is remarkable that, in my opinion, the file IO has much less impact than we expected. (I assume
every computer uses SSD technology when dealing with super expensive Orbitrap data.) For small amounts of data, the impact of initialization the file handle seems much higher.
Concerning data.table::fread
, my philosophy is to use as much as possible R base
to make it easy to install and maintain.
from rawrr.
Below is my experience. The RAW file is only 35M, and it takes almost 110 seconds.
I tried the EH4547 demo but failed when runing R$createObject()
with ASSEMBLY PROBLEM.
My primary idea is to redirect rawrr.exe output to Random-access memory(RAM) and bypass the need to write and read. Writing to the file has much more impact on non-SSD.
from rawrr.
@asepsiswu: Pleas don't fell offended. @cpanse is afraid of package dependencies. 👻 I once had to remove code that was using a dplyr
function and look for ways to do it in base R.
But, why don't we check if data.table
is available on the system. If yes, use it, if not use the base R read.table
? At 100 Hz (Astral) we should expect already 360'000 spectra/h recording time. If the fread
is really faster it might matter at some point.
from rawrr.
data.table
is an R package without other dependence, which really outperforms well. It is worth a try.
I agree with @cpanse, to use as much as possible R base to make it easy to install and maintain.
from rawrr.
That sounds very interesting! Having a text buffer that connects the C# with the R level would save a write and read operation. @cpanse I think you should really consider this solution!
from rawrr.
R> rbenchmark::benchmark({.readIndexI70(f) |> nrow() -> dump}, replications = 10)
test replications elapsed relative user.self sys.self user.child sys.child
1 {\n dump <- nrow(.readIndexI70(f))\n} 10 34.621 1 0.769 0.516 35.272 3.772
R> rbenchmark::benchmark({rawrr::readIndex(f) |> nrow() -> dump}, replications = 10)
test replications elapsed relative user.self sys.self user.child sys.child
1 {\n dump <- nrow(rawrr::readIndex(f))\n} 10 35.044 1 0.293 0.038 35.272 4.101
R> file.size(f) / 1024^3
[1] 2.030649
R> .readIndexI70
function(rawfile, tmpdir = tempdir()){
rawrr:::.isAssemblyWorking()
rawrr:::.checkRawFile(rawfile)
mono <- if(Sys.info()['sysname'] %in% c("Darwin", "Linux")) TRUE else FALSE
exe <- rawrr:::.rawrrAssembly()
tfstdout <- tempfile(tmpdir=tmpdir)
cmd <- exe
if (mono){
con <- system2(Sys.which("mono"),
args = c(shQuote(exe), shQuote(rawfile),
"index", shQuote(tfstdout)),
stdout = TRUE) |> textConnection()
}else{
con <- system2(exe,
args = c( shQuote(rawfile), "index", shQuote(tfstdout)),
stdout = TRUE) |> textConnection()
}
DF <- read.table(con,
header = TRUE,
comment.char = "#",
sep = ';',
na.strings = "-1",
colClasses = c('integer', 'character', 'numeric', 'numeric', 'character',
'integer', 'integer', 'integer', 'numeric'))
DF$dependencyType <- as.logical(DF$dependencyType)
DF
}
<bytecode: 0x122ae75b8>
R>
Yes, it is more elegant, and we used pipe in rawDiag. So far, it does not seem to improve the read performance on my Apple M1 (reading 10x a 2G raw file saved less than 0.5seconds).
from rawrr.
The readIndex function generates much smaller intermediate files compared to readSpectrum. The latter, readSpectrum, exhibits poorer performance, especially when used with non-SSD (Solid State Drive) disks.
Based on my experience, data.table::fread
demonstrates significantly faster performance compared to read.table, especially when dealing with large files.
from rawrr.
@cpanse: I don't share your assumption. We should also think about the people that have less resources and still depend on conventional hard disks!
from rawrr.
Below is my experience. The RAW file is only 35M, and it takes almost 110 seconds.
I tried the EH4547 demo but failed when runing
R$createObject()
with ASSEMBLY PROBLEM. My primary idea is to redirect rawrr.exe output to Random-access memory(RAM) and bypass the need to write and read. Writing to the file has much more impact on non-SSD.
have you considered:
- set
tempdir = /dev/shm/
- set
mode = "barebone"
when using readSpectrum
from rawrr.
@asepsiswu: Pleas don't fell offended. @cpanse is afraid of package dependencies. 👻 I once had to remove code that was using a
dplyr
function and look for ways to do it in base R.But, why don't we check if
data.table
is available on the system. If yes, use it, if not use the base Rread.table
? At 100 Hz (Astral) we should expect already 360'000 spectra/h recording time. If thefread
is really faster it might matter at some point.
because why do you want to read all Astral spectra? solve #44 and that issue and many more are gone.
from rawrr.
It seems not good.
from rawrr.
It seems not good.
@asepsiswu of note, rawrr::readSpectrum
is not using read.table
as rawrr::readIndex
it uses source
parsing R code.
from rawrr.
@asepsiswu of note,
rawrr::readSpectrum
is not usingread.table
asrawrr::readIndex
it usessource
parsing R code.
@asepsiswu and if you set the parameter mode = "barebone"
too?
from rawrr.
@asepsiswu of note,
rawrr::readSpectrum
is not usingread.table
asrawrr::readIndex
it usessource
parsing R code.@asepsiswu and if you set the parameter
mode = "barebone"
too?
I need nosies info. The IO seems to have less impact. If possible, read data from C# interface to R is the best solution.
from rawrr.
Related Issues (20)
- Get information on gradient HOT 1
- Enhancement - Complete readIndex() function HOT 9
- Peak charges for MS1 spectras HOT 4
- Spectrum scan centroid mZ, intensity and noises values do not match HOT 2
- Error in Example: Length of "x" and "y" are not matching HOT 3
- Read noise value for profile mode mass spectra HOT 4
- Read_Spectrum - Sum Spectra
- unit should be minute / auc computation in seconds HOT 24
- validate_rawrrSpectrum 'StartTime' HOT 4
- "Error: line 1 did not have 9 elements" for readIndex() and readChromatogram() + "Error : No scan vector is provided"for readSpectrum HOT 9
- Problem executing readChromatogram inside Singularity container HOT 5
- Add a check if `input` file exists and is not empty
- Error in if (rvs != "No RAW file specified!") { : the condition has length > 1 HOT 16
- Switch to RawFileReader 5.0.93 HOT 2
- different total number of Spectra in msconvert, compomics/ThermoRawFileParser and thermofisherlsms/RawFileReader HOT 2
- Request for auc.rawrrChromatogram HOT 3
- profile mode in readSpectrum HOT 11
- rawrr::buildRawrrExe() fails HOT 6
- auc.rawrrChromatogram question
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rawrr.