sl-solution / dlmreader.jl Goto Github PK
View Code? Open in Web Editor NEWHigh-performance delimited-file reader and writer for Julia
License: MIT License
High-performance delimited-file reader and writer for Julia
License: MIT License
It seems that the "read from buffer" information is ignoring the informat.
julia> using DLMReader
julia> function INFMT!(x)
r=findfirst(r"GMT+.*", x)
remove!(x, r)
end
INFMT! (generic function with 1 method)
julia> register_informat(INFMT!)
[ Info: Informat INFMT! has been registered
julia> filereader(IOBuffer("""date
12:20:39 GMT+000
"""), informat=Dict(1=>INFMT!), dtformat=Dict(1=>dateformat"H:M:S"))
┌ Warning: There are problems with parsing the input file at line 2 (observation 1) :
│ Column 1 : date::Time : Read from buffer ("12:20:39 ") ##################HERE#####################
│ the values are set as missing.
│ MORE DETAILS:
│ date::Time = missing
│ 12:20:39
└ @ DLMReader C:\Users\msol658\.julia\dev\DLMReader\src\util.jl:940
1×1 Dataset
Row │ date
│ identity
│ Time?
─────┼──────────
1 │ missing
Hi!
Currently in DLMReader/InMemoryDatasets, Floats very close to zero (under 5e-324) are parsed as missing values. It seems to me that the prefered behaviour should be to parse those as 0.0 (like CSV/DataFrames does) in order to distinguish from "real" missing values. Keeping the warning (as opposed to CSV) is probably a good idea though. What do you think about it?
Thanks a lot!
I cannot use informats to parse hex strings into an integer column, because the decimal representation is often longer than the original string.
I tried:
function hex2int!(str)
val = parse(Int64, str, base=16)
val_str = repr(val)
setindex!(str, val_str)
return str
end
which I can register, but it gives wrong results, e.g. the string ""6c1" is converted to 172, but should have been converted to "1729".
@JuliaRegistrator register
As discussed here https://discourse.julialang.org/t/how-do-i-know-if-a-package-is-good/82133, it would be nice if this package's Readme linked to some alternatives: CSV.jl, DelimetedFiles.jl
I realised everyone likes to advertise what their package is good at, but sometimes users find the wrong one for their needs first. This package's name is close to Matlab's dlmread
, so some people may find it who really just need the standard library's DelimetedFiles.jl.
Cc @juliohm from discourse thread.
Hi,
I am struggling to properly read a csv file using DLMReader. I get a lot of warnings.
For example I amd trying to read the dataset called "Air Quality" (city of New York - csv) that can be found at:
https://catalog.data.gov/dataset
I am simply doing filereader("Air_Quality.csv")
.
Am I doing something wrong?
Thank you
Are comments in the CSV file is supported?
Hi There,
The following example from the package documentation runs very slow in Windows.
julia> using InMemoryDatasets
julia> ds = Dataset(rand([1.1,2.2,3.4], 100, 100000), :auto);
julia> filewriter("_tmp.csv", ds, buffsize = 2^25, lsize = 500000);
julia> @time ds = filereader("_tmp.csv", buffsize = 2^21, lsize = 2^20, types = fill(Float64, 10^5));
1.163346 seconds (900.02 k allocations: 180.966 MiB)
julia> @time ds = filereader("_tmp.csv", buffsize = 2^21, lsize = 2^20, guessingrows = 2);
1.803125 seconds (4.10 M allocations: 289.193 MiB, 2.86% gc time)
It is because of parsing float. It seems that parsing floats in base is very slow in Windows
parse(Float64, "32423") is about 20 times slower than parse(Int, "32423") in Windows (in OSX they are almost the same).
Hi,
It is in the title.
I was wondering if it is possible to add support for compressed .gz .zip file? (Like data.table or CSV.jl)
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.