Giter VIP home page Giter VIP logo

dlmreader.jl's People

Contributors

giantmoa avatar sl-solution avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

giantmoa

dlmreader.jl's Issues

An issue with showing warnings

It seems that the "read from buffer" information is ignoring the informat.

julia> using DLMReader

julia> function INFMT!(x)
           r=findfirst(r"GMT+.*", x)
           remove!(x, r)
       end
INFMT! (generic function with 1 method)

julia> register_informat(INFMT!)
[ Info: Informat INFMT! has been registered
julia> filereader(IOBuffer("""date
       12:20:39 GMT+000
       """), informat=Dict(1=>INFMT!), dtformat=Dict(1=>dateformat"H:M:S"))
┌ Warning: There are problems with parsing the input file at line 2 (observation 1) :
│ Column 1 : date::Time : Read from buffer ("12:20:39        ")  ##################HERE#####################
│  the values are set as missing.
│ MORE DETAILS:
│ date::Time = missing12:20:39
└ @ DLMReader C:\Users\msol658\.julia\dev\DLMReader\src\util.jl:940
1×1 Dataset
 Row │ date
     │ identity
     │ Time?
─────┼──────────
   1missing

small floats

Hi!
Currently in DLMReader/InMemoryDatasets, Floats very close to zero (under 5e-324) are parsed as missing values. It seems to me that the prefered behaviour should be to parse those as 0.0 (like CSV/DataFrames does) in order to distinguish from "real" missing values. Keeping the warning (as opposed to CSV) is probably a good idea though. What do you think about it?

Thanks a lot!

Informats cannot parse hex values

I cannot use informats to parse hex strings into an integer column, because the decimal representation is often longer than the original string.

I tried:

function hex2int!(str)
    val = parse(Int64, str, base=16)
    val_str = repr(val)
    setindex!(str, val_str)
    return str
end

which I can register, but it gives wrong results, e.g. the string ""6c1" is converted to 172, but should have been converted to "1729".

Add links to readme?

As discussed here https://discourse.julialang.org/t/how-do-i-know-if-a-package-is-good/82133, it would be nice if this package's Readme linked to some alternatives: CSV.jl, DelimetedFiles.jl

I realised everyone likes to advertise what their package is good at, but sometimes users find the wrong one for their needs first. This package's name is close to Matlab's dlmread, so some people may find it who really just need the standard library's DelimetedFiles.jl.

Cc @juliohm from discourse thread.

Question: How to properly read a csv file

Hi,
I am struggling to properly read a csv file using DLMReader. I get a lot of warnings.
For example I amd trying to read the dataset called "Air Quality" (city of New York - csv) that can be found at:
https://catalog.data.gov/dataset

I am simply doing filereader("Air_Quality.csv").
Am I doing something wrong?

Thank you

slow performance in MS Windows

Hi There,
The following example from the package documentation runs very slow in Windows.

julia> using InMemoryDatasets

julia> ds = Dataset(rand([1.1,2.2,3.4], 100, 100000), :auto);

julia> filewriter("_tmp.csv", ds, buffsize = 2^25, lsize = 500000);

julia> @time ds = filereader("_tmp.csv", buffsize = 2^21, lsize = 2^20, types = fill(Float64, 10^5));
  1.163346 seconds (900.02 k allocations: 180.966 MiB)

julia> @time ds = filereader("_tmp.csv", buffsize = 2^21, lsize = 2^20, guessingrows = 2);
  1.803125 seconds (4.10 M allocations: 289.193 MiB, 2.86% gc time)

It is because of parsing float. It seems that parsing floats in base is very slow in Windows

parse(Float64, "32423") is about 20 times slower than parse(Int, "32423") in Windows (in OSX they are almost the same).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.