Giter VIP home page Giter VIP logo

fastcsv's People

Contributors

joelverhagen avatar mgholam avatar vekstr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fastcsv's Issues

Parse from Stream or TextReader?

From what I can tell, you can only use fastCsv.ReadFile to load CSV data. For my case I am reading CSV data from a network stream (specifically from Azure Blob Storage). I would be nice to have an overload that accepts StreamReader or Stream.

I'm excited to test the performance of this library along with several others.

I may have time to do a pull request but I figured I'd put the request out there.

DataGridView WriteFile

fastCSV.ReadFile The read list can be saved successfully with WriteFile, but DataGridView or datatable cannot be saved. Am I not using it correctly?

How is fastCSV licensed?

I am interested in using fastCSV in an open source project due to it's excellent performance but I am unsure of the license terms.

From the project URL on the package page, I see that this GitHub repository may be licensed under Code Project Open License (CPOL), but I'm not sure since the CPOL information is not even on GitHub. Would it be possible for you to release it under a more common license like MIT or Apache-2.0?

This is all I know about CPOL:
https://en.wikipedia.org/wiki/Code_Project_Open_License#Status_as_an_open-source_license

Also, do you think you could include the license in the next package version so that it appears on nuget.org?
https://www.nuget.org/packages/mgholam.fastCSV/

There is a document on how to do that here:
https://docs.microsoft.com/en-us/nuget/reference/msbuild-targets#packing-a-license-expression-or-a-license-file

Last line includes CR LF (\r\n) if file ends in empty line

I am analyzing many CSV parsers in .NET and I have noticed that fastCSV includes a \r\n sequence in the last column of the last row if the file ends in an empty line. Many tools include the trailing line ending (both Windows and Linux) so this behavior may be unexpected.

For example,

a,b,c
d,e,f

The f field will contain a \r\n after the f.

All of these other CSV readers do not include the \r\n in the last field.
image

Repro: CsvRepro.zip

Unknown/variable column count

Hello Mehdi,

I'm trying to add your library to some .NET CSV benchmarks that I'm collecting, and while I've made it work, I wonder if there isn't a faster way to do what I'm trying to do.

The biggest issues that I see is that I can't figure out how to handle a variable number of columns. My specific test dataset is the Johns Hopkins University covid data set, which is updated daily and adds a new column ever day. A bit unusual, perhaps, but I can't figure out an elegant way to process this CSV with your library.

My benchmark project is CSVBenchmarks.

The benchmark for your library is here: mgholam.fastCSV

The comments in the code segment above highlight a few of the issues. I'd happily accept a PR with a better solution if you can provide one.

Thanks.

By the way, I became aware of your project via @joelverhagen who has also been compiling some CSV benchmarks that you're probably aware of.

Close TextReader in method ReadData<T>()

The TextReader in method ReadData() must be closed, otherwise the CSV file is blocked (e.g. can't be renamed)

E.g.:

private static List<T> ReadData<T>(TextReader tr, bool hasheader, int colcount, char delimiter, ToOBJ<T> mapper)
{
    try
    {
       .....
    }
    finally
    {
        tr.Close();
    }
}

(use using statements ...)

How to read all columns

How to read a file with unknown column name?

Suggestion: add a function to generate datatable directly after reading.

Missing data if hasHeader=false in method fastCSV.ReadFile<>

Hello,

it seems that the method fastCSV.ReadFile() will ignore the first data row in CSV file if hasHeader property is set to false.
If the CSV file contains e.g. 5 data rows (without header text row!) you will always retrieve 4 last data objects in CSV file, independent if hasHeader is set tot true or false.

Regards,
Guenter

Extra data left at the end of some records

In my usage of your library, I have noticed some records that get deserialized have the last field populated with data from the previous line sometimes. I compared NReco.Csv behavior with your library:

Row 2 is different!
NReco.CSV: {
  "ScanId": "7c1985ab-b557-4561-9e3e-7697f25d303a",
  "ScanTimestamp": "2020-11-28T01:50:47.6915182+00:00",
  "Id": "VL.TrackObjects",
  "Version": "0.0.2-alpha",
  "Created": "2020-11-27T22:56:33.19+00:00",
  "ResultType": "NoAssets",
  "PatternSet": "",
  "PropertyAnyValue": "",
  "PropertyCodeLanguage": "",
  "PropertyTargetFrameworkMoniker": "",
  "PropertyLocale": "",
  "PropertyManagedAssembly": "",
  "PropertyMSBuild": "",
  "PropertyRuntimeIdentifier": "",
  "PropertySatelliteAssembly": "",
  "Path": "",
  "FileName": "",
  "FileExtension": "",
  "TopLevelFolder": "",
  "RoundTripTargetFrameworkMoniker": "",
  "FrameworkName": "",
  "FrameworkVersion": "",
  "FrameworkProfile": "",
  "PlatformName": "",
  "PlatformVersion": ""
}
mgholam.fastCSV: {
  "ScanId": "7c1985ab-b557-4561-9e3e-7697f25d303a",
  "ScanTimestamp": "2020-11-28T01:50:47.6915182+00:00",
  "Id": "VL.TrackObjects",
  "Version": "0.0.2-alpha",
  "Created": "2020-11-27T22:56:33.19+00:00",
  "ResultType": "NoAssets",
  "PatternSet": "",
  "PropertyAnyValue": "",
  "PropertyCodeLanguage": "",
  "PropertyTargetFrameworkMoniker": "",
  "PropertyLocale": "",
  "PropertyManagedAssembly": "",
  "PropertyMSBuild": "",
  "PropertyRuntimeIdentifier": "",
  "PropertySatelliteAssembly": "",
  "Path": "",
  "FileName": "",
  "FileExtension": "",
  "TopLevelFolder": "",
  "RoundTripTargetFrameworkMoniker": "",
  "FrameworkName": "",
  "FrameworkVersion": "",
  "FrameworkProfile": "",
  "PlatformName": "",
  "PlatformVersion": "0.0.0.0"
}

(see the last property, PlatformVersion)

Here is the CSV:

c0db9120-80cc-4c0c-9aa7-ccc957348e4f,2020-11-28T01:50:40.5074056+00:00,VIT.COFIDE.GESTIONCUENTAS.Models,1.0.18,2020-11-27T22:42:01.8030000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/VIT.COFIDE.GESTIONCUENTAS.Models.dll,VIT.COFIDE.GESTIONCUENTAS.Models.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
7c1985ab-b557-4561-9e3e-7697f25d303a,2020-11-28T01:50:47.6915182+00:00,VL.TrackObjects,0.0.2-alpha,2020-11-27T22:56:33.1900000+00:00,NoAssets,,,,,,,,,,,,,,,,,,,
3c219c16-8b8f-4e29-b16c-ed34c5442c73,2020-11-28T01:49:41.7240806+00:00,WOLF.Net,4.0.0-alpha1,2020-11-27T21:19:25.8870000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.dll,WOLF.Net.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
3c219c16-8b8f-4e29-b16c-ed34c5442c73,2020-11-28T01:49:41.7240806+00:00,WOLF.Net,4.0.0-alpha1,2020-11-27T21:19:25.8870000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.dll,WOLF.Net.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
498680e6-31a0-4c27-b79e-d607ef7c8393,2020-11-28T01:49:43.5970660+00:00,WOLF.Net.Redis,3.0.0,2020-11-27T21:20:56.7600000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.Redis.dll,WOLF.Net.Redis.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
498680e6-31a0-4c27-b79e-d607ef7c8393,2020-11-28T01:49:43.5970660+00:00,WOLF.Net.Redis,3.0.0,2020-11-27T21:20:56.7600000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.Redis.dll,WOLF.Net.Redis.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
bd1af124-5907-4227-bf5a-4d3ca5e8ff2e,2020-11-28T01:50:18.4736659+00:00,YPF.MSPromotions.DTO,1.0.4.24,2020-11-27T21:50:20.3230000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
bd1af124-5907-4227-bf5a-4d3ca5e8ff2e,2020-11-28T01:50:18.4736659+00:00,YPF.MSPromotions.DTO,1.0.4.24,2020-11-27T21:50:20.3230000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
02870803-36bd-4ae5-acd5-3b89e6bbdc70,2020-11-28T01:45:51.1358087+00:00,YPF.MSPromotions.DTO,1.0.4.24-beta,2020-11-27T19:55:12.1870000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
02870803-36bd-4ae5-acd5-3b89e6bbdc70,2020-11-28T01:45:51.1358087+00:00,YPF.MSPromotions.DTO,1.0.4.24-beta,2020-11-27T19:55:12.1870000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0

Here is a repro: CsvRepro.zip

Code for the benchmarks

Hi,

Would you please share the code you used to benchmark all the CSV parsers?

I did some research on them a few months ago, and NReco was a lot faster than the numbers in the README.md...

How to read by column name, not by index

At present, the index is used when reading a column, so it is impossible to determine which column is being read. Is there any way to specify the column name to read?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.