mgholam / fastcsv Goto Github PK
View Code? Open in Web Editor NEWFast CSV reader writer in c#
Home Page: https://www.codeproject.com/Articles/5255318/fastCSV
License: MIT License
Fast CSV reader writer in c#
Home Page: https://www.codeproject.com/Articles/5255318/fastCSV
License: MIT License
From what I can tell, you can only use fastCsv.ReadFile
to load CSV data. For my case I am reading CSV data from a network stream (specifically from Azure Blob Storage). I would be nice to have an overload that accepts StreamReader
or Stream
.
I'm excited to test the performance of this library along with several others.
I may have time to do a pull request but I figured I'd put the request out there.
fastCSV.ReadFile The read list can be saved successfully with WriteFile, but DataGridView or datatable cannot be saved. Am I not using it correctly?
I am interested in using fastCSV in an open source project due to it's excellent performance but I am unsure of the license terms.
From the project URL on the package page, I see that this GitHub repository may be licensed under Code Project Open License (CPOL), but I'm not sure since the CPOL information is not even on GitHub. Would it be possible for you to release it under a more common license like MIT or Apache-2.0?
This is all I know about CPOL:
https://en.wikipedia.org/wiki/Code_Project_Open_License#Status_as_an_open-source_license
Also, do you think you could include the license in the next package version so that it appears on nuget.org?
https://www.nuget.org/packages/mgholam.fastCSV/
There is a document on how to do that here:
https://docs.microsoft.com/en-us/nuget/reference/msbuild-targets#packing-a-license-expression-or-a-license-file
I am analyzing many CSV parsers in .NET and I have noticed that fastCSV includes a \r\n
sequence in the last column of the last row if the file ends in an empty line. Many tools include the trailing line ending (both Windows and Linux) so this behavior may be unexpected.
For example,
a,b,c
d,e,f
The f
field will contain a \r\n
after the f
.
All of these other CSV readers do not include the \r\n
in the last field.
Repro: CsvRepro.zip
Hello Mehdi,
I'm trying to add your library to some .NET CSV benchmarks that I'm collecting, and while I've made it work, I wonder if there isn't a faster way to do what I'm trying to do.
The biggest issues that I see is that I can't figure out how to handle a variable number of columns. My specific test dataset is the Johns Hopkins University covid data set, which is updated daily and adds a new column ever day. A bit unusual, perhaps, but I can't figure out an elegant way to process this CSV with your library.
My benchmark project is CSVBenchmarks.
The benchmark for your library is here: mgholam.fastCSV
The comments in the code segment above highlight a few of the issues. I'd happily accept a PR with a better solution if you can provide one.
Thanks.
By the way, I became aware of your project via @joelverhagen who has also been compiling some CSV benchmarks that you're probably aware of.
How to use Readfile method for single column CSV file? I want to read file directly into list of string only not into any custom type class.
The TextReader in method ReadData() must be closed, otherwise the CSV file is blocked (e.g. can't be renamed)
E.g.:
private static List<T> ReadData<T>(TextReader tr, bool hasheader, int colcount, char delimiter, ToOBJ<T> mapper)
{
try
{
.....
}
finally
{
tr.Close();
}
}
(use using statements ...)
How to read a file with unknown column name?
Suggestion: add a function to generate datatable directly after reading.
Hello,
it seems that the method fastCSV.ReadFile() will ignore the first data row in CSV file if hasHeader property is set to false.
If the CSV file contains e.g. 5 data rows (without header text row!) you will always retrieve 4 last data objects in CSV file, independent if hasHeader is set tot true or false.
Regards,
Guenter
In my usage of your library, I have noticed some records that get deserialized have the last field populated with data from the previous line sometimes. I compared NReco.Csv behavior with your library:
Row 2 is different! NReco.CSV: { "ScanId": "7c1985ab-b557-4561-9e3e-7697f25d303a", "ScanTimestamp": "2020-11-28T01:50:47.6915182+00:00", "Id": "VL.TrackObjects", "Version": "0.0.2-alpha", "Created": "2020-11-27T22:56:33.19+00:00", "ResultType": "NoAssets", "PatternSet": "", "PropertyAnyValue": "", "PropertyCodeLanguage": "", "PropertyTargetFrameworkMoniker": "", "PropertyLocale": "", "PropertyManagedAssembly": "", "PropertyMSBuild": "", "PropertyRuntimeIdentifier": "", "PropertySatelliteAssembly": "", "Path": "", "FileName": "", "FileExtension": "", "TopLevelFolder": "", "RoundTripTargetFrameworkMoniker": "", "FrameworkName": "", "FrameworkVersion": "", "FrameworkProfile": "", "PlatformName": "", "PlatformVersion": "" } mgholam.fastCSV: { "ScanId": "7c1985ab-b557-4561-9e3e-7697f25d303a", "ScanTimestamp": "2020-11-28T01:50:47.6915182+00:00", "Id": "VL.TrackObjects", "Version": "0.0.2-alpha", "Created": "2020-11-27T22:56:33.19+00:00", "ResultType": "NoAssets", "PatternSet": "", "PropertyAnyValue": "", "PropertyCodeLanguage": "", "PropertyTargetFrameworkMoniker": "", "PropertyLocale": "", "PropertyManagedAssembly": "", "PropertyMSBuild": "", "PropertyRuntimeIdentifier": "", "PropertySatelliteAssembly": "", "Path": "", "FileName": "", "FileExtension": "", "TopLevelFolder": "", "RoundTripTargetFrameworkMoniker": "", "FrameworkName": "", "FrameworkVersion": "", "FrameworkProfile": "", "PlatformName": "", "PlatformVersion": "0.0.0.0" }
(see the last property, PlatformVersion
)
Here is the CSV:
c0db9120-80cc-4c0c-9aa7-ccc957348e4f,2020-11-28T01:50:40.5074056+00:00,VIT.COFIDE.GESTIONCUENTAS.Models,1.0.18,2020-11-27T22:42:01.8030000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/VIT.COFIDE.GESTIONCUENTAS.Models.dll,VIT.COFIDE.GESTIONCUENTAS.Models.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
7c1985ab-b557-4561-9e3e-7697f25d303a,2020-11-28T01:50:47.6915182+00:00,VL.TrackObjects,0.0.2-alpha,2020-11-27T22:56:33.1900000+00:00,NoAssets,,,,,,,,,,,,,,,,,,,
3c219c16-8b8f-4e29-b16c-ed34c5442c73,2020-11-28T01:49:41.7240806+00:00,WOLF.Net,4.0.0-alpha1,2020-11-27T21:19:25.8870000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.dll,WOLF.Net.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
3c219c16-8b8f-4e29-b16c-ed34c5442c73,2020-11-28T01:49:41.7240806+00:00,WOLF.Net,4.0.0-alpha1,2020-11-27T21:19:25.8870000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.dll,WOLF.Net.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
498680e6-31a0-4c27-b79e-d607ef7c8393,2020-11-28T01:49:43.5970660+00:00,WOLF.Net.Redis,3.0.0,2020-11-27T21:20:56.7600000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.Redis.dll,WOLF.Net.Redis.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
498680e6-31a0-4c27-b79e-d607ef7c8393,2020-11-28T01:49:43.5970660+00:00,WOLF.Net.Redis,3.0.0,2020-11-27T21:20:56.7600000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp3.1,,,,,,lib/netcoreapp3.1/WOLF.Net.Redis.dll,WOLF.Net.Redis.dll,.dll,lib,netcoreapp3.1,.NETCoreApp,3.1.0.0,,,0.0.0.0
bd1af124-5907-4227-bf5a-4d3ca5e8ff2e,2020-11-28T01:50:18.4736659+00:00,YPF.MSPromotions.DTO,1.0.4.24,2020-11-27T21:50:20.3230000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
bd1af124-5907-4227-bf5a-4d3ca5e8ff2e,2020-11-28T01:50:18.4736659+00:00,YPF.MSPromotions.DTO,1.0.4.24,2020-11-27T21:50:20.3230000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
02870803-36bd-4ae5-acd5-3b89e6bbdc70,2020-11-28T01:45:51.1358087+00:00,YPF.MSPromotions.DTO,1.0.4.24-beta,2020-11-27T19:55:12.1870000+00:00,AvailableAssets,RuntimeAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
02870803-36bd-4ae5-acd5-3b89e6bbdc70,2020-11-28T01:45:51.1358087+00:00,YPF.MSPromotions.DTO,1.0.4.24-beta,2020-11-27T19:55:12.1870000+00:00,AvailableAssets,CompileLibAssemblies,,,netcoreapp2.2,,,,,,lib/netcoreapp2.2/YPF.MSPromotions.DTO.dll,YPF.MSPromotions.DTO.dll,.dll,lib,netcoreapp2.2,.NETCoreApp,2.2.0.0,,,0.0.0.0
Here is a repro: CsvRepro.zip
Hi,
Would you please share the code you used to benchmark all the CSV parsers?
I did some research on them a few months ago, and NReco was a lot faster than the numbers in the README.md...
At present, the index is used when reading a column, so it is impossible to determine which column is being read. Is there any way to specify the column name to read?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.