Giter VIP home page Giter VIP logo

linqtocsv's Introduction

LINQtoCSV

This library makes it easy to use CSV files with LINQ queries. Its features include:

  • Follows the most common rules for CSV files. Correctly handles data fields that contain commas and line breaks.
  • In addition to comma, most delimiting characters can be used, including tab for tab delimited fields.
  • Can be used with an IEnumarable of an anonymous class - which is often returned by a LINQ query.
  • Supports deferred reading.
  • Supports processing files with international date and number formats.
  • Supports different character encodings if you need them.
  • Recognizes a wide variety of date and number formats when reading files.
  • Provides fine control of date and number formats when writing files.
  • Robust error handling, allowing you to quickly find and fix problems in large input files.

Full documentation is at http://www.codeproject.com/Articles/25133/LINQ-to-CSV-library

License

Apache License, Version 2.0

Contributors welcome

All contributions are welcome, whether those are new features or bug fixes.

Before you invest time in your feature or bug fix, please first raise the issue in the issues list to get feedback about your idea: https://github.com/mperdeck/LINQtoCSV/issues

For bugs, show how the bug can be reproduced. For features, show why it would be useful to the wider community.

Introducing a new feature involves more than simply coding the new feature. For every new feature, the following needs to be done:

  • Code the feature (obviously);
  • Update the documentation in the article.htm file, including the history section at the end;
  • Add unit tests to the LINQtoCSV project, to ensure future code changes don't break your feature.

linqtocsv's People

Contributors

jkallay1 avatar lvaleriu avatar mperdeck avatar omederos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linqtocsv's Issues

Columns to not appears as they are declared

I am using LinqToCVS to produce a number of reports in .NET 4.0 and noticed the columns do not appear in the same order as they are declared in the data type. However, after changing the target framework to .NET 4.5 the column do appear in the same order. Since, it is not practical to change the target framework of my project to .NET 4.5, the only way to ensure the columns are ordered correctly is to apply the CsvColumnAttrubute to every class property which I hoped to avoid.

I can only assume the sorting algorithm you are using in .NET 4.0 is not a stable, that is, it will not preserve the property collection order when the CompareTo returns zero.

I would like to propose a change to the way the columns are sorted to ensure the behavior is consistent and the order is preserved. Instead of using
Array.Sort(m_IndexToInfo)
which seems unpredictable use the stable Linq search
m_IndexToInfo = m_IndexToInfo.OrderBy(x => x.index).ToArray();

Below is the unit test used to verify my change
image

Tracking the progress or read/write

If we somehow had a tracking mechanism which can tell us the row it has been reading or writing, it would be helpful, especially when the file size is too huge.

Comment line delimiter

Is it possible to include a "comment character" in the file description to skip over lines in the file (e.g. "#" is often used at the beginning of a line to indicate it is a comment).

Adding support for SeparatorChar inside quoted fields (Reading)

I came across an issue while parsing a record like this one.
"RS9906762","RS","13-APR-2009","31-DEC-2014","CLASS "A" HEATING AND AIR CONDITIONING, INC."

I have my separator set to ',' and it breaks the code when it reaches this item.
"CLASS "A" HEATING AND AIR CONDITIONING, INC."
Right after the "Conditioning" it thinks it's a new item. This leads these records to have more items than others.

New issues

1° Reading CSV files without separator char (Add CharLength property in field csv attribute)
2° Using FieldIndex for reading data at the right index in the data row
3° Using OutputFormat (when needed) for parsing values when reading csv fields.

These are features I've needed in several projects.

Duplicate csv column headers

I changed in FieldMapper.AnalyzeType the line from
m_NameToInfo[tfi.Name] = tfi;
to
m_NameToInfo[tfi.memberInfo.Name] = tfi;
and I was able to write csv with non unique column headers. (My customer wanted it, don't ask me why...)

.NET Core 2.0

Hi,

Are there any plans to update this library to Core 2.0?

Thanks

Parsing nullable decimal fails

Parsing a field declared as decimal? fails with a FormatException with the value "2,000,000".

This is caused because when the TypeFieldInfo class (inside FieldMapper class) sets up the converters it uses the fields PropertyType or FieldType to determine if that type has a Parse or ParseExact method (FieldMapper.cs lines 96-114). If it doesn't it falls back to a type converter. Unfortunately the type converter hard codes NumberStyles.Float (DecimalConverter.FromString(string, NumberFormatInfo) which won't parse decimal strings with a comma as a thousands seperator.

To fix the field type should be checked for nullable, and if it is to use the underlying type to test for a Parse or ParseExact method.

The only work around as it currently stands is to change the type of the field from decimal? to decimal and use the CanBeNull=true attribute.

Problem with .NET 4.6

I had to read data from a simple csv file, but the read method always returned a null reference exception. It was a .NET 4.6 project. After a couple of hours I changed the target framework to 3.5 and the same code worked flawlessly. Seems that the write operations work fine qith the 4.6, however.

Error loading unicode CSV

Hello,

I get the following error when loading a unicode CSV:

"Unhandled Exception: LINQtoCSV.NameNotInTypeException: The input file has column name ""rijksregisternummer"" in the first record, but there is no field or property with that name in type "XXXXXX.CrmCustomizations.Workflow.Models.Contact".

I don't get the error when loading the same file in ANSI encoding.

The column is in the csv data class.

public class Contact
{
[CsvColumn(Name = "rijksregisternummer", FieldIndex = 1)]
public string Rijksregisternummer { get; set; }

I already tried different CsvFileDescription properties (e.g. FirstLineHasColumnNames, EnforceCsvColumnAttribute, TextEncoding = Encoding.Unicode, ...) but nothing helps.

Is this a bug? is there a workaround?

Thanx,

Cypress

Error: the value cannot be null. Parameter name 'key'

Hi,

First thing, very great component!

When i set the property IgnoreUnknownColumns to True, the Read method raise the error in subject.

I created a model that doesn't map every column, but only needed.
I used CsvColumnAttribute specifing the NAME and the FIELD INDEX.

What could be a problem is the name of the column? Something with space like "MY IMAGE PATH"?

Add support for asynchronous writes

I want to suggest to add support for asynchronous programming.

Write should add an overload WriteAsync returning Task, so one can await for that and do other tasks.

It is my understanding that this project may not be well maintained at the moment :-(

I have bug fixes for two bugs - Are pull requests still being considered?

This code throws exceptions when it should not:

public void Test() {
    var csvContext = new LINQtoCSV.CsvContext();
    string testInput =
        "NotNullColumn1,NotNullColumn2" + Environment.NewLine +
        "Line1NotNull1,Line1NotNull2"; // Note: All not-null columns have values
    csvContext.Read<Data>(this.StreamReaderFromString(testInput));
    testInput =
        "NotNullColumn" + Environment.NewLine +
        "Line1NotNull"; // Note: All not-null columns have values
    csvContext.Read<Data2>(this.StreamReaderFromString(testInput));
}
class Data {
    [LINQtoCSV.CsvColumn(CanBeNull = false)]
    public string NotNullColumn1 { get; set; }
    public string ExtraColumn { get; set; }
    [LINQtoCSV.CsvColumn(CanBeNull = false)]
    public string NotNullColumn2 { get; set; }
}
class Data2 {
    [LINQtoCSV.CsvColumn(FieldIndex = 1)] // Force Array.Sort() ordering to expose bug
    public string ExtraColumn { get; set; }
    [LINQtoCSV.CsvColumn(CanBeNull = false)]
    public string NotNullColumn { get; set; }
}

The first read (of class Data) throws: 'In line 2, no value provided for required field or property "Column1" in type "UserQuery+Data"'
(The exact column name it complains about depends on exactly how Array.Sort() sorts in line 224 of file FieldMapper.cs, and for some sorting may not even give an error.)

The second read (of class Data2) throws: 'In line 2, no value provided for required field or property "NotNullColumn" in type "UserQuery+Data2".'

The problem is in FieldMapper.ReadNames() where it destructively copies TypeFieldInfos into m_IndexToInfo array, without taking care to preserve the tail of that array.

Before ReadNames, m_IndexToInfo may look like (in the Data2 example):

ExtraColumn,
NotNullColumn

After ReadNames() m_IndexToInfo is:

NotNullColumn,
NotNullColumn

NotNullColumn is duplicated and ExtraColumn has been deleted.
The duplicate NotNullColumn triggers MissingRequiredFieldException at the very very end of FieldMapper.Reading().

Bug 2:

public void Test() {
    var csvContext = new LINQtoCSV.CsvContext();
    testInput =
        "ExtraColumn,Column" + Environment.NewLine +
        "Extra,Value";
    var fileDescription = new LINQtoCSV.CsvFileDescription { EnforceCsvColumnAttribute = true, IgnoreUnknownColumns = true
    }; 
    csvContext.Read<Bug2>(this.StreamReaderFromString(testInput), fileDescription).Dump();
}
class Bug2 {
    [LINQtoCSV.CsvColumn()]
    public string Column { get; set; }
}

This throws: 'Index was outside the bounds of the array.' (and NOT wrapped in AggregatedException).

The problem is line 438 of FieldMapper.ReadNames() where the field being checked is using the wrong index (it should be m_IndexToInfo[_mappingIndexes[i]].hasColumnAttribute).

The fixed lines are mostly:

protected List<TypeFieldInfo> m_IndexToInfo = null;
...
m_IndexToInfo = new List<TypeFieldInfo>(nbrTypeFields);
...
m_IndexToInfo.Sort();
...
// Re-order m_IndexToInfo to match the field names
for (int i = 0; i < row.Count; i++) {
    if (!_mappingIndexes.ContainsKey(i)) {
        continue;
    }

    TypeFieldInfo tfi = m_NameToInfo[row[i].Value];
    m_IndexToInfo.Remove(tfi);
    m_IndexToInfo.Insert(_mappingIndexes[i], tfi);

    if (m_fileDescription.EnforceCsvColumnAttribute && !tfi.hasColumnAttribute) {
        // enforcing column attr, but this field/prop has no column attr.
        throw new MissingCsvColumnAttributeException(typeof (T).ToString(), row[i].Value, m_fileName);
    }
}

I can create unit tests for these and create a new pull request, are pull requests still being considered?

Please sign the assembly with strong name

Hi There,
We are not able to use the LinqToCSV dll in our project as there is build error due to referencing this assembly.

Since we have enabled code analysis with all "Micrsofoft All Rules", we are not able to use this great assembly. I would be really great if the Assembly can be signed with strong name/key and this error will disappear.

Regards,

Rahman

Dynamic Columns

I am wondering if there is a way to have dynamic columns included in the defined data class. I have an instance where I'd like to provide the user to upload a csv that can include some key-value pairs to store in a property bag. Essentially the column name would be the key and the value in each record would get assigned accordingly. In order to accommodate some flexibility in the property bag, it would be nice to allow dynamic column names.

Problem with the Reading Raw Data Rows Feature

There is a problem when you use try to read raw data fields using the form

public class MyDataRow : List<DataRowItem>, IDataRow
{
}

Then using the code:

IEnumerable<MyDataRow> products = cc.Read<MyDataRow>("products.csv", inputFileDescription);

The List gets populated with the corresponding rows from the txt file but the columns (DataRowItem object) are empty.

I used the debugger to follow the execution into the ReadData class. All the values are retrieved as expected and you can inspect them before they get yield returned. After this point they get lost

Quote some fields not all data fields

There is QuoteAllFields property, When true, write surrounds all data fields with quotes.
But I want to surrounds not all data fields but only string fields with quotes.
eg :
Id,Value,Description
0, 102, "Description1"
1, 5, "Description2"

Is it possible?

Release notes/changelog

Where can I find release notes or changelog? I'm currently at 1.0 and would like to know the changes from that version to 1.5.

Unexpected column quote surrounding

I have problem with column containg value like 12", when i getting data from csv everything is ok, but when i try to write back objects to csv this value is saved as "12""".

[CSVColumn(Name="")] not being respected

The column name is not being respected in the output file. What am I doing wrong?

Model:

public class ItemExportCSVModel
{
    [CsvColumn(Name="ID", FieldIndex=1)]
    public int ID { get; set; }
    [CsvColumn(Name = "GUID", FieldIndex = 2)]
    public string Guid { get; set; }
    [CsvColumn(Name = "FirstName", FieldIndex = 3)]
    public string FirstName { get; set; }
    [CsvColumn(Name = "LastName", FieldIndex = 4)]
    public string LastName { get; set; }
    [CsvColumn(Name = "employee_number", FieldIndex = 5)]
    public string EmployeeNumber { get; set; }
    [CsvColumn(Name = "email", FieldIndex = 6)]
    public string Email { get; set; }
    [CsvColumn(Name = "date_of_purchase", FieldIndex = 7)]
    public DateTime DateOfPurchase { get; set; }
    [CsvColumn(Name = "IMEI", FieldIndex = 8)]
    public string IMEI { get; set; }
    [CsvColumn(Name = "Product", FieldIndex = 9)]
    public string Product { get; set; }
    [CsvColumn(Name = "order_number", FieldIndex = 10)]
    public string OrderNumber { get; set; }
    [CsvColumn(Name = "Status", FieldIndex = 11)]
    public string Status { get; set; }
    [CsvColumn(Name = "Reason", FieldIndex = 12)]
    public string Reason { get; set; }
}

Controller:

private CsvActionResult<ItemExportCSVModel> CSVExporter_SaveCSV(CSVExporter Model)
    {
        List<Dictionary<string, object>> list = JsonConvert.DeserializeObject<List<Dictionary<string, object>>>(Model.Data);
        List<ItemExportCSVModel> toExport = new List<ItemExportCSVModel>();

        foreach (var dict in list)
        {
            toExport.Add(new ItemExportCSVModel() { 
                ID = dict.ContainsKey("StatusValue") ? int.Parse(dict["ItemId"] != null ? dict["ItemId"].ToString() : "-1") : -1,
                Guid = dict.ContainsKey("ItemGuid") ? dict["ItemGuid"].ToString() : null,
                FirstName = dict.ContainsKey("FirstName") ? dict["FirstName"].ToString() : null,
                LastName = dict.ContainsKey("LastName") ? dict["LastName"].ToString() : null,
                EmployeeNumber = dict.ContainsKey("employee_number") ? dict["employee_number"].ToString() : null,
                Email = dict.ContainsKey("UserEmail") ? dict["UserEmail"].ToString() : null,
                DateOfPurchase = dict.ContainsKey("date_of_purchase") ? DateTime.Parse(dict["date_of_purchase"] as string ?? DateTime.MaxValue.ToString()) : DateTime.MaxValue,
                IMEI = dict.ContainsKey("IMEI") ? dict["IMEI"].ToString() : null,
                Product = dict.ContainsKey("ProductName") ? dict["ProductName"].ToString() : null,
                OrderNumber = string.Empty,
                Status = dict.ContainsKey("StatusValue") ? dict["StatusValue"].ToString() : null,
                Reason = dict.ContainsKey("Reason") ? dict["Reason"].ToString() : null
            });
        }

        CsvFileDescription outputFileDescription = new CsvFileDescription
        {
            SeparatorChar = ',',
            FirstLineHasColumnNames = true,
            FileCultureName = "en-US"
        };

        CsvContext cc = new CsvContext();
        string fileName = string.Format("item-details-export_{0}{1}_{2}.csv", 
            toExport.Count(), 
            (Model.UpdateStatusTo > 0 ?  ("_" + Model.UpdateStatusTo.ToString()) : string.Empty), 
            DateTime.Now.ToString("yyyy.MM.dd.hh.mmssfff"));
        string finalPath = Server.MapPath(_CSVExportPath) + fileName;


        //Save the CSV for our records.
        cc.Write(toExport, finalPath, outputFileDescription);

        //Return the result of the CSV
        return new CsvActionResult<ItemExportCSVModel>(toExport, fileName, ',');
    }

Support for streams that cannot Seek (like GZipStream)

The use of Seek makes it difficult to use this library with some streams because they do not support seeking (like GZipStream).

Also it seems that if you try to enumerate the result a second time before finishing with the first then both will wind up using the same stream object. I can't see how that could work.

It would be nice if the functions took a sort of StreamReader factory instead of a StreamReader.

    public IEnumerable<T> Read<T>(Func<StreamReader> streamFactory)...
    public IEnumerable<T> Read<T>(Func<StreamReader> streamFactory, CsvFileDescription fileDescription)...

Then instead of calling Seek the library would just call the factory to get a new StreamReader. This would be much more in keeping with the way the library works when reading files.

For anyone who needs a work around, here is the one I came up with for now. It is kind of horrible and you have to be careful to only use the IEnumerable once.

    /// <summary>
    /// Workaround for the fact that LINQToCSV tries to rewind the zip stream.
    /// This stream has CanSeek == true, but in fact you can only call Seek
    /// if to go to the beginning of the file and only when you are already there
    /// (have not read any data)
    /// </summary>
    class LazyRewindGZipStream : GZipStream {
        // Keep track of weather we are at the start of the stream
        private bool _atStart = true;

        public LazyRewindGZipStream(Stream stream, CompressionMode mode)
            : base(stream, mode) {}

        public override int Read(byte[] array, int offset, int count) {
            _atStart = false; // We are not at the start of the stream any more
            return base.Read(array, offset, count);
        }
        public override IAsyncResult BeginRead(byte[] array, int offset, int count, AsyncCallback asyncCallback, object asyncState) {
            _atStart = false; // We are not at the start of the stream any more
            return base.BeginRead(array, offset, count, asyncCallback, asyncState);
        }
        public override bool CanSeek {
            get {
                return true; // Permit seek (even though most seeks will be unsupported)
            }
        }
        public override long Seek(long offset, SeekOrigin origin) {
            // Only case where we want to allow this
            if (offset == 0 && origin == SeekOrigin.Begin && _atStart)
                return 0;

            // Other wise this is still not supported
            return base.Seek(offset, origin);
        }
    }

Exporting CSV Pin

Hi,

The below code works fine and is exporting my CSV results as expected. However when I go to re-save the file after opening it with Excel, it auto saves it as "Unicode Text" without a filename and no CSV (Comma delimited) by default. Any ideas?

var outputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true,
FileCultureName = "en-GB",
QuoteAllFields = true
};

        var cc = new CsvContext();

        cc.Write(o, "File.csv", outputFileDescription);

CsvColumn FieldIndex does not respect position when parsing a file

I am attempting to process a file that contains five columns:

Column1,Column2,Column3,Column4,Column5

I only want to grab certain columns but LinqToCSV errors out when it attempts to parse the file because Column1 is a string and Column2 is an int. The file does not contain headers so I would expect I could just specify the FieldIndex and it would grab from the correct position. My model is like such:

public class Foo {
[CsvColumn(FieldIndex = 2)]
public int Id;
}

So I would expect Column2 data to appear in the Id property but LinqToCSV tries to place Column1 data in Id. It looks like it is purely used for ordering the columns but they still all need to be present. It would be nice if the FieldIndex actually correlated to the position of the data we wish to store in our model.

cc.write<> should ignore extraneous data in class

Though it seems this project is under-maintained lately, I want to suggest an addition:
Either via a decorator (ie: [CsvColumn(Name = "extra_data", FieldIndex = null)] or, probably better, via the fileDescription (ie: IgnoreExtraClassData = true), extra data in the class should not be written to the file upon cc.write<>

I have a helper class that interprets a string from particular columns to implement a custom sort algorithm (for well position addresses in 96-well microtiter plates, FYI), so in addition to the column data (column header = DestWellId), I had an object called DestWell which properly interpreted the meaning of the string and implemented a comparer method for sorting. However, when I called cc.write<>, it always wrote these DestWell objects as new columns (at the right-side of the file), and it seems there was nothing I could do to avoid it.

Rewinding stream in CsvContext.ReadData has to take into account preamble (BOM)

Hi Matt, I think there might be a bug in how the stream is rewinded in the ReadData method of CsvContext. The actual code does not take into account the length of a possible BOM in a UTF-8 or Unicode encoded stream, thus those 2, 3 or 4 bytes of the BOM are getting read as part of the first element.

If the stream has column names in the first line it would raise an exception because that first column name (with the BOM prepended) will not be found in T.

I think you could fix it by changing line 126 in CsvContext.cs:

stream.BaseStream.Seek(0, SeekOrigin.Begin);

With this code:

// Skip Unicode preamble/BOM (Byte Order Mark) if present.
stream.Peek();
var bytes = new byte[stream.CurrentEncoding.GetPreamble().Length];
stream.BaseStream.Seek(0, SeekOrigin.Begin);
stream.BaseStream.Read(bytes, 0, stream.CurrentEncoding.GetPreamble().Length);
if (!bytes.SequenceEqual(stream.CurrentEncoding.GetPreamble()))
{
    stream.BaseStream.Seek(0, SeekOrigin.Begin);
}
stream.DiscardBufferedData();

NOTES:

  1. stream.Peek() forces the inspection of the stream thus setting the correct stream.CurrentEncoding value if detectEncodingFromByteOrderMarks is set to true in the constructor of the stream.
  2. stream.DiscardBufferedData() resets the internal buffer of the stream so that it can be safely read again.

Best regards.

Read unknown column

I'd like to have the option to have the option IgnoreUnknownColumns set, but still load the values into a a list of "unknown values", each unknown value should have name and value (string).

Thanks for a good component.

Feature Request: Use 'OutputFormat' Column attribute for reading as well

When Importing a csv where dates were written as "d/M/yyyy" I am getting the exception:
LINQtoCSV.WrongDataFormatException: Value "31/12/2019" in line 2 has the wrong format...

It would be nice if the 'OutputFormat' data could be used when importing date fields, not just writing. Example:

   [CsvColumn(Name = "Date", FieldIndex = 13, OutputFormat = "d/M/yyyy")]
   public DateTime Date { get; set; }

StreamReader CsvColumn Bug

when use csvContent.Read(stream,csvDescription) method
if the header is "index_id", we will get " index_id"(is wrong)
FieldMapper.cs file need use row[i].Value.Trim() replace use row[i].Value

Extra column expecte by the context on read

I am trying to read a file with a model

public class ImportCandidateVM
{
[CsvColumn(Name = "CV No.", FieldIndex = 1)]
public string CVId { get; set; }

    [CsvColumn(Name = "Received on date", FieldIndex = 2, OutputFormat = "dd MMM HH:mm:ss")]
    public DateTime ReceivedOnDate { get; set; }

    [CsvColumn(Name = "Source received from", FieldIndex = 3)]
    public string Source { get; set; }

    [CsvColumn(Name = "Position applied for", FieldIndex = 4)]
    public string Position { get; set; }

    [CsvColumn(Name = "Department", FieldIndex = 5)]
    public string Department { get; set; }

    [CsvColumn(Name = "Name of Candidate", FieldIndex = 6)]
    public String CandidateName { get; set; }

    [CsvColumn(Name = "E-mail ID", FieldIndex = 7)]
    public String Email { get; set; }

}

the csv has some extra fields which i dont want like mobile etc so i set theEnforceCsvColumnAttribute = true but i got error
The input file has column name "Mobile No." in the first record, but there is no field or property with that name in type "Jobsoid.Web.ViewModels.ImportCandidateVM".

EnforceCsvColumnAttribute = true doenst work i tried setting FirstLineHasColumnNames= true and false .

Any one facing this issue????

Add configuration for line ending character(s)

I am in need of using UNIX style line endings in my files. Expecially as .net is opening and running on MAC and Linux systems now it seems good to be able to configure this in general. I would like to add this to the CsvFileDescription and then in CsvStream change line 71 from

        m_outStream.WriteLine("");

to

        m_outStream.WriteLine(m_endOfLineChar);

and pass this end of line char in from the file descriptor

Add InputFormat and InputCulture attributes to have more fine grained control of Reading

I am receiving some CSV files where I receive dates in the format yyyyMMddHHmmss and sometimes, yyyyMMdd. I would like to be able to desrialise these into proper DateTime objects.

06|T|1|NHRPL|57999|Minor Theatre|M|12000|T|20080414105034|20080414125034||AC||0|1|0|0|0|0|P||

The .NET framework allows parsing these values as:

string value = "20080414105034";
DateTime.Parse(value); 'FAILURE
DateTime.ParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture); 'SUCCESS

If we could optionally pass in those through settings, then it would really help to deal with these non-standard formats. Sometimes, the data does not follow standard formats for the FileCultureName property. This would then allow this sort of configuration:

[CsvColumn(FieldIndex = 9, InputFormat="yyyyMMddHHmmss", InputCulture="", OutputFormat="yyyyMMddHHmmss")]
public DateTime TransactionDate { get; set; }

Locale and environment-related unit test failures, with suggested fixes

  1. GoodFileTabDelimitedNoNamesInFirstLineNLnl was failing for me because of a difference in newline characters in the expected and actual values, stemming from the fact that the newline characters are added via verbatim string literals in the source file itself. (line 83 and following). The source file I had was encoding the newlines within the verbatim string literal as a \n while the test string had \r\n. The point is that it appears that the encoding of newlines in the source file can vary, e.g. according to the user's formatting settings--so tests shouldn't depend on that. Include the newlines as \r\n explicitly in the expected value strings.
  2. GoodFileCommaDelimitedNamesInFirstLineNLnl was failing for me because of the way the dates "1/2/2008" and "5/11/2009" are parsed on my system. You can either explicitly specify the culture in the Parse call or construct the DateTime directly rather than parsing it.

Trailing comma in header can cause unknown ArgumentNullException

When creating test files I inadvertently left data on one of the data lines - which inserted an additional comma on the header line. When testing I received the unhelpful message "Value cannot be null.\r\nParameter name: key".

image

After digging around I found the culprit in FieldMapper.cs on line 413,

if (!m_NameToInfo.ContainsKey(row[i].Value))

Specifically, the extra comma in the header meant that row[i]Value evaluated as null.

While my case was certainly a fringe case the issue is easily solved be adding an additional exception type and checking for null prior to evaluating the column header.

Would be happy to submit a pull request if you'd like.

Why "ref" keyword

There are a few methods where parameters have the "ref" keyword. My expectation is that each of these methods would have at least one codepath where the variable is being re-assigned, but this is not always the case. The places are:
CsvStream.ReadRow(ref IDataRow row)
FieldMapper.WriteNames(ref List row)
FieldMapper.WriteObject(T obj, ref List row)

Mixing FieldIndex and Name to read a dynamic column name

I'm trying to read in a class such as the following where one/or more of the property names are available at runtime.

public class BreakdownCSV
{
      [CsvColumn(FieldIndex = 1)]
      public string BreakdownProperty { get; set; }
      [CsvColumn(Name = "Day of Week Name")]
      public DayOfWeek DayOfWeekName { get; set; }
      [CsvColumn(Name = "SessionName")]
      public string SessionName { get; set; }
}

but the BreakdownProperty value is coming up as null. As far as I understand, this is because the reader isn't able to read properties without the matching the property's name or it being specified in the attribute.

How would I go about doing this?

Ability to skip firstline or first xlines on read

I have a use case where I am getting report data from different customer systems. The data in the columns is identical but customers can change the "display" label for the value, thus affecting the column header name.

I would like ability to add a bool property in the CSVFileDescription SkipFirstLine, or maybe a SkipXLines that takes an int

Then in the readdata in CSVContext we would honour this setting.

We would need error trap that you cant set SkipFirstLine if you also have FirstLineHasColumnNames

Feature Request: Add a "DefaultValue" decorator.

For example, when an expected integer value contains no value ("") a WrongDataFormatException is thrown.

Can a "DefaultValue" decorator be added, where, when a NoValue is encountered, instead of throwing the exception, a default value could be specified?

[CsvColumn(DefaultValue = -1)]
public int SomeProperty { get; set; }

do not seek beginning of stream

when a stream is passed, you must not seek it's beginning or change the position prior to working with it! I want to skip bogus data first so you must read from the position the stream was placed to.

Missing columns issue

First of all I would like to thank you for this library, I have used it in multiple projects.

I don't know if there is a workaround for this issue, but I haven't figured anything out so far. Let's assume we have a csv file with 2 columns, ID and Name. The model will be something like this:-

    [CsvColumn(Name = "ID", FieldIndex = 1, CanBeNull = false)]
    public int UserID { get; set; }

    [CsvColumn(Name = "Name", FieldIndex = 2, CanBeNull = false)]
    public string UserName { get; set; }

If one of those values is left empty, the MissingRequiredFieldException is raised, and the fieldName property is being correct. However if i completely remove a column the fieldName is not being reported correctly. So if i remove the column ID, it will raise the MissingRequiredFieldException but the fieldName missing will be reported as 'Name'. Any idea please? I'm currently checking out the code to see if i can figure this out myself, but in case someone has already found a workaround I would appreciate if he/she can share it :)

Csv files where multiple columns share the same name

I have csv files where multiple columns share the same column names (different measurements on the same sample). I'd like to specify a field as a list, and have it populated using the values from the common columns.

Add a license

Hey there,

thanks for this project.

Would it be possible to add a license, such as the MIT license, for making this project usable in company solutions?

Kind Regards,
Florian

Reading raw data does not work for me.

... but with this patch to CsvContext.cs, it does:

Index: CsvContext.cs

--- CsvContext.cs (revision 2611)
+++ CsvContext.cs (revision 2612)
@@ -179,6 +179,7 @@
if (readingRawDataRows)
{
obj = row as T;

  •                            row = new T() as IDataRow;
                         }
                         else
                         {
    

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.