Giter VIP home page Giter VIP logo

Comments (6)

jondegenhardt avatar jondegenhardt commented on May 18, 2024

Thanks for the report and for trying the tools.

The only tool explicitly supporting Windows line-endings on Linux is csv2tsv. You are correct, tsv-append works fine, and a couple other tools as well, but it's more accidental than by design.

What's going on is that the tools are using D standard library functions for reading lines, these functions assume unix line endings on unix platforms. If the file has Windows line ending (a \r\n pair), the line is left with an extraneous \r character at the end of the last field. If this extraneous \r interfers with processing, the tool doesn't work.

In the case of tsv-summarize and tsv-filter, if they try to interpret that the last field as numeric value, the conversion will fail. However, even if they don't perform a conversion, the tools are not necessarily working correctly. tsv-select for example, isn't really preserving Windows line endings.

As an example:

$ # This outputs the Windows line endings
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | grep $'YY\r'
AA	XX	YY
BB	XX	YY
$ # Select the last field.
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 3
YY
YY
$ # Grep shows the Windows line ending
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 3 | grep $'YY\r'
YY
YY
$ # Select the second field
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 2
XX
XX
$ # Grep shows its not a Windows line ending.
$ echo $'AA\tXX\tYY\r\nBB\tXX\tYY\r' | tsv-select -f 2 | grep $'XX\r'
$

I'm not inclined to add support for Windows line endings on Unix platforms. The dos2unix tool is a good tool for this and fits the pipeline approach being used by the tools.

However, something is going wrong with the error message formatting, and the error message should identify a Windows line ending as a likely problem. The documentation for the tools should also discuss line endings. I'll have to look into both of these.

Regarding csv2tsv - This tool explicitly supports Windows line-endings because they are commonly used in many programs that generate CSV files.

from tsv-utils.

jondegenhardt avatar jondegenhardt commented on May 18, 2024

The badly formatted error message is due to the \r character being included in the error message. I'll have to fix it.

from tsv-utils.

Halmaethor avatar Halmaethor commented on May 18, 2024

Thanks for your explanation and the great tools.

As you said, I used dos2unix to convert the line endings, so this issue wasn't deterrent to my work. Updating the documentation and especially the error message is more than enough to solve this issue.

from tsv-utils.

jondegenhardt avatar jondegenhardt commented on May 18, 2024

Current plan: On Unix builds, check for Windows/DOS line endings when processing the first line of a file. That should handle most cases prior to ever hitting the error message. Regarding the poor error message format: There's an open D bug for it: https://issues.dlang.org/show_bug.cgi?id=17708

from tsv-utils.

jondegenhardt avatar jondegenhardt commented on May 18, 2024

Addressed by PR #103, merged to master. Will be included in the next release.

from tsv-utils.

jondegenhardt avatar jondegenhardt commented on May 18, 2024

Included in release v.1.1.16.

from tsv-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.