Giter VIP home page Giter VIP logo

Comments (6)

dbro avatar dbro commented on August 16, 2024

Hi Les-
Thanks for your report. If I understand correctly, this is exactly the use case for csvquote, to provide clean data for applications that are not aware of quoted fields. I don't think you need to use more than a single byte character to create a functioning pipeline.

It looks like the input data is a regular csv, with fields that contain special characters properly quoted. The output appears to be the issue, if I understand correctly. As you say, the application you want to use assumes that field separators are always field separators, and never data. This is true of many programs (awk, cut, sort, etc). When using csvquote, this kind of application takes its place in the pipeline before the final "csvquote -u" command, just like the cut command you have in your example. So instead of something like this:

$ echo 'a,b,c,"d,e,f",g,h,"i,j,k",l' | csvquote | cut -d, -f2,4,5 | csvquote -u -d# | my_application

do this:

$ echo 'a,b,c,"d,e,f",g,h,"i,j,k",l' | csvquote | cut -d, -f2,4,5 | my_application | csvquote -u -d,

What my_application gets as data input is a data set where commas are always field separators. In situations where the original data file had commas inside the quoted fields (as in fields number 2 and 5 above), each comma is replaced with a different character, 0x1E, which should allow my_application to treat it as data.

If this is not correct, please provide more information about the application in use.

Dan

from csvquote.

lesderid avatar lesderid commented on August 16, 2024

@dbro The application that I'm feeding the final CSV into uses and displays the fields directly to the user as part of a graphical interface.

(The reason they 'got away' with not allowing the field separator as data is that the application is Japanese, which has its own comma different from the normal one. My input data, however, uses normal commas in data.)

from csvquote.

dbro avatar dbro commented on August 16, 2024

Thanks Les. So are you wanting to translate regular commas (0x2C) that are inside quoted fields to be japanese commas (0xE3 0x80 0x81) ? Maybe sharing the full command pipeline would be helpful.

from csvquote.

lesderid avatar lesderid commented on August 16, 2024

That's right. This is what I'm using right now to extract two columns for use in the (closed source) GUI application:

$ csvquote input.csv | cut -d, -f $firstColumn,$secondColumn | sed "s/\x1F/、/g" > output.csv
$ ./myGUIApplication output.csv

It would obviously be better if I could just use csvquote -u -d、 (that's a Japanese comma) instead.

from csvquote.

dbro avatar dbro commented on August 16, 2024

Thanks, I think I understand better the desired input and output data. Just to make sure:

The input.csv file uses commas as delimiters, and has commas inside quoted fields.

The input for myGUIApplication needs to have Japanese commas instead of regular commas inside the quoted fields. And this input file should have commas as separators between fields.

One option would be to get the Japanese commas created by whatever application created input.csv. That would be the cleanest approach.

Another option would be to use tab-separated-value as the format instead of comma-separated-values. Is this possible in whatever application created input.csv? And is myGUIApplication able to accept TSV as input?

Another option is to continue with what you have above. That pipeline is simple and does exactly what you need.

Changing csvquote to add mutibyte characters as delimiters would introduce a lot of complexity to the code. Unless it is a common need without an adequate workaround, I don't think it should be done. This is the first case I've heard of, and because a workaround exists I am not inclined to make the change.
Dan

from csvquote.

lesderid avatar lesderid commented on August 16, 2024

The input.csv file uses commas as delimiters, and has commas inside quoted fields.

The input for myGUIApplication needs to have Japanese commas instead of regular commas inside the quoted fields. And this input file should have commas as separators between fields.

That's correct.

One option would be to get the Japanese commas created by whatever application created input.csv. That would be the cleanest approach.

Another option would be to use tab-separated-value as the format instead of comma-separated-values. Is this possible in whatever application created input.csv? And is myGUIApplication able to accept TSV as input?

Both the input and the output format can't be changed, so neither would really work I'm afraid.

Another option is to continue with what you have above. That pipeline is simple and does exactly what you need.

That's what I'm probably going to do, because the current solution is working fine.

Changing csvquote to add mutibyte characters as delimiters would introduce a lot of complexity to the code. Unless it is a common need without an adequate workaround, I don't think it should be done. This is the first case I've heard of, and because a workaround exists I am not inclined to make the change.

Fair enough. Thanks for taking a look at it anyway!

from csvquote.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.