Comments (6)
Hi Les-
Thanks for your report. If I understand correctly, this is exactly the use case for csvquote, to provide clean data for applications that are not aware of quoted fields. I don't think you need to use more than a single byte character to create a functioning pipeline.
It looks like the input data is a regular csv, with fields that contain special characters properly quoted. The output appears to be the issue, if I understand correctly. As you say, the application you want to use assumes that field separators are always field separators, and never data. This is true of many programs (awk, cut, sort, etc). When using csvquote, this kind of application takes its place in the pipeline before the final "csvquote -u" command, just like the cut command you have in your example. So instead of something like this:
$ echo 'a,b,c,"d,e,f",g,h,"i,j,k",l' | csvquote | cut -d, -f2,4,5 | csvquote -u -d# | my_application
do this:
$ echo 'a,b,c,"d,e,f",g,h,"i,j,k",l' | csvquote | cut -d, -f2,4,5 | my_application | csvquote -u -d,
What my_application gets as data input is a data set where commas are always field separators. In situations where the original data file had commas inside the quoted fields (as in fields number 2 and 5 above), each comma is replaced with a different character, 0x1E, which should allow my_application to treat it as data.
If this is not correct, please provide more information about the application in use.
Dan
from csvquote.
@dbro The application that I'm feeding the final CSV into uses and displays the fields directly to the user as part of a graphical interface.
(The reason they 'got away' with not allowing the field separator as data is that the application is Japanese, which has its own comma different from the normal one. My input data, however, uses normal commas in data.)
from csvquote.
Thanks Les. So are you wanting to translate regular commas (0x2C) that are inside quoted fields to be japanese commas (0xE3 0x80 0x81) ? Maybe sharing the full command pipeline would be helpful.
from csvquote.
That's right. This is what I'm using right now to extract two columns for use in the (closed source) GUI application:
$ csvquote input.csv | cut -d, -f $firstColumn,$secondColumn | sed "s/\x1F/、/g" > output.csv
$ ./myGUIApplication output.csv
It would obviously be better if I could just use csvquote -u -d、
(that's a Japanese comma) instead.
from csvquote.
Thanks, I think I understand better the desired input and output data. Just to make sure:
The input.csv file uses commas as delimiters, and has commas inside quoted fields.
The input for myGUIApplication needs to have Japanese commas instead of regular commas inside the quoted fields. And this input file should have commas as separators between fields.
One option would be to get the Japanese commas created by whatever application created input.csv. That would be the cleanest approach.
Another option would be to use tab-separated-value as the format instead of comma-separated-values. Is this possible in whatever application created input.csv? And is myGUIApplication able to accept TSV as input?
Another option is to continue with what you have above. That pipeline is simple and does exactly what you need.
Changing csvquote to add mutibyte characters as delimiters would introduce a lot of complexity to the code. Unless it is a common need without an adequate workaround, I don't think it should be done. This is the first case I've heard of, and because a workaround exists I am not inclined to make the change.
Dan
from csvquote.
The input.csv file uses commas as delimiters, and has commas inside quoted fields.
The input for myGUIApplication needs to have Japanese commas instead of regular commas inside the quoted fields. And this input file should have commas as separators between fields.
That's correct.
One option would be to get the Japanese commas created by whatever application created input.csv. That would be the cleanest approach.
Another option would be to use tab-separated-value as the format instead of comma-separated-values. Is this possible in whatever application created input.csv? And is myGUIApplication able to accept TSV as input?
Both the input and the output format can't be changed, so neither would really work I'm afraid.
Another option is to continue with what you have above. That pipeline is simple and does exactly what you need.
That's what I'm probably going to do, because the current solution is working fine.
Changing csvquote to add mutibyte characters as delimiters would introduce a lot of complexity to the code. Unless it is a common need without an adequate workaround, I don't think it should be done. This is the first case I've heard of, and because a workaround exists I am not inclined to make the change.
Fair enough. Thanks for taking a look at it anyway!
from csvquote.
Related Issues (13)
- Please choose a license for cvsquote HOT 1
- working with body HOT 1
- Make a homebrew install mechanism for this excellent thing :) HOT 2
- Eating up comma (,) in output HOT 3
- Changing the number of lines HOT 1
- Little bug with Progress Export Files HOT 2
- Escaped quotes quoted HOT 5
- Add option to flush buffer HOT 1
- Document what the quoting mechanism is HOT 1
- Which awk did you test with for your other languages benchmark? HOT 1
- Add error checks for both input and output HOT 2
- Behavior is changed? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csvquote.