Could we get a command-line flag to spit out tab- or comma-separated values instead of

tsv and/or csv output support,about jqlang/jq

Comments (46)

jdanbrown commented on July 4, 2024 6

In the meantime, here's a simple tsv hack that takes advantage of string substitution (and will break if you have any tabs embedded in strings):

$ echo '{"one":1,"two":"x"}' | jq --raw-output '"\(.one)\t\(.two)"'
1       x

from jq.

stedolan commented on July 4, 2024 5

This is working but undocumented in master, with a slightly strange syntax that may change. @csv is a builtin function that reformats its argument (which must be an array) as a CSV string. So, something like:

./jq -r '@csv'

will output 1,2,"hello, world" when given [1,2,"hello, world"].

from jq.

commented on July 4, 2024 3

One issue is that order is significant for csv, but not for JSON fields: we can't rely on the order of fields in JSON. This can be solved by specifying the mapping from JSON named fields to csv positional fields, by constructing an array of those fields, using [.date,.count,.title]:

input: { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
jq -r '[.date,.count,.title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"

The csv output has slightly overzealous quoting (more than your target), but does no harm.

A second issue is you want to apply this to an array of objects, not just one. The .[] operator streams each item of an array in turn:

jq  -r '.[] | [.date, .count, .title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

Finally, you also want a header, stating the csv field names at the top. The easiest way to do this is literally (we need parentheses because , binds tighter than |):

jq  -r '["date", "count", "title"], (.[] | [.date, .count, .title]) | @csv'
"date","count","title"
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

But duplication is bad. We can avoid repeating the same list of field names, by reusing the header array to lookup the fields in each object - but this looks confusing to me, I wonder if there's a clearer way?:

jq  -r '["date", "count", "title"] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv'

EDIT here it is as a function , with a slightly nicer field syntax, using path() (but you still need to remember the -r switch - maybe a --csw does make sense):

def csv(fs): [path(null|fs)[]] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv;
USAGE: csv(.date, .count, .title)

PS: if the input is not an array of objects, but just a sequence of objects (your second, JSONish, input), then we could just omit the .[] - but then we can't get the header at the top. It's probably best to convert it to an array, using the --slurp/-s option (or put [] around it, if it's generated within jq).

PPS: You can also get an array of the fields of an object using keys, but it returns them in alphabetical ("unicode codepoint") order. Of course, one could create a tool that does retain the ordering of JSON fields - but it's not part of the JSON spec, and JSON tools and web APIs don't seem to do it. (Though I think it would be convenient in some ways e.g. ordering the string/number fields first make JSON much more human-readable when deeply nested).

from jq.

wtlangford commented on July 4, 2024 2

Looks like .rates | to_entries | .[] | [.key,.value] | @csv is what you want, then.

from jq.

stedolan commented on July 4, 2024

Hrm. This would definitely be useful, but there are a lot of annoying special cases here. I'm not sure what should happen if the output's not a set of json objects, or if they have different fields. Also, CSV quoting rules are a nightmare, but I suppose I wouldn't have to parse the CSV afterwards...

from jq.

alexchamberlain commented on July 4, 2024

Just make it Excel compliant...

from jq.

jdanbrown commented on July 4, 2024

Just require the output to be arrays, and map those arrays directly to tsv/csv. Quoting shouldn't be too hard since you're generating it and not parsing it (as you point out).

from jq.

ajmath commented on July 4, 2024

from jq.

johan commented on July 4, 2024

That sort of sounds like something else. If unclear, my wish is to get the csv output:

date,count,title
2011-01-12 13:14,17,"He's dead, Jim!"
2011-01-13 21:30,4711,"What do you mean, \"dead\"?"
2011-01-14 00:07,,Dead!

…apart from either of these two JSON/JSONish ones currently supported of the same data:

[ { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
, { "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
, { "date": "2011-01-14 00:07", "title":"Dead!" }
]

…or:

{ "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
{ "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
{ "date": "2011-01-14 00:07", "title":"Dead!" }

Is that what you are shooting for, or something else that's useful for some other purpose?

from jq.

johan commented on July 4, 2024

While json attribute order can be arbitrary, I would be happy if (in situations where the jq command line does not stipulate an ordering) the csv column order is given by the order of attribute discovery from the input file, i e date, count and title in my example, as given by the input order of the first object (while ignoring the order of all subsequent input lines, since none of them introduce any additional attributes / columns).

For (typical?) cases where the jq command line itself names what fields the output should have, it would indeed be nice to figure out a DRY syntax that infers order by your already stated order on that command line, and either always print out the header names on the first line, or have a special csv-mode flag to specifically disable that output (make the rare case a flag).

My gut feel is that asking for csv output is best done as a command-line mode flag rather than as a step in the pipeline, so that kind of housekeeping logic becomes the responsibility of jq rather than yourself, while also making it easier to switch output formats back to json again, should you want to, but maybe that is substantially harder to implement?

from jq.

neq1337 commented on July 4, 2024

Sorry for digging out this topic, but does @csv work in the currrent version? I got "compile error" when I try to use @csv, @html or others. I am using the windows jq version.

from jq.

lluchs commented on July 4, 2024

No, it's not included in the current version.

from jq.

neq1337 commented on July 4, 2024

Is it possible to download somewhere the previous version that had this implemented? I found that it is the only way to parse json...

from jq.

nicowilliams commented on July 4, 2024

IMO this could be left to a separate utility.

from jq.

neq1337 commented on July 4, 2024

@svpenn Thank you! CSV works, however this build adds some numbers to the parsed data. For example:

←[0m←[34;1m"id"←[0m←[37m: ←[0m←[0m518377←[0m←[37m,

The original data is id:518377.

Does anybody know how to resolve this issue?

from jq.

neq1337 commented on July 4, 2024

@svnpenn Thanks a lot! :)

from jq.

factom commented on July 4, 2024

👍 working well for me in master. love this feature.

from jq.

jdanbrown commented on July 4, 2024

@csv is great! Can we get a @tsv as well?

from jq.

pablomendes commented on July 4, 2024

+1 for @TSV 👍

Update: if you arrived here trying to find a way to do tsv formatting, the command below should also work:
jq -r '"\(.date)\t\(.count)\t\(.title)"'

from jq.

stevezieglerva commented on July 4, 2024

Here's a the syntax for tab delimited when using DOS:
jq -r ".aggregations[] | .buckets[] | "(.key)\t(.doc_count)""

from jq.

drorata commented on July 4, 2024

I fail to use the @csv and the suggestions of @stevezieglerva . Assume that I have:

[
  {
    "a": 1,
    "b": 2
  },
  {
    "a": 3,
    "b": 3
  },
  {
    "a": 2,
    "b": 1
  }
]

stored in foo.bar. How can I export it to CSV? Tried some combination, but I didn't manage to make it work...

from jq.

drorata commented on July 4, 2024

OK. Got it...

cat foo.bar | jq '.[] | [.a, .b] | @csv'

Follow up: how can I get rid of the "'s and their escapes (in the case where the value is a string for instance)?

from jq.

pkoppstein commented on July 4, 2024

@drorata - You might find the -r command-line option useful:

$ $ jq -n '["a","b","a\"b"]|@csv'
"\"a\",\"b\",\"a\"\"b\""

$ jq -r -n '["a","b","a\"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending on which variant of CSV you expect. For example, the "-r" option will cause the newline in "a\nb" to become an actual newline.

@csv is useful but it was not designed to produce output that conforms strictly with RFC 4180 (which, for example, requires CR/LF). It looks as though the intent was to provide a simple "UTF-8-style CSV", so that if some specific style of CSV is required, an external tool (or tool chain) can be used.

from jq.

commented on July 4, 2024

Perhaps @csv should output actual CSVs?

2014-09-11 15:20 GMT+02:00 pkoppstein [email protected]:

@drorata https://github.com/drorata - You might find the -r
command-line option useful:

$ $ jq -n '["a","b","a"b"]|@csv'
""a","b","a""b""

$ jq -r -n '["a","b","a"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending
on which variant of CSV you expect. For example, the "-r" option will cause
the newline in "a\nb" to become an actual newline.

@csv https://github.com/csv is useful but it was not designed to
produce output that conforms strictly with RFC 4180 (which, for example,
requires CR/LF). It looks as though the intent was to provide a simple
"UTF-8-style CSV", so that if some specific style of CSV is required, an
external tool (or tool chain) can be used.

"a
","b"
peter:~/jq$

—
Reply to this email directly or view it on GitHub
#48 (comment).

from jq.

pkoppstein commented on July 4, 2024

@slapresta asked:

Perhaps @csv should output actual CSVs?

@csv must produce valid JSON, since it's a jq filter, not a "post-filter". I believe that jq in combination with -r does in fact produce "valid CSV" (see below), it being understood that there are many variations in common use.

Regarding the particular point about embedded newlines, jq is in conformance with the "800 pound gorilla CSV" standard:

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs

In fact, this document also sanctions the use of EITHER CR or CRLF as a record separator:

Each record is one line   ...but
A record separator may consist of a line feed (ASCII/LF=0x0A), or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A).

Perhaps what's needed is a reference to some "CSV standard" to which @csv conforms? Or @csv4180? In addition to @TSV, of course :-)

from jq.

JohnParkerXNU commented on July 4, 2024

Instead of polluting jq, why not use an external tool to convert JSON to CSV (e.g. https://github.com/jehiah/json2csv) ?

from jq.

commented on July 4, 2024

Agree with JohnParkerXNU

from jq.

joelpurra commented on July 4, 2024

Update: see jq-hopkok's tabular folder.

For reference: RFC4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files
http://www.ietf.org/rfc/rfc4180.txt

I use two scripts I pipe JSON into, and get CSV/TSV output. Object keys are used to generate a header.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-tsv.sh

Because of implementation details in to_entries (see #561 to_entries always outputs keys sorted - can it be avoided?), I have to name keys so they're sorted in the output. I follow the format 01--My first column, 02--My second column, 03--A third column, and clean them with another script.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-csv-sorted-header.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-tsv-sorted-header.sh

~~I'll throw the scripts into the (future) pile of utilities that is jq-hopkok when I have time to clean up code.~~ Update: see jq-hopkok's tabular folder.

[
    {
        "01--Employee name": "Homer Simpson",
        "02--Logins": 578
    },
    {
        "01--Employee name": "Carl Carlsson",
        "02--Logins": 75926
    }
]

<"input.json" array-of-objects-to-tsv.sh | clean-tsv-sorted-header.sh >"output.json"

Employee name   Logins
Homer Simpson   578
Carl Carlsson   75926

The same input piped through the CSV scripts produces this.

"Employee name","Logins"
"Homer Simpson",578
"Carl Carlsson",75926

I prefer TSV as it's much easier to split the raw data into columns (some tex/latex packages don't play nice with RFC4180 input). I've taken a shortcut in replacing \t in strings with \\t - in my own data I "assume" there are no tabs in the output.

Hope these scripts can help someone.

Update: see jq-hopkok's tabular folder.

from jq.

kmatt commented on July 4, 2024

If I have a structure like this:

{
    "timestamp": 1319730758,
    "base": "USD",
    "rates": {
        "AED": 3.672626,
        "AFN": 48.3775,
        "ALL": 110.223333,
        "AMD": 409.604993
     } 
}

jq '.rates | @csv' responds with "jq: error: object cannot be csv-formatted, only array"

Is it necessary to "cast" the object to an array? If so, how?

from jq.

wtlangford commented on July 4, 2024

There's no real casting, here. All the data types are too discrete, overall, for casting (except via string interpolation, but then all you have is a string.). The issue is that all @csv does is print an array out in csv form. How exactly did you want the output printed?

from jq.

kmatt commented on July 4, 2024

@wtlangford In this case, I am converting a json source to flat delimited for bulk loading into a relational database table (PostgreSQL). I can pipe the output of " jq '.rates' " into awk, filter for the pairs and replace delimiter characters, but keeping everything in jq would be cleaner.

Example:

jq '.rates' rates.json | awk -v OFS=',' '/:/ { gsub(/[\":,]/,"",$0); print "'\''" $1 "'\''" , $2 }'

from jq.

kmatt commented on July 4, 2024

@wtlangford Very close! Is there a way to omit the brackets, and print key, value on same line?

"AZN", 0.786006

instead of

[
  "AZN",
  0.786006
]

from jq.

wtlangford commented on July 4, 2024

Don't forget the @csv on the end. You might also want to pass -r to jq when you invoke it to strip out the quoting escapes. Keep in mind that this may not quite be what you want, however. If any of your strings has a newline in it, that'll get printed out unescaped.

from jq.

wtlangford commented on July 4, 2024

Just out of curiosity, why was @csv not sufficient at the end?

On Mon, Sep 22, 2014 at 6:09 PM, Matt Keranen [email protected]
wrote:

Thanks - this is close enough:

jq -c '.rates | to_entries | .[] | [.key,.value]' rates.json | tr -d "[]" | sed s/"/'/g

—
Reply to this email directly or view it on GitHub
#48 (comment).

from jq.

kmatt commented on July 4, 2024

It was, I neglected it in a cut-and-paste error:

jq -c -r '.rates | to_entries | .[] | [.key,.value] | @csv'

from jq.

wtlangford commented on July 4, 2024

Excellent! Glad to have helped.

from jq.

Fjelley85 commented on July 4, 2024

Hi, I also have a problem with "Cannot index array with string"

My code: curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | ./jq -r '.[] | .committer | {login: .login,id: .id} | @csv'

Help would be appreciated

from jq.

pkoppstein commented on July 4, 2024

The login and id values are associated with the "author" field:

jq -c -r '.[] | .author | {login,id} '

To output them using @csv:

jq -c -r '.[] | .author | [.login, .id] | @csv '

from jq.

Fjelley85 commented on July 4, 2024

Thank you!

from jq.

sankalp-khare commented on July 4, 2024

Thanks! the @csv option is very helpful 👍

from jq.

joelpurra commented on July 4, 2024

I've finally moved my CSV/TSV utility scripts to jq-hopkok's tabular folder -- see also other helpful shell scripts for parallel processing etcetera.

from jq.

AlJohri commented on July 4, 2024

It would be really great to have a way to turn lists of objects into csvs (with header) automatically.

Right now I'm resorting to json2csv:

curl 'http://.../nfl/participants/' | jq -c '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName}' | json2csv > players.csv

I would love to instead do | @csv directly to a list of objects.

curl 'http://.../nfl/participants/' | jq -r '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName} | @csv' > players.csv

Is there any chance something like this can be implemented?

EDIT: while https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh exists, I really need something that's either built into jq or at least easily installable like npm install -g json2csv. Built into JQ would be the ideal here.

EDIT2: woah, did not see this before commenting: https://github.com/joelpurra/jqnpm

from jq.

pkoppstein commented on July 4, 2024

Here's a def that takes a stream of JSON objects and produces output suitable for feeding directly to @csv or @TSV. It does NOT require the keys to be consistently ordered, and only pays attention to the keys in the first object. An illustrative example follows.

def object2array(stream):
  foreach stream as $x (null;
    if . == null then $x | [true, keys_unsorted] else .[0]=false end;
    (if .[0] then .[1] else empty end),
    .[1] as $keys | $x | [getpath( $keys[] | [.]) ] );

Example

def data: [{a:1,b:2}, {b:22,a:11,c:0}];

object2array(data[])

Output

["a","b"]
[1,2]
[11,22]

to_table

to_table (available here) takes a very different approach. It can handle JSON irregardless of the depth of nesting, but is only useful if there is sufficient regularity, notably of the ordering of keys within corresponding objects.

from jq.

AlJohri commented on July 4, 2024

How can I easily use “object2array” myself and on my team? Is there some way to definite global functions and share them? I would like to just run “| object2array” and have it work on my laptop and my colleagues laptops. What’s the best distribution strategy

from jq.

pkoppstein commented on July 4, 2024

Unfortunately, all the simple options that I know of have one drawback or another.

Probably the simplest approach would be for everyone to add the desired utility functions to a file called ~/.jq since, if it exists, it will automatically be read in on startup. The main drawback of this approach is that it prevents ~/.jq from being used as a directory, which jq's module system by default recognizes as the location of jq modules.

Another option would be to take advantage of jq's module system. This is a tiny bit complicated, but is certainly worth considering. Here is one approach:

Copy the desired def into ~/.jq/jq.jq
Invoke jq as usual but preface the jq program with include "jq";
e.g. jq 'include "jq"; ...' input.json

A third option which I've had some success with is to piggy-back off npm, but as best I can tell, it's not completely trivial.

from jq.

richardjharris commented on July 4, 2024

@tsv seems to work for me on jq-1.6 (Manjaro Linux)

from jq.

tsv and/or csv output support about jq HOT 46 CLOSED

Comments (46)

Example

Output

to_table

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent