bxparks / bigquery-schema-generator Goto Github PK
View Code? Open in Web Editor NEWGenerates the BigQuery schema from newline-delimited JSON or CSV data records.
License: Apache License 2.0
Generates the BigQuery schema from newline-delimited JSON or CSV data records.
License: Apache License 2.0
To install from PyPI, we use the following pip3 command:
$ sudo -H pip3 install bigquery-schema-generator
On Ubuntu (verified on 17.10, 16.04), the 'generate-schema' shell script is installed at: /usr/local/bin/generate-schema
On MacOS (verified on 10.13.2, using Python 3.6.4), the 'generate-schema' script is installed at:
/Library/Frameworks/Python.framework/Versions/3.6/bin/generate-schema
This is not an obvious location for the user.
We need to create a symlink from /usr/local/bin/generate-schema -> (the above location) on MacOS.
Schema nested too deeply for field
protoPayload.request.spec.validation.openAPIV3Schema.properties.spec.properties.match.properties.kinds.items.properties.apiGroups.items, maximum allowed depth is 15.
"logName": "projects/xxxxxxxxxxxxx/logs/cloudaudit.googleapis.com%2Fdata_access",
"type": "k8s_cluster"
Need a way to possibly handle this
Currently, the optional timezone indicator on a TIMESTAMP field is expected to contain a colon (:) character. For example:
2017-05-22 12:33:01-07:30
However, ISO8601 allows a timezone format without the colon character, like this:
2017-05-22 12:33:01-0730
I have not needed this feature yet, but this should be easy to add if someone needs it.
this example json is valid but cannot be parsed to a schema:
from bigquery_schema_generator.generate_schema import SchemaGenerator
import json
test = {
"a": "a",
"b": "20220101",
"c": "c",
"values": [{
"percentage": 3,
"values": [{
"a": "20220101",
"b": 100,
}]
}]
}
generator = SchemaGenerator(input_format='json', infer_mode='NULLABLE')
schema_map, error_logs = generator.deduce_schema(input_data=json.dumps(test))
Having issue with that my csv column names are not valid bigquery names. Renaming and handling this is outside my control and schema is updated frequently. Ran into this library which makes it much easier but noticed I had to process everything twice to clean up the invalid names.
Bigquery does an automatic substitution in accordance with what is described in this issue #14
So added a pull request that allows one to run it with a sanitize names mode if wanted.
This library saved me a bunch of time. Despite using BQ for a long time I hadn't heard of it until someone referred me on SO. Thanks a lot!
File "/lib/python3.11/site-packages/bigquery_schema_generator/generate_schema.py", line 190, in deduce_schema
for json_object in reader:
File "/lib/python3.11/csv.py", line 111, in next
row = next(self.reader)
^^^^^^^^^^^^^^^^^
_csv.Error: field larger than field limit (131072)
version = '1.5.1'
Hey -- I love this tool. However, the documentation recommends using sudo to install pip packages, which is discouraged. See this stackoverflow thread: https://stackoverflow.com/questions/21055859/what-are-the-risks-of-running-sudo-pip
I would recommend altering the instructions to suggest pip3 install --user
, which allows you to get the command line script outside a virtual environment without requiring you to install as root.
Thanks!
Hello, I have a string column in BQ where I store timestamps.
Is there a way to prevent deduce_schema converting my field that contains string-timestamps to timestamps if I have quoted_values_are_strings= True?
Or maybe another solution would be if I'm passing the original schema to have the option to prevent changing the types of the existing colums? e.g. add a flag dont_modify_original_colums and whenever it's true, don't modify the colums of the existing schema (only add new ones)
The following call:
generator.deduce_schema([ {'1':None}, {'1':['a','b']}, {'1':None}, {'1':['c','d','e']} ])
Produces OrderedDict([('1', None)])
Other calls of a similar nature produce inconsistent results:
generator.deduce_schema([ {'1':None}, {'1':['a','b']}, {'1':['c','d','e']} ])
Produces OrderedDict([('1', OrderedDict([('status', 'hard'), ('filled', True), ('info', OrderedDict([('mode', 'REPEATED'), ('name', '1'), ('type', 'STRING')]))]))])
And
generator.deduce_schema([ {'1':None}, {'1':['a','b']}, {'1':None} ])
Produces OrderedDict([('1', OrderedDict([('status', 'soft'), ('filled', False), ('info', OrderedDict([('mode', 'NULLABLE'), ('name', '1'), ('type', 'STRING')]))]))])
The specific issue I have involves a column 90% composed of nulls and 10% string arrays. It results in version 3 of the above instances, when I'd have hoped that it'd result in something with a mode of 'REPEATED'
I set infer_mode=False, and after scanning the json file, the resulting schema still had 'REQUIRED' fields instead of the default 'NULLABLE'.
Is that the expected behavior? Or am I misunderstanding what the argument does?
Hi,
I think it would be useful (at least for me :)) to add a function to the library that generates a schema from some data, something like
from bigquery_schema_generator import generate_schema
data = [{"first_column": 1, "second_column": "value"}, {"first_column": 2, "second_column": "another value"}]
schema = generate_schema(data) # returns a list
I came up with this function:
from subprocess import check_output
def generate_schema(data):
data_string = ""
for d in data:
if d:
data_string = data_string + json.dumps(d) + '\n'
data_bytes = data_string.encode('utf-8')
s = check_output(['generate-schema'], input=data_bytes)
schema = json.loads(s)
return schema
but I'm sure there's a more efficient way.
In the readme I get a 404 for the python code examples link: https://github.com/bxparks/bigquery-schema-generator/blob/develop/examples/test_generate_schema.py
Thanks for creating, it looks great and likely to save many hours :)
Hi
I have been trying to export asset metadata to GCS. The idea is to export the asset metadata generated into bigquery and then visualize in Data Studio.
However whenever I use the cloud asset API (either using curl or 'gcloud asset export' command), the generated raw json data file contains two duplicate fields, 'IPProtocol' and 'ipProtocol'.
Due to this when I try to export this data into bigquery (either by bq mk or bq load command) it gives me follwing error.
$ bq mk inventory_dataset.2019_09_20_11_00_00 schema.json
BigQuery error in mk operation: Field resource.data.allowed.ipProtocol already exists in schema
Is this a bug or I am doing anything wrong?
I am using a bigquery-schema-generator tool for generating schema.(https://pypi.org/project/bigquery-schema-generator/)
Please help.
I have a newline delimited JSON file with a few bad (i.e. undecodable) lines. Currently this results in a JSONDecodeError
halting execution.
Given that BigQuery can cope with bad records (--max_bad_records
parameter) by skipping them, would it be useful to have a similar option in the schema generator? (This could be useful for e.g. CSV files with missing trailing columns as well.)
Concretely, the issue with my JSON file could be resolved by adding an (optional) try/except to
bigquery-schema-generator/bigquery_schema_generator/generate_schema.py
Lines 546 to 552 in a60c38a
Using bq extract to export table data from BigQuery exports default UTC timestamps in the format "YY-MM-DD HH:MI:SS UTC". This is the same format as displayed in BigQuery Web UI when previewing data.
When this data is passed through the schema generator, the regex on the TIMESTAMP_MATCHER fails and the data is interpreted as a STRING in the JSON schema.
Attempting to use bq update using the JSON schema on the same table the data was exported from then fails due to the change in data type from TIMESTAMP to STRING.
Should be quite simple to fix - need to add optional " UTC" check in regex as an alternative to "Z".
Hello,
First of all, thank you very much for creating this. Looks like it's saved me heaps of time already.
I have, however, received an error, which seems to be related to the format of the json file. Specifically relating to the file having an array as one of the nested elements.
Example error message:
INFO:root:Problem on line 4: Unsupported array element type: __array__
This repeats for almost all rows of the file.
Row 4 of the file looks like this:
{"op":"mcm","clk":"1304450546","pt":1585613976590,"mc":[{"id":"1.170258437","rc":[{"batl":[[0,2.66,2.53],[1,1000,2.2]],"ltp":0.0,"tv":0.0,"id":110503}]}]}
Questions:
In a complex structure like the following:
{
"source_machine": {
"port": 80
},
"dest_machine": {
"port": "http-port"
}
}
If there was an error with another log where dest_machine.port
was an integer this would error and simply state something like:
Ignoring field with mismatched type: old=(hard,port,NULLABLE,STRING); new=(hard,port,NULLABLE,INTEGER)
At this point you are left to figure out which structure this port
column actually exists in. This is a more simple example but as the schema grows and is more complex, this problem is harder to manually resolve.
Ideally, we can track the path to this using a JSON path or dpath expression. Something like dest_machine.port
. This will likely take adding an additional argument to the recursive function merge_schema_entry
. Something like a base_path=None
and continually build up that base_path string in each recursive iteration so that it can be used in the errors like "{}.{}".format(base_path, new_name)
and "{}.{}".format(base_path, old_name)
Attempting to install from PyPI produces the following errors:
$ pip3 install bigquery-schema-generator
Collecting bigquery-schema-generator
Downloading bigquery-schema-generator-0.1.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/_q/d57c1qhn5fb9ng6ycg2_3sxc0000gp/T/pip-build-i66hkje9/bigquery-schema-generator/setup.py", line 5, in <module>
import pypandoc
ModuleNotFoundError: No module named 'pypandoc'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/_q/d57c1qhn5fb9ng6ycg2_3sxc0000gp/T/pip-build-i66hkje9/bigquery-schema-generator/
After installing pypandoc and trying again, I encountered this error:
pip3 install bigquery-schema-generator
The directory '/Users/call/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/call/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting bigquery-schema-generator
Downloading bigquery-schema-generator-0.1.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/tmp/pip-build-mr9vuupc/bigquery-schema-generator/setup.py", line 6, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File "/usr/local/lib/python3.6/site-packages/pypandoc/__init__.py", line 66, in convert
raise RuntimeError("Format missing, but need one (identified source as text as no "
RuntimeError: Format missing, but need one (identified source as text as no file with that name was found).
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-build-mr9vuupc/bigquery-schema-generator/
I was able to work around this by cloning this repo, cd'ing into the local repo and installing via pip3 install .
, but it'd be great if this installed properly from PyPI.
Also, this is a much-needed utility. I'd previously been semi-solving this problem using wolverdude/genSON to infer a JSON schema, then converting that to BigQuery schema with some custom code, but this looks much more idiomatic. Looking forward to taking it for a spin. Thanks, and keep up the good work.
Hi folks.
First - your tool works great thanks for it.
Unfortunately, the data I work with them is a mess. What I am fighting with now is this JSON
{"objects":{"0":{"mime_type":"application/octet-stream","type":"artifact","hashes":{"MD5":"6..1","SHA-1":"4..0","SHA-256":"4..f"},"url":"https://URL/artifacts/4..f","x_cta_hash_identity":"6..a","x_cta_hash_context":"2..2","spec_version":"2.0"}}}
as you see, there is the map, which key is named as number 0
... but BigQuery doesn't support numbers as the first letter
BigQuery error in load operation: Invalid field name "0". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long.
So I propose to change "0"
to "_0"
in this case when --santizie_name
is applied.
Thanks
#15 introduced automatic conversion of quoted numeric version to INTEGER type.
For my use-case I would not want that behaviour. Some identifiers using numbers but should be treated as strings. e.g. they may not just consist of digits in other files. I am generating the JSON and would generate the JSON with the corresponding type (i.e. if I wanted something to be represented as an INTEGER then I wouldn't quote it).
To replicate:
test.json
:
{"name": "111222333444555666777"}
{"name": "111222333444555666777"}
Expected:
% python3 -m bigquery_schema_generator.generate_schema --keep_nulls < ../data/test.json
INFO:root:Processed 2 lines
[
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
}
]
Actual:
% python3 -m bigquery_schema_generator.generate_schema --keep_nulls < ../data/test.json
INFO:root:Processed 2 lines
[
{
"mode": "NULLABLE",
"name": "name",
"type": "INTEGER"
}
]
From a comment by @bxparks in #57 regarding the sections within the DataReader class.
Hmm, it's getting harder to keep track of which tags are allowed in which sections. Originally, the order of the tags were just: DATA, [ERRORS], SCHEMA, END. But now it's DATA, [EXISTING_SCHEMA], [ERRORS], SCHEMA, END. A better way would be to allow these sections to appears in any order. But that's a bit out of scope for this PR. If I get motivated, maybe I'll take a crack at it after merging in this PR... but realistically, it will probably not rise high enough on my priority list with so many other things going on. Too bad. At least this tidbit is recorded here.
support sending an existing schema to deduce schema so we can merge an existing BigQuery schema with new rows in file.
Something like:
def deduce_schema(self, file, schema_map =None):
if schema_map is None:
schema_map = OrderedDict()
with data in a file:
{ "model": {"data": {"Inventory": {"Observations": [] }}}}
{ "model": {"data": {"Inventory": {"Observations": ["foo"] }}}}
If I manually upload the sample file to BIgquery I get schema:
{"name":"model","type":"RECORD","mode":"REPEATED","fields":[
{"name":"data","type":"RECORD","mode":"REPEATED","fields":[
{"name":"Inventory","type":"RECORD","mode":"REPEATED","fields":[
{"name":"Observations","type":"STRING","mode":"NULLABLE"}
]}
}
}
from bigquery_schema_generator.generate_schema import SchemaGenerator
generator = SchemaGenerator(
infer_mode=True,
input_format="json",
quoted_values_are_strings=True,
preserve_input_sort_order=True,
keep_nulls=True,
debugging_map=True,
sanitize_names=True,
)
with open(file) as f:
schema_map, errors = generator.deduce_schema(f)
if errors:
for error in errors:
print("Problem on line %s: %s", error['line_number'], error['msg'])
specs = generator.flatten_schema(schema_map)
return [
bigquery.SchemaField(
name=spec["name"], field_type=spec["type"], mode=spec["mode"]
)
for spec in specs
]
But I get this error from the library:
Ignoring non-RECORD field with mismatched mode:
old=(hard,model.data.Inventory.Observations,REPEATED,STRING);
new=(soft,model.data.Inventory.Observations,NULLABLE,STRING)
My questions:
Docs suggest using generator.flatten_schema(schema_map)
but is there an alternative method to get a list of SchemaField in the original nested structure?
meaning: schemaFields without the flatten.
Are my batch sizes too big? What's the guidance?
I'm scanning 1000 records with 42 tags in the generated schema but that's after it eliminates all the nesting.
Even on 2MB files with ~280 records I get weird errors.
I get intermittent errors: Problem on line 278: Unsupported array element type: __null__
There are 14 nulls in line 278 and 13 in line 279, but none of them are in an array.
Hi, when trying to recreate the example (using Ubuntu and venv)
I have the following problems
user@DESKTOP:/mnt/c/X/venv_dir$ cat > file.data.json
{ "a": [1, 2] }
{ "i": 3 }
Ctrl-D
user@DESKTOP:/mnt/c/X/venv_dir$ generate-schema < file.data.json > file.schema.json
Traceback (most recent call last):
File "/home/user/.local/bin/generate-schema", line 7, in
from bigquery_schema_generator.generate_schema import main
File "/home/user/.local/lib/python3.5/site-packages/bigquery_schema_generator/generate_schema.py", line 303
f'Ignoring non-RECORD field with mismatched mode: '
^
SyntaxError: invalid syntax
What might be wrong with this?
Hi, I would expect that module would pass the following test:
DATA
{ "r" : [{ "i": 4 },{ "i": "4px" }] }
SCHEMA
[
{
"fields": [
{
"mode": "NULLABLE",
"name": "i",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "r",
"type": "RECORD"
}
]
END
Unfortunately, type of the "i" field returned is INTEGER. I have a problem with understanding if it is this a bug - it's seems to be technically doable and useful, but it also seems to be a case mentioned somewhere in README - "but bq load does not support it, so we follow its behavior".
Is is a bug to be fixed or not?
When we remove nulls we are only removing inner nulls and not the case where a record has all fields removed during this process. This produces the following error when attempting to load into BigQuery with the generated schema:
Field outer_nested_record is type RECORD but has no schema.
test_data.json
{"test": "thing", "empty_record": {}, "outer_nested_record": {"inner_empty_record": {}}}
generate-schema --input_format json --quoted_values_are_strings < test_file.json
[
{
"fields": [],
"mode": "NULLABLE",
"name": "outer_nested_record",
"type": "RECORD"
},
{
"mode": "NULLABLE",
"name": "test",
"type": "STRING"
}
]
[
{
"mode": "NULLABLE",
"name": "test",
"type": "STRING"
}
]
Names in RECORD
fields are not sanitized. I reproduced the issue consistently by introducing the following test data:
# Sanitize the names to comply with BigQuery, recursively.
DATA sanitize_names
{ "r" : { "a-name": [1, 2] } }
SCHEMA
[
{
"fields": [
{
"mode": "REPEATED",
"name": "a_name",
"type": "INTEGER"
}
],
"mode": "NULLABLE",
"name": "r",
"type": "RECORD"
}
]
END
Which results in the following failure:
======================================================================
FAIL: test (__main__.TestFromDataFile)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests/test_generate_schema.py", line 423, in test
self.verify_data_chunk(chunk_count, chunk)
File "./tests/test_generate_schema.py", line 450, in verify_data_chunk
self.assertEqual(expected, schema)
AssertionError: Lists differ: [Orde[62 chars]', 'a_name'), ('type', 'INTEGER')])]), ('mode'[46 chars]')])] != [Orde[62 chars]', 'a-name'), ('type', 'INTEGER')])]), ('mode'[46 chars]')])]
First differing element 0:
Order[61 chars]', 'a_name'), ('type', 'INTEGER')])]), ('mode'[45 chars]D')])
Order[61 chars]', 'a-name'), ('type', 'INTEGER')])]), ('mode'[45 chars]D')])
[OrderedDict([('fields',
[OrderedDict([('mode', 'REPEATED'),
- ('name', 'a_name'),
? ^
+ ('name', 'a-name'),
? ^
('type', 'INTEGER')])]),
('mode', 'NULLABLE'),
('name', 'r'),
('type', 'RECORD')])]
----------------------------------------------------------------------
Ran 14 tests in 0.006s
Whether starting with the existing schema or not, if the script encounters a change, it logs the changed line. And giving errors like:
INFO:root:Problem on line 47730: Ignoring field with mismatched type: old=(hard,dimensionValue,REPEATED,RECORD); new=(hard,dimensionValue,REPEATED,STRING) INFO:root:Problem on line 47732: Ignoring field with mismatched type: old=(hard,dimensionValue,REPEATED,STRING); new=(hard,dimensionValue,REPEATED,RECORD)
For example, our file includes like 100000 rows, but there are only 100 rows that do not match the existing schema. But if those nonmatching lines come consecutively, the script detects the first one as problematic, and one matching line that comes after consecutive nonmatching lines is marked as problematic, although it actually matches the existing schema.
Add a new feature that checks existing files regarding a schema file and excludes rows that do not match the schema and writes them to another JSON/CSV file.
Following the readme tutorial, I reproduced the steps:
schema_map = generator.deduce_schema(
input_data=table_data
)
schema = generator.flatten_schema(schema_map)
the input_data is a dictionary, which I specified in the generator configurations.
The exception raised is: Exception: Unexpected type '<class 'tuple'>' for schema_map
I haven't seen an option for it, and the schema is not generating for a pipe-delimited csv. I was wondering if this is something that could be added.
I might be able to do the code edit myself and push it, but this would be the first project I would contributing to, so would want to take the time to look at all the code first.
i wrote a recursive function to walk a flattened schema_map and convert everything to bigquery.SchemaField:
but this could probably be done somewhere higher up instead of post generation, and would be more performant. This works well for me, but could be helpful for others.
def walk_schema(s):
result = []
for field in s:
if field.get('fields', None):
field['fields'] = walk_schema(field['fields'])
if field.get('type', None):
field['field_type'] = field.pop('type')
field = bigquery.SchemaField(**field)
result.append(field)
return result
When starting with an existing bigquery schema that has a TIMESTAMP field in it we get an error when trying to load logs which contain an epoch time as this is detected as an INTEGER and we get the following error:
Error: [{'line_number': 1, 'msg': 'Ignoring field with mismatched type: old=(hard,event_time,NULLABLE,TIMESTAMP); new=(hard,event_time,NULLABLE,INTEGER)'}]
If this timestamp matched the correct number of digits for an epoch timestamp which is supported by bigquery we should be able to assume that this INTEGER is in fact a TIMESTAMP and allow it to be maintained as such.
Add a new if block to the convert_type
function which will allow btype = TIMESTAMP and atype = INTEGER to return TIMESTAMP if the integer matches the correct number of digits for an epoch time.
There is added complexity because we do not pass the actual data for the record into this function. We may need to start doing this.
Right now if I wanted to generate the schema of a list of dictionaries I would need to first convert each dictionary into a JSON string just so that it could be loaded back into a dictionary and yielded in the json_reader. When using this as a library this would be a useful feature.
I am happy to create a PR for this if you would like but wanted to propose and make sure you are onboard with it before sending the PR @bxparks
Let me know if I should continue with a PR that includes some added tests for it.
When the schema is created, column names with spaces are writen as they are.
Therefore, when uploading to bq generates the following error
<BigQuery error in load operation: Invalid field name "utm_medium-partners". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.>
Would be posible to substitute blank spaces and other wrong characters with '_' as the '--autodetect' option does?
For example:
'Column.example 1' is written as 'Column_example_1'
I used bigquery-schema-generator for relatively small CSV file (320 mb). After reading 30,000 lines Attribute error was thrown stating:
AttributeError: 'NoneType' object has no attribute 'lower'
INFO:root:Processing line 1000
INFO:root:Processing line 2000
INFO:root:Processing line 3000
INFO:root:Processing line 4000
INFO:root:Processing line 5000
INFO:root:Processing line 6000
INFO:root:Processing line 7000
INFO:root:Processing line 8000
INFO:root:Processing line 9000
INFO:root:Processing line 10000
INFO:root:Processing line 11000
INFO:root:Processing line 12000
INFO:root:Processing line 13000
INFO:root:Processing line 14000
INFO:root:Processing line 15000
INFO:root:Processing line 16000
INFO:root:Processing line 17000
INFO:root:Processing line 18000
INFO:root:Processing line 19000
INFO:root:Processing line 20000
INFO:root:Processing line 21000
INFO:root:Processing line 22000
INFO:root:Processing line 23000
INFO:root:Processing line 24000
INFO:root:Processing line 25000
INFO:root:Processing line 26000
INFO:root:Processing line 27000
INFO:root:Processing line 28000
INFO:root:Processing line 29000
INFO:root:Processing line 30000
INFO:root:Processed 30334 lines
Traceback (most recent call last):
File "/usr/local/bin/generate-schema", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/bigquery_schema_generator/generate_schema.py", line 1074, in main
generator.run(schema_map=existing_schema_map)
File "/usr/local/lib/python3.7/dist-packages/bigquery_schema_generator/generate_schema.py", line 707, in run
input_file, schema_map=schema_map
File "/usr/local/lib/python3.7/dist-packages/bigquery_schema_generator/generate_schema.py", line 201, in deduce_schema
schema_map=schema_map,
File "/usr/local/lib/python3.7/dist-packages/bigquery_schema_generator/generate_schema.py", line 237, in deduce_schema_for_record
canonical_key = self.sanitize_name(key).lower()
AttributeError: 'NoneType' object has no attribute 'lower'
sanitize logic allows numbers to start name of column.
generate-schema file2.data.json file.schema.json
usage: generate-schema [-h] [--input_format INPUT_FORMAT] [--keep_nulls]
[--quoted_values_are_strings] [--infer_mode]
[--debugging_interval DEBUGGING_INTERVAL]
[--debugging_map] [--sanitize_names]
generate-schema: error: unrecognized arguments: file2.data.json file.schema.json
I have tried linux and mac both and get the same error. the file2.data.json has only one json object. but basically it gives error on the args.
Hi
Currently the code identifies a date/timestamp field only if it is in ISO format. Is it possible to add a feature to identify dates/timestamp in any format. I can work on this if required.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.