with open('yelp_dataset_challenge_academic_dataset',encoding='utf-8') as f: jsonda

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can not import the dataset into python about dataset-examples HOT 8 OPEN

yelp commented on June 9, 2024

Can not import the dataset into python

from dataset-examples.

Comments (8)

HongxuChenUQ commented on June 9, 2024 1

YES! SOLVED! Once you have decomposed it from *.tar, do it again on the generated file, then you will see different josn files.

from dataset-examples.

bngksgl commented on June 9, 2024

@tiechengsu i am having the same problem, were you able to solve the issue?

from dataset-examples.

tiechengsu commented on June 9, 2024

@bngksgl No, I used the previous dataset instead, which you can find here
https://app.dominodatalab.com/mtldata/yackathon/browse/yelp_dataset_challenge_academic_dataset
It's easier to import. The latest data combine several categories together, no idea have to import it.

from dataset-examples.

Hank-JSJ commented on June 9, 2024

It's a .tar file, just decompress it again

from dataset-examples.

HongxuChenUQ commented on June 9, 2024

The latest data combine several categories together, no idea have to import it.

Does that mean reviews.josn and business.json,etc. are mixed stored int he file?

from dataset-examples.

CAVIND46016 commented on June 9, 2024

Not really sure where you ppl r facing errors. I have edited the code to accept .json files explicitly and convert them to .csv. I have mentioned the filepath in main method explicitly instead of using arg.parse as in original code. Let me know if this helps.

Reference:

https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py

"""Convert the Yelp Dataset Challenge dataset from json format to csv.
import argparse
import collections
import csv
import json
def read_and_write_file(json_file_path, csv_file_path, column_names):
"""Read in the json dataset file and write it out to a csv file, given the column names."""
with open(csv_file_path, 'w') as fout:
csv_file = csv.writer(fout)
csv_file.writerow(list(column_names))
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
csv_file.writerow(get_row(line_contents, column_names))
def get_superset_of_column_names_from_file(json_file_path):
"""Read in the json dataset file and return the superset of column names."""
column_names = set()
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
column_names.update(
set(get_column_names(line_contents).keys())
)
return column_names
def get_column_names(line_contents, parent_key=''):
"""Return a list of flattened key names given a dict.
Example:
line_contents = {
'a': {
'b': 2,
'c': 3,
},
}
will return: ['a.b', 'a.c']
These will be the column names for the eventual csv file.
"""
column_names = []
for k, v in line_contents.items():
column_name = "{0}.{1}".format(parent_key, k) if parent_key else k
if isinstance(v, collections.MutableMapping):
column_names.extend(
get_column_names(v, column_name).items()
)
else:
column_names.append((column_name, v))
return dict(column_names)
def get_nested_value(d, key):
"""Return a dictionary item given a dictionary d and a flattened key from get_column_names.

Example:
    d = {
        'a': {
            'b': 2,
            'c': 3,
            },
    }
    key = 'a.b'
    will return: 2

"""
if '.' not in key:
    if key not in d:
        return None
    return d[key]
base_key, sub_key = key.split('.', 1)
if base_key not in d:
    return None
sub_dict = d[base_key]
return get_nested_value(sub_dict, sub_key)

def get_row(line_contents, column_names):
"""Return a csv compatible row given column names and a dict."""
row = []
for column_name in column_names:
line_value = get_nested_value(
line_contents,
column_name,
)
if isinstance(line_value, str):
row.append('{0}'.format(line_value.encode('utf-8')))
elif line_value is not None:
row.append('{0}'.format(line_value))
else:
row.append('')
return row
if(name == 'main'):
"""Convert a yelp dataset file from json to csv."""
json_file = []
json_file.append('D:\YELP Dataset\yelp_academic_dataset_business.json'); #args.json_file
json_file.append('D:\YELP Dataset\yelp_academic_dataset_checkin.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_review.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_tip.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_user.json');
csv_file = []
for i in range(5):
csv_file.append('{}.csv'.format((json_file[i])[0:len(json_file[i])-5]))
column_names = get_superset_of_column_names_from_file(json_file[i])
read_and_write_file(json_file[i], csv_file[i], column_names)
print('{} converted to {} successfully.'.format(json_file[i], csv_file[i]))

from dataset-examples.

tootrackminded commented on June 9, 2024

@CAVIND46016 are you able to post your code in a formatted snippet? Using it in my compiler is producing indentation errors. Thank you!

from dataset-examples.

CAVIND46016 commented on June 9, 2024

@dotdose : Have a look at the code here, this should work better.
https://github.com/CAVIND46016/Yelp-Reviews-Dataset-Analysis/blob/master/json_to_csv_converter.py

from dataset-examples.

Can not import the dataset into python about dataset-examples HOT 8 OPEN

Comments (8)

Reference:

https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent