Giter VIP home page Giter VIP logo

Comments (8)

HongxuChenUQ avatar HongxuChenUQ commented on June 9, 2024 1

YES! SOLVED! Once you have decomposed it from *.tar, do it again on the generated file, then you will see different josn files.

from dataset-examples.

bngksgl avatar bngksgl commented on June 9, 2024

@tiechengsu i am having the same problem, were you able to solve the issue?

from dataset-examples.

tiechengsu avatar tiechengsu commented on June 9, 2024

@bngksgl No, I used the previous dataset instead, which you can find here
https://app.dominodatalab.com/mtldata/yackathon/browse/yelp_dataset_challenge_academic_dataset
It's easier to import. The latest data combine several categories together, no idea have to import it.

from dataset-examples.

Hank-JSJ avatar Hank-JSJ commented on June 9, 2024

It's a .tar file, just decompress it again

from dataset-examples.

HongxuChenUQ avatar HongxuChenUQ commented on June 9, 2024

The latest data combine several categories together, no idea have to import it.

Does that mean reviews.josn and business.json,etc. are mixed stored int he file?

from dataset-examples.

CAVIND46016 avatar CAVIND46016 commented on June 9, 2024

Not really sure where you ppl r facing errors. I have edited the code to accept .json files explicitly and convert them to .csv. I have mentioned the filepath in main method explicitly instead of using arg.parse as in original code. Let me know if this helps.

Reference:

https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py

"""Convert the Yelp Dataset Challenge dataset from json format to csv.
import argparse
import collections
import csv
import json
def read_and_write_file(json_file_path, csv_file_path, column_names):
"""Read in the json dataset file and write it out to a csv file, given the column names."""
with open(csv_file_path, 'w') as fout:
csv_file = csv.writer(fout)
csv_file.writerow(list(column_names))
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
csv_file.writerow(get_row(line_contents, column_names))
def get_superset_of_column_names_from_file(json_file_path):
"""Read in the json dataset file and return the superset of column names."""
column_names = set()
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
column_names.update(
set(get_column_names(line_contents).keys())
)
return column_names
def get_column_names(line_contents, parent_key=''):
"""Return a list of flattened key names given a dict.
Example:
line_contents = {
'a': {
'b': 2,
'c': 3,
},
}
will return: ['a.b', 'a.c']
These will be the column names for the eventual csv file.
"""
column_names = []
for k, v in line_contents.items():
column_name = "{0}.{1}".format(parent_key, k) if parent_key else k
if isinstance(v, collections.MutableMapping):
column_names.extend(
get_column_names(v, column_name).items()
)
else:
column_names.append((column_name, v))
return dict(column_names)
def get_nested_value(d, key):
"""Return a dictionary item given a dictionary d and a flattened key from get_column_names.

Example:
    d = {
        'a': {
            'b': 2,
            'c': 3,
            },
    }
    key = 'a.b'
    will return: 2

"""
if '.' not in key:
    if key not in d:
        return None
    return d[key]
base_key, sub_key = key.split('.', 1)
if base_key not in d:
    return None
sub_dict = d[base_key]
return get_nested_value(sub_dict, sub_key)

def get_row(line_contents, column_names):
"""Return a csv compatible row given column names and a dict."""
row = []
for column_name in column_names:
line_value = get_nested_value(
line_contents,
column_name,
)
if isinstance(line_value, str):
row.append('{0}'.format(line_value.encode('utf-8')))
elif line_value is not None:
row.append('{0}'.format(line_value))
else:
row.append('')
return row
if(name == 'main'):
"""Convert a yelp dataset file from json to csv."""
json_file = []
json_file.append('D:\YELP Dataset\yelp_academic_dataset_business.json'); #args.json_file
json_file.append('D:\YELP Dataset\yelp_academic_dataset_checkin.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_review.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_tip.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_user.json');
csv_file = []
for i in range(5):
csv_file.append('{}.csv'.format((json_file[i])[0:len(json_file[i])-5]))
column_names = get_superset_of_column_names_from_file(json_file[i])
read_and_write_file(json_file[i], csv_file[i], column_names)
print('{} converted to {} successfully.'.format(json_file[i], csv_file[i]))

from dataset-examples.

tootrackminded avatar tootrackminded commented on June 9, 2024

@CAVIND46016 are you able to post your code in a formatted snippet? Using it in my compiler is producing indentation errors. Thank you!

from dataset-examples.

CAVIND46016 avatar CAVIND46016 commented on June 9, 2024

@dotdose : Have a look at the code here, this should work better.
https://github.com/CAVIND46016/Yelp-Reviews-Dataset-Analysis/blob/master/json_to_csv_converter.py

from dataset-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.