Comments (8)
YES! SOLVED! Once you have decomposed it from *.tar, do it again on the generated file, then you will see different josn files.
from dataset-examples.
@tiechengsu i am having the same problem, were you able to solve the issue?
from dataset-examples.
@bngksgl No, I used the previous dataset instead, which you can find here
https://app.dominodatalab.com/mtldata/yackathon/browse/yelp_dataset_challenge_academic_dataset
It's easier to import. The latest data combine several categories together, no idea have to import it.
from dataset-examples.
It's a .tar file, just decompress it again
from dataset-examples.
The latest data combine several categories together, no idea have to import it.
Does that mean reviews.josn and business.json,etc. are mixed stored int he file?
from dataset-examples.
Not really sure where you ppl r facing errors. I have edited the code to accept .json files explicitly and convert them to .csv. I have mentioned the filepath in main method explicitly instead of using arg.parse as in original code. Let me know if this helps.
Reference:
https://github.com/Yelp/dataset-examples/blob/master/json_to_csv_converter.py
"""Convert the Yelp Dataset Challenge dataset from json format to csv.
import argparse
import collections
import csv
import json
def read_and_write_file(json_file_path, csv_file_path, column_names):
"""Read in the json dataset file and write it out to a csv file, given the column names."""
with open(csv_file_path, 'w') as fout:
csv_file = csv.writer(fout)
csv_file.writerow(list(column_names))
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
csv_file.writerow(get_row(line_contents, column_names))
def get_superset_of_column_names_from_file(json_file_path):
"""Read in the json dataset file and return the superset of column names."""
column_names = set()
with open(json_file_path, encoding = 'utf8') as fin:
for line in fin:
line_contents = json.loads(line)
column_names.update(
set(get_column_names(line_contents).keys())
)
return column_names
def get_column_names(line_contents, parent_key=''):
"""Return a list of flattened key names given a dict.
Example:
line_contents = {
'a': {
'b': 2,
'c': 3,
},
}
will return: ['a.b', 'a.c']
These will be the column names for the eventual csv file.
"""
column_names = []
for k, v in line_contents.items():
column_name = "{0}.{1}".format(parent_key, k) if parent_key else k
if isinstance(v, collections.MutableMapping):
column_names.extend(
get_column_names(v, column_name).items()
)
else:
column_names.append((column_name, v))
return dict(column_names)
def get_nested_value(d, key):
"""Return a dictionary item given a dictionary d
and a flattened key from get_column_names
.
Example:
d = {
'a': {
'b': 2,
'c': 3,
},
}
key = 'a.b'
will return: 2
"""
if '.' not in key:
if key not in d:
return None
return d[key]
base_key, sub_key = key.split('.', 1)
if base_key not in d:
return None
sub_dict = d[base_key]
return get_nested_value(sub_dict, sub_key)
def get_row(line_contents, column_names):
"""Return a csv compatible row given column names and a dict."""
row = []
for column_name in column_names:
line_value = get_nested_value(
line_contents,
column_name,
)
if isinstance(line_value, str):
row.append('{0}'.format(line_value.encode('utf-8')))
elif line_value is not None:
row.append('{0}'.format(line_value))
else:
row.append('')
return row
if(name == 'main'):
"""Convert a yelp dataset file from json to csv."""
json_file = []
json_file.append('D:\YELP Dataset\yelp_academic_dataset_business.json'); #args.json_file
json_file.append('D:\YELP Dataset\yelp_academic_dataset_checkin.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_review.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_tip.json');
json_file.append('D:\YELP Dataset\yelp_academic_dataset_user.json');
csv_file = []
for i in range(5):
csv_file.append('{}.csv'.format((json_file[i])[0:len(json_file[i])-5]))
column_names = get_superset_of_column_names_from_file(json_file[i])
read_and_write_file(json_file[i], csv_file[i], column_names)
print('{} converted to {} successfully.'.format(json_file[i], csv_file[i]))
from dataset-examples.
@CAVIND46016 are you able to post your code in a formatted snippet? Using it in my compiler is producing indentation errors. Thank you!
from dataset-examples.
@dotdose : Have a look at the code here, this should work better.
https://github.com/CAVIND46016/Yelp-Reviews-Dataset-Analysis/blob/master/json_to_csv_converter.py
from dataset-examples.
Related Issues (20)
- More/Fewer total reviews per business than in 'review_count'
- where is the label if review is positive or negative?
- json_to_csv_converter.py has a small issue for the new yelp dataset HOT 1
- json_to_csv_converter.py fixed for Python 3 HOT 7
- What's the meaing of each column heading ?
- question about date availability for business in NYC HOT 2
- Review_count HOT 1
- Get Yelp original json file HOT 1
- Check-in data is not user-related? HOT 1
- missing attribute in category_predictor.py HOT 1
- Is the first entry in the variable “categories” the superordinate category?
- business.json is not converted to business.csv HOT 2
- Broken links in README HOT 2
- Question regarding the time zones in the review.json file HOT 1
- 'NoneType' object is not subscriptable error when yelp_academic_dataset_business.json conversion
- Citation issue
- Including not recommended reviews
- Convert JSON to CSV: Missing module source and missing undefined variable
- Extract review.json file from yelp_academic_dataset.json
- Can I know which photo belongs to which review
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataset-examples.