This python script is a better solution than my previous post. It was generated with a prompt to Microsoft’s version of chatGPT, something like “Write a python program to translate a unicode JSON file of arbitrary size into comma-separated values.” I did a little manual fix-up on the script that was generated, and got:
#!/usr/bin/python3
import json
import csv
with open('answers.json') as f:
data = json.load(f)
csv_file = open('answers.csv', 'w')
csv_writer = csv.writer(csv_file)
count = 0
for item in data:
if count == 0:
header = item.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(item.values())
csv_file.close()
The first line in the script is just to tell Linux to use python3. If you have Windows, that isn’t necessary.
My input file was answers.json, and output was answers.csv . You would change these to suit, or if you have a lot of files, maybe want to prompt for the names.
The program assumes that the first record of the json is a header with column names.
Unlike the previous conversion website that I used, multi-byte unicode characters like emojis were handled well, for instance:
Q: 😂„ are you a rePUPlican 🐶 or a demoCAT 🐱?
A: indePANDAnt 🐼
The output file is comma-separated values (csv), which can easily be read by Excel, or Openoffice Calc.
One problem was that my Yahoo Answers database originally came from the web (html), so contained strings like
I can't believe it <3
which should be
I can't believe it <3
As the ampersand-hash sequence was just ordinary characters in the JSON, it passed right through that way, and I had to do a global search and replace on all such things. I just did a search on &
in all cells, then when I found a pattern like can't
did a global replace of '
with '
(apostrophe). Not all of the finds were of the form &#
– there were a few like "
and &
.
There’s probably some Excel script that does the translation of such things, but it didn’t take long to do it manually.