reading wikiart captions from jsonl files #28

ajie6666 · 2024-09-06T03:23:05Z

Hello, I want to read the caption of the wikiart section in the latest JSONL file. I am using the following code, but I am unable to read it.
##################################################
import json
import jsonlines
import pprint

with open('./json_files/StyleGallery.jsonl') as file:
for line in jsonlines.Reader(file):
if "img_file" in line :
pprint.pprint(line["img_file"])

Jeoyal · 2024-09-06T03:36:37Z

import json

with open("./StyleGallery.jsonl", 'r') as f:
datas = f.readlines()
for data in datas:
data = json.loads(data)
print(data['content_prompt'])

ajie6666 · 2024-09-06T05:20:03Z

Thanks for your reply. This outputs all the tags, so how can I tell if it belongs to the wikiart dataset? Because MultiGen-20M and JourneyDB also have "content_prompt".

Jeoyal · 2024-09-06T05:22:44Z

Word "wikiart" should be in data["image_file"].

ajie6666 · 2024-09-06T05:32:09Z

I see what you mean, I tried the following code:
##################################################
import json
import pprint

with open("./json_files/StyleGallery.jsonl", 'r') as f:
datas = f.readlines()
for data in datas:
data = json.loads(data)
if "wikiart" in data['image_file'] :
pprint.pprint(data['content_prompt'])
###############################################
But I'm getting an error ：KeyError: 'image_file'
And I changed“image_file”to“img_file”：
###############################################
import json
import pprint

with open("./json_files/StyleGallery.jsonl", 'r') as f:
datas = f.readlines()
for data in datas:
data = json.loads(data)
if "wikiart" in data['img_file'] :
pprint.pprint(data['content_prompt'])
#################################################
I also got an error :
Traceback (most recent call last):
File "read_jsonfile.py", line 30, in
data = json.loads(data)
File "/opt/conda/envs/styleshot/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/opt/conda/envs/styleshot/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/conda/envs/styleshot/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 806 (char 805)
#############################################################
T-T

Jeoyal · 2024-09-06T07:01:11Z

Hi, i found something wrong in our StyleGallery.jsonl, i will update the correct version soon.

Jeoyal · 2024-09-06T07:04:51Z

It might take two hours.

Jeoyal · 2024-09-06T10:31:52Z

Hi, i have updated it in here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading wikiart captions from jsonl files #28

reading wikiart captions from jsonl files #28

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

Jeoyal commented Sep 6, 2024 •

edited

Loading

Jeoyal commented Sep 6, 2024

reading wikiart captions from jsonl files #28

reading wikiart captions from jsonl files #28

Comments

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

ajie6666 commented Sep 6, 2024

Jeoyal commented Sep 6, 2024

Jeoyal commented Sep 6, 2024 • edited Loading

Jeoyal commented Sep 6, 2024

Jeoyal commented Sep 6, 2024 •

edited

Loading