We introduce a Depicted image Quality Assessment method (DepictQA). DepictQA leverages Multi-modal Large Language Models, allowing for detailed, language-based, human-like evaluation of image quality. DepictQA interprets image content and distortions descriptively and comparatively, aligning closely with humans' reasoning process. To build the DepictQA model, we establish a hierarchical task paradigm, and collect a multi-modal IQA training dataset. To navigate the challenges in limited training data and processing multiple images, we propose to use multi-source training data and specialized image tags. Our DepictQA demonstrates a better performance than score-based methods on the BAPPS benchmark. Moreover, compared with general MLLMs, our DepictQA can generate more accurate reasoning descriptive languages.
DepictQA first identifies the distortions , then weighs the influences of different distortions to the texture damages, and finally obtains the comparison conclusion that are better aligned with human judgments than score-based IQA methods.
We build the task paradigm consisted of three tasks:
(a) quality description,
(b) quality comparison,
and (c) comparison reasoning.
Based on the task paradigm, we collect responses through
(1) questionnaire collection,
(2) GPT-4 generation,
and (3) annotator revision.
@article{depictqa_v2,
title={Descriptive Image Quality Assessment in the Wild},
author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
journal={arXiv preprint arXiv:2405.18842},
year={2024}
}
@article{depictqa_v1,
title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
author={You, Zhiyuan and Li, Zheyuan, and Gu, Jinjin, and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
journal={arXiv preprint arXiv:2312.08962},
year={2023}
}