Depicting Beyond Scores:
Advancing Image Quality Assessment through Multi-modal Language Models

1The Chinese University of Hong Kong 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
3University of Sydney 4Shanghai AI Laboratory
*Equal Contribution Corresponding Author

We introduce a Depicted image Quality Assessment method (DepictQA). DepictQA leverages Multi-modal Large Language Models, allowing for detailed, language-based, human-like evaluation of image quality. DepictQA interprets image content and distortions descriptively and comparatively, aligning closely with humans' reasoning process. To build the DepictQA model, we establish a hierarchical task paradigm, and collect a multi-modal IQA training dataset. To navigate the challenges in limited training data and processing multiple images, we propose to use multi-source training data and specialized image tags. Our DepictQA demonstrates a better performance than score-based methods on the BAPPS benchmark. Moreover, compared with general MLLMs, our DepictQA can generate more accurate reasoning descriptive languages.

Illustration of DepictQA

DepictQA first identifies the distortions , then weighs the influences of different distortions to the texture damages, and finally obtains the comparison conclusion that are better aligned with human judgments than score-based IQA methods.

Task Paradigm & Dataset Construction

We build the task paradigm consisted of three tasks: (a) quality description, (b) quality comparison, and (c) comparison reasoning.
Based on the task paradigm, we collect responses through (1) questionnaire collection, (2) GPT-4 generation, and (3) annotator revision.

Results of Quality Description
Results of Quality Comparison
Results of Comparison Reasoning

      title={Descriptive Image Quality Assessment in the Wild},
      author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
      journal={arXiv preprint arXiv:2405.18842},
@article{depictqa_v1, title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models}, author={You, Zhiyuan and Li, Zheyuan, and Gu, Jinjin, and Yin, Zhenfei and Xue, Tianfan and Dong, Chao}, journal={arXiv preprint arXiv:2312.08962}, year={2023} }