Depicting Beyond Scores:
Advancing Image Quality Assessment through Multi-modal Language Models

1The Chinese University of Hong Kong 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
3University of Sydney 4Shanghai AI Laboratory
*Equal Contribution Corresponding Author

Demo, Codes, and Datasets will be released in around late May or early June 2024.

Abstract

We introduce a Depicted image Quality Assessment method (DepictQA). DepictQA leverages Multi-modal Large Language Models, allowing for detailed, language-based, human-like evaluation of image quality. DepictQA interprets image content and distortions descriptively and comparatively, aligning closely with humans' reasoning process. To build the DepictQA model, we establish a hierarchical task paradigm, and collect a multi-modal IQA training dataset. To navigate the challenges in limited training data and processing multiple images, we propose to use multi-source training data and specialized image tags. Our DepictQA demonstrates a better performance than score-based methods on the BAPPS benchmark. Moreover, compared with general MLLMs, our DepictQA can generate more accurate reasoning descriptive languages.

Illustration of DepictQA

DepictQA first identifies the distortions , then weighs the influences of different distortions to the texture damages, and finally obtains the comparison conclusion that are better aligned with human judgments than score-based IQA methods.

Task Paradigm & Dataset Construction

We build the task paradigm consisted of three tasks: (a) quality description, (b) quality comparison, and (c) comparison reasoning.
Based on the task paradigm, we collect responses through (1) questionnaire collection, (2) GPT-4 generation, and (3) annotator revision.

Results of Quality Description
Results of Quality Comparison
Results of Comparison Reasoning
BibTeX

        @article{depictqa,
          title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
          author={You, Zhiyuan and Li, Zheyuan, and Gu, Jinjin, and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
          journal={arXiv preprint arXiv:2312.08962},
          year={2023}
        }