DepictQA: Depicted Image Quality Assessment with Vision Language Models

DepictQA (Depicted image Quality Assessment) is dedicated to developing multi-modal image quality assessment models that linguistically assess or compare image quality, aiming to more closely align with human expressions.
News
[2025.02]    DeQA-Score was accepted to CVPR 2025.
[2025.01]    We released DeQA-Score, a distribution-based depicted image quality assessment model for score regression. Datasets, codes, and model weights (full tuning / LoRA tuning) were available.

[2024.07]    Datasets (huggingface / modelscope) of DepictQA-v1 and DepictQA-Wild (DepictQA-v2) were available.
[2024.07]    DepictQA-v1 was accepted to ECCV 2024.
[2024.06]    Codes of DepictQA-v1 and DepictQA-Wild (DepictQA-v2) were available.
[2024.05]    We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.
[2023.12]    We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.
Papers

*: Equal Contribution, †: Corresponding Author
Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue†, Chao Dong†
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
paper / project page / code / data

We introduce DeQA-Score, a distribution-based depicted image quality assessment model for score regression.

Descriptive Image Quality Assessment in the Wild
Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong†, Tianfan Xue†
arXiv, 2024
paper / project page / code / data

We introduce DepictQA-Wild, a multi-functional in-the-wild descriptive image quality assessment model.

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li*, Jinjin Gu*, Zhenfei Yin, Tianfan Xue†, Chao Dong†
European Conference on Computer Vision (ECCV), 2024
paper / project page / code / data

We introduce DepictQA, leveraging Multi-modal Large Language Models, allowing for detailed, language-based, and human-like evaluation of image quality.


Template from JonBarron