IndicTTI: Navigating Text-to-Image Generative Bias across Indic Languages

1IIT Jodhpur 2Meta 3Weir P.B.C.
ECCV 2024

Abstract

This research explores the bias in text-to-image (TTI) models for the Indic languages widely spoken throughout India. It examines and compares the generative performance and cultural aspects of leading TTI models in these languages, contrasting it with their English language capabilities. Employing the proposed IndicTTI benchmark, this research comprehensively evaluates the performance of 30 Indic languages using two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to measure how well these models support Indic languages and identify areas in need of improvement. Considering the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness in the context of Indic linguistic landscapes.

Visual Abstract

(Top) Images generated by Midjourney when given equivalent prompts in the English and Hindi languages highlighting the tendency of the model to generate incorrectly. (Bottom) Images generated by DallE-3, when given equivalent prompts in the English and Hindi languages, highlight astonishingly different cultural representations.

BibTeX


        @article{mittal2024indicTTI,
          title={Navigating Text-to-Image Generative Bias across Indic Languages},
          author={Mittal, Surbhi and Sudan, Arnav and Vatsa, Mayank and Singh, Richa and Glaser, Tamar and Hassner, Tal},
          journal={ECCV},
          year={2024}
        }