AI’s Regimes of Representation: A Community-Centered Study of Text-to-Image Models in South Asia
This case study presents a community-centered evaluation of South Asian cultural representations in text-to-image (T2I) models and distills lessons for the responsible development of T2I models that allows for recognition of structural inequalities.
by Rida Qadri, Renee Shelby, Cynthia L. Bennett, and Remi Denton
Published onFeb 27, 2024
AI’s Regimes of Representation: A Community-Centered Study of Text-to-Image Models in South Asia
·
Abstract
This case study presents a community-centered evaluation of South Asian cultural representations in text-to-image (T2I) models. We identify several failure modes and locate them within participants’ reporting of their existing experiences of social marginalization. We thus show how generative artificial intelligence (AI) can reproduce an outsiders’ gaze for viewing South Asian cultures, shaped by global and regional power inequities. By centering communities as experts and soliciting their perspectives on T2I limitations, our study adds nuance to existing evaluative frameworks, and deepens our understanding of the culturally specific ways in which generative AI technologies can fail in non-Western and Global South settings. We distill lessons for responsible development of T2I models, recommending concrete pathways forward that can allow for recognition of structural inequalities.
🎧 Listen to an audio version of this case study.
Keywords: human-centered AI, AI harms, text-to-image models, generative AI, non-Western AI fairness, South Asia
Rida Qadri Google Research, San Francisco, CA
Renee Shelby Google Research, San Francisco, CA
Cynthia L. Bennett Google Research, New York, NY
Remi Denton Google Research, New York, NY
Learning Objectives
To introduce readers to failures of generative AI models in global settings and limitations of dominant Western approaches to AI model building.
To explore challenges in building more globally inclusive and responsive AI models.
To understand tensions within text-to-image AI’s representation of South Asian contexts, as evaluated by people who self-identify with these cultural contexts.
To showcase the value of qualitative and community-based evaluations of generative AI outputs.
Authors’ Note: This SERC case study is excerpted with permission from Rida Qadri, Renee Shelby, Cynthia L. Bennett, and Remi Denton, “AI’s Regimes of Representation: A Community-Centered Study of Text-to-Image Models in South Asia,” in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23), June 12–15, 2023, Chicago, IL USA, 506–517. New York: ACM, 2023), https://doi.org/10.1145/3593013.3594016.
Introduction
Recent scholarship on fairness, accountability, and transparency in computing systems points to the need for reevaluating the field’s dominant and Western-centered methods and frameworks for understanding and evaluating harms from artificial intelligence (AI). For instance, there are growing calls for more community-centered work1 and a reorientation toward non-Western frameworks of fairness.2 However, empirical studies collaboratively investigating AI harms with diverse, global communities are still rare, continuing the disconnect between dominant evaluation approaches and the lived experiences of impacted communities, including people residing in South Asian contexts.3
In response, we conducted a community-centered study of cultural limitations of a proliferating generative AI medium—text-to-image (T2I) models—in South Asian contexts, with thirty-six participants from Pakistan (n = 15), India (n = 13), and Bangladesh (n = 8). Through two-part focus groups, participants co-designed T2I prompts and collectively reflected on model outputs. This study design offered participants the opportunity to articulate their own understandings of model limitations, failures, and potential impacts, drawing from their local cultural knowledge and situated experiences with South Asian representations.
Participant conversation and reflections foregrounded three broad failure modes: (1) failing to generate cultural subjects, (2) amplifying dominant (yet inaccurate) cultural defaults, and (3) perpetuating cultural stereotypes. Our study indicates how cultural limitations of T2I models can magnify existing harmful media representations of various groups. Whereas the T2I failure modes foregrounded by participants are not necessarily unique to South Asia, they highlight some of the ways in which global and regional power dynamics creep into generative AI tools. Our study illustrates the importance of developing more qualitative ways to evaluate generative AI tools such as T2I models that to date have typically been evaluated using quantitative benchmarks.4 In this way, our study offers an example of how qualitative, community-centered research can strengthen responsible AI practices, by centering local knowledge and expertise.
Theoretical Background
Text-to-image (T2I) generative models allow users to create photo-realistic images from free-form and open-ended text prompts typically relying on web-scale datasets. Such datasets have been shown to reflect social stereotypes, inequalities, and hierarchies,5 raising concerns about T2I models similarly fostering representational and cultural harms.6 However, unlike other generative AI, such as language7 or image caption models,8 computing researchers have yet to articulate the broader landscape of potential harms for generative image models particularly in a global setting. While empirical research on T2I models is still nascent, studies suggest they can reinforce social hierarchies and replicate dominant stereotypes along axes of gender, skin tone, and culture.9 Our work complements and extends this work, by offering the first empirical study of T2I models that centers on and engages non-Western communities.
This extension is particularly important in response to a growing body of scholarship calling attention to the dominance of Western perspectives and experiences embedded within responsible AI frameworks,10 which are not transferable across cultural contexts.11 The nonportability of Western frameworks can lead to flawed data and model assumptions, evaluation methods that overlook culturally specific axes of discrimination, and cultural incongruencies.12
Evaluating representation is complex because academic scholarship tells us there are no “true” or “correct” representations; rather representations are “historically determined [social] construction(s). . . mediated by social, ideological, and cultural processes.”13 The power to represent communities in ways that shape how they are understood can be understood as a regime of representation, a dominant system of media discourse, symbols, and images that create particular narratives about already marginalized groups.14
In the Asian context, a dominant regime of representation is Orientalism, which refers to a broader system of thought, way of writing, and studying the “Orient,” or Eastern world that emerged in the nineteenth century.15 As framed by Western geopolitical forces, the “Orient” became a singular stand-in for the numerous cultural and national boundaries of the eastern colonies.16 Through Orientalism’s outsider’s gaze, the West imposed demeaning cultural stereotypes onto Asia: backwards, silently different, passive, and sexualized. These became a way not only of thinking about South Asia, but a means to conceptualize Asia in a way that made it susceptible to certain kinds of control and geopolitical management. Orientalist tropes continue in contemporary media about South Asia, and we investigate how they endure in emerging technologies. These include essentialist representations of the subcontinent as diseased and mentally ill, impoverished, and economically dysfunctional.17 Reductive depictions of Asian women as sexually available and exotic, or lacking agency through Western understandings of veiling are also common.18 While the range of Orientalist representations vary, they are united in distorting the meaning of a cultural practice or symbol through reduction and simplification.19
Existing studies on representation have shown how these media representations hold power to shape how a particular community is seen and understood and thus engaged with by others.20 Moreover, the reductive stereotypes, miscategorizations, and forms of erasure can also “block the capacity of marginalized groups . . . to imagine, describe, and invent [themselves] in ways that are liberatory.”21 To date, little is known about what forms of representation T2I models contain and perpetuate, particularly as defined by South Asian communities. Recognizing these regimes is necessary to disrupt their harmful impacts.
Methodology
We engaged participants from three South Asian countries (Bangladesh, India, and Pakistan) through focus groups and a survey to: (1) collaboratively develop culturally specific text prompts, (2) collectively reflect upon images generated by T2I models in response to culturally specific text prompts, and (3) understand participants’ experiences of the generated imagery. South Asia is a rich and complex region with many diverse cultures; we focused on participants from these specific countries, because the countries’ cultural histories overlap, which created opportunities to facilitate useful comparisons.
We recruited participants with cultural knowledge of any one of the three nation-states, including lived experience, professional affiliation, and/or academic study. We asked prospective participants to self-identify with one of the nation-states, and did not assign these identities ourselves. Other inclusion criteria required participants to have English-language proficiency and to be at least eighteen years old. We enrolled thirty-six participants, affiliated with the countries on which we focus: Pakistan (n = 15), India (n = 13), and Bangladesh (n = 8). Our sample covered ten linguistic groups and fourteen subnational regional groups within South Asia, and included seventeen academic researchers, nine “cultural workers” employed in such fields as museum curation and the arts, and ten participants with lived experience within the cultural contexts we were studying, but not necessarily professional experience in cultural industries.22
Each participant attended two ninety-minute focus groups composed of between seven and nine participants from the same country. The first focus group was structured around discussion questions and interactive activities to understand how participants defined ideas of “good” and “bad” cultural representations.
Following the first focus group, participants completed a survey submitting full-sentence text prompts and suggesting up to five examples of cultural events, landmarks, art styles, and/or artists, historic events, figures, and characters they felt would enable the assessment of T2I models. The research team synthesized participant’s prompt suggestions, tested various prompts, and generated images using four state-of-the-art T2I models.23 We constructed prompts based on participant suggestions with the aim of increasing quality and coverage of cultural references in the study. For example, a general model failure for the prompt, “A day in Lahore,” might result in images of daylight, rather than the city; however, rewording the prompt as “People spending their day in Lahore” led to images reflecting the model’s learned associations with daily life in Lahore.
During the second focus group, participants reviewed and evaluated generated images from four to five prompts that they had suggested. Following individual reflection, we facilitated discussion on the possibilities and risks of T2I models. We deliberately kept discussion questions open–ended to give participants the opportunity to focus on what they found most important.24
All focus groups were video-recorded, transcribed, and thematically analyzed.25 In our discussion below, we introduce participant quotes with alphanumeric identifiers, providing their country to better contextualize participants’ comments. Some quotations were provided anonymously during interactive exercises; these only have the identifier “A” after the country-group.
Findings
Broadly, participants were interested in both the accuracy of cultural subject-matter recognition and nuances of cultural representations in T2I generated imagery. As P35, from Pakistan, summarized, “[if I put in a particular figure, historical event, or allegory], does [the model] get what I’m trying to say, first of all? Is there a kind of understanding or legibility? But then within that, what kind of visual representation do you get? Do you get a kind of Orientalist, portraiture rendering? Do you get an image that looks closer to maybe South Asian renderings?”
Drawing on the participants’ input and discussions, we identify three failure modes that encapsulate participants’ concerns about model accuracy and representations: (1) failing to recognize cultural subjects: generated imagery fails to depict a culture’s subject matter; (2) amplifying cultural defaults: culture’s subject matter in generated images defaults to particular hegemonic or dominant cultures; and (3) perpetuating cultural tropes: generated images contain stereotypes and tropes associated with particular cultures. It is important to note that our work is not demonstrating a large-scale evaluation of what is and is not possible with T2I models, nor an investigation as to the reasons for model failures. Instead, we share the frictions and failures that can emerge as participants of non-Western backgrounds use these models to represent their own worlds, lives, and identities.
Failing to Recognize Cultural Subjects
Participants shared their desire to test T2I models’ ability to generate cultural artifacts, history, and practices from South Asian cultures. Importantly, participants emphasized that they were not looking for absolute accuracy in each image, and similarly acknowledged that it would be impossible to achieve complete accuracy for topics with multiple realities and possible renderings (e.g., a South Asian family). Rather, they adjudicated accuracy based on whether the cultural subject matter had a canonical rendering (e.g., historical figures like Indira Gandhi and architectural landmarks like Badshahi Mosque), or essential canonical elements (e.g., the correct sporting equipment for cricket scenes, the proper landscape for a region, or the art style of Sadequain). Through their reviews of generated imagery, participants identified different dimensions of “failure to recognize cultural subjects,” from total failure to partial legibility but lacking cultural specificity.
Across all countries, South Asian participants identified examples in which models completely failed to depict important cultural subject matter specified in text prompts. For instance, models totally failed to render the styles of famous artists from India (e.g., Tagore), Pakistan (e.g., Gulgee, Sadequain), and Bangladesh (e.g., Zainul Abedin). Participants described how such total failures were particularly frustrating, as generated images shown during the first focus group reflected the painting styles of Monet, Picasso, and Rembrandt in easily recognizable ways. As P18 from India reflected: “AI seems to be able to pick up and adapt [images] to the style of Monet [...] much better [than with] Indian artists or Indian folk art.”
Participants named a second way T2I models fail to recognize cultural subjects, in which models render vaguely “Eastern” visual associations in generated imagery. For example, a text prompt for the famous love story, “Heer Ranjha,” resulted in depictions that, according to P30, a Pakistani participant, “[do not] really have anything to do with Heer Ranjha.” (See Figure 1.) The famous folklore story is about two star-crossed lovers from rural Punjab akin to Romeo and Juliet; however, none of the generated images contained Heer, the woman, and included only a man wearing attire completely disconnected from the Punjab region or class that Heer Ranjha were from. Explaining further, P30 described the man as a “[stereotypical] monarch from Northern India,” even though Ranjha was a character from an agricultural family. Yet, participants noted images generated for Sherlock Holmes were very accurate.
Participants called out other failed images with vaguely “Eastern” aesthetics. For text prompts referencing Mughal-era (a South Asian empire with a distinctive style) cityscapes and buildings, models generated architecture that participants described as clearly Ottoman-looking, Gulf or Middle Eastern, or even East Asian–like, which indicated that the T2I tools were merging distinct cultures into one indistinguishable category.
Failures though were not equally distributed. Generated images for North Indian artifacts, such as culturally important buildings like the Qutub Minar and Red Fort, were identified as more accurate than their Pakistani and Bangladeshi counterparts, such as the Baitul Mukarram National Mosque in Bangladesh. However, even within India, participants emphasized that T2I models generated imagery more effectively for majority cultural artifacts compared to regional South Asian celebrations, such as Rajwadi Holi, which did not render at all.
Amplifying Hegemonic Cultural Defaults
Cultural defaults refer to which cultural centers are naturalized as the dominant frame of reference. As a T2I failure mode, cultural defaults encapsulate participants’ concerns about which cultural lens dominates representations in generated imagery. Much scholarship has emphasized the overrepresentation of Western or white cultural subject matter in media and algorithmic technologies.26 Participants, too, mentioned white, Western defaults in media, and identified examples where T2I models appeared to default to Eurocentric cultural artifacts, even if no cultural context was specified in the prompt. For example, neutral prompts for “A photo of a house of worship” rendered Christian, American-looking churches (see Figure 2), while “Toddler in marketplace” resulted in multiple images of white-skinned toddlers in stereotypically Western grocery stores. More worryingly, this centering of white, Western bodies continued even when South Asian cultural contexts were specified in text prompts (e.g., “Children eating street food in Varanasi,” “People eating street food in Lahore,” and “People celebrating Holi”).
Beyond a focus on generated examples that featured depictions of white people in Western settings, participants identified a pattern of T2I imagery, in which regional powers were more frequently represented. Participants from Bangladesh and Pakistan in particular emphasized the ways in which dominant media tends to center India as the South Asian cultural default. For the prompt “South Asian family,” Pakistani and Bangladeshi participants referred to generated attire as “Indian-looking”; similarly for prompts referencing women in “saris,” Bangladeshi participants emphasized that the T2I models produced saris with patterns and styling that appears Indian.
Participants also identified where T2I models generated Indian objects from prompts explicitly mentioning Bangladeshi and Pakistani cultural artifacts and subjects. For instance, prompts for Bangladeshi cultural topics rendered imagery containing Hindi Sanskrit instead of Bengali script. A prompt for “Bangladeshi Language Day” resulted in images with Hindi text; and images generated for the “Bangladesh Liberation War,” a seminal moment in Bangladesh’s history that formed the nation, depicted men wearing turbans, which P8 felt “represent[ed] more an Indian army man than an actual Bangladeshi army.”
One anonymous comment summarized the significance of regional cultural erasure that “India should not stand in for all of South Asia,” and P30, from Pakistan, explained how wide-sweeping this cultural default is, as South Asia is “an area that is about as big or half as big as Western Europe. That is a very large area, and there are tons of cultures . . . generalized into Northern India.” Reflecting on Bangladeshi cultural erasures, P2 emphasized: “they didn’t really get the exact nuances of our region or our people,” and P3 commented “AI still has a lot to learn about South Asia, apart from India.”
Beyond the “India as South Asia” cultural default, participants noted that T2I imagery tended to erase the “diversity of class, religious, gender, ethnic minority narratives” within their countries (anonymous, Pakistan). When prompts explicitly mentioned India (e.g., “Indian food” and “Indian women”), generated images defaulted to what participants identified as upper-caste, North Indian cultural subject matter. When discussing images generated in response to the prompt “Indian cultural dancers,” P15 identified the predominance of upper-caste dance forms, like Bharatnatyam, but not folk dancers of more marginalized castes, characterizing the representations as having a “very homogenized lens (upper caste),” which she argued could lead to further exclusions of non-upper-caste Indians from imaginations of “Indian culture,” while also cementing the distinction that marginalized caste practices are not in fact cultural. She also pointed out that while dance forms practiced by men (bhangra) were represented in the images, women’s dance traditions (e.g., giddha) were not, emphasizing that generated imagery perpetuated a “very male perspective through which we look into dance form.”
Religious diversity was also missing in most outputs for prompts referencing “Indian houses of worship.” P12, from India, pointed out how they felt this was a “Hinduization of Indian religious iconography” in T2I imagery, which mapped onto a broader tendency to represent India as unequivocally “Hindu,” even though India has a significant minority of Muslims, Christians, and Buddhists.27
Perpetuating Cultural Tropes
Cultural tropes reflect the stereotypes associated with particular cultures. Whereas the prior failure modes reflect systematic absences and miscategorizations, cultural tropes concern the harmful, essentialist representations that appear when cultural subject matter is visualized. These representations are “caricatures of the world” (P33, Pakistan) that perpetuate “extremely narrow depictions of extremely diverse phenomena/lives that then come to stand for the whole” (anonymous, Pakistan). Here, we summarize four dominant cultural tropes identified by participants, emphasizing connections between T2I imagery and existing representations of South Asian cultures.
The first trope concerns South Asia as impoverished and underdeveloped. Participants across all three nation-states described how tropes of dusty cities and “everyone living in slums” (anonymous, India) pervade media portrayals of South Asia, reducing the region to “one economic strata” (anonymous, India). Whereas income inequality exists, as with all parts of the world, the rich diversity of South Asian life tends to be absent in media portrayals, which typically depict South Asia as unequivocally impoverished and economically dysfunctional.28 Participants identified how this trope appeared in images generated from prompts for daily life in South Asian locales, often depicting “shabby and old households” (P4, Bangladesh). P21, from India, described how images generated in response to the prompt “A photo of daily life in Mumbai” reduced the city to “congested spaces and poverty.” For the prompt, “People spending their day in Peshawar” (see Figure 3), P30 from Pakistan emphasized how inclusion of architectural details would have disrupted the trope of underdevelopment: “Peshawar has markets, [...] old frescos, [...] old buildings, the old woodwork from the pre-independence era. It has various cultural stalls. So. . . I would have wanted . . . something that . . . presents our culture [...] What I received was a dusty street with a few rickshaws.” P9, from India, commented how generated images framed indigenous South Asian tribes as dirty, “even though Adivasi villages and homes are really clean and beautiful, even if there is poverty.”
Participants also noted how T2I renderings represented South Asia as frozen in time, indicating it was less modern or advanced than Western settings. Reviewing prompts for scenes and marketplaces in various Pakistani cities, P22 noted that generated images erased “modern parts of urban life” by showing only “old school open markets,” rather than the contemporary “upscale marketplaces.” In sum, participants felt they were “seeing pictures [from] 50 years back” (P31, Pakistan).
A second trope concerns South Asia as exotic. Participants noted the harmful cultural trope of exoticization in media, which from a Western gaze, depicts South Asia as a strange and bizarre land.29 P20, from India, described how South Asia is imagined as having “chaotic traffic” and “the cows in the streets,” creating a representation of South Asia as disorderly and overpopulated. P12 mentioned the trope of India as a “land of snake charmers,” where South Asian men are depicted as excessively brown-skinned and women clothed in traditional attire. Others noted the association of South Asia with particular color palettes sets the region apart from the rest of the world—either sepia tones or over-the-top bright colors—constituting another form of exoticization in the media.30 P11, from India, reflected how exoticization was common on postcard images depicting tribal women wearing “extensive silver jewelry” and positioned South Asian women as “exotic and wondrous and magic” subjects.
Participants identified this theme of exoticization in T2I imagery. In response to the prompt, “Painting of Queer South Asia where the painting has symbols of South Asia and queer culture,” multiple participants noted generated images continued the trope of South Asia being reduced to a certain color palette. P36, from Pakistan, called these colors “gaudy,” and P20, from India, pointed out that for Western representations, the models had generated “a greater variety, a greater diversity of color palettes and styles.” P36, from Pakistan, specifically called out the similarity of T2I outputs to historic photography practices, particularly colonial imagery: “the way the darkness of these bodies is represented is uncannily like especially that the first hundred or so years in photography, when lighting and color and picture development processes were very unsurprisingly set towards representing white bodies more than dark bodies. So it just brings up that particular history in showing these ill-defined generic dark bodies, even if it’s a little bit I guess more artistic.” P11, from India, emphasized that the invocation of such tropes is merely a way to sell more media, a capitalist and colonial logic that can continue in T2I, noting: “And I think that should stop.”
A third trope concerned the notion that Dalit communities are disempowered: a caste group in India, mapping onto the lowest rung of the caste hierarchy, who have experienced centuries of social and economic marginalization and exclusion.31 P15, a Dalit academic from India, discussed how Dalit communities are often presented in the media through both a classist and casteist lens, associated with “undesirable” occupations: “[near] a sewer or toilets... with dirt around [them].” Reviewing a prompt for “Daily life of a Dalit person,” she pointed out that T2I imagery similarly associated Dalit life with connotations of dirtiness, hardship, poverty, and lacking artisanal and resistive culture. None of the generated images for “daily life” incorporated Dalit celebrations or cultural productions, such as Dalit dance forms. Even when models were prompted for “A Dalit family celebrating Diwali in their house,” P15 detailed how the model resorted to upper-caste Hindu celebrations of Diwali that did not show the specific characteristics of Dalit Diwali celebrations. She further emphasized that representations of Dalit daily life should “also [be] about their songs, about their cultures, about how they make a difference through their everyday acts.” Characterizing T2I imagery as “essentialist” and a “clichéd representation,” P15 specified that this regime of representation missed the “dynamic aspect of Dalit identity” that in reality disrupts the trope of “abject poverty. . . as the only marker of Dalit life.”
The fourth trope that participants highlighted is of Muslim lives as one-dimensional. Pakistani participants expressed frustration with Western media narratives that reduce Islam and Muslim cultures to religious iconography, especially those since September 11, 2001, which portray Pakistan as a “terrorist” nation and Islam as fundamentalist.32 On media narratives, P23 from Pakistan reflected: “When we talk about Muslim life it always goes with a mosque. [As if Muslims] only worship all day.” Similarly, P33 from Pakistan described a fixation on “the call to prayer at the beginning of all TV shows and movies.”
Participants discussed nuanced ways these tropes appeared in T2I imagery, for instance, through repeated depictions of people wearing traditionally religious attire in scenes of Pakistan. In response to the prompt “Political protest in Pakistan,” P26 noted: “All men are wearing shalwar kameez and most are wearing prayer caps. Literally no person is wearing Western attire, which is quite common for men in Pakistan.” P23 and P22 both talked about the constant presence of veils and headscarves in T2I depictions of Muslim women, which P26 noted mapped onto tropes of women as only “conservative,” a trope that suggests that Muslim women need rescuing and lack agency.33 Participants clarified when prompts specify religious subject matter (e.g., Eid), veils and headscarves are not inherently problematic; but their presence in all generated images speaks to how Muslim lives are condensed to one-dimensional stories. T2I models risk reproducing reductive and one-dimensional representations for complex cultural subjects, such as “Islam,” further reducing diverse Islamic cultures to one form of religious practice. For example, participants described depictions of an “Islamic city” as being “very stereotypical” (P24, Pakistan), due to the images’ fixation on mosques. P23 noted the reduction of Muslim cities to mosques made it seem like people in these cities “don’t have a life” outside religion.
Discussion
History and Impact of Representational Failures
The T2I limitations that participants identified have a long history in media representations of South Asia but also draw from existing sociopolitical power imbalances. Participants noted how in T2I images, just as in existing media, “the touristic, Western gaze” is seemingly pandered to.34 Participants expressed concern that T2I models might be heavily biased toward outsider perspectives on their cultures. P9, from India, described how generated images felt like “tourist’s photos” reflecting “flatter versions of South Asia,” and amplified what P17, from India, called the “empirical abundance of certain kinds of images [about India]” that map onto global majoritarian views. P30 described they felt T2I training data and the resulting imagery led to a “Western vision of the East.”
However, the representational failures were not just divided in a binary of East vs. West. T2I images were also reflecting internal South Asian power imbalances that produce internal representations of marginalized communities that are just as “othering” as those produced by observers from the West. As P13, from India, commented, “it’s not just South Asian culture here against ... Western culture... There’s so many layers here of hegemonic cultures within South Asia. One small layer of this [culture] gets to represent the entirety of South Asian culture.”
Amplifying these power centers and existing media representations of their identities, was something participants were concerned about. Participants described how they work to correct reductive media stereotypes in their lives, and were concerned that T2I models would further “normalize” these stereotypes. P23 and P26, from Pakistan, explained how media depictions that reduce Muslim culture to religious rituals create tension and awkward social moments when they traveled outside Pakistan. P12, from India, described how when she travels abroad, she is frequently asked if “India is full of slums like in [the film] Slumdog Millionaire.”
Participants also reflected on the frustration and grief related to identity loss when their cultures are conflated with others. P8 elaborated that, as a Bangladeshi, such points of cultural confusion are regular occurrences: “Growing up, I was always categorized as ... [Indian]. I’m like no, I’m not Indian... No, I’m Bangladeshi. . . we have our own foods, we have our own holidays, we have our own historical events.” Similarly, P4, from Bangladesh, voiced concern that people in the Bangladeshi diaspora growing up outside their country would lose their cultural identity because they are less attuned to these differences. P22, from Pakistan, described the distress of seeing outsider representations of their culture in T2I outputs: “AI represents the majoritarian view, and if you’re someone who doesn’t fit in with that, then it’s particularly disturbing [for you].” Generative AI tools that reinforce hierarchies of cultural power risk limiting how people understand their culture on their own terms. When algorithms reproduce and amplify an outsider’s narrative about a culture, they impact both people’s sense of identity and belonging, as well as how they are perceived by others.35
Tensions of Inclusive Generative AI
Whereas participants agreed on the importance that T2I models be inclusive of global cultures, multiple participants emphasized the challenges inherent in defining and operationalizing global inclusivity. As P14 from India explained, “there’s no singular identity,” but rather “multiple languages, multiple cultures [...] and the complexities that come with that.” Participants also commented on the subjectivity inherent in the interpretation of visual imagery, echoing AI scholarship that has written of the socially and culturally subjective nature of image-text relationships.36 As one participant put it, most text prompts will have “such a wide range of portrayals” that there will always be the question of “which lens are you using?” (P26, Pakistan).
While pointing out the limits of T2I models, participants shared nuanced perspectives on the potential they saw generative AI could have in challenging outsider gazes and power inequities in existing media portrayals. They noted new sources of media could bring out possibilities of multiplicity and diversity of representations, unconstrained by existing hierarchies of caste, class, or museum patronage. For example, they pointed to the diverse representations generated by South Asian communities on TikTok, commenting there was “already an overwhelming incredible diversity of visual vocabularies and modes [available online]” that people could learn from, instead of having our “representation strained by logics of power and capital” (P36, Pakistan). Participants discussed whether generative AI could grant people space to tell their own stories and represent themselves, seeing an opportunity for generative AI to “call attention to certain kinds of folk art practices, which otherwise nobody would have noticed” (P17, India).
On the other hand, participants questioned the bounds of what should be captured in T2I models, debating the values and risks of inclusion. They also raised concerns about artist attribution, commodification, and the consequences of separating certain art forms from their traditional roots. For example, when reflecting on the models’ failure to produce a kalamkari-style print, P14 from India argued that the easier it becomes to “find traditional artworks that [have been] mass produced somewhere [...] the more it [becomes a] mechanism to run roughshod over people’s practices that don’t already have a voice and just further silence them or push them into obscurity or further.”
Ultimately, participant aspirations for generative AI focused heavily on restoring agency and community control over the terms of representation, exemplified by one participant’s challenge: “why can’t we imagine a more authentic world that our communities can build ourselves?” (P32, Pakistan). P19 from India similarly argued for shifting the power imbalance by “including people in this process. . . [as] representation has to come from places, which do not or never have had the resources to tell the stories.” Other participants were less hopeful about the possibility of inclusion in AI, instead questioning whether “it’s better to just opt out” because “there’s no way for this to be equitable. The bulk of material, the weight of how much media has already been produced, the sheer volume of it is so huge that it’s never going to be representative” (P22, Pakistan).
Generative AI as a Media Technology
This study also makes the case that the endurance of existing media representations in new technologies is an important site of study for generative AI. Left unchecked, algorithmic systems can scale existing regimes of representation through patterns of over- and underrepresentation in algorithmic systems. Strengthening understanding of such experiences of cultural representations in conversation with impacted communities is required to identify precisely how generative models participate in disempowering logics of cultural representation. This research approach also decenters AI as the sole loci of power or harm, building off work by scholars who have studied algorithms as culturally produced objects and those who have theorized questions of media representation.37 Apart from empirical research, T2I models, and other related generative image technologies, must be historicized within scholarly analyses of other cultural technologies, such as how photography functioned as a technology of cultural memory, propagation, and inclusion and exclusion.38 Insights about the politics of cultural technologies, particularly how they functioned in historically exclusionary ways to certain communities, shed light on how generative image technologies may have complicated relationships with communities historically marginalized from canonical, majoritarian representations.39
Generative AI technologies, such as T2I, are increasingly producing cultural artifacts that can have widespread reach. Just as with other media, developers and users of generative AI tools will need to contend with whose cultural narratives get reproduced and whose cultural knowledge becomes oversimplified or even erased altogether in these systems. We also have to recognize that cultural limitations of generative AI are deeply entangled with structural and power inequalities, and we have to allow for that recognition within our AI development systems.
Discussion Questions
What does cultural representation mean to you, and how might this definition and your experience with past representation impact how you would evaluate representation of your identity in generative AI models? What aspects of your identity do you think you would center when evaluating representations in AI model output?
What do you think is the role of small-scale qualitative evaluations for building more ethical generative AI models? How do they compare to larger, quantitative, benchmark-style evaluations?
Participants in this study shared “aspirations” for generative AI to be more inclusive, but also noted the tensions around making it more inclusive. Do you think AI can be made more globally inclusive?
What mitigations do you think developers could explore to respond to some of the concerns raised by participants in this study?
As mentioned in this case study, representation is not static; it changes over time, is culturally situated, and varies greatly from person to person. How can we “encode” this contextual and changing nature of representation into models, datasets, and algorithms? Is the idea of encoding at odds with the dynamism and fluidity of representation?
How can we learn from the history of technology and media to build more responsible and representative AI models?
Acknowledgments
We thank Vinodkumar Prabhakaran, Michael Madaio, Gurleen Virk, Kathy Meier-Hellstern, Sarah Laszlo, and the anonymous reviewers for their valuable feedback on the paper. We also thank our study participants for sharing their time and expertise.
Bibliography
Ahmad, Fauzia. “Still ‘In Progress?’: Methodological Dilemmas, Tensions and Contradictions in Theorizing South Asian Muslim Women.” In South Asian Women in the Diaspora, edited by Nirmal Puwar and Parvati Raghuram, 43–66. London: Routledge, 2003.
Alkhatib, Ali. “To Live in Their Utopia: Why Algorithmic Systems Create Absurd Outcomes.” In CHI ’21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, edited by Y. Kitamura et al., 1–9. New York: ACM, 2021. https://doi.org/10.1145/3411764.3445740.
Allen, James A. “The Color of Algorithms: An Analysis and Proposed Research Agenda for Deterring Algorithmic Redlining.” Fordham Urban Law Journal 46, no. 2 (2019): 219–70. https://ir.lawnet.fordham.edu/ulj/vol46/iss2/1.
Amrute, Sareeta, Ranjit Singh, and Rigoberto Lara Guzmán. “A Primer on AI in/from the Majority World: An Empirical Site and a Standpoint.” Data & Society, September 14, 2022. http://dx.doi.org/10.2139/ssrn.4199467.
Bansal, Hritik, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang, “How Well Can Text-to-Image Generative Models Understand Ethical Natural Language Interventions?” Preprint submitted October 27, 2022. https://arxiv.org/abs/2210.15230.
Barabas, Chelsea, Colin Doyle, J. B. Rubinovitz, and Karthik Dinakar. “Studying Up: Reorienting the Study of Algorithmic Fairness around Issues of Power.” In FAccT ’20: Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency, edited by M. Hildebrandt et al., 167–76. New York: ACM, 2020. https://doi.org/10.1145/3351095.3372859.
Behdad, Ali, and Luke Gartlan, eds. Photography’s Orientalism: New Essays on Colonial Representation. Los Angeles: Getty Publications, 2013.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. New York: ACM, 2021. https://doi.org/10.1145/3442188.3445922.
Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. Boston: Polity, 2019.
Bhagavan, Manu Belur, and Faisal Bari. “(Mis)Representing Economy: Western Media Production and the Impoverishment of South Asia.” Comparative Studies of South Asia, Africa and the Middle East 21, no. 1 (2001): 99–109. https://doi.org/10.1215/1089201X-21-1-2-99.
Bianchi, Federico, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. “Easily Accessible Text-To-Image Generation Amplifies Demographic Stereotypes at Large Scale.” In FAccT’23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1493–504. New York: ACM, 2023. https://doi.org/10.1145/3593013.3594095.
Birhane, Abeba, and Vinay Uday Prabhu. “Large Image Datasets: A Pyrrhic Win for Computer Vision?” In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1536–46. New York: IEEE, 2021. https://doi.org/10.1109/WACV48630.2021.00158.
Birhane, Abeba, Vinay Uday Prabhu, and Emmanuel Kahembwe. “Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes.” Preprint submitted October 5, 2021. https://arXiv.org/abs/2110.01963.
Birhane, Abeba, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L. Dancy. “The Forgotten Margins of AI Ethics.” In FAcct ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 948–58. New York: ACM, 2022. https://doi.org/10.1145/3531146.3533157.
Braun, Virginia, and Victoria Clarke. “One Size Fits All? What Counts as Quality Practice in (Reflexive) Thematic Analysis?” Qualitative Research in Psychology 18, no. 3 (2021): 328–52. https://doi.org/10.1080/14780887.2020.1769238.
Braun, Virginia, and Victoria Clarke. “Using Thematic Analysis in Psychology.” Qualitative Research in Psychology 3, no. 2 (2006): 77–101. https://doi.org/10.1191/1478088706qp063oa.
Breckenridge, Carol A., and Peter van der Veer. Orientalism and the Postcolonial Predicament: Perspectives on South Asia, 1–19. Philadelphia: University of Pennsylvania Press, 1993.
Brouilette, Sarah. “On the Entrepreneurial Ethos in Aravind Adiga’s The White Tiger.” In Re-Orientalism and South Asian Identity Politics: The Oriental Other Within, edited by Lisa Lau and Cristina Mendes, 40–55. London: Routledge, 2011.
Burr, Jennifer. “Cultural Stereotypes of Women from South Asian Communities: Mental Health Care Professionals’ Explanations for Patterns of Suicide and Depression.” Social Science & Medicine 55, no. 5 (2002): 835–45. https://doi.org/10.1016/S0277-9536(01)00220-9.
Chaudhuri, Shohini. “Snake Charmers and Child Brides: Deepa Mehta’s Water, ‘Exotic’ Representation, and the Cross-Cultural Spectatorship of South Asian Migrant Cinema.” South Asian Popular Culture 7, no. 1 (2009): 7–20. https://doi.org/10.1080/14746680802704956.
Desai, Dipti. “Imaging Difference: The Politics of Representation in Multicultural Art Education.” Studies in Art Education 41, no. 2 (2000): 114–29. https://doi.org/10.1080/00393541.2000.11651670.
Desai, Jigna. “Pulp Frictions.” In Re-Orientalism and South Asian Identity Politics: The Oriental Other Within, edited by Lisa Lau and Cristina Mendes, 72–88. London: Routledge, 2011.
Eriksson Krutrök, Moa, and Mathilda Åkerlund. “Through a White Lens: Black Victimhood, Visibility, and Whiteness in the Black Lives Matter Movement on TikTok.” Information, Communication & Society 26, no. 10 (2022): 1–19. https://doi.org/10.1080/1369118X.2022.2065211.
Ge, Songwei, and Devi Parikh. “Visual Conceptual Blending with Large-scale Language and Vision Models.” Preprint submitted June 27, 2021. https://arxiv.org/abs/2106.14127.
Hall, Stuart. Representation: Cultural Representations and Signifying Practices. London: Sage, 1997.
hooks, bell. Black Looks: Race and Representation. Boston: South End Press, 1992.
Hutchinson, Ben, Jason Baldridge, and Vinodkumar Prabhakaran. “Underspecification in Scene Description-to-Depiction Tasks.” In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 1172–84. Kerrville, TX: Association for Computational Linguistics, 2022. https://aclanthology.org/2022.aacl-main.86.pdf.
Jiwan, Yasmin. “Helpless Maidens and Chivalrous Knights: Afghan Women in the Canadian Press.” University of Toronto Quarterly 78, no. 2 (2009): 728–44. https://doi.org/10.3138/utq.78.2.728.
Inden, Ronald. Imagining India. Bloomington: Indiana University Press, 2001.
Kak, Amba Kak. “‘The Global South Is Everywhere, but Also Always Somewhere’: National Policy Narratives and AI Justice.” In AIES ’20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, edited by A. Markham et al., 307–12. New York: ACM, 2020. https://doi.org/10.1145/3375627.3375859.
Kapania, Shivani, Oliver Siy, Gabe Clapper, Azhagu Meena SP, and Nithya Sambasivan. “‘Because AI is 100% Right and Safe’: User Attitudes and Sources of AI Authority in India.” In CHI ’22:Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, edited by S. Barbosa et al., 1–18. New York: ACM, 2022. https://doi.org/10.1145/3491102.3517533.
Karim, Karim H. “The Historical Resilience of Primary Stereotypes: Core Images of the Muslim Other.” In The Language and Politics of Exclusion: Others in Discourse, edited by Stephen H. Riggins, 153–82. Thousand Oaks, CA: Sage, 1997.
Karizat, Nadia, Dan Delmonaco, Motahhare Eslami, and Nazanin Andalibi. “Algorithmic Folk Theories and Identity: How TikTok Users Co-Produce Knowledge of Identity and Engage in Algorithmic Resistance.” Proceedings of the ACM on Human-Computer Interaction 5, no. CSCW2 (2021): 1–44. https://doi.org/10.1145/3476046.
Lau, Lisa. Re-Orientalism: The Perpetration and Development of Orientalism by Orientals. New York: Cambridge University Press, 2009.
Lau, Lisa, and Ana Cristina Mendes. “Introducing Re-Orientalism: A New Manifestation of Orientalism.” In Re-Orientalism and South Asian Identity Politics: The Oriental Other Within, 1–14. London: Routledge, 2011.
Maira, Sunaina. “Belly Dancing: Arab-Face, Orientalist Feminism, and US Empire.” American Quarterly 60, no. 2 (2008): 317–45. https://doi.org/10.1353/aq.0.0019.
Kumar Malreddy, Pavan. Orientalism, Terrorism, Indigenism: South Asian Readings in Postcolonialism. Thousand Oaks, CA: Sage, 2015.
Mohamed, Shakir, Marie-Therese Png, and William Isaac. “Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence.” Philosophy & Technology 33 (2020): 659–84. https://doi.org/10.1007/s13347-020-00405-8.
Nacos, Brigitte L., and Oscar Torres-Reyna. “Framing Muslim-Americans before and after 9/11.” In Framing Terrorism: The News Media, The Government, and the Public, edited by Pippa Norris, Montague Kern, and Marion Just, 141–66. London: Routledge, 2004.
Oh, David. Whitewashing the Movies: Asian Erasure and White Subjectivity in US Film Culture. New Brunswick, NJ: Rutgers University Press, 2021.
Paullada, Amandalynne, Inioluwa Deborah Raji, Emily M. Bender, Remi Denton, and Alex Hanna. “Data and Its (Dis)contents: A Survey of Dataset Development and Use in Machine Learning Research.” Patterns 2, no. 11 (2021): 100336. https://doi.org/10.1016/j.patter.2021.100336.
Png, Marie-Therese. “At the Tensions of South and North: Critical Roles of Global South Stakeholders in AI Governance.” In FAccT ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 1434–45. New York: ACM, 2022. https://doi.org/10.1145/3531146.3533200.
Prabhakaran, Vinodkumar, Rida Qadri, and Ben Hutchinson. “Cultural Incongruencies in Artificial Intelligence.” Preprint submitted November 19, 2022. https://arXiv.org/abs/2211.13069.
Ramesh, Aditya, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. “Hierarchical Text-Conditional Image Generation with CLIP Latents.” Preprint submitted April 13, 2022. https://arXiv.org/abs/2204.06125.
Rao, Anupama. The Caste Question: Dalits and the Politics of Modern India. Berkeley: University of California Press, 2009.
Rombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. “High-Resolution Image Synthesis with Latent Diffusion Models.” Preprint submitted December 20, 2021. https://arxiv.org/abs/2112.10752.
Rubinstein, Daniel, and Katrina Sluis. “The Digital Image in Photographic Culture: Algorithmic Photography and the Crisis of Representation.” In The Photographic Image in Digital Culture, edited by Martin Lister, 22–40. London: Routledge, 2013.
Saharia, Chitwan, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Remi Denton, Seyed Kamyar Seyed Ghasemipour et al. “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.” Preprint submitted May 23, 2022. https://arXiv.org/abs/2205.11487.
Sambasivan, Nithya. “All Equation, No Human: The Myopia of AI Models.” Interactions 29, no. 2 (2022): 78–80. https://doi.org/10.1145/3516515.
Sambasivan, Nithya, Erin Arnesen, Ben Hutchinson, and Vinodkumar Prabhakaran. “Non-portability of Algorithmic Fairness in India.” Preprint submitted December 3, 2020. https://arXiv.org/abs/2012.03659.
Sambasivan, Nithya, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. “Re-Imagining Algorithmic Fairness in India and Beyond.” In FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 315–28. New York: ACM, 2021. https://doi.org/10.1145/3442188.3445896.
Shelby, Renee, Shalaleh Rismani, Kathryn Henne, AJung Moon, Negar Rostamzadeh, Paul Nicholas, N'Mah Yilla, Jess Gallegos et al. “Sociotechnical Harms: Scoping a Taxonomy for Harm Reduction.” Preprint submitted October 11, 2022. https://arXiv.org/abs/2210.05791.
Sonbol, Amira El Azhary, ed., Beyond the Exotic: Women’s Histories in Islamic Societies. Syracuse, NY: Syracuse University Press, 2005.
Srinivasan, Ramya, and Kanji Uchino. “Biases in Generative Art: A Causal Look from the Lens of Art History.” In FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 41–51. New York: ACM, 2021. https://doi.org/10.1145/3442188.3445869.
Suresh, Harini, Rajiv Movva, Amelia Lee Dogan, Rahul Bhargava, Isadora Cruxen, Angeles Martinez Cuba, Guilia Taurino, Wonyoung So, and Catherine D’Ignazio. “Towards Intersectional Feminist and Participatory ML: A Case Study in Supporting Feminicide Counterdata Collection.” In FAcct ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 667–78. New York: ACM, 2022. https://doi.org/10.1145/3531146.3533132.
Tomasev, Nenad, Jonathan Leader Maynard, and Iason Gabriel. “Manifestations of Xenophobia in AI Systems.” Preprint submitted December 15, 2022. https://arXiv.org/abs/2212.07877.
Wang, Angelina, Solon Barocas, Kristen Laird, and Hanna Wallach. “Measuring Representational Harms in Image Captioning.” In FAccT ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 324–35. New York: ACM, 2022. https://doi.org/10.1145/3531146.3533099.
Weidinger, Laura, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese et al. “Taxonomy of Risks Posed by Language Models.” In FAccT ’22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 214–29. New York: ACM, 2022. https://doi.org/10.1145/3531146.3533088.
Weinberg, Lindsay. “Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML Fairness Approaches.” Journal of Artificial Intelligence Research 74 (2022): 75–109. https://doi.org/10.1613/jair.1.13196.
Wolfe, Robert, Yiwei Yang, Bill Howe, and Aylin Caliskan. “Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias.” In FAccT ’23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1174–85. New York: ACM, 2023. https://doi.org/10.1145/3593013.3594072.
Yu, Jiahui, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan et al. “Scaling Autoregressive Models for Content-Rich Text-to-Image Generation.” Preprint submitted June 22, 2022. https://arXiv.org/abs/2206.10789.