As generative AI becomes a default gateway to information, it risks deepening long-standing imbalances in what knowledge gets seen and validated. The essay argues that large language models, trained primarily on English-language, Western-centric data, underplay local languages and oral traditions—diminishing indigenous ecological know-how, traditional health practices, and region-specific craft. Evidence ranges from Common Crawl’s English dominance (45%) versus minimal representation for Hindi (0.2%) and Tamil (0.04%) to model dynamics like “mode amplification” and RLHF, which favor mainstream answers. The result, researchers warn, could be a “knowledge collapse,” as AI-generated content increasingly feeds back into future training sets, crowding out the long tail of human experience. Case studies—from Bengaluru’s sidelined lake-management traditions to the limits of AI tutors in capturing local context—illustrate how institutional incentives and liability concerns privilege research-backed advice over community practices. The piece contends that without new data strategies, validation methods, and participatory design, AI could entrench cultural hegemony and erase valuable, place-based wisdom at the moment climate and development challenges most require it.
Related articles:
— Model Collapse: When Foundation Models Are Trained on Their Own Generated Data
— The Curse of Recursion: Training on Generated Data Makes Models Forget
— Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence
— Whose Opinions Do Language Models Reflect?
— Beyond English-Centric Multilingual Machine Translation





























