India Reviews Progress of Sovereign AI Initiative, BharatGen, Advancing Multimodal Capabilities

09:24, 26 November

Edited by: Vera Mo

The operational status of India's inaugural sovereign Large Language Model (LLM) initiative, 'BharatGen,' was formally reviewed on November 25, 2025, at the Indian Institute of Technology Bombay (IIT Bombay). Dr. Jitendra Singh, the Union Minister of State (Independent Charge) for Science & Technology, chaired the review, during which Professor Ganesh Ramakrishnan, the Professor-in-Charge of BharatGen, presented the model's status as a forthcoming national artificial intelligence asset. This demonstration highlighted the project's trajectory toward achieving indigenous technological self-reliance in generative AI systems, aligning with the nation's strategic digital objectives.

BharatGen's core architecture is engineered to encompass the linguistic, cultural, and social diversity of India, with a stated capacity to support more than twenty-two distinct Indian languages. The system integrates three primary modalities: text processing, speech recognition and generation, and document vision, enabling it to interpret and generate information reflecting the natural communication patterns of Indian citizens. This development supports the strategic vision articulated by Prime Minister Narendra Modi for technology solutions deeply rooted in India's unique national strengths and heritage.

The financial foundation for the undertaking is provided through a dual-source mechanism. The Department of Science and Technology (DST), via the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), channeled an initial allocation of Rs 235 crore through the Technology Innovation Hub at IIT Bombay. Further support comes from the Ministry of Electronics and Information Technology (MeitY), which secured an additional ₹1,058 crore under the comprehensive India AI Mission framework, underscoring a high-level governmental commitment to the project's scale.

Several foundational models were presented during the review. These included Param-1, the primary text-based LLM, which features 2.9 billion parameters and was trained on 7.5 trillion tokens, with over one-third of the data specifically focused on Indian-centric content. Complementing this are Shrutam, an Automatic Speech Recognition (ASR) system with 30 million parameters, and Sooktam, a Text-to-Speech (TTS) model with 150 million parameters operational in nine Indic languages. Furthermore, the initiative produced Patram, India's first document-vision model, utilizing seven billion parameters trained on 2.5 billion tokens to interpret complex Indian document formats.

To validate practical utility, proof-of-concept applications were demonstrated, including Krishi Sathi, a voice-enabled tool deployed via WhatsApp to provide information to the farming community. A critical component for long-term autonomy is Bharat Data Sagar, a repository established to ensure India's governance over its digital knowledge resources, reinforcing digital sovereignty. The effort involves a consortium of premier technical institutions, including the Indian Institute of Technology Madras, the International Institute of Information Technology Hyderabad, and the Indian Institute of Technology Kanpur, pooling expertise for a robust and inclusive AI framework.