Why in the News?
India has made its first major push into foundational AI model training by releasing domestically developed 35B and 105B parameter LLMs using subsidised Graphics Processing Unit (GPU) infrastructure under the IndiaAI Mission. With over 36,000 GPUs commissioned and 4,096 allocated to select firms, the move marks a strategic shift from dependence on foreign frontier models to state-supported indigenous AI capability.
Why Is Training Large Language Models on Indian Soil Financially and Logistically Challenging?
- GPU Dependence: Requires high-end Graphics Processing Units for model training and inference; combined hardware and electricity costs run into millions of dollars.
- Electricity Intensity: Compute-heavy training increases power consumption and operational expenses.
- Capital Requirements: Large upfront investment limits private-sector experimentation in foundational AI.
- Data Constraints: Internet training corpora disproportionately represent English and European languages.
- Token Inefficiency: Indian language tasks require more tokens due to translation layers, increasing inference cost.
How Has the IndiaAI Mission Lowered Entry Barriers for Domestic AI Firms?
- Public Compute Infrastructure: Commissioned 36,000+ GPUs in domestic data centres operated by firms such as Yotta.
- Cluster Allocation: Provided 4,096 GPUs through a shared government compute facility.
- Subsidised Access: Enabled startups and researchers to train and deploy models at relatively nominal fees.
- Institutional Facilitation: Ministry of Electronics and Information Technology supports long-term indigenous AI capacity.
- Ecosystem Development: Encourages domestic research, experimentation, and AI entrepreneurship.
How Does the Mixture of Experts (MoE) Architecture Improve Cost Efficiency in Model Deployment?
- Selective Activation: Activates only a fraction of parameters during inference rather than the full network.
- Compute Reduction: Lowers electricity consumption compared to dense models.
- Inference Efficiency: Enables large models such as 105B parameters to run at lower operational cost.
- Scalable Design: Allows domestic firms to optimise performance without matching trillion-parameter scale.
- Cost Competitiveness: Enhances feasibility of AI deployment in education, healthcare, and governance contexts.
Does Parameter Size Alone Determine Strategic AI Capability?
- Model Scale: Domestic models at 35B and 105B parameters remain smaller than global frontier systems.
- Contextual Alignment: Designed for Indian languages and domestic sectoral use.
- Sector-Specific Model: A 17B multilingual model developed for education and healthcare applications.
- Incremental Scaling Strategy: Prioritises contextual performance before expanding model size.
- Capability Gap: Comparative benchmarking with frontier systems remains limited.
How Does Linguistic Data Imbalance Affect Digital Inclusion?
- Language Dominance: English and European languages dominate global internet datasets.
- Indian Language Underrepresentation: Limits model accuracy in vernacular contexts.
- Translation Dependence: Machine translation remains inferior to native-language modelling.
- Governance Impact: Weak vernacular performance may affect citizen-facing digital services.
- Inclusion Objective: Indigenous LLMs aim to strengthen equitable AI access.
What Transparency and Accountability Concerns Arise from Publicly Funded AI Infrastructure?
- Open-Source Ambiguity: Models described as open but not fully accessible on major global platforms.
- Limited Independent Scrutiny: Restricted external evaluation affects benchmarking.
- Public Investment Oversight: Large-scale GPU subsidies require measurable performance assessment.
- Benchmark Transparency: Absence of publicly standardised comparison metrics.
- Energy Governance: Limited disclosure of sustainability audits for compute-intensive infrastructure.
Way Forward: Strengthening Indigenous AI Capacity
- Transparent Benchmarking: Establishes clear performance metrics for publicly funded LLMs against global standards to ensure accountability.
- Green Compute Standards: Mandates energy-efficiency norms and renewable integration for GPU-intensive data centres.
- Vernacular Data Expansion: Builds high-quality Indian language datasets through public–private collaboration.
- Outcome-Linked Subsidy: Links GPU allocation and funding to measurable innovation and adoption outcomes.
- Regulatory Framework: Defines standards for data governance, algorithmic transparency, and institutional accountability.
Conclusion
India’s entry into foundational LLM training marks a shift from AI consumption to domestic capability creation. Public compute subsidies under the IndiaAI Mission reduce entry barriers but require transparent benchmarking, fiscal oversight, and sustainability safeguards. Long-term competitiveness will depend on strengthening vernacular data ecosystems, improving cost-efficient architectures, and institutionalising regulatory accountability.
PYQ Relevance
[UPSC 2023] Introduce the concept of Artificial Intelligence (AI). How does AI help clinical diagnosis? Do you perceive any threat to privacy of the individual in the use of AI in healthcare?
Linkage: Indigenous LLM development strengthens AI capability for governance and sectoral applications such as healthcare diagnostics. It simultaneously raises concerns of data protection, algorithmic transparency, and privacy, core issues highlighted in the 2023 AI question.
Get an IAS/IPS ranker as your 1: 1 personal mentor for UPSC 2024

