The Sarvam AI Voice API is probably the most interesting thing to come out of Indian AI this year, and I say that having watched this space closely for a while. If you've ever tried building a voice app in Hindi, or Tamil, or Bengali, or Marathi, you know the pain. English TTS sounds fine. Anything in a regional language sounds like a robot having a stroke. Sarvam is trying to fix that, and honestly, they're getting pretty close.
What is the Sarvam AI Voice API?
Sarvam AI is a Bengaluru-based startup building AI specifically for Indian languages from the ground up. Not fine-tuned English models. Not translated outputs. Models trained with Indian language data at the core.
Their voice stack has two main pieces: Bulbul (text-to-speech) and Saaras (speech-to-text). The TTS side just got a significant upgrade. Bulbul V3 launched in May 2026 as part of what the company called a 14-day product launch blitz ahead of the India AI Impact Summit. V3 is described as "natural, expressive, and production-ready," which matters because earlier versions were good but had that slight uncanny quality at emotional peaks.
The API supports 10+ Indian languages including Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, Marathi, Gujarati, Punjabi, and Odia. You can generate speech, transcribe audio, and with their voice agent stack, build full conversational voice applications.
Google CEO Sundar Pichai has publicly mentioned being impressed with Sarvam's work. That's worth noting, but what actually matters for you is whether it works for your specific use case.
Bulbul V3: what changed in the latest TTS model
Bulbul V3 is the text-to-speech model you'd use to give your app a voice. The improvements in V3 are mostly around prosody, that's how the voice rises and falls, where it pauses, how it handles punctuation. Earlier versions were intelligible but a bit flat. V3 sounds more like someone actually reading with intent.
A few things worth knowing:
- Multiple speaker personas available for Hindi and some other languages
- Supports pace and emphasis controls similar to SSML
- Low latency, suitable for real-time applications
- Handles code-switching reasonably well (Hindi sentences with embedded English terms, which is just how people actually talk)
I tried generating sample phrases in Hindi and Bengali. Hindi is genuinely impressive, the intonation feels natural, not performative. Bengali is decent but not quite at the same level yet. The gap between languages is real, and you should test your specific language before committing.
Getting started: the API basics
The API is REST-based, which means you can call it from any language, Python, Node, PHP, whatever you're using. Here's the basic flow:
- Sign up at sarvam.ai and generate an API key from the developer dashboard
- For TTS, send a POST request to the
/text-to-speechendpoint with your text, language code, and speaker preference - Get back a base64-encoded audio file in WAV format
- Decode and play it in your app
A minimal Python call looks something like this:
import requests
response = requests.post(
"https://api.sarvam.ai/text-to-speech",
headers={"API-Subscription-Key": "your_key_here"},
json={
"inputs": ["नमस्ते, आपका ऑर्डर तैयार है।"],
"target_language_code": "hi-IN",
"speaker": "meera",
"pitch": 0,
"pace": 1.0,
"loudness": 1.5,
"enable_preprocessing": True
}
)
The language codes follow the standard format: hi-IN for Hindi, ta-IN for Tamil, bn-IN for Bengali. The enable_preprocessing flag is useful. It handles numbers, dates, and abbreviations more naturally, converting "500" to "paanch sau" rather than reading out individual digits. Turn it on by default.
Pricing and the free tier in 2026
Sarvam offers a free tier that's genuinely usable for development and light testing. For production, pricing is based on character count for TTS and audio duration for STT. I'd check the official pricing page directly since they've been updating tiers during their launch period, but the rates sit well below what you'd pay for ElevenLabs for Indian language content.
For a small app sending around 10,000 short TTS requests a month (order confirmations, delivery updates, form readouts), you're likely looking at under ₹1,000 a month. That's workable even for a bootstrapped product.
For startups, Sarvam runs an active startup program in 2026 with free API credits and technical support. Worth applying if you're building at any serious scale.
Real deployments already live in India
This isn't theoretical. Swiggy has partnered with Sarvam AI to deploy multilingual voice-based shopping across food delivery, Instamart, and Dineout. Users can now place orders by voice in their preferred language. That's a production deployment at real scale, and it tells you something about where enterprise confidence in this API currently stands.
The obvious use cases beyond food delivery:
- IVR systems: replace the robotic English voice in customer service calls with natural Hindi or regional audio
- EdTech apps: read lessons, questions, or feedback aloud in the student's mother tongue
- Accessibility tools: read forms, documents, or government portals aloud for users less comfortable with text
- E-commerce notifications: order confirmations and delivery updates in the customer's language
- Agriculture and rural apps: voice interfaces for users who aren't fluent in English or even standard Hindi
That last one is underappreciated. A lot of government-adjacent apps and agri-tech products struggle because they're built for English-comfortable users. Voice in the right language changes the entire experience, especially for Jan Dhan account holders, PMFBY beneficiaries, and people interacting with digital services for the first time.
How Sarvam AI compares to other voice API options
| Feature | Sarvam AI | ElevenLabs | Google Cloud TTS | Azure Neural TTS |
|---|---|---|---|---|
| Hindi voice quality | Excellent | Decent | Good | Good |
| Regional Indian languages | 10+ natively | Limited | Most major ones | Most major ones |
| Code-switching (Hinglish) | Yes | Partial | Limited | Limited |
| India-friendly pricing (INR) | Yes | USD only | USD billing | USD billing |
| Free developer tier | Yes | Yes (limited) | Yes (limited) | Yes (limited) |
| Full voice agent stack | Yes, available | Yes | Requires assembly | Requires assembly |
For English, ElevenLabs still wins on expressiveness and voice variety. That's just the truth. But for Hindi and regional Indian languages, Sarvam is ahead, and it's not even close for languages like Odia or Marathi where Western providers have done minimal work.
Google Cloud TTS has been around longer and probably edges ahead on reliability at very high volume. But their Indian language voices still feel formal and stiff next to Bulbul V3.
The open-source models and what they mean for developers
Something that doesn't get enough attention: Sarvam has open-sourced their 30B and 105B language models. If you're a researcher, a university, or a startup that wants to self-host and fine-tune for a specific domain, this is a real advantage. Most voice AI companies treat their models as completely closed. Sarvam's approach means you're not permanently locked in, and you can adapt models to medical terminology, legal language, or specific regional dialects.
The open models won't replace the API for most production use cases. Hosting a 30B model isn't trivial in terms of GPU cost. But for experimentation and academic research, having the weights available matters. And for context, IndiaAI's compute platform offers subsidized GPU access for Indian researchers, which makes self-hosting more feasible than it sounds.
Should you build on Sarvam AI in 2026?
Sarvam is reportedly in advanced talks to raise $300-350 million at a valuation of $1.5-1.55 billion. That signals enough runway to keep the platform developing. If you're building something serious on top of their API, platform longevity matters, and Sarvam looks stable compared to many smaller Indian AI voice startups with uncertain funding.
The limitations are real. Some smaller language variants still sound synthetic. The documentation has gaps you'll hit through trial and error rather than reading. Voice cloning isn't in the public API yet. And for very high-volume deployments, you'll want to run your own latency benchmarks rather than trusting marketing claims.
But for a developer in India who wants to add Hindi or regional language voice to an app without paying USD-denominated pricing to a foreign provider, the Sarvam AI Voice API is the most complete, production-ready option available right now. The Swiggy integration alone tells you it can handle real traffic at production scale.
If you're curious about the broader Indian AI tool landscape, check out our AI tools directory or read through our explainers on how voice AI works under the hood. And to track the latest API updates and model launches from Sarvam and other Indian AI platforms, our news section covers it regularly.