Introduction
What if your website could literally talk to your users? Not through pre-recorded audio files or complicated audio libraries, but using the voices already built into their browser?
The Web Speech API makes this possible. With just a few lines of JavaScript, you can transform any text into spoken words, customize the voice, pitch, and speed, and even track which word is being spoken in real-time. Try it out:
Your browser doesn't support the Speech Synthesis API. Try using a modern version of Chrome, Safari, Firefox, or Edge.
Pretty cool, right? This technology opens up incredible possibilities: accessibility tools that read content aloud, language learning apps with pronunciation practice, interactive storytelling with character voices, and creative experiments you haven’t even imagined yet.
In this article, we’ll explore the Speech Synthesis API from the ground up. We’ll start with the basics, progressively build up to advanced patterns, and create plenty of interactive demos you can play with along the way. By the end, you’ll have all the tools you need to give your web apps a voice.
Browser Compatibility Note: The Speech Synthesis API is supported in modern browsers (Chrome, Safari, Edge, Firefox), but voice availability and behavior can vary significantly across platforms. iOS Safari, for example, has more limited voice options than desktop Chrome. We’ll explore these differences throughout this article.
Your First Words: The Basics
The Speech Synthesis API consists of two main pieces: the speechSynthesis object (the controller) and SpeechSynthesisUtterance (the thing being spoken). Think of it like a music player: speechSynthesis is the play/pause/stop controls, while SpeechSynthesisUtterance is the track you want to play.
Here’s the absolute simplest example:
// Your browser's built-in text-to-speech
const utterance = new SpeechSynthesisUtterance("Hello, world!");
speechSynthesis.speak(utterance);
That’s it! Just two lines of code. Let’s break down what’s happening:
- Create an utterance:
new SpeechSynthesisUtterance("Hello, world!")creates a speech request containing the text you want spoken. - Speak it:
speechSynthesis.speak(utterance)adds your utterance to the speech queue and starts speaking it.
The speech synthesis system uses a queue, which means if you call speak() multiple times, each utterance will be spoken in order. Think of it like a playlist - one finishes, then the next begins.
Your browser doesn't support the Speech Synthesis API. Try using a modern version of Chrome, Safari, Firefox, or Edge.
Go ahead, modify the text in the playground above and hear how it sounds. The default voice varies by platform, but we’ll learn how to choose specific voices later.
Customizing the Voice: Parameters
The default voice is fine, but what if you want to make speech faster, slower, higher, or lower? The SpeechSynthesisUtterance object has three properties you can adjust to customize how the speech sounds.
Pitch
Controls how high or low the voice sounds. The value ranges from 0 to 2, with 1 being the default.
0.5= Deep, low-pitched voice1.0= Normal pitch (default)1.5= Higher-pitched, squeakier voice
Rate
Controls the speed of speech. Values range from 0.1 to 10, though most useful values are between 0.5 and 2.
0.5= Half speed (good for language learning)1.0= Normal speed (default)1.5= 1.5x speed (like a podcast on fast-forward)
Volume
Controls how loud the voice is. Values range from 0 (silent) to 1 (full volume).
0= Silent0.5= Half volume1.0= Full volume (default)
Play with these parameters in the interactive demo below. Notice how different combinations can create wildly different effects:
Your browser doesn't support the Speech Synthesis API. Try using a modern version of Chrome, Safari, Firefox, or Edge.
Using Parameters in Code
Here’s how you set these parameters in vanilla JavaScript:
const utterance = new SpeechSynthesisUtterance("I sound different now!");
utterance.pitch = 1.5; // Higher pitched
utterance.rate = 0.8; // Slower
utterance.volume = 0.9; // Slightly quieter
speechSynthesis.speak(utterance);
And here’s a React component that lets users control these parameters:
import { useState } from 'react';
function SpeechDemo() {
const [pitch, setPitch] = useState(1);
const [rate, setRate] = useState(1);
const [volume, setVolume] = useState(1);
const speak = (text: string) => {
const utterance = new SpeechSynthesisUtterance(text);
utterance.pitch = pitch;
utterance.rate = rate;
utterance.volume = volume;
speechSynthesis.speak(utterance);
};
// UI controls here...
}
Browser Note: Most browsers support the full range of pitch and rate values, but some mobile browsers may clamp these values more aggressively. Safari on iOS, for example, limits how high or low the pitch can go compared to desktop Chrome.
Choosing Voices: The Voice Gallery
So far we’ve been using the browser’s default voice. But most browsers actually provide multiple voices across different languages. Some sound robotic, some sound surprisingly natural, and the selection varies wildly depending on your operating system and browser.
You can get all available voices using speechSynthesis.getVoices():
// Get all available voices
const voices = speechSynthesis.getVoices();
voices.forEach(voice => {
console.log(voice.name, voice.lang);
});
Try browsing through all the voices available on your device. The number and quality will vary - I get 80+ voices on macOS Chrome, but only a handful on iOS Safari:
Your browser doesn't support the Speech Synthesis API. Try using a modern version of Chrome, Safari, Firefox, or Edge.
Each SpeechSynthesisVoice object has several properties:
name: The voice’s name (e.g., “Alex”, “Samantha”, “Google US English”)lang: The language code (e.g., “en-US”, “es-ES”, “ja-JP”)localService: Boolean indicating if the voice is on-device (true) or requires network (false)default: Boolean indicating if this is the system’s default voice
The voiceschanged Gotcha
Here’s something that trips up a lot of developers: voices load asynchronously in most browsers. If you try to get voices immediately when your page loads, you’ll often get an empty array:
// Wrong: voices might not be loaded yet!
const voices = speechSynthesis.getVoices(); // Often returns []
console.log(voices.length); // 0 😢
The solution is to wait for the voiceschanged event:
// Right: wait for the event
speechSynthesis.addEventListener('voiceschanged', () => {
const voices = speechSynthesis.getVoices();
console.log('Voices loaded:', voices.length); // 80 🎉
});
In React, you’d typically handle this in a useEffect:
useEffect(() => {
const loadVoices = () => {
const availableVoices = speechSynthesis.getVoices();
setVoices(availableVoices);
};
// Load voices immediately (works in some browsers)
loadVoices();
// Also listen for the voiceschanged event (required in others)
speechSynthesis.addEventListener('voiceschanged', loadVoices);
return () => {
speechSynthesis.removeEventListener('voiceschanged', loadVoices);
};
}, []);
This approach covers both bases: it tries to load voices immediately (works in Firefox and Safari) and also listens for the event (required in Chrome and Edge).
Lifecycle Events: Controlling Playback
Speech isn’t just fire-and-forget. The SpeechSynthesisUtterance object fires events throughout its lifecycle, allowing you to track when speech starts, ends, encounters errors, or even which word is currently being spoken.
The Event Lifecycle
Each utterance can emit several events:
onstart- Fired when speech beginsonend- Fired when speech completesonerror- Fired if something goes wrongonpause- Fired when speech is pausedonresume- Fired when speech resumes after pauseonboundary- Fired at word/sentence boundaries (not supported in all browsers)
Here’s a basic example in vanilla JavaScript:
const utterance = new SpeechSynthesisUtterance("Track my lifecycle!");
utterance.onstart = () => console.log('Started speaking');
utterance.onend = () => console.log('Finished speaking');
utterance.onerror = (event) => console.error('Error:', event.error);
speechSynthesis.speak(utterance);
Interactive Demo
Try the interactive demo below. Click “Speak” and watch the event log populate in real-time. Try pausing and resuming to see how those events fire:
Your browser doesn't support the Speech Synthesis API. Try using a modern version of Chrome, Safari, Firefox, or Edge.
The onboundary event is particularly interesting - it fires at word boundaries, giving you the character index and length of each word. You can use this to highlight words as they’re spoken, create karaoke-style effects, or track reading progress. Unfortunately, not all browsers support it (Firefox and Safari notably don’t).
Building a Reusable Hook
Rather than wiring up all these events every time, let’s create a reusable React hook. This is exactly what all the interactive demos in this article use:
// useSpeechSynthesis.ts
export function useSpeechSynthesis() {
const [voices, setVoices] = useState<SpeechSynthesisVoice[]>([]);
const [speaking, setSpeaking] = useState(false);
const [paused, setPaused] = useState(false);
// Load voices (handling the async gotcha)
useEffect(() => {
const loadVoices = () => {
setVoices(speechSynthesis.getVoices());
};
loadVoices();
speechSynthesis.addEventListener('voiceschanged', loadVoices);
return () => {
speechSynthesis.removeEventListener('voiceschanged', loadVoices);
};
}, []);
const speak = useCallback((text: string, options = {}) => {
speechSynthesis.cancel(); // Clear queue
const utterance = new SpeechSynthesisUtterance(text);
// Apply options
if (options.voice) utterance.voice = options.voice;
if (options.pitch) utterance.pitch = options.pitch;
if (options.rate) utterance.rate = options.rate;
if (options.volume) utterance.volume = options.volume;
// Attach event handlers
utterance.onstart = () => {
setSpeaking(true);
options.onStart?.();
};
utterance.onend = () => {
setSpeaking(false);
options.onEnd?.();
};
utterance.onerror = (event) => {
console.error('Speech error:', event);
setSpeaking(false);
options.onError?.(event);
};
speechSynthesis.speak(utterance);
}, []);
const cancel = useCallback(() => {
speechSynthesis.cancel();
setSpeaking(false);
}, []);
return { speak, cancel, voices, speaking, paused };
}
Now using speech synthesis becomes much simpler:
// In your component
const { speak, voices, speaking, cancel } = useSpeechSynthesis();
// Speak with custom options
speak("Hello!", {
voice: voices[0],
pitch: 1.2,
rate: 1.0,
onEnd: () => console.log('Done!')
});
This hook handles voice loading, state management, and provides a clean API for all our needs. All the interactive demos in this article use this same hook - we’re not reinventing the wheel for each one!
Creative Use Cases
Now that you understand the fundamentals, let’s explore some creative applications. The Speech Synthesis API opens up possibilities that go far beyond simple text-to-speech.
Interactive Storytelling
Imagine a choose-your-own-adventure story where different characters have distinct voices. By switching between voices and adjusting parameters, you can create immersive, dynamic narratives:
// Character voice switching example
const narrator = voices.find(v => v.name.includes('Alex'));
const character = voices.find(v => v.name.includes('Samantha'));
function speakDialogue(text, isNarrator) {
const utterance = new SpeechSynthesisUtterance(text);
utterance.voice = isNarrator ? narrator : character;
utterance.pitch = isNarrator ? 1.0 : 1.3;
speechSynthesis.speak(utterance);
}
// Usage
speakDialogue("Once upon a time...", true);
speakDialogue("Help! A dragon!", false);
You could take this further by:
- Using the
onboundaryevent to highlight text as it’s spoken - Synchronizing animations with speech events
- Letting users skip ahead by canceling the current utterance
- Creating voice-activated choices using the Speech Recognition API
Language Learning
The Speech Synthesis API is perfect for language learning applications. By controlling the rate and selecting native voices for different languages, you can create pronunciation practice tools:
// Language learning helper
function pronunciationPractice(word, language = 'es-ES') {
const voice = voices.find(v => v.lang === language);
// Slow version for learning
const slow = new SpeechSynthesisUtterance(word);
slow.voice = voice;
slow.rate = 0.6;
// Normal speed version
const normal = new SpeechSynthesisUtterance(word);
normal.voice = voice;
normal.rate = 1.0;
// Speak slow first, then normal
speechSynthesis.speak(slow);
slow.onend = () => speechSynthesis.speak(normal);
}
// Try it
pronunciationPractice("¡Hola! ¿Cómo estás?", "es-ES");
This pattern works great for:
- Vocabulary flashcards with audio
- Accent comparison (compare English voice saying Spanish words vs native Spanish voice)
- Pronunciation drills that repeat words at adjustable speeds
- Interactive lessons that respond to user progress
Creative & Experimental
Speech synthesis can be an artistic medium. By randomizing parameters and using the event system creatively, you can create generative audio art:
// Generative poetry reader with random parameters
function readPoetically(text) {
const lines = text.split('\n');
lines.forEach((line, i) => {
const utterance = new SpeechSynthesisUtterance(line);
// Random voice parameters for artistic effect
utterance.pitch = 0.8 + Math.random() * 0.8; // 0.8-1.6
utterance.rate = 0.7 + Math.random() * 0.6; // 0.7-1.3
// Add delay between lines
setTimeout(() => {
speechSynthesis.speak(utterance);
}, i * 2000);
});
}
// Read a poem with varying voice characteristics
const poem = `Roses are red
Violets are blue
This poem sounds weird
Because pitch is askew`;
readPoetically(poem);
Other creative ideas:
- Voice-based games: Speak clues in a mystery game, or have enemies taunt players
- Data sonification: “Speak” numbers from charts to make data more accessible
- Generative music: Use speech as a rhythmic or melodic element
- Interactive art: Create installations that respond to user input with synthesized speech
The key is experimentation. Try combining speech with other web APIs - like the Web Audio API for effects, Canvas for visualizations, or Gamepad API for voice-controlled games.
Browser Compatibility & Gotchas
The Speech Synthesis API is widely supported, but with significant differences in implementation quality and available features. Let’s dig into the details so you know what to expect.
Support Matrix
Here’s a breakdown of feature support across major browsers:
| Feature | Chrome | Firefox | Safari | Edge | iOS Safari | Android Chrome |
|---|---|---|---|---|---|---|
| Basic synthesis | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Voice selection | ✅ | ✅ | ⚠️ Limited | ✅ | ⚠️ Very Limited | ✅ |
| Full pitch/rate range | ✅ | ✅ | ⚠️ Clamped | ✅ | ⚠️ Clamped | ✅ |
onboundary event | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ |
pause()/resume() | ✅ | ✅ | ✅ | ✅ | ⚠️ Buggy | ✅ |
✅ = Fully supported ⚠️ = Partially supported or has quirks ❌ = Not supported
Platform-Specific Quirks
iOS Safari
iOS Safari has the most limitations:
- Very few voices: Often just 2-3 voices available (compared to 80+ on desktop)
- Requires user interaction: Speech won’t work until the user has interacted with the page (click, tap, etc.)
- No background playback: Speech stops when the app is backgrounded
- Parameter clamping: Pitch and rate values are more restricted than on desktop
- Pause/resume issues: The pause() and resume() methods can be unreliable
// iOS-friendly pattern: trigger speech from a user event
button.addEventListener('click', () => {
const utterance = new SpeechSynthesisUtterance("Hello!");
speechSynthesis.speak(utterance);
});
Android
Android’s implementation varies based on the system TTS engine:
- System-dependent voices: Voice quality and selection depend on what TTS engines the user has installed
- Google voices are common: Most Android devices have Google TTS pre-installed
- Generally good support: Most features work as expected on modern Android versions
Desktop Browsers
Desktop browsers generally have the best support:
- Chrome/Edge: Excellent support, extensive voice libraries (Google voices + system voices)
- Firefox: Good support, but lacks
onboundaryevent - Safari: Good support, but limited to system voices (high quality, but fewer options)
Common Gotchas
1. Voice Loading Timing
We covered this earlier, but it’s worth repeating: voices load asynchronously in most browsers. Always use the voiceschanged event:
speechSynthesis.addEventListener('voiceschanged', () => {
const voices = speechSynthesis.getVoices();
// Now you can use voices
});
2. Queue Behavior
The speech queue can sometimes get stuck, especially when rapidly calling speak() multiple times. Always cancel before speaking new text:
// Clear the queue if things get stuck
speechSynthesis.cancel();
// Then speak your new text
const utterance = new SpeechSynthesisUtterance("New text");
speechSynthesis.speak(utterance);
3. User Interaction Requirements
Many mobile browsers (especially iOS Safari) require user interaction before allowing speech synthesis. This is similar to autoplay restrictions for video and audio:
// This might not work on page load
speechSynthesis.speak(new SpeechSynthesisUtterance("Hello!"));
// This will work after a user click
button.addEventListener('click', () => {
speechSynthesis.speak(new SpeechSynthesisUtterance("Hello!"));
});
4. Long Text Truncation
Some browsers (notably Chrome on some platforms) may cut off text after ~200-300 characters. The workaround is to chunk your text:
// Workaround: chunk long text
function speakLongText(text) {
// Split into ~200 character chunks at sentence boundaries
const chunks = text.match(/.{1,200}/g) || [text];
chunks.forEach((chunk, i) => {
const utterance = new SpeechSynthesisUtterance(chunk);
if (i === chunks.length - 1) {
utterance.onend = () => console.log('Complete!');
}
speechSynthesis.speak(utterance);
});
}
5. Rate Limits
Some browsers may rate-limit or restrict speech synthesis if called too frequently. Be mindful of how often you’re triggering speech, especially in response to user input.
Feature Detection & Fallbacks
Always check for browser support before using the API:
if ('speechSynthesis' in window) {
// Use speech synthesis
const utterance = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utterance);
} else {
// Fallback: show text in a modal, use audio files, etc.
showTextFallback(text);
}
You can also check for specific features:
// Check if onboundary is supported
const utterance = new SpeechSynthesisUtterance();
const hasBoundary = 'onboundary' in utterance;
if (hasBoundary) {
// Use word highlighting features
} else {
// Skip word-by-word tracking
}
The key takeaway: test your implementation across different platforms, especially if you’re targeting mobile users. What works perfectly on desktop Chrome might need adjustments for iOS Safari.
Wrapping Up
We’ve covered a lot of ground! From the basics of creating your first utterance to advanced patterns with event handling, voice selection, and creative applications. Here’s what we explored:
- The fundamentals: How
speechSynthesisandSpeechSynthesisUtterancework together - Customization: Adjusting pitch, rate, and volume to create different effects
- Voice selection: Browsing and choosing from available voices (and handling the async loading quirk)
- Events: Tracking speech lifecycle with
onstart,onend,onboundary, and other events - Creative applications: Interactive storytelling, language learning, and experimental uses
- Browser quirks: Platform-specific limitations and workarounds
The Speech Synthesis API is just one half of the Web Speech API. The other half is the Speech Recognition API, which does the opposite - it listens to spoken words and converts them to text. Combine both, and you can create fully voice-interactive applications.
This technology is mature enough for production use, but remember to:
- Test across multiple browsers and devices
- Provide fallbacks for unsupported browsers
- Consider accessibility implications
- Respect user preferences (some users may find unexpected speech jarring)
Resources
Want to dive deeper? Here are some helpful resources:
If you liked this article and think others should read it, please share it on Twitter!