Voice Activity Detection
The RAVATAR platform supports an optional Voice Activity Detection (VAD) mode that provides a hands-free voice interaction experience. When enabled, the microphone automatically activates when the text input is empty, allowing users to speak naturally without holding down the mic button.
Configuration
VAD mode is controlled by the vadEnabled flag in the chat configuration API response:
interface ChatConfigResponse {
// ... other config fields
vadEnabled?: boolean; // Optional, defaults to false
}
Behavior
When VAD mode is disabled (default)
- Traditional push-to-talk: Hold the microphone button to record
- Microphone is automatically enabled when text input is empty
- Release outside the input area to cancel recording
When VAD mode is enabled (vadEnabled: true)
- Microphone automatically activates when text input is empty
- Click the microphone button once to manually disable
- Click again to manually re-enable
- Microphone automatically disables when typing text
- Manual disable state is preserved after sending messages
- Visual indicator: Green microphone icon (#1ba74b) when active
- Green voice activity shadow for visual feedback
Visual Feedback
- Active VAD mode: Green microphone icon with green shadow
- Manually disabled: Red microphone icon with disabled state
- Recording (non-VAD): Red microphone icon with red shadow
- Tooltip guidance: Context-sensitive tooltips explain the current interaction mode
Implementation Example
import MessageInput from './components/MessageInput';
// Non-VAD mode (traditional push-to-talk)
<MessageInput
vadEnabled={false} // or omit - defaults to false
onSend={handleSend}
onVoiceSend={handleVoiceSend}
/>
// VAD mode (auto-enable microphone)
<MessageInput
vadEnabled={true}
onSend={handleSend}
onVoiceSend={handleVoiceSend}
/>
API Configuration
The backend API should include the vadEnabled field in the chat configuration endpoint response:
{
"status": 200,
"avatars_info": [],
"languages": [],
"chat_types": [],
"vadEnabled": true
}
User Experience Flow
Non-VAD Mode Flow
- User starts with empty text input
- Microphone is enabled (icon visible)
- User holds microphone button to record
- User releases button to send or drags outside to cancel
- After sending, microphone re-enables
VAD Mode Flow
- User starts with empty text input
- Microphone is auto-enabled (green icon)
- User can click once to manually disable (red icon)
- User types text -- microphone auto-disables
- User clears text -- microphone auto-enables again
- After sending message, manual disable state persists if user had disabled it
Future Enhancements
The current implementation provides the UX foundation for VAD mode. Future enhancements will include:
- Browser-based voice activity detection using @ricky0123/vad-web with Silero model
- Intelligent audio transmission gating (40-60% bandwidth reduction)
- Noise suppression using native browser APIs and RNNoise
- Client-side speech detection to minimize audio data transmission
- Privacy controls with permission priming and clear user consent
- Mobile optimization handling iOS Safari permission timeouts and battery constraints
Technical Details
Component Architecture
App.tsx
-- Widget.tsx
-- MessageInput.tsx
|-- SendButton.tsx (handles click vs hold detection)
+-- AudioRecorder.tsx (hidden in VAD mode for monitoring)
State Management
MessageInput manages:
micEnabled: Whether microphone is currently enabledmanuallyDisabled: Whether user manually disabled the mic (VAD mode only)
SendButton manages:
- Click vs hold detection with 200ms timer threshold
- Visual feedback based on VAD mode state
CSS Variables
--mic-vad-color: #1ba74b; /* Green microphone icon in VAD mode */
--mic-vad-shadow-color: rgba(27, 167, 75, 0.2); /* Green shadow for voice activity */
--recording-color: var(--light-red); /* Red for recording/disabled state */
Testing
Comprehensive test coverage should include:
- MessageInput component tests covering VAD and non-VAD modes
- SendButton component tests for interaction behavior
- Backward compatibility tests ensuring existing behavior unchanged
- Edge case handling and state management tests
Browser Compatibility
Current:
- All modern browsers (Chrome, Firefox, Safari, Edge)
- Mobile browsers (iOS Safari, Chrome Mobile)
- No additional dependencies required
Future (with AudioWorklet-based VAD):
- Chrome 66+ (AudioWorklet support)
- Firefox 76+ (AudioWorklet support)
- Safari 14.1+ (AudioWorklet support)
- Limited iOS Safari support (permission constraints)
Troubleshooting
Microphone does not auto-enable in VAD mode
Possible causes:
vadEnabledis not set totruein API configuration- User manually disabled the microphone (check for red icon)
- Text input is not empty
Solution: Clear text input and ensure API returns vadEnabled: true
Click not working to toggle microphone
Possible causes:
- Not in VAD mode (
vadEnabled: false) - In non-VAD mode where hold-to-record is the interaction pattern
Solution: Verify vadEnabled: true in configuration
Green color not appearing
Possible causes:
- CSS variables not loaded
- Theme override blocking VAD colors
Solution: Check that --mic-vad-color and --mic-vad-shadow-color are defined in your theme
Migration Guide
From Non-VAD to VAD Mode
- Update backend API to include
vadEnabledfield in configuration response - No frontend code changes required (backward compatible)
- Test user experience with
vadEnabled: true - Roll out gradually by enabling for specific users/projects
Reverting to Non-VAD Mode
- Set
vadEnabled: falseor omit field in API configuration - Widget automatically reverts to traditional push-to-talk behavior
- No frontend code changes required