Voice Activity Detection

The RAVATAR platform supports an optional Voice Activity Detection (VAD) mode that provides a hands-free voice interaction experience. When enabled, the microphone automatically activates when the text input is empty, allowing users to speak naturally without holding down the mic button.

Configuration

VAD mode is controlled by the vadEnabled flag in the chat configuration API response:

interface ChatConfigResponse {
// ... other config fields
vadEnabled?: boolean; // Optional, defaults to false
}

Behavior

When VAD mode is disabled (default)

  • Traditional push-to-talk: Hold the microphone button to record
  • Microphone is automatically enabled when text input is empty
  • Release outside the input area to cancel recording

When VAD mode is enabled (vadEnabled: true)

  • Microphone automatically activates when text input is empty
  • Click the microphone button once to manually disable
  • Click again to manually re-enable
  • Microphone automatically disables when typing text
  • Manual disable state is preserved after sending messages
  • Visual indicator: Green microphone icon (#1ba74b) when active
  • Green voice activity shadow for visual feedback

Visual Feedback

  • Active VAD mode: Green microphone icon with green shadow
  • Manually disabled: Red microphone icon with disabled state
  • Recording (non-VAD): Red microphone icon with red shadow
  • Tooltip guidance: Context-sensitive tooltips explain the current interaction mode

Implementation Example

import MessageInput from './components/MessageInput';

// Non-VAD mode (traditional push-to-talk)
<MessageInput
vadEnabled={false} // or omit - defaults to false
onSend={handleSend}
onVoiceSend={handleVoiceSend}
/>

// VAD mode (auto-enable microphone)
<MessageInput
vadEnabled={true}
onSend={handleSend}
onVoiceSend={handleVoiceSend}
/>

API Configuration

The backend API should include the vadEnabled field in the chat configuration endpoint response:

{
"status": 200,
"avatars_info": [],
"languages": [],
"chat_types": [],
"vadEnabled": true
}

User Experience Flow

Non-VAD Mode Flow

  1. User starts with empty text input
  2. Microphone is enabled (icon visible)
  3. User holds microphone button to record
  4. User releases button to send or drags outside to cancel
  5. After sending, microphone re-enables

VAD Mode Flow

  1. User starts with empty text input
  2. Microphone is auto-enabled (green icon)
  3. User can click once to manually disable (red icon)
  4. User types text -- microphone auto-disables
  5. User clears text -- microphone auto-enables again
  6. After sending message, manual disable state persists if user had disabled it

Future Enhancements

The current implementation provides the UX foundation for VAD mode. Future enhancements will include:

  • Browser-based voice activity detection using @ricky0123/vad-web with Silero model
  • Intelligent audio transmission gating (40-60% bandwidth reduction)
  • Noise suppression using native browser APIs and RNNoise
  • Client-side speech detection to minimize audio data transmission
  • Privacy controls with permission priming and clear user consent
  • Mobile optimization handling iOS Safari permission timeouts and battery constraints

Technical Details

Component Architecture

App.tsx
-- Widget.tsx
-- MessageInput.tsx
|-- SendButton.tsx (handles click vs hold detection)
+-- AudioRecorder.tsx (hidden in VAD mode for monitoring)

State Management

MessageInput manages:

  • micEnabled: Whether microphone is currently enabled
  • manuallyDisabled: Whether user manually disabled the mic (VAD mode only)

SendButton manages:

  • Click vs hold detection with 200ms timer threshold
  • Visual feedback based on VAD mode state

CSS Variables

--mic-vad-color: #1ba74b;              /* Green microphone icon in VAD mode */
--mic-vad-shadow-color: rgba(27, 167, 75, 0.2); /* Green shadow for voice activity */
--recording-color: var(--light-red); /* Red for recording/disabled state */

Testing

Comprehensive test coverage should include:

  • MessageInput component tests covering VAD and non-VAD modes
  • SendButton component tests for interaction behavior
  • Backward compatibility tests ensuring existing behavior unchanged
  • Edge case handling and state management tests

Browser Compatibility

Current:

  • All modern browsers (Chrome, Firefox, Safari, Edge)
  • Mobile browsers (iOS Safari, Chrome Mobile)
  • No additional dependencies required

Future (with AudioWorklet-based VAD):

  • Chrome 66+ (AudioWorklet support)
  • Firefox 76+ (AudioWorklet support)
  • Safari 14.1+ (AudioWorklet support)
  • Limited iOS Safari support (permission constraints)

Troubleshooting

Microphone does not auto-enable in VAD mode

Possible causes:

  • vadEnabled is not set to true in API configuration
  • User manually disabled the microphone (check for red icon)
  • Text input is not empty

Solution: Clear text input and ensure API returns vadEnabled: true

Click not working to toggle microphone

Possible causes:

  • Not in VAD mode (vadEnabled: false)
  • In non-VAD mode where hold-to-record is the interaction pattern

Solution: Verify vadEnabled: true in configuration

Green color not appearing

Possible causes:

  • CSS variables not loaded
  • Theme override blocking VAD colors

Solution: Check that --mic-vad-color and --mic-vad-shadow-color are defined in your theme

Migration Guide

From Non-VAD to VAD Mode

  1. Update backend API to include vadEnabled field in configuration response
  2. No frontend code changes required (backward compatible)
  3. Test user experience with vadEnabled: true
  4. Roll out gradually by enabling for specific users/projects

Reverting to Non-VAD Mode

  1. Set vadEnabled: false or omit field in API configuration
  2. Widget automatically reverts to traditional push-to-talk behavior
  3. No frontend code changes required