Live Mode
Live Mode provides real-time interactive sessions with streaming avatar responses via PixelStreaming. This mode requires additional session management and supports continuous audio streaming using PCM format for Gemini API integration.
Overview
Key differences from Chat Mode:
- Requires explicit session start/end via REST endpoints
- Uses
LicenseIdfor session tracking - Supports real-time PCM audio streaming
- Avatar responses are streamed in real-time via 3D rendering
- Voice Activity Detection (VAD) auto-stops on speech end
Starting Live Session
Before using Live Mode, start a live session to obtain streaming credentials.
Endpoint:
POST /startLiveSession
Request Headers:
Content-Type: application/json
Authorization: Bearer {jwt_token}
Request Body:
{
"avatar_id": "avatar-id-123",
"width": "1280",
"height": "720"
}
| Field | Type | Required | Description |
|---|---|---|---|
avatar_id | string | Yes | Avatar identifier |
width | string | No | Video resolution width in pixels (default: "1280") |
height | string | No | Video resolution height in pixels (default: "720") |
Field Validation:
| Field | Constraints |
|---|---|
avatar_id | Must be a valid avatar ID from /connection response |
width | String representation of width in pixels. Must be an integer ≤ 1920 (e.g., "640", "1280", "1920") |
height | String representation of height in pixels. Must be an integer ≤ 1080 (e.g., "360", "720", "1080") |
cURL Example:
curl -X POST https://chat.rvtr.ai/startLiveSession \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-d '{"avatar_id": "avatar-id-123", "width": "1280", "height": "720"}'
Response (Success):
{
"status": 200,
"LicenseId": "license-abc-123",
"streamingUrl": "https://streaming.example.com/stream"
}
Response (Error):
{
"status": 503,
"error": "All AI assistants are busy at the moment"
}
Status Codes:
| Status Code | Description | Meaning |
|---|---|---|
| 200 | Success | Live session started successfully |
| 503 | Service Unavailable | All AI assistants are currently busy |
| 504 | Gateway Timeout | Stream connection timed out |
| -1 | Unknown Error | Generic error occurred |
Response Fields:
| Field | Type | Description |
|---|---|---|
status | number | Status code (see table above) |
LicenseId | string | License ID for session tracking (required for all Live Mode messages). Only present on success. |
streamingUrl | string | HTTP/HTTPS URL for iframe-based WebRTC video streaming (PixelStreaming). Only present on success. |
error | string | Error message describing the failure. Only present on error. |
Store the LicenseId -- it must be included in all Live Mode WebSocket messages and is required to end the session.
Iframe Integration
The streamingUrl returned by startLiveSession is rendered inside an iframe to display the 3D avatar. The iframe URL must include the JWT token for authentication:
const sep = streamingUrl.includes("?") ? "&" : "?";
const iframeUrl = `${streamingUrl}${sep}token=${encodeURIComponent(jwtToken)}`;
If your application is served over HTTPS, the streamingUrl must also use HTTPS. Mixed content (HTTPS page loading HTTP iframe) is blocked by modern browsers.
If the stream opens correctly in a new browser tab but shows a black screen inside the iframe, the PixelStreaming page is likely blocking embeds via X-Frame-Options or CSP frame-ancestors headers. This requires a server-side configuration change on the streaming server. Check your browser DevTools Console for related errors.
Ending Live Session
End the live session when the user exits or the app closes.
Endpoint:
POST /endLiveSession
Request Headers:
Content-Type: application/json
Authorization: Bearer {jwt_token}
Request Body:
{
"LicenseId": "license-abc-123"
}
cURL Example:
curl -X POST https://chat.rvtr.ai/endLiveSession \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-d '{"LicenseId": "license-abc-123"}'
Response:
{
"status": "success"
}
App Lifecycle Handling:
For reliable session cleanup when the app is closed or goes to background, use a "fire and forget" approach:
ON app_closing OR app_going_to_background:
POST /endLiveSession
Headers:
Content-Type: application/json
Authorization: Bearer {jwtToken}
Body:
{ "LicenseId": "{licenseId}" }
Options:
keepalive: true // Ensures request completes even if app closes
- iOS: Use
beginBackgroundTaskfor background execution - Android: Use
WorkManageror allow request to complete beforeonDestroy - Web: Use
fetchwithkeepalive: trueoption
Gemini Native Audio Format (PCM)
For Live Mode with Gemini API integration, use raw PCM audio format for optimal streaming performance.
| Property | Value |
|---|---|
| Format | Raw PCM (uncompressed) |
| Sample Rate | 16,000 Hz (16 kHz) |
| Bit Depth | 16-bit signed integer |
| Channels | Mono (1 channel) |
| Byte Order | Little-endian |
| Buffer Size | 4,096 samples per chunk |
| Encoding | Base64 |
| Transmission Field | audio_base64_pcm |
PCM Conversion (Float32 to Int16):
FOR EACH sample IN float32_audio_buffer:
// Clamp value to valid range [-1.0, 1.0]
clamped = MAX(-1.0, MIN(1.0, sample))
// Convert to 16-bit signed integer [-32768, 32767]
int16_value = ROUND(clamped * 32767)
// Write as little-endian bytes
WRITE int16_value TO output_buffer
END FOR
Buffer Size Calculation:
- Samples per buffer: 4,096
- Bytes per buffer: 8,192 (4,096 samples x 2 bytes per sample)
- Duration per buffer: ~256ms at 16 kHz
Live Mode Messages
Live Mode messages include additional fields for session tracking.
Text Message (Live Mode):
{
"avatar_id": "avatar-id-123",
"user_id": "user-123",
"chat_type": "text",
"language": "en",
"request": "Tell me about the weather",
"requestType": "text",
"source": "my-source-app-name",
"isLive": true,
"LicenseId": "license-abc-123",
"session": "session-identifier"
}
Audio Message with PCM (Live Mode):
{
"avatar_id": "avatar-id-123",
"user_id": "user-123",
"chat_type": "voice",
"language": "en",
"requestType": "audio",
"source": "my-source-app-name",
"isLive": true,
"LicenseId": "license-abc-123",
"audio_base64_pcm": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVV...",
"session": "session-identifier"
}
| Field | Type | Required | Description |
|---|---|---|---|
isLive | boolean | Yes | true for Live Mode |
LicenseId | string | Yes | License ID from startLiveSession |
audio_base64_pcm | string | For audio | Base64-encoded PCM audio data |
Live Mode can also send standard blob audio (WAV/WebM). In that case, use the standard audio message format with file_base64_data, but set isLive: true and include LicenseId. See the WebSocket API for the full message structure.
Voice Interaction Modes
In Live Mode, users can interact with the avatar using voice through several approaches:
- Voice Activity Detection (VAD) -- Microphone automatically activates and detects speech. See Voice Activity Detection for details.
- Push-to-talk -- User holds a button to record audio
- Record and send -- User records a clip, then sends it