Live Mode

Live Mode provides real-time interactive sessions with streaming avatar responses via PixelStreaming. This mode requires additional session management and supports continuous audio streaming using PCM format for Gemini API integration.

Overview

Key differences from Chat Mode:

  • Requires explicit session start/end via REST endpoints
  • Uses LicenseId for session tracking
  • Supports real-time PCM audio streaming
  • Avatar responses are streamed in real-time via 3D rendering
  • Voice Activity Detection (VAD) auto-stops on speech end

Starting Live Session

Before using Live Mode, start a live session to obtain streaming credentials.

Endpoint:

POST /startLiveSession

Request Headers:

Content-Type: application/json
Authorization: Bearer {jwt_token}

Request Body:

{
"avatar_id": "avatar-id-123",
"width": "1280",
"height": "720"
}
FieldTypeRequiredDescription
avatar_idstringYesAvatar identifier
widthstringNoVideo resolution width in pixels (default: "1280")
heightstringNoVideo resolution height in pixels (default: "720")

Field Validation:

FieldConstraints
avatar_idMust be a valid avatar ID from /connection response
widthString representation of width in pixels. Must be an integer ≤ 1920 (e.g., "640", "1280", "1920")
heightString representation of height in pixels. Must be an integer ≤ 1080 (e.g., "360", "720", "1080")

cURL Example:

curl -X POST https://chat.rvtr.ai/startLiveSession \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-d '{"avatar_id": "avatar-id-123", "width": "1280", "height": "720"}'

Response (Success):

{
"status": 200,
"LicenseId": "license-abc-123",
"streamingUrl": "https://streaming.example.com/stream"
}

Response (Error):

{
"status": 503,
"error": "All AI assistants are busy at the moment"
}

Status Codes:

Status CodeDescriptionMeaning
200SuccessLive session started successfully
503Service UnavailableAll AI assistants are currently busy
504Gateway TimeoutStream connection timed out
-1Unknown ErrorGeneric error occurred

Response Fields:

FieldTypeDescription
statusnumberStatus code (see table above)
LicenseIdstringLicense ID for session tracking (required for all Live Mode messages). Only present on success.
streamingUrlstringHTTP/HTTPS URL for iframe-based WebRTC video streaming (PixelStreaming). Only present on success.
errorstringError message describing the failure. Only present on error.
warning:  

Store the LicenseId -- it must be included in all Live Mode WebSocket messages and is required to end the session.

Iframe Integration

The streamingUrl returned by startLiveSession is rendered inside an iframe to display the 3D avatar. The iframe URL must include the JWT token for authentication:

const sep = streamingUrl.includes("?") ? "&" : "?";
const iframeUrl = `${streamingUrl}${sep}token=${encodeURIComponent(jwtToken)}`;
HTTPS Required:  

If your application is served over HTTPS, the streamingUrl must also use HTTPS. Mixed content (HTTPS page loading HTTP iframe) is blocked by modern browsers.

Black Screen in Iframe:  

If the stream opens correctly in a new browser tab but shows a black screen inside the iframe, the PixelStreaming page is likely blocking embeds via X-Frame-Options or CSP frame-ancestors headers. This requires a server-side configuration change on the streaming server. Check your browser DevTools Console for related errors.

Ending Live Session

End the live session when the user exits or the app closes.

Endpoint:

POST /endLiveSession

Request Headers:

Content-Type: application/json
Authorization: Bearer {jwt_token}

Request Body:

{
"LicenseId": "license-abc-123"
}

cURL Example:

curl -X POST https://chat.rvtr.ai/endLiveSession \
-H "Content-Type: application/json" \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-d '{"LicenseId": "license-abc-123"}'

Response:

{
"status": "success"
}

App Lifecycle Handling:

For reliable session cleanup when the app is closed or goes to background, use a "fire and forget" approach:

ON app_closing OR app_going_to_background:
POST /endLiveSession
Headers:
Content-Type: application/json
Authorization: Bearer {jwtToken}
Body:
{ "LicenseId": "{licenseId}" }
Options:
keepalive: true // Ensures request completes even if app closes
Platform-specific cleanup:  
  • iOS: Use beginBackgroundTask for background execution
  • Android: Use WorkManager or allow request to complete before onDestroy
  • Web: Use fetch with keepalive: true option

Gemini Native Audio Format (PCM)

For Live Mode with Gemini API integration, use raw PCM audio format for optimal streaming performance.

PropertyValue
FormatRaw PCM (uncompressed)
Sample Rate16,000 Hz (16 kHz)
Bit Depth16-bit signed integer
ChannelsMono (1 channel)
Byte OrderLittle-endian
Buffer Size4,096 samples per chunk
EncodingBase64
Transmission Fieldaudio_base64_pcm

PCM Conversion (Float32 to Int16):

FOR EACH sample IN float32_audio_buffer:
// Clamp value to valid range [-1.0, 1.0]
clamped = MAX(-1.0, MIN(1.0, sample))

// Convert to 16-bit signed integer [-32768, 32767]
int16_value = ROUND(clamped * 32767)

// Write as little-endian bytes
WRITE int16_value TO output_buffer
END FOR

Buffer Size Calculation:

  • Samples per buffer: 4,096
  • Bytes per buffer: 8,192 (4,096 samples x 2 bytes per sample)
  • Duration per buffer: ~256ms at 16 kHz

Live Mode Messages

Live Mode messages include additional fields for session tracking.

Text Message (Live Mode):

{
"avatar_id": "avatar-id-123",
"user_id": "user-123",
"chat_type": "text",
"language": "en",
"request": "Tell me about the weather",
"requestType": "text",
"source": "my-source-app-name",
"isLive": true,
"LicenseId": "license-abc-123",
"session": "session-identifier"
}

Audio Message with PCM (Live Mode):

{
"avatar_id": "avatar-id-123",
"user_id": "user-123",
"chat_type": "voice",
"language": "en",
"requestType": "audio",
"source": "my-source-app-name",
"isLive": true,
"LicenseId": "license-abc-123",
"audio_base64_pcm": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVV...",
"session": "session-identifier"
}
FieldTypeRequiredDescription
isLivebooleanYestrue for Live Mode
LicenseIdstringYesLicense ID from startLiveSession
audio_base64_pcmstringFor audioBase64-encoded PCM audio data
tip:  

Live Mode can also send standard blob audio (WAV/WebM). In that case, use the standard audio message format with file_base64_data, but set isLive: true and include LicenseId. See the WebSocket API for the full message structure.

Voice Interaction Modes

In Live Mode, users can interact with the avatar using voice through several approaches:

  • Voice Activity Detection (VAD) -- Microphone automatically activates and detects speech. See Voice Activity Detection for details.
  • Push-to-talk -- User holds a button to record audio
  • Record and send -- User records a clip, then sends it