Live Mode

Live Mode provides real-time interactive sessions with streaming avatar responses via PixelStreaming. This mode requires additional session management and supports continuous audio streaming using PCM format for Gemini API integration.

Overview

Key differences from Chat Mode:

Requires explicit session start/end via REST endpoints
Uses LicenseId for session tracking
Supports real-time PCM audio streaming
Avatar responses are streamed in real-time via 3D rendering
Voice Activity Detection (VAD) auto-stops on speech end

Starting Live Session

Before using Live Mode, start a live session to obtain streaming credentials.

Endpoint:

POST /startLiveSession

Request Headers:

Content-Type: application/json
Authorization: Bearer {jwt_token}

Request Body:

{
  "avatar_id": "avatar-id-123",
  "width": "1280",
  "height": "720"
}

Field	Type	Required	Description
`avatar_id`	string	Yes	Avatar identifier
`width`	string	No	Video resolution width in pixels (default: `"1280"`)
`height`	string	No	Video resolution height in pixels (default: `"720"`)

Field Validation:

Field	Constraints
`avatar_id`	Must be a valid avatar ID from `/connection` response
`width`	String representation of width in pixels. Must be an integer ≤ 1920 (e.g., `"640"`, `"1280"`, `"1920"`)
`height`	String representation of height in pixels. Must be an integer ≤ 1080 (e.g., `"360"`, `"720"`, `"1080"`)

cURL Example:

curl -X POST https://chat.rvtr.ai/startLiveSession \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -d '{"avatar_id": "avatar-id-123", "width": "1280", "height": "720"}'

Response (Success):

{
  "status": 200,
  "LicenseId": "license-abc-123",
  "streamingUrl": "https://streaming.example.com/stream"
}

Response (Error):

{
  "status": 503,
  "error": "All AI assistants are busy at the moment"
}

Status Codes:

Status Code	Description	Meaning
200	Success	Live session started successfully
503	Service Unavailable	All AI assistants are currently busy
504	Gateway Timeout	Stream connection timed out
-1	Unknown Error	Generic error occurred

Response Fields:

Field	Type	Description
`status`	number	Status code (see table above)
`LicenseId`	string	License ID for session tracking (required for all Live Mode messages). Only present on success.
`streamingUrl`	string	HTTP/HTTPS URL for iframe-based WebRTC video streaming (PixelStreaming). Only present on success.
`error`	string	Error message describing the failure. Only present on error.

warning:

Store the LicenseId -- it must be included in all Live Mode WebSocket messages and is required to end the session.

Iframe Integration

The streamingUrl returned by startLiveSession is rendered inside an iframe to display the 3D avatar. The iframe URL must include the JWT token for authentication:

const sep = streamingUrl.includes("?") ? "&" : "?";
const iframeUrl = `${streamingUrl}${sep}token=${encodeURIComponent(jwtToken)}`;

HTTPS Required:

If your application is served over HTTPS, the streamingUrl must also use HTTPS. Mixed content (HTTPS page loading HTTP iframe) is blocked by modern browsers.

Black Screen in Iframe:

If the stream opens correctly in a new browser tab but shows a black screen inside the iframe, the PixelStreaming page is likely blocking embeds via X-Frame-Options or CSP frame-ancestors headers. This requires a server-side configuration change on the streaming server. Check your browser DevTools Console for related errors.

Ending Live Session

End the live session when the user exits or the app closes.

Endpoint:

POST /endLiveSession

Request Headers:

Content-Type: application/json
Authorization: Bearer {jwt_token}

Request Body:

{
  "LicenseId": "license-abc-123"
}

cURL Example:

curl -X POST https://chat.rvtr.ai/endLiveSession \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -d '{"LicenseId": "license-abc-123"}'

Response:

{
  "status": "success"
}

App Lifecycle Handling:

For reliable session cleanup when the app is closed or goes to background, use a "fire and forget" approach:

ON app_closing OR app_going_to_background:
    POST /endLiveSession
        Headers:
            Content-Type: application/json
            Authorization: Bearer {jwtToken}
        Body:
            { "LicenseId": "{licenseId}" }
        Options:
            keepalive: true  // Ensures request completes even if app closes

Platform-specific cleanup:

iOS: Use beginBackgroundTask for background execution
Android: Use WorkManager or allow request to complete before onDestroy
Web: Use fetch with keepalive: true option

Gemini Native Audio Format (PCM)

For Live Mode with Gemini API integration, use raw PCM audio format for optimal streaming performance.

Property	Value
Format	Raw PCM (uncompressed)
Sample Rate	16,000 Hz (16 kHz)
Bit Depth	16-bit signed integer
Channels	Mono (1 channel)
Byte Order	Little-endian
Buffer Size	4,096 samples per chunk
Encoding	Base64
Transmission Field	`audio_base64_pcm`

PCM Conversion (Float32 to Int16):

FOR EACH sample IN float32_audio_buffer:
    // Clamp value to valid range [-1.0, 1.0]
    clamped = MAX(-1.0, MIN(1.0, sample))

    // Convert to 16-bit signed integer [-32768, 32767]
    int16_value = ROUND(clamped * 32767)

    // Write as little-endian bytes
    WRITE int16_value TO output_buffer
END FOR

Buffer Size Calculation:

Samples per buffer: 4,096
Bytes per buffer: 8,192 (4,096 samples x 2 bytes per sample)
Duration per buffer: ~256ms at 16 kHz

Live Mode Messages

Live Mode messages include additional fields for session tracking.

Text Message (Live Mode):

{
  "avatar_id": "avatar-id-123",
  "user_id": "user-123",
  "chat_type": "text",
  "language": "en",
  "request": "Tell me about the weather",
  "requestType": "text",
  "source": "my-source-app-name",
  "isLive": true,
  "LicenseId": "license-abc-123",
  "session": "session-identifier"
}

Audio Message with PCM (Live Mode):

{
  "avatar_id": "avatar-id-123",
  "user_id": "user-123",
  "chat_type": "voice",
  "language": "en",
  "requestType": "audio",
  "source": "my-source-app-name",
  "isLive": true,
  "LicenseId": "license-abc-123",
  "audio_base64_pcm": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVV...",
  "session": "session-identifier"
}

Field	Type	Required	Description
`isLive`	boolean	Yes	`true` for Live Mode
`LicenseId`	string	Yes	License ID from `startLiveSession`
`audio_base64_pcm`	string	For audio	Base64-encoded PCM audio data

tip:

Live Mode can also send standard blob audio (WAV/WebM). In that case, use the standard audio message format with file_base64_data, but set isLive: true and include LicenseId. See the WebSocket API for the full message structure.

Voice Interaction Modes

In Live Mode, users can interact with the avatar using voice through several approaches:

Voice Activity Detection (VAD) -- Microphone automatically activates and detects speech. See Voice Activity Detection for details.
Push-to-talk -- User holds a button to record audio
Record and send -- User records a clip, then sends it

Live Mode

Overview

Starting Live Session

Iframe Integration

Ending Live Session

Gemini Native Audio Format (PCM)

Live Mode Messages

Voice Interaction Modes

Products

Solutions

Resources

Company

Live Mode

Overview​

Starting Live Session​

Iframe Integration​

Ending Live Session​

Gemini Native Audio Format (PCM)​

Live Mode Messages​

Voice Interaction Modes​

Products

Solutions

Resources

Company

Overview

Starting Live Session

Iframe Integration

Ending Live Session

Gemini Native Audio Format (PCM)

Live Mode Messages

Voice Interaction Modes