Persona

Persona Context Page - User Manual


Overview

The Persona page (located at Context → Persona) allows you to create and manage AI personality templates for your voice agents. Each persona defines comprehensive voice characteristics, behavioral parameters, and technical settings based on OpenAI's Realtime API. Personas are reusable templates that can be assigned to multiple agents, ensuring consistent personality and performance across your fleet.


Page Layout

Information Banner

At the top of the page, you'll find a dismissible blue information alert that explains the purpose of the Persona templates and their connection to OpenAI's Realtime API parameters.

Persona Library Table

The main content area displays all configured personas in a sortable table with the following columns:

  • Persona ID: Unique identifier (e.g., PER-001) in monospace blue text

  • Name: Persona name with description displayed below in smaller gray text

  • Voice & Speech: Shows selected voice character (e.g., "shimmer") and speech speed multiplier

  • Model Settings: Displays temperature value and latency hint setting

  • Used By: Number of agents currently using this persona (sortable)

  • Created: Date the persona was created

  • Actions: View, Edit, and Delete buttons for each persona

The table supports pagination (10 personas per page) and can be sorted by clicking column headers.


Creating a New Persona

Step 1: Open Creation Drawer

Click the "Create Persona" button (purple, located in the top-right corner of the Persona Library card) to open the persona configuration drawer on the right side of the screen.

Step 2: Basic Information

Persona Name (Required)

Enter a descriptive name that clearly identifies the persona's purpose and personality type.

  • Example: "Professional & Empathetic Support Agent"

  • Guidelines: Use clear, descriptive names that indicate tone and use case

Description (Required)

Provide a 2-3 sentence description of the personality, ideal use cases, and scenarios where this persona excels.

  • Example: "Warm, professional tone with high empathy. Ideal for customer support and service recovery."


Step 3: Voice & Speech Configuration

This section controls how your agent sounds and speaks. Click the section header to expand/collapse.

Voice Character (Required)

Select the AI voice that will be used for text-to-speech synthesis. Each voice has distinct characteristics:

  • Alloy - Neutral, balanced tone

  • Ash - Smooth, professional

  • Ballad - Warm, storytelling quality

  • Coral - Bright, energetic

  • Echo - Clear, authoritative

  • Sage - Wise, calm

  • Shimmer - Friendly, approachable (Default)

  • Verse - Expressive, dynamic

  • Marin - Natural, conversational

  • Cedar - Deep, reassuring (newest)

Best Practice: Test different voices in the Playground to find the best match for your brand and use case.

Speech Speed

Use the slider to control how fast the AI speaks:

  • Range: 0.25× (very slow) to 4.0× (very fast)

  • Default: 1.0× (normal speed)

  • Recommended: 0.9× - 1.15× for most use cases

  • Use Cases:

    • 0.8× - 0.9×: Technical support, complex instructions

    • 1.0×: Standard customer service

    • 1.1× - 1.2×: Sales, energetic personalities

System Instructions (Required)

This is the most important field—it defines your agent's personality, behavior, and communication style. Be specific and detailed.

Required Elements:

  • Role definition ("You are a...")

  • Tone and style guidance

  • Response length constraints

  • Language complexity level

  • Interaction approach

Example:

You are a helpful and professional customer service agent. Speak in a warm, friendly tone while being efficient. Show empathy for customer concerns and offer proactive solutions. Keep responses to 2-3 sentences per turn. Use clear, simple language and avoid technical jargon unless necessary. Always confirm understanding before proceeding with complex tasks.

Best Practices:

  • Be explicit about desired behavior

  • Include response length guidelines to prevent rambling

  • Specify when to escalate or transfer

  • Define brand voice characteristics


Step 4: Model Behavior

This section controls the AI's response patterns and technical performance.

Temperature

Controls response randomness and creativity:

  • Range: 0.6 to 1.2

  • Default: 0.8 (optimal for voice)

  • Lower values (0.6-0.7): More focused, deterministic, consistent responses

  • Higher values (0.9-1.2): More creative, varied, conversational responses

Recommendation: Keep at 0.8 for most use cases. Adjust only if responses feel too rigid (increase) or too unpredictable (decrease).

Max Response Tokens

Limits the maximum length of agent responses:

  • Unlimited (Default): No token limit

  • 512 tokens: Very concise responses (1-2 sentences)

  • 1024 tokens: Standard responses (2-4 sentences)

  • 2048 tokens: Detailed responses (full paragraph)

  • 4096 tokens: Comprehensive responses (multiple paragraphs)

Best Practice: Set explicit limits to prevent verbose responses and control costs. Use 1024 for most customer service scenarios.

Latency Hints

Balances response quality versus speed:

  • Low - Fastest: Optimizes for speed, minimal latency (use for high-volume, simple interactions)

  • Balanced (Default): Equal priority to speed and quality

  • High - Most Coherent: Prioritizes response quality and thoughtfulness (use for complex problem-solving)

Output Modalities

Determines response format:

  • Text: Agent generates text-only responses

  • Audio: Agent generates voice-only responses

  • Text + Audio (Default): Agent generates both (recommended for full functionality)

Context Truncation

Manages conversation history when approaching token limits:

  • Auto-truncate (Default - ON): Automatically removes oldest messages when nearing limit

  • Error on full: Throws error when context is full (requires manual intervention)

Recommendation: Keep enabled for production to prevent conversation failures.


Step 5: Turn Detection (Voice Activity)

This section controls how the system detects when the user has finished speaking—critical for natural conversation flow.

Detection Type

Choose how the system identifies speech boundaries:

  • Server VAD - Automatic voice detection (Default): Server-side Voice Activity Detection, fastest and most reliable

  • Semantic VAD - Meaning-based: Uses AI to detect when a complete thought is expressed (best for noisy environments)

  • None - Manual/Push-to-talk: User must manually indicate when finished speaking

Recommendation: Use Server VAD for standard deployments. Use Semantic VAD if customers are in noisy environments (call centers, outdoor settings).

Detection Threshold

Controls sensitivity for detecting speech:

  • Range: 0 (most sensitive) to 1.0 (least sensitive)

  • Default: 0.5 (balanced)

  • Lower values (0.3-0.4): Picks up quiet speech, may trigger on background noise

  • Higher values (0.6-0.8): Ignores background noise better, may miss soft-spoken customers

Best Practice: Start at 0.5, adjust based on testing. Increase if false triggers occur, decrease if missing customer speech.

Prefix Padding

Milliseconds of audio before speech detection to include:

  • Range: 0ms to 1000ms (1 second)

  • Default: 300ms

  • Purpose: Prevents cutting off the first few words when customer starts speaking

Recommendation: Keep at 300ms to ensure natural conversation starts.

Silence Duration

Milliseconds of silence before considering speech ended:

  • Range: 100ms (0.1 seconds) to 2000ms (2 seconds)

  • Default: 200ms

  • Shorter durations (100-200ms): Faster responses, agent may interrupt mid-thought

  • Longer durations (500-1000ms): More patient, fewer interruptions, slower response time

Best Practice: Use 200-300ms for most cases. Increase to 500ms+ if customers need time to think or pause frequently.

Auto-create Response

When enabled (default: ON), automatically generates AI response after detecting user finished speaking. Disable only for push-to-talk scenarios.

Allow Interruptions

When enabled (default: ON), allows users to interrupt the AI mid-response. Recommended for natural conversation flow.


Step 6: Audio Settings

This section configures audio input/output formats and transcription.

Input Audio Format

Format for audio received from the user:

  • PCM16 - 16-bit PCM at 24kHz (Default): Standard format, widely compatible

  • G.711 μ-law: Telephony standard (North America/Japan)

  • G.711 A-law: Telephony standard (Europe/rest of world)

  • Opus - Lowest latency: Compressed format with minimal delay (best for production)

Recommendation: Use PCM16 for testing, switch to Opus for production deployments.

Output Audio Format

Format for audio sent to the user:

  • PCM16 - Standard: Uncompressed, high quality

  • G.711 μ-law / A-law: Telephony formats

  • Opus - Best for production (Recommended): Low latency, efficient bandwidth

  • MP3: Compressed, universal compatibility

  • WAV: Uncompressed, largest file size

Recommendation: Use Opus in production for optimal latency and quality.

Enable Transcription

When enabled, uses OpenAI Whisper to transcribe user audio to text:

  • Default: Disabled (OFF)

  • Use Cases: Better conversation logging, sentiment analysis, compliance recording

  • Note: Adds minimal latency and additional API costs

Transcription Language (Optional)

Provide a language hint (ISO code) for improved transcription accuracy:

  • Auto-detect (no selection) works for most cases

  • Specify language for better accuracy with accents or multilingual customers

  • Available: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean


Editing Existing Personas

  1. Locate the persona in the Persona Library table

  2. Click the "Edit" button in the Actions column

  3. The configuration drawer opens with all current settings pre-populated

  4. Modify any fields as needed

  5. Click "Update Persona" to save changes

  6. Click "Cancel" to discard changes

Important: Changes to personas affect all agents currently using that persona. Test thoroughly in Mirror Mode before updating production personas.


Deleting Personas

  1. Locate the persona in the table

  2. Click the red "Delete" button in the Actions column

  3. The persona is immediately deleted (no confirmation dialog)

Warning: You cannot delete personas that are currently in use by active agents. The system will prevent deletion and display the number of agents using the persona.

Best Practice: Before deleting a persona, verify it's not assigned to any agents by checking the "Used By" column.


Viewing Persona Details

Click the "View" button to see a read-only summary of the persona's complete configuration without opening the edit drawer.


Best Practices

Persona Design

  1. Start with templates: Use the four default personas as starting points

  2. Test iteratively: Create persona → test in Playground → refine → test again

  3. Document use cases: Use clear descriptions to help team members select appropriate personas

  4. Maintain consistency: Create distinct personas for different use cases rather than trying to make one persona do everything

System Instructions

  1. Be specific: Vague instructions lead to unpredictable behavior

  2. Set boundaries: Explicitly state what the agent should and shouldn't do

  3. Control verbosity: Always include response length guidelines ("Keep responses to 2-3 sentences")

  4. Include escalation triggers: Define when to transfer to human agents

  5. Test edge cases: Verify behavior with difficult or unusual customer inputs

Technical Settings

  1. Don't over-tune: Start with defaults, adjust only based on observed behavior

  2. Test in production conditions: Silence thresholds perform differently in quiet offices vs. noisy call centers

  3. Monitor latency: Higher quality settings may introduce delay—balance quality vs. responsiveness

  4. Use Opus in production: Always switch to Opus audio format before deploying to reduce latency

Maintenance

  1. Version control: When making significant changes, create a new persona instead of modifying existing ones

  2. A/B test: Use Mirror Mode to compare persona variations before rolling out changes

  3. Monitor performance: Track quality scores and customer satisfaction by persona to identify optimization opportunities

  4. Regular reviews: Quarterly review all personas to ensure they align with current brand voice and business needs


Common Issues and Solutions

Issue

Likely Cause

Solution

Agent interrupts customers frequently

Silence duration too short

Increase silence duration to 500-700ms

Agent misses first word when customer starts speaking

Prefix padding too low

Increase prefix padding to 400-500ms

Responses are too robotic

Temperature too low

Increase temperature to 0.85-0.9

Responses are inconsistent or random

Temperature too high

Decrease temperature to 0.7-0.75

Agent talks too much

No token limit set

Set max response tokens to 1024

Slow response time

Latency hints set to "High"

Change to "Balanced" or "Low"

Background noise triggers false detections

VAD threshold too low

Increase threshold to 0.6-0.7


  • Agent Playground (Tools → Playground): Test personas in real-time conversations before deployment

  • Create Agent Wizard (Agent Management → Create Agents): Assign personas during agent creation

  • Mirror Mode (Agent Management → Deploy Agents): Compare persona performance with production agents


Next Steps: After creating your persona, navigate to Agent Management → Create Agents to build an agent using this personality template.

Last updated