Persona
Persona Context Page - User Manual
Overview
The Persona page (located at Context → Persona) allows you to create and manage AI personality templates for your voice agents. Each persona defines comprehensive voice characteristics, behavioral parameters, and technical settings based on OpenAI's Realtime API. Personas are reusable templates that can be assigned to multiple agents, ensuring consistent personality and performance across your fleet.
Page Layout
Information Banner
At the top of the page, you'll find a dismissible blue information alert that explains the purpose of the Persona templates and their connection to OpenAI's Realtime API parameters.
Persona Library Table
The main content area displays all configured personas in a sortable table with the following columns:
Persona ID: Unique identifier (e.g., PER-001) in monospace blue text
Name: Persona name with description displayed below in smaller gray text
Voice & Speech: Shows selected voice character (e.g., "shimmer") and speech speed multiplier
Model Settings: Displays temperature value and latency hint setting
Used By: Number of agents currently using this persona (sortable)
Created: Date the persona was created
Actions: View, Edit, and Delete buttons for each persona
The table supports pagination (10 personas per page) and can be sorted by clicking column headers.
Creating a New Persona
Step 1: Open Creation Drawer
Click the "Create Persona" button (purple, located in the top-right corner of the Persona Library card) to open the persona configuration drawer on the right side of the screen.
Step 2: Basic Information
Persona Name (Required)
Enter a descriptive name that clearly identifies the persona's purpose and personality type.
Example: "Professional & Empathetic Support Agent"
Guidelines: Use clear, descriptive names that indicate tone and use case
Description (Required)
Provide a 2-3 sentence description of the personality, ideal use cases, and scenarios where this persona excels.
Example: "Warm, professional tone with high empathy. Ideal for customer support and service recovery."
Step 3: Voice & Speech Configuration
This section controls how your agent sounds and speaks. Click the section header to expand/collapse.
Voice Character (Required)
Select the AI voice that will be used for text-to-speech synthesis. Each voice has distinct characteristics:
Alloy - Neutral, balanced tone
Ash - Smooth, professional
Ballad - Warm, storytelling quality
Coral - Bright, energetic
Echo - Clear, authoritative
Sage - Wise, calm
Shimmer - Friendly, approachable (Default)
Verse - Expressive, dynamic
Marin - Natural, conversational
Cedar - Deep, reassuring (newest)
Best Practice: Test different voices in the Playground to find the best match for your brand and use case.
Speech Speed
Use the slider to control how fast the AI speaks:
Range: 0.25× (very slow) to 4.0× (very fast)
Default: 1.0× (normal speed)
Recommended: 0.9× - 1.15× for most use cases
Use Cases:
0.8× - 0.9×: Technical support, complex instructions
1.0×: Standard customer service
1.1× - 1.2×: Sales, energetic personalities
System Instructions (Required)
This is the most important field—it defines your agent's personality, behavior, and communication style. Be specific and detailed.
Required Elements:
Role definition ("You are a...")
Tone and style guidance
Response length constraints
Language complexity level
Interaction approach
Example:
You are a helpful and professional customer service agent. Speak in a warm, friendly tone while being efficient. Show empathy for customer concerns and offer proactive solutions. Keep responses to 2-3 sentences per turn. Use clear, simple language and avoid technical jargon unless necessary. Always confirm understanding before proceeding with complex tasks.Best Practices:
Be explicit about desired behavior
Include response length guidelines to prevent rambling
Specify when to escalate or transfer
Define brand voice characteristics
Step 4: Model Behavior
This section controls the AI's response patterns and technical performance.
Temperature
Controls response randomness and creativity:
Range: 0.6 to 1.2
Default: 0.8 (optimal for voice)
Lower values (0.6-0.7): More focused, deterministic, consistent responses
Higher values (0.9-1.2): More creative, varied, conversational responses
Recommendation: Keep at 0.8 for most use cases. Adjust only if responses feel too rigid (increase) or too unpredictable (decrease).
Max Response Tokens
Limits the maximum length of agent responses:
Unlimited (Default): No token limit
512 tokens: Very concise responses (1-2 sentences)
1024 tokens: Standard responses (2-4 sentences)
2048 tokens: Detailed responses (full paragraph)
4096 tokens: Comprehensive responses (multiple paragraphs)
Best Practice: Set explicit limits to prevent verbose responses and control costs. Use 1024 for most customer service scenarios.
Latency Hints
Balances response quality versus speed:
Low - Fastest: Optimizes for speed, minimal latency (use for high-volume, simple interactions)
Balanced (Default): Equal priority to speed and quality
High - Most Coherent: Prioritizes response quality and thoughtfulness (use for complex problem-solving)
Output Modalities
Determines response format:
Text: Agent generates text-only responses
Audio: Agent generates voice-only responses
Text + Audio (Default): Agent generates both (recommended for full functionality)
Context Truncation
Manages conversation history when approaching token limits:
Auto-truncate (Default - ON): Automatically removes oldest messages when nearing limit
Error on full: Throws error when context is full (requires manual intervention)
Recommendation: Keep enabled for production to prevent conversation failures.
Step 5: Turn Detection (Voice Activity)
This section controls how the system detects when the user has finished speaking—critical for natural conversation flow.
Detection Type
Choose how the system identifies speech boundaries:
Server VAD - Automatic voice detection (Default): Server-side Voice Activity Detection, fastest and most reliable
Semantic VAD - Meaning-based: Uses AI to detect when a complete thought is expressed (best for noisy environments)
None - Manual/Push-to-talk: User must manually indicate when finished speaking
Recommendation: Use Server VAD for standard deployments. Use Semantic VAD if customers are in noisy environments (call centers, outdoor settings).
Detection Threshold
Controls sensitivity for detecting speech:
Range: 0 (most sensitive) to 1.0 (least sensitive)
Default: 0.5 (balanced)
Lower values (0.3-0.4): Picks up quiet speech, may trigger on background noise
Higher values (0.6-0.8): Ignores background noise better, may miss soft-spoken customers
Best Practice: Start at 0.5, adjust based on testing. Increase if false triggers occur, decrease if missing customer speech.
Prefix Padding
Milliseconds of audio before speech detection to include:
Range: 0ms to 1000ms (1 second)
Default: 300ms
Purpose: Prevents cutting off the first few words when customer starts speaking
Recommendation: Keep at 300ms to ensure natural conversation starts.
Silence Duration
Milliseconds of silence before considering speech ended:
Range: 100ms (0.1 seconds) to 2000ms (2 seconds)
Default: 200ms
Shorter durations (100-200ms): Faster responses, agent may interrupt mid-thought
Longer durations (500-1000ms): More patient, fewer interruptions, slower response time
Best Practice: Use 200-300ms for most cases. Increase to 500ms+ if customers need time to think or pause frequently.
Auto-create Response
When enabled (default: ON), automatically generates AI response after detecting user finished speaking. Disable only for push-to-talk scenarios.
Allow Interruptions
When enabled (default: ON), allows users to interrupt the AI mid-response. Recommended for natural conversation flow.
Step 6: Audio Settings
This section configures audio input/output formats and transcription.
Input Audio Format
Format for audio received from the user:
PCM16 - 16-bit PCM at 24kHz (Default): Standard format, widely compatible
G.711 μ-law: Telephony standard (North America/Japan)
G.711 A-law: Telephony standard (Europe/rest of world)
Opus - Lowest latency: Compressed format with minimal delay (best for production)
Recommendation: Use PCM16 for testing, switch to Opus for production deployments.
Output Audio Format
Format for audio sent to the user:
PCM16 - Standard: Uncompressed, high quality
G.711 μ-law / A-law: Telephony formats
Opus - Best for production (Recommended): Low latency, efficient bandwidth
MP3: Compressed, universal compatibility
WAV: Uncompressed, largest file size
Recommendation: Use Opus in production for optimal latency and quality.
Enable Transcription
When enabled, uses OpenAI Whisper to transcribe user audio to text:
Default: Disabled (OFF)
Use Cases: Better conversation logging, sentiment analysis, compliance recording
Note: Adds minimal latency and additional API costs
Transcription Language (Optional)
Provide a language hint (ISO code) for improved transcription accuracy:
Auto-detect (no selection) works for most cases
Specify language for better accuracy with accents or multilingual customers
Available: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean
Editing Existing Personas
Locate the persona in the Persona Library table
Click the "Edit" button in the Actions column
The configuration drawer opens with all current settings pre-populated
Modify any fields as needed
Click "Update Persona" to save changes
Click "Cancel" to discard changes
Important: Changes to personas affect all agents currently using that persona. Test thoroughly in Mirror Mode before updating production personas.
Deleting Personas
Locate the persona in the table
Click the red "Delete" button in the Actions column
The persona is immediately deleted (no confirmation dialog)
Warning: You cannot delete personas that are currently in use by active agents. The system will prevent deletion and display the number of agents using the persona.
Best Practice: Before deleting a persona, verify it's not assigned to any agents by checking the "Used By" column.
Viewing Persona Details
Click the "View" button to see a read-only summary of the persona's complete configuration without opening the edit drawer.
Best Practices
Persona Design
Start with templates: Use the four default personas as starting points
Test iteratively: Create persona → test in Playground → refine → test again
Document use cases: Use clear descriptions to help team members select appropriate personas
Maintain consistency: Create distinct personas for different use cases rather than trying to make one persona do everything
System Instructions
Be specific: Vague instructions lead to unpredictable behavior
Set boundaries: Explicitly state what the agent should and shouldn't do
Control verbosity: Always include response length guidelines ("Keep responses to 2-3 sentences")
Include escalation triggers: Define when to transfer to human agents
Test edge cases: Verify behavior with difficult or unusual customer inputs
Technical Settings
Don't over-tune: Start with defaults, adjust only based on observed behavior
Test in production conditions: Silence thresholds perform differently in quiet offices vs. noisy call centers
Monitor latency: Higher quality settings may introduce delay—balance quality vs. responsiveness
Use Opus in production: Always switch to Opus audio format before deploying to reduce latency
Maintenance
Version control: When making significant changes, create a new persona instead of modifying existing ones
A/B test: Use Mirror Mode to compare persona variations before rolling out changes
Monitor performance: Track quality scores and customer satisfaction by persona to identify optimization opportunities
Regular reviews: Quarterly review all personas to ensure they align with current brand voice and business needs
Common Issues and Solutions
Issue
Likely Cause
Solution
Agent interrupts customers frequently
Silence duration too short
Increase silence duration to 500-700ms
Agent misses first word when customer starts speaking
Prefix padding too low
Increase prefix padding to 400-500ms
Responses are too robotic
Temperature too low
Increase temperature to 0.85-0.9
Responses are inconsistent or random
Temperature too high
Decrease temperature to 0.7-0.75
Agent talks too much
No token limit set
Set max response tokens to 1024
Slow response time
Latency hints set to "High"
Change to "Balanced" or "Low"
Background noise triggers false detections
VAD threshold too low
Increase threshold to 0.6-0.7
Related Features
Agent Playground (Tools → Playground): Test personas in real-time conversations before deployment
Create Agent Wizard (Agent Management → Create Agents): Assign personas during agent creation
Mirror Mode (Agent Management → Deploy Agents): Compare persona performance with production agents
Next Steps: After creating your persona, navigate to Agent Management → Create Agents to build an agent using this personality template.
Last updated
