NavTalk Product Update: Five Core Features Comprehensive Upgrade

Major Update: This update covers five functional modules: real-time communication, Avatar management, data reporting, API integration, and account security, while also announcing the next development plan. Notably, we have optimized the digital human response latency to approximately 200ms, achieving industry-leading levels and providing users with a smooth experience close to real human conversation.

1. Module One: Real-Time Communication Feature Optimization

In this update, we have comprehensively optimized the real-time communication features, focusing on response speed improvement, simplified integration process, and enhanced connection stability.

1.1 Digital Human Response Speed Optimization

Through deep optimization of the model and full-link performance tuning, we have elevated the real-time digital human response speed to industry-leading levels. This breakthrough performance improvement has brought NavTalk to new heights in real-time interaction experience.

Response Latency Breakthrough

In real-time conversation scenarios, response latency is a key indicator affecting user experience. Through continuous technical optimization, we have successfully controlled the end-to-end response latency to approximately 200ms. This means that the complete process from when users finish speaking to hearing the AI digital human's reply has almost reached the fluency of natural human conversation.

This performance level is leading among all real-time digital human systems. Traditional real-time conversation systems typically require 500ms to 1000ms or even longer response times, while NavTalk's 200ms response latency is already close to the fluency of real human conversation, significantly improving user interaction experience. In practical applications, users can hardly feel obvious delays, making the conversation process more natural and smooth.

Full-Link Technical Optimization

To achieve this performance breakthrough, we have conducted deep optimization across multiple technical aspects, achieving end-to-end full-link performance improvement:

  • Model Inference Optimization: We have conducted multi-level optimization of the model inference process. Through optimizing model architecture, reducing unnecessary computational steps, GPU-accelerated image processing, and other methods, we have significantly reduced the computational latency of model inference, greatly improving inference speed while ensuring response quality.

  • Network Transmission Optimization: Network transmission is an important aspect of real-time conversation systems. We have optimized the data transmission process to ensure data can be transmitted quickly and stably.

  • System Architecture Optimization: We have also optimized the entire system architecture, improving communication mechanisms between services and optimizing resource scheduling strategies, achieving overall system performance improvement.

The combined effect of these technical optimizations has enabled NavTalk to achieve extremely low response latency while maintaining high-quality conversation experience, bringing users a smooth interaction experience close to real human conversation. This performance improvement not only enhances user experience but also provides technical guarantees for more real-time interaction scenarios.

1.2 WebRTC Connection Consolidation

1.2.1 Previous Architecture Issues

Before optimization, developers needed to connect to two independent WebSocket services to complete real-time communication functionality:

  • Real-Time Communication WebSocket: wss://transfer.navtalk.ai/api/realtime-api, used for processing real-time conversation messages and business logic
  • Video Stream Interface: wss://transfer.navtalk.ai/api/webrtc, used for establishing WebRTC connections to obtain video streams

Although this dual-connection architecture was functionally complete, it brought many inconveniences. Developers needed to maintain the state of two connections simultaneously, handle connection establishment, reconnection, error handling, and other logic for both connections, increasing code complexity and maintenance costs. Additionally, state synchronization between the two connections was also a challenge, prone to connection state inconsistency issues.

1.2.2 Unified Connection Architecture

Now, we have merged these two services into a unified connection address: wss://transfer.navtalk.ai/wss/v2/realtime-chat. Through this single connection, developers can complete all real-time communication-related operations, including message transmission and video stream acquisition.

1.2.3 Architecture Optimization Advantages

This architecture optimization brings significant advantages in multiple aspects:

Simplified Connection Management: Developers only need to maintain one WebSocket connection, greatly simplifying connection management complexity. No longer need to handle state synchronization issues between two connections, reducing code volume and potential bug risks. Connection establishment, reconnection, error handling, and other logic are all unified on one connection, making the code clearer and easier to maintain.

Improved Development Efficiency: From the unified connection, developers can directly obtain sessionId and use it to establish WebRTC connections to obtain video streams, without additional requests and complex coordination logic. The entire process becomes more intuitive and efficient, allowing developers to complete integration work faster.

Reduced Maintenance Costs: The simplified architecture not only reduces development costs but also lowers subsequent maintenance costs. The code is more concise, problem troubleshooting is easier, and upgrades and optimizations are more convenient. This is of great significance for long-term maintenance and iterative development.

This architecture optimization not only simplifies developers' work but also improves system stability and performance, laying a solid foundation for NavTalk's further development.

1.3 Intelligent Parameter Configuration

To simplify the developer experience, we have designed intelligent optimization for connection parameters.

Required Parameters:

  • license: Authorization code, used for identity verification and authorization management.

  • name: Avatar name, specifying the digital human character to use. This is the core parameter of the connection, determining which Avatar to use for conversation. The system will load corresponding configurations and resources based on the Avatar name.

Optional Parameters:

  • model: Specifies the language model to use. This is an optional parameter. If not specified, the system will use the default value gpt-realtime-mini. Developers can choose different models based on actual needs, such as selecting more powerful models for scenarios requiring higher performance, or lightweight models for scenarios requiring lower costs.

Default Value Mechanism

We have introduced a default value mechanism to make the connection process more convenient and flexible. When you only specify the name parameter (Avatar name) without other optional parameters, the system will automatically use the default model and voice configured for that Avatar.

Usage Examples

The following code examples demonstrate two connection methods:

// Method 1: Full parameter connection
// Suitable for scenarios requiring explicit specification of all parameters, such as temporarily using different models
const ws = new WebSocket('wss://transfer.navtalk.ai/wss/v2/realtime-chat?license=YOUR_LICENSE&name=avatar_name&model=gpt-realtime-mini');

// Method 2: Only specify required parameters, use default configuration (Recommended)
// Suitable for most scenarios, the system will automatically use the Avatar's default configuration
const ws = new WebSocket('wss://transfer.navtalk.ai/wss/v2/realtime-chat?license=YOUR_LICENSE&name=avatar_name');

Through this intelligent parameter configuration mechanism, we ensure both functional completeness and flexibility while greatly simplifying the developer experience, making NavTalk integration simpler and more efficient.

1.4 Message Format Optimization

To provide a clearer and more unified message interaction experience, we have unified the encapsulation of all message return formats. This optimization prepares for the integration of ElevenLabs while integrating OpenAI Realtime API.

For more detailed information, please refer to the API Documentation to learn about message format specifications and usage examples.

2. Module Two: Avatar Management Features

The introduction of Avatar sharing and import features makes collaboration between users more convenient. Now, you can easily share your carefully configured Avatar with others, or quickly import Avatars shared by others.

2.1 Sharing Feature

The sharing feature supports one-click generation of sharing links or sharing codes, allowing you to quickly share your carefully configured Avatar. The shared Avatar contains complete configuration information (model, voice, appearance, and all other settings), ensuring that recipients can get an experience completely consistent with the original Avatar.

2.2 Import Feature

The import feature supports quickly importing Avatars shared by others through sharing links or sharing codes. Imported Avatars can be used directly without reconfiguration, and the system will automatically apply all configuration information. The system will automatically synchronize Avatar configuration information to ensure that the imported Avatar configuration remains consistent with the original Avatar.

These features not only promote communication and cooperation between users but also enhance the scalability and shareability of Avatars.

3. Module Three: Data Reporting Features

To help users better manage and analyze business data, we have added powerful report export features. These features allow you to easily export and analyze business data, meeting the data analysis needs of different scenarios.

3.1 Conversation Record Report

The conversation record report feature allows you to export the complete conversation history between users and Avatars, providing strong support for data analysis and business decision-making.

Features:

  • Export complete conversation history between users and Avatars, including all conversation content
  • Support filtering by time range, flexibly selecting the data time period to export
  • Include complete data such as conversation content and timestamps, ensuring data integrity

3.2 Recharge Record Report

The recharge record report focuses on exporting account recharge details, providing support for financial management and data analysis.

Features:

  • Export account recharge details, including complete information such as recharge amount and time
  • Support filtering by user, time range, and other conditions, flexibly querying required data

4. Module Four: API Integration Features

To meet the needs of enterprise-level applications and third-party system integration, we have added conversation record query API and Webhook message notification features. These two features provide different data acquisition methods to meet integration needs in different scenarios.

4.1 Conversation Record Query API

The conversation record query API allows you to actively query conversation records through the API, supporting flexible query conditions and data formats.

Usage: Call the API through HTTP requests, pass in query parameters, and the system returns conversation records that meet the conditions.

4.2 Webhook Message Notification

The Webhook message notification feature automatically sends callback events of conversation records to your configured Webhook address after each call is completed, achieving passive data reception.

Usage: After configuring the Webhook address and trigger conditions, the system will automatically send callback requests to your server after each call is completed, containing complete conversation record data.

5. Module Five: Account Security Features

Account security has always been our focus. In this update, we have optimized the login logic to improve account security and user experience.

We have optimized login-related security mechanisms, including:

  • Optimized Verification Code Mechanism: Improved the generation and verification process of verification codes to enhance security
  • Secure Email Verification: Receive verification codes through registered email to ensure account security

Through these optimizations, we have further improved account security while maintaining a good user experience. We are committed to providing you with the most secure account protection, ensuring the security of your data and privacy.

6. Next Development Plan

6.1 ElevenLabs Integration

We will integrate ElevenLabs to bring you more powerful voice and model capabilities.

Voice Support

  • Integrate ElevenLabs' rich voice library
  • Support uploading and training your own exclusive voices
  • Provide more flexible voice configuration and management features

Model Support

  • Support multiple large language models such as OpenAI
  • Support connecting to your own model services
  • Flexibly switch between different models to meet different scenario needs

For detailed model support list, please refer to ElevenLabs WebSocket Real-Time Conversation Demo

Intelligent Knowledge Base Management

Implemented through RAG (Retrieval-Augmented Generation) technology:

  • Support retrieving your enterprise or personal knowledge base
  • Upload, manage, and update knowledge base content
  • Automatically retrieve relevant knowledge to improve answer accuracy
  • Provide personalized answers based on your knowledge base

Configuration and Pricing

  • More flexible and controllable model and voice combination configuration
  • Transparent pricing strategy
  • Choose services on demand, select optimal configuration based on usage scenarios
  • Achieve cost optimization

6.2 Multi-Avatar Generation Model Integration

We are researching the possibility of integrating multiple Avatar generation models to provide richer digital human images and expressiveness.

Feature Planning:

  • Support integrating different digital human generation models
  • Support switching between different models
  • Optimize multi-model operation efficiency
  • Provide higher quality digital human generation effects

Expected Results:

  • Richer Avatar choices
  • Higher quality image generation
  • More flexible technical solutions
  • Meet different scenario needs

6.3 Localized Deployment Support

We are developing a localized deployment solution that allows you to run the entire NavTalk project on your own GPU server.

Core Features:

  • Complete deployment with fully localized data
  • Meet data security requirements
  • Support enterprise private deployment needs
  • Optimize based on your hardware configuration

Applicable Scenarios:

  • Enterprise private deployment
  • Scenarios with high data security requirements
  • Large-scale deployment cost optimization
  • Customization needs

Service Support:

  • Complete deployment documentation and tools
  • Automated deployment scripts
  • Technical support and services
  • Continuous updates and maintenance

7. Update Summary

This NavTalk product update has comprehensively optimized according to functional modules, covering five core modules: real-time communication, Avatar management, data reporting, API integration, and account security. Among them, the real-time communication feature has achieved a major breakthrough in response speed optimization, optimizing digital human response latency to approximately 200ms, reaching industry-leading levels. These updates will further improve NavTalk's user experience and functional completeness, providing individual users and enterprise customers with a more powerful and easier-to-use AI virtual human interaction platform.

At the same time, we are actively promoting development plans such as ElevenLabs integration, performance optimization, multi-model support, and localized deployment to bring more powerful capabilities to NavTalk. These plans will enable NavTalk to reach new heights in voice selection, model support, knowledge base management, performance, and deployment flexibility.