
Unlike previous architectures that required multiple stages such as ASR (speech-to-text), LLM (large language model), and TTS (text-to-speech),
Additionally, the model performs excellently in personalization control. Through the dual guidance of "speech + text," users not only define the AI's role background but also precisely control its tone and intonation. AIbase learned that NVIDIA combined massive real call data with synthetic scenarios during training, allowing the model to have natural language habits while strictly adhering to specific industry business rules. Current evaluation results show that
Research: https://research.nvidia.com/labs/adlr/personaplex/
Key Points:
🎙️ Full-duplex Interaction:
supports real-time speech stream processing, allowing users to interject or overlap conversations while the AI is speaking, achieving rapid response.PersonaPlex-7B-v1 🧠 Single Model Architecture: It abandons the complicated plugin pipeline and uses a single
structure to simultaneously predict text and speech tokens, improving the naturalness of dialogue from the ground up.Transformer 🎭 Deep Personalization: It supports system prompts of up to 200 tokens and specific speech embeddings, enabling flexible customization of the AI's personality, business knowledge, and emotional tone.
