Websocket not working as expected #640

sadimoodi · 2024-10-22T04:54:05Z

Hello,
I am using the below code to build a voice agent, most of the code has been gathered from different examples. I am facing the following problems:
1- interruption handling is bad compared to exactly the same code but using daily as transport (same VADParams).
2- the voice quality is inferior to using daily, though i am running everything locally.

the same code runs like a charm using daily as transport, what can i do to achieve the same results using sockets?

import aiohttp
import asyncio
import os
import sys

from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask,PipelineParams
from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator,
    LLMUserResponseAggregator
)
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.network.websocket_server import WebsocketServerParams, WebsocketServerTransport
from pipecat.vad.silero import SileroVADAnalyzer,VADParams
from pipecat.services.whisper import WhisperSTTService
from pipecat.services.xtts import XTTSService
from pipecat.frames.frames import EndFrame
from loguru import logger

from dotenv import load_dotenv
load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = WebsocketServerTransport(
            params=WebsocketServerParams(
                audio_out_enabled=True,
                add_wav_header=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=float(os.getenv("VAD_STOP_SECS", "0.3")))),
                vad_audio_passthrough=True
            )
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-4o-mini")

        stt = WhisperSTTService()

        tts = XTTSService(
                aiohttp_session=session,
                voice_id="Brenda Stern", #"Claribel Dervla"
                language="en",
                base_url="http://localhost:8001"
            )
        messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant, answer questions accurately."            },
        ]

        tma_in = LLMUserResponseAggregator(messages)
        tma_out = LLMAssistantResponseAggregator(messages)

        pipeline = Pipeline([
            transport.input(),   # Websocket input from client
            stt,                 # Speech-To-Text
            tma_in,              # User responses
            llm,                 # LLM
            tts,                 # Text-To-Speech
            transport.output(),  # Websocket output to client
            tma_out              # LLM responses
        ])

        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))

        @transport.event_handler("on_client_connected")
        async def on_client_connected(transport, client):
            # Kick off the conversation.
            messages.append(
                {"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
        
        # @transport.event_handler("on_client_disconnected") 
        # async def on_client_disconnected(transport, client):
        #     # end the conversation
        #     await task.queue_frame(EndFrame())
        #     logger.info("Partcipant left. Exiting.")

        runner = PipelineRunner()

        await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())

The text was updated successfully, but these errors were encountered:

markbackman · 2024-11-09T16:09:45Z

Daily's transport is built using WebRTC, which is very resilient to variable network conditions. Under ideal network conditions, Websockets can work on par with WebRTC. But, networks are not ideal very often. At least in terms of voice quality, I don't expect performance to be as good with Websockets.

For production apps running on real world networks, I would recommend using a WebRTC transport.

ttamoud · 2024-11-27T12:08:51Z

Daily's transport is built using WebRTC, which is very resilient to variable network conditions. Under ideal network conditions, Websockets can work on par with WebRTC. But, networks are not ideal very often. At least in terms of voice quality, I don't expect performance to be as good with Websockets.

For production apps running on real world networks, I would recommend using a WebRTC transport.

I'd like to first thank the team for the remarkable work done on pipecat. However, I would like to share some observations regarding the transport layer:

Regarding current solutions:

Daily provides an excellent experience, but still has compatibility issues for some Windows users. The WSL alternative is not optimal.
Livekit, while functional, still lacks several essential features compared to Daily.
WebSockets are not a viable alternative as they use a different protocol and offer fewer capabilities than the other mentioned services.

Given that there are numerous open-source WebRTC projects available that are both free to use and privacy-friendly, are there any plans to integrate additional WebRTC transport layers beyond the currently available options?

Thank you

golbin · 2024-11-28T06:33:03Z

Hi @ttamoud ,

May I ask for your opinion? Could you share your thoughts on the limitations of LiveKit’s features? I’m curious about how LiveKit compares to Daily in terms of functionality.

I’ve been thinking that the Daily library seems a bit harder to use and more prone to bugs compared to LiveKit. Because of this, I’ve been considering LiveKit.

Thank you!

ttamoud · 2024-11-28T17:22:43Z

While LiveKit offers a more straightforward implementation, it has several limitations compared to Daily:

Audio/Video Processing: Daily provides more sophisticated audio processing with built-in VAD support and better video quality control. LiveKit has more basic audio configuration options and a simpler video subscription model.

Advanced Features: Daily includes native transcription support, comprehensive dial-in/dial-out capabilities, and advanced recording functionality out of the box. These features are absent in LiveKit and would require additional components to implement.

Connection Management: Daily has more robust reconnection logic and better error handling, while LiveKit offers more basic reconnection strategy and simpler error handling mechanisms.

Hope that helps !

golbin · 2024-11-28T17:45:56Z

Thank you for sharing your experience, @ttamoud! It’s much appreciated.

sadimoodi · 2024-11-28T18:26:19Z

As a side note: i am using daily with Audio-only and not experiencing any issue, its been stable and transcription is pretty accurate, VAD works perfectly well too.

ttamoud · 2024-11-29T17:05:27Z

What do you mean by i'm using audio only, you mean using audio only with daily on windows or wsl maybe ? Or directly with no wsl ? If it's the case can you pls share how you did it ? I have multiple pipecat worfklows on standby just because of windows compatibility.

sadimoodi · 2024-11-29T21:33:23Z

i contacted daily to enable audio only charging (much cheaper) for my account. I run on WSL inside windows and also on my redhat linux server

aconchillo · 2024-12-04T23:21:02Z

@sadimoodi What version of Pipecat? Version 0.0.48 fixed websocket interruptions and also increased sample rate to 24000, however I think the websocket client is setup to 16000 so I think a resampling is happening.

Would it be possible for you to try again?

markbackman assigned aconchillo Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Websocket not working as expected #640

Websocket not working as expected #640

sadimoodi commented Oct 22, 2024

markbackman commented Nov 9, 2024

ttamoud commented Nov 27, 2024

golbin commented Nov 28, 2024

ttamoud commented Nov 28, 2024

golbin commented Nov 28, 2024

sadimoodi commented Nov 28, 2024

ttamoud commented Nov 29, 2024 •

edited

Loading

sadimoodi commented Nov 29, 2024

aconchillo commented Dec 4, 2024

Websocket not working as expected #640

Websocket not working as expected #640

Comments

sadimoodi commented Oct 22, 2024

markbackman commented Nov 9, 2024

ttamoud commented Nov 27, 2024

golbin commented Nov 28, 2024

ttamoud commented Nov 28, 2024

golbin commented Nov 28, 2024

sadimoodi commented Nov 28, 2024

ttamoud commented Nov 29, 2024 • edited Loading

sadimoodi commented Nov 29, 2024

aconchillo commented Dec 4, 2024

ttamoud commented Nov 29, 2024 •

edited

Loading