Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket not working as expected #640

Open
sadimoodi opened this issue Oct 22, 2024 · 9 comments
Open

Websocket not working as expected #640

sadimoodi opened this issue Oct 22, 2024 · 9 comments
Assignees

Comments

@sadimoodi
Copy link

Hello,
I am using the below code to build a voice agent, most of the code has been gathered from different examples. I am facing the following problems:
1- interruption handling is bad compared to exactly the same code but using daily as transport (same VADParams).
2- the voice quality is inferior to using daily, though i am running everything locally.

the same code runs like a charm using daily as transport, what can i do to achieve the same results using sockets?

import aiohttp
import asyncio
import os
import sys

from pipecat.frames.frames import LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask,PipelineParams
from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator,
    LLMUserResponseAggregator
)
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.network.websocket_server import WebsocketServerParams, WebsocketServerTransport
from pipecat.vad.silero import SileroVADAnalyzer,VADParams
from pipecat.services.whisper import WhisperSTTService
from pipecat.services.xtts import XTTSService
from pipecat.frames.frames import EndFrame
from loguru import logger

from dotenv import load_dotenv
load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = WebsocketServerTransport(
            params=WebsocketServerParams(
                audio_out_enabled=True,
                add_wav_header=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=float(os.getenv("VAD_STOP_SECS", "0.3")))),
                vad_audio_passthrough=True
            )
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-4o-mini")

        stt = WhisperSTTService()

        tts = XTTSService(
                aiohttp_session=session,
                voice_id="Brenda Stern", #"Claribel Dervla"
                language="en",
                base_url="http://localhost:8001"
            )
        messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant, answer questions accurately."            },
        ]

        tma_in = LLMUserResponseAggregator(messages)
        tma_out = LLMAssistantResponseAggregator(messages)

        pipeline = Pipeline([
            transport.input(),   # Websocket input from client
            stt,                 # Speech-To-Text
            tma_in,              # User responses
            llm,                 # LLM
            tts,                 # Text-To-Speech
            transport.output(),  # Websocket output to client
            tma_out              # LLM responses
        ])

        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))

        @transport.event_handler("on_client_connected")
        async def on_client_connected(transport, client):
            # Kick off the conversation.
            messages.append(
                {"role": "system", "content": "Please introduce yourself to the user."})
            await task.queue_frames([LLMMessagesFrame(messages)])
        
        # @transport.event_handler("on_client_disconnected") 
        # async def on_client_disconnected(transport, client):
        #     # end the conversation
        #     await task.queue_frame(EndFrame())
        #     logger.info("Partcipant left. Exiting.")

        runner = PipelineRunner()

        await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())

@markbackman
Copy link
Contributor

Daily's transport is built using WebRTC, which is very resilient to variable network conditions. Under ideal network conditions, Websockets can work on par with WebRTC. But, networks are not ideal very often. At least in terms of voice quality, I don't expect performance to be as good with Websockets.

For production apps running on real world networks, I would recommend using a WebRTC transport.

@ttamoud
Copy link

ttamoud commented Nov 27, 2024

Daily's transport is built using WebRTC, which is very resilient to variable network conditions. Under ideal network conditions, Websockets can work on par with WebRTC. But, networks are not ideal very often. At least in terms of voice quality, I don't expect performance to be as good with Websockets.

For production apps running on real world networks, I would recommend using a WebRTC transport.

I'd like to first thank the team for the remarkable work done on pipecat. However, I would like to share some observations regarding the transport layer:

Regarding current solutions:

  • Daily provides an excellent experience, but still has compatibility issues for some Windows users. The WSL alternative is not optimal.
  • Livekit, while functional, still lacks several essential features compared to Daily.
  • WebSockets are not a viable alternative as they use a different protocol and offer fewer capabilities than the other mentioned services.

Given that there are numerous open-source WebRTC projects available that are both free to use and privacy-friendly, are there any plans to integrate additional WebRTC transport layers beyond the currently available options?

Thank you

@golbin
Copy link
Contributor

golbin commented Nov 28, 2024

Hi @ttamoud ,

May I ask for your opinion? Could you share your thoughts on the limitations of LiveKit’s features? I’m curious about how LiveKit compares to Daily in terms of functionality.

I’ve been thinking that the Daily library seems a bit harder to use and more prone to bugs compared to LiveKit. Because of this, I’ve been considering LiveKit.

Thank you!

@ttamoud
Copy link

ttamoud commented Nov 28, 2024

While LiveKit offers a more straightforward implementation, it has several limitations compared to Daily:

Audio/Video Processing: Daily provides more sophisticated audio processing with built-in VAD support and better video quality control. LiveKit has more basic audio configuration options and a simpler video subscription model.

Advanced Features: Daily includes native transcription support, comprehensive dial-in/dial-out capabilities, and advanced recording functionality out of the box. These features are absent in LiveKit and would require additional components to implement.

Connection Management: Daily has more robust reconnection logic and better error handling, while LiveKit offers more basic reconnection strategy and simpler error handling mechanisms.

Hope that helps !

@golbin
Copy link
Contributor

golbin commented Nov 28, 2024

Thank you for sharing your experience, @ttamoud! It’s much appreciated.

@sadimoodi
Copy link
Author

As a side note: i am using daily with Audio-only and not experiencing any issue, its been stable and transcription is pretty accurate, VAD works perfectly well too.

@ttamoud
Copy link

ttamoud commented Nov 29, 2024

What do you mean by i'm using audio only, you mean using audio only with daily on windows or wsl maybe ? Or directly with no wsl ? If it's the case can you pls share how you did it ? I have multiple pipecat worfklows on standby just because of windows compatibility.

@sadimoodi
Copy link
Author

i contacted daily to enable audio only charging (much cheaper) for my account. I run on WSL inside windows and also on my redhat linux server

@aconchillo
Copy link
Contributor

@sadimoodi What version of Pipecat? Version 0.0.48 fixed websocket interruptions and also increased sample rate to 24000, however I think the websocket client is setup to 16000 so I think a resampling is happening.

Would it be possible for you to try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants