What We Learned Building a Self-Hosted Speech Translation Platform

dhs@lemmy.world · 6 days ago

What We Learned Building a Self-Hosted Speech Translation Platform

grapemix@lemmy.ml · 5 days ago

It’s nice to you release as open source self hosted pj and also nice to see py and FastApi stack. There has multiple open source tts sst pipeline, what is your pj’s selling points? I can’t find in your src. Amd what exactly is near real-time? Sub 200 ms?

dhs@lemmy.world · 20 hours ago

Thanks! You’re right that there are already excellent open-source STT, translation, and TTS projects. PolyTalk isn’t trying to replace them, and we build on top of them. What we’re focused on is creating a complete, self-hosted real-time communication platform that ties those components together and can handle different audio sources (microphones, meetings, browser tabs, system audio, etc.) through a single workflow.

Regarding latency, we’re not targeting sub-200 ms. In our testing, we’ve intentionally favored translation quality and conversational flow over minimizing latency at all costs. Depending on the setup, end-to-end latency is typically around 2 seconds. One thing we’ve already improved is processing complete sentences rather than translating word-by-word. That gives the translation more context and generally produces much more natural results. We’re also working on additional context-aware translation improvements, and tone adaptation is on our roadmap.