What We Learned Building a Self-Hosted Speech Translation Platform

dhs@lemmy.world · 6 days ago

What We Learned Building a Self-Hosted Speech Translation Platform

dhs@lemmy.world · 21 hours ago

Thanks! You’re right that there are already excellent open-source STT, translation, and TTS projects. PolyTalk isn’t trying to replace them, and we build on top of them. What we’re focused on is creating a complete, self-hosted real-time communication platform that ties those components together and can handle different audio sources (microphones, meetings, browser tabs, system audio, etc.) through a single workflow.

Regarding latency, we’re not targeting sub-200 ms. In our testing, we’ve intentionally favored translation quality and conversational flow over minimizing latency at all costs. Depending on the setup, end-to-end latency is typically around 2 seconds. One thing we’ve already improved is processing complete sentences rather than translating word-by-word. That gives the translation more context and generally produces much more natural results. We’re also working on additional context-aware translation improvements, and tone adaptation is on our roadmap.