Over the past few months, we’ve been working on a project called PolyTalk.
The original goal was pretty simple: make real-time multilingual communication possible without depending on external translation APIs or cloud-only services.
While testing existing solutions, we noticed that many of them required sending conversations through third-party infrastructure. That works for some use cases, but it wasn’t a great fit for organizations that care about privacy, deployment flexibility, or keeping communication workflows under their own control.
So we started building a self-hosted, open-source speech-to-speech translation platform instead.
A few things we’ve focused on:
Real-time speech translation Self-hosted deployment Open-source core No external translation APIs Live audio translation
The project is still evolving, but it’s been interesting exploring the challenges of multilingual communication, local AI infrastructure, and real-time translation workflows.
I’d be curious to hear how others here approach translation.
Are you using cloud-based services, self-hosted tools, or something in between?
GitHub: https://github.com/PolyTalkIO/polytalk
Website: https://polytalk.io/


Thanks! You’re right that there are already excellent open-source STT, translation, and TTS projects. PolyTalk isn’t trying to replace them, and we build on top of them. What we’re focused on is creating a complete, self-hosted real-time communication platform that ties those components together and can handle different audio sources (microphones, meetings, browser tabs, system audio, etc.) through a single workflow.
Regarding latency, we’re not targeting sub-200 ms. In our testing, we’ve intentionally favored translation quality and conversational flow over minimizing latency at all costs. Depending on the setup, end-to-end latency is typically around 2 seconds. One thing we’ve already improved is processing complete sentences rather than translating word-by-word. That gives the translation more context and generally produces much more natural results. We’re also working on additional context-aware translation improvements, and tone adaptation is on our roadmap.