Over the past few months, we’ve been working on a project called PolyTalk.

The original goal was pretty simple: make real-time multilingual communication possible without depending on external translation APIs or cloud-only services.

While testing existing solutions, we noticed that many of them required sending conversations through third-party infrastructure. That works for some use cases, but it wasn’t a great fit for organizations that care about privacy, deployment flexibility, or keeping communication workflows under their own control.

So we started building a self-hosted, open-source speech-to-speech translation platform instead.

A few things we’ve focused on:

Real-time speech translation Self-hosted deployment Open-source core No external translation APIs Live audio translation

The project is still evolving, but it’s been interesting exploring the challenges of multilingual communication, local AI infrastructure, and real-time translation workflows.

I’d be curious to hear how others here approach translation.

Are you using cloud-based services, self-hosted tools, or something in between?

GitHub: https://github.com/PolyTalkIO/polytalk

Website: https://polytalk.io/

  • artifex@piefed.social
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 days ago

    This is a pretty interesting project! Assuming one wanted to run everything locally, what’s the minimum viable hardware stack for near-realtime performance?

    • dhs@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      20 hours ago

      Thanks! We’re still benchmarking different setups, so I don’t want to give a misleading “minimum spec” number yet. In practice, the hardware requirements depend much more on the STT/translation/TTS models you choose than on PolyTalk itself. For a single-user setup, you don’t necessarily need expensive hardware. As you push for lower latency, larger models, or multiple simultaneous streams, the requirements increase pretty quickly. Proper hardware benchmarks are something we plan to publish once we’ve tested a wider range of configurations.