Translatica: A Survey and Implementation Study on Speech-to-Speech Translation and Voice Synthesis with Speaker Preservation

Jhaj, Baaz

Description

This thesis presents Translatica, a modular speech-to-speech translation (S2ST) system that preserves both linguistic meaning and the speaker’s vocal identity across languages. Alongside developing a working prototype, this work surveys the landscape of S2ST methods and motivates the choice of a…

This thesis presents Translatica, a modular speech-to-speech translation (S2ST) system that preserves both linguistic meaning and the speaker’s vocal identity across languages. Alongside developing a working prototype, this work surveys the landscape of S2ST methods and motivates the choice of a modular architecture over direct approaches, emphasizing flexibility, interpretability, and voice fidelity. The system combines state-of-the-art tools in transcription, translation, and voice synthesis to enable expressive, speaker-preserving dubbing of prerecorded videos. Through implementation and evaluation, the thesis explores the trade-offs between accuracy, latency, and control, demonstrating how modular design enables customization for diverse use cases. Future work includes real-time translation, enhanced speaker tracking, and applications in education and live media.

Details

Contributors

Jhaj, Baaz (Author)
Ramani, Krishna (Co-author)
Hsu, Jeffrey (Co-author)
Osburn, Steven (Thesis director)
Zhu, Haolin (Committee member)
Barrett, The Honors College (Contributor)
Computer Science and Engineering Program (Contributor)

Date Created

2025-05

Topical Subject

Additional Information

Language English

Series

Academic Year 2024-2025

Extent

29 pages

Open Access

Peer-reviewed

Translatica: A Survey and Implementation Study on Speech-to-Speech Translation and Voice Synthesis with Speaker Preservation

Downloads

Details

Machine-readable links