Description
This thesis presents Translatica, a modular speech-to-speech translation (S2ST) system
that preserves both linguistic meaning and the speaker’s vocal identity across languages. Alongside developing a working prototype, this work surveys the landscape of S2ST methods and motivates the choice of a modular architecture over direct approaches, emphasizing flexibility, interpretability, and voice fidelity. The system combines state-of-the-art tools in transcription, translation, and voice synthesis to enable expressive, speaker-preserving dubbing of prerecorded videos. Through implementation and evaluation, the thesis explores the trade-offs between accuracy, latency, and control, demonstrating how modular design enables customization for diverse use cases. Future work includes real-time translation, enhanced speaker tracking, and applications in education and live media.
Details
Contributors
- Jhaj, Baaz (Author)
- Ramani, Krishna (Co-author)
- Hsu, Jeffrey (Co-author)
- Osburn, Steven (Thesis director)
- Zhu, Haolin (Committee member)
- Barrett, The Honors College (Contributor)
- Computer Science and Engineering Program (Contributor)
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2025-05