Transcription Analysis

This page evaluates whether modern AI transcription tools can reliably convert spoken Burmese audio into English text for archival use. Two artifacts were tested: the Joy and Pain Interview, a field interview with three members of the Karenni resistance, and the protest song Kabar Ma Kyay Bu (We Will Never Forget). Both recordings contain Burmese speech and represent key oral testimony in the archive.

Several transcription systems were tested, including OpenAI Whisper (large-v2 and large-v3), Gemini 3.5 Pro Preview, and ElevenLabs' browser-based transcription tool using the Scribe V2 model. In practice, nearly all models failed completely when processing the Burmese audio. Whisper and Gemini did not produce usable English transcription, despite documentation claiming support for Burmese speech recognition. Because no reliable automated output was produced, it was not possible to establish a meaningful ground-truth comparison with human transcription.

The only model capable of producing partial results was ElevenLabs Scribe V2, though this required manual use of the browser interface rather than the API and still produced incomplete output. The results show a key limitation of current transcription models. Language support claims do not always translate into reliable performance in real-world archival material, particularly when recordings include accents, background noise, or humanitarian field conditions. For archival work, current tools remain unreliable for Burmese-language transcription without significant human intervention.

← Back to AI Analysis