Deep dive

Persian–English mixed-language meeting transcription: the real challenges

July 1, 2026· 3 min read· by AppNest
The short answer

Persian–English code-switching defeats most speech-to-text systems. Google Meet's live captions decode only one selected language, so the other language is dropped or transliterated. Many cloud STT engines are no better — for example, Deepgram excludes Persian from its multi-language mode. In practice, Google's Gemini models are the reliable option: they transcribe each part in the language actually spoken, keeping Persian in Persian script and English in English. That is why MeetConnect uses Gemini for its mixed-language audio pass.

What code-switching is

In many bilingual teams, a single sentence mixes languages: the conversation runs in Persian (Farsi) but technical terms, product names, or code are read aloud in English. This is code-switching, and it is the norm in a lot of engineering and startup meetings — not an edge case.

Why live captions fail on it

Google Meet's live captions are tied to a single selected caption language. Set it to Persian and the English stretches come out as phonetic Persian or vanish; set it to English and the Persian is lost. Meet has no live 'both languages' mode, so a genuinely mixed meeting cannot be captured cleanly by captions alone.

NoteMeetConnect tags each caption segment with a detected language so a switch is at least visible, but the underlying live caption text is still limited to Meet's one selected language. The fix is the audio-based AI pass.

Why many STT engines also fail

You might expect a dedicated speech-to-text service to do better than Meet. Often it does not. In our own benchmarking on real Persian+English meeting audio:

EnginePersian↔English code-switching
Deepgram (multi)Excludes Persian from multi-language mode
Deepgram (fa)Mangles the English portions
Whisper-class (varies)Inconsistent on rapid switches
Gemini 2.5 / 3.xHandles it natively — each part as spoken

The takeaway: 'multilingual' on a spec sheet rarely means 'handles Persian and English in the same sentence.' Code-switching is a specific, harder capability.

How Gemini handles it

Given the recorded audio, Gemini transcribes each span in the language actually spoken — Persian in Persian script, English in English — and can keep an 'English for AI' mode that marks the originally-English speech for downstream agents. Because it sees the whole clip, it also diarizes speakers more consistently than a single-language live caption stream.

This is a post-meeting step, so the practical workflow is: capture live captions for free during the call, record the audio, and run a Gemini pass afterward for the accurate mixed-language transcript.

A practical workflow

  1. 1.Capture the meeting with live captions for an instant record.
  2. 2.Record the audio locally (see recording Meet locally).
  3. 3.Run the Gemini transcription pass for a faithful Persian+English transcript.
  4. 4.Keep both — the live caption .txt and the AI transcript — and compare where it matters.

Frequently asked questions

Why does Google Meet mistranscribe English words in a Persian meeting?+

Because Meet decodes captions in one selected language at a time. With Persian selected, English speech is forced through a Persian recognizer and comes out transliterated or dropped.

Can Deepgram transcribe mixed Persian and English?+

Not well. In our testing, Deepgram's multi-language mode excludes Persian, and its Persian mode mangles the English — so it is not suitable for Persian↔English code-switching.

What actually handles Persian–English code-switching?+

Google's Gemini models transcribe each part in the language actually spoken, keeping Persian in Persian script and English in English. MeetConnect uses Gemini for its mixed-language audio transcription.

Is the mixed-language transcript live or after the meeting?+

After the meeting. The live captions remain single-language; the faithful mixed-language transcript comes from an AI pass over the recorded audio.

Keep reading

Capture your next meeting like it mattered.

Live captions free, local-first recording, and AI transcription with your own key.

Add to Chrome — free