What Is an SRT File? How to Open, Edit & Auto-Generate | Subanana

Last week a freelance video-editor friend WhatsApp'd me a file called interview_cut3.srt with the message: "Double-click doesn't work, Final Cut says timecode error, what now?" I opened it in VS Code. The very first byte was an invisible — a UTF-8 BOM — and two of the cues overlapped by about 0.2 seconds. Final Cut wasn't being dramatic; that file really was broken. Ten minutes of cleanup later it imported fine.

SRT files are genuinely simple. So simple that most people underestimate how easy they are to read, fix, and generate. This post walks through it in one minute: what an .srt file actually is, what you can open it with, how to edit the timecodes and text without breaking it, and — the thing most readers actually want — how to auto-generate one from a video or audio file in minutes, including Cantonese.

What is an .srt file?

SRT stands for SubRip Subtitle. The format came out of a late-1990s DVD-ripping tool called SubRip, and its design philosophy is one sentence: plain text that says "at this time, show this text."

Open any .srt in a text editor and you'll see something like this:

1
00:00:00,560 --> 00:00:05,400
第一個同學 出到去拎個波

2
00:00:06,430 --> 00:00:09,592
返嚟 好喇 然後

Each cue is four lines:

An index number — 1, 2, 3, and so on.
The timecode — start --> end, in HH:MM:SS,mmm format. Note the comma before the milliseconds. It has to be a comma, not a period. Some older VLC builds just refuse to parse a period.
The subtitle text — can span multiple lines.
A blank line to mark the end of the cue.

That's the whole spec. No fonts, no colors, no effects — those belong to ASS or VTT. SRT's entire value is that almost everything opens it: YouTube, Netflix, Vimeo, VLC, Premiere, Final Cut, DaVinci Resolve, Notepad — they all read it.

How to open an .srt file

Three options, each good for a different job.

1. A text editor — fastest, no preview

Notepad on Windows, TextEdit on Mac, VS Code anywhere. Opens instantly, lets you fix typos or nudge timecodes by hand. You just can't see the subtitle on top of the video.

One gotcha specific to HK Windows users: if you save a Chinese-text SRT in Notepad, it defaults to UTF-8 with BOM — that invisible at the start of the file. Some older players (certain PotPlayer builds especially) silently swallow the first cue's index number because of it. Save as "UTF-8 without BOM" in VS Code or Notepad++ and the problem goes away.

2. A media player — just watch it

Drop the .srt and the video into the same folder with the same base filename (e.g. my_video.mp4 and my_video.srt), open the video in VLC, and the subtitles load automatically. This is the laziest path and it works. VLC is also one of the few players that handles mixed encodings (UTF-8, Big5, GB18030) without getting confused, which matters if you're bouncing files between mainland, TW, and HK collaborators.

Windows Media Player will also load sidecar SRTs, but its Chinese-encoding handling is noticeably worse than VLC's.

3. A subtitle editor — serious editing

Three worth knowing about:

Subtitle Edit (Windows, free). The most feature-complete option: waveform view, translation hooks, format conversion between SRT, ASS, and VTT. It's what pro fansubbers reach for. The UI looks dated, and Mac users have to install it via mono, which is a pain.
Aegisub (cross-platform, free). Designed for .ass effects subtitles, but it opens SRT fine. Precise timeline-plus-waveform alignment. If all you need is clean SRT with no effects, Aegisub has more knobs than you'll use.
Subanana (browser-based). Nothing to install; it runs in the browser. Its focus is generating an SRT from scratch, not fine-editing one you already have. If you just want to fix three typos in an existing file, Subtitle Edit is faster. What Subanana is built for is the next section.

How to auto-generate an .srt file

This is what most readers actually want: not to fix a file, but to produce one from nothing — you have a video or audio clip and you want a properly timecoded subtitle file coming out the other side.

Typing it by hand and eyeballing the timecodes takes about 40 to 60 minutes per 10 minutes of video. With AI, that drops to 2 to 3 minutes, and the accuracy sits around 90 to 98% depending on the language and how clean the audio is.

Subanana's AI subtitle tool is built around that workflow. The HK bilingual angle — which is where I spend most of my time — has two pieces that matter:

Cantonese accuracy sits around ~95% on clean audio. Mandarin is actually higher for us (more training data, less accent variance), but Cantonese is where most HK creators need help, and it's where most English-first tools struggle.

口語→書面語 (colloquial Cantonese to written Chinese) is a one-click toggle. This is Subanana's biggest single differentiator for HK bilingual creators. Cantonese-speakers read Chinese that looks nothing like how they talk — the written form is closer to Mandarin grammar. So a creator filming in Cantonese has to choose: subtitle the Cantonese the way it's spoken (accessible to HK viewers, looks odd to everyone else), or translate it into written Chinese by hand (expensive). The toggle does the second one automatically. Concretely:

Spoken: 佢哋而家唔使咁做
Written: 他們現在不需要這樣做

Same meaning, different register. Most English-first STT tools can't do that rewrite because they don't model Cantonese-vs-written-Chinese as a register problem. If you're a HK YouTuber or course creator, this is the feature that justifies the subscription.

The end-to-end flow:

Upload a file or paste a social link. Supported formats are .mp4 / .mov / .webm / .ogg, up to 15 GB / 3 hours on paid plans. Or paste a public YouTube / Instagram / Facebook URL — Subanana fetches and transcribes straight from the link, no local download. That last part is the unusual bit; most competitors are file-upload only, and Taption does YouTube but not IG or FB.
Pick your source language. Cantonese (廣東話 / Hong Kong), Mandarin (華語), English, Japanese, Korean — 80+ total.
Hit generate. In the background we route the audio to whichever STT model currently benches best for your chosen language. We continuously evaluate models per language rather than locking to a single vendor, so when a different model starts winning on Cantonese or Japanese, traffic shifts automatically. You don't see any of that; you just get the output.
QA runs before you see the transcript. Three layers:
- Hallucination detection with automatic model substitution — if the first model's output shows signs of hallucinating (content that doesn't match the audio), the system reruns those segments on a different benched model. You see the cleaner result, not the retry.
- LLM-assisted proofreading on the text. The editor runs an LLM pass over the transcript and flags likely misheard words (wrong word picked) and homophone substitutions (right sound, wrong character). Each suggestion is propose-and-confirm — you approve or reject; nothing auto-applies. Scope: substitution errors only. It does not detect missing characters (漏字) and it never touches timecodes — that's STT's job, not the LLM's.
- CPS (characters-per-second) flags — the editor highlights cues where the text is too dense for a viewer to read comfortably, or where a cue hangs on screen too long with too little text. A simple readability check, so you know which lines to look at first.
For Cantonese, toggle 口語→書面語 if you want written-Chinese output instead of spoken-Cantonese transcription.
Edit in the browser — text and timecodes are both editable inline.
Export. Six standalone formats — SRT, VTT, TXT, DOCX, XLSX, MD — plus a ZIP bundle containing all six. Bilingual dual-language SRT (source + translation stacked per cue) is available in the same export step, and so is one-click burned-in video if you want a finished MP4 rather than a sidecar file.

If the source is an interview, podcast, or two-person conversation, Subanana's transcript mode also does speaker diarization (it labels who's saying what), which saves a lot of time in the edit.

For the Chinese-language version of this walkthrough, see the 繁中版 of this post.

The 3 most common SRT editing mistakes

Mistake 1: Overlapping timecodes

Cue 1 ends at 00:00:05,500 and cue 2 starts at 00:00:05,200. VLC plays it anyway. Premiere and Final Cut refuse to import it. Fix: make sure each cue's end time is earlier than the next cue's start time, with at least a 0.05-second gap between them.

Mistake 2: Period instead of comma for milliseconds

00:00:01.500 is wrong. The spec requires 00:00:01,500 — comma, not period. This one shows up constantly when someone hand-converts a VTT file (VTT uses periods) into SRT.

Mistake 3: UTF-8 BOM on Chinese SRTs

The thing that broke my friend's interview_cut3.srt. Open the file in VS Code, look at the bottom-right status bar for the encoding indicator. If it says "UTF-8 with BOM," click it, switch to "UTF-8," save. Problem gone. This is the single most common cause of "my SRT won't load" tickets in HK Windows environments, because Notepad's default save behavior still writes the BOM.

Wrapping up

The SRT format itself is trivial: index, timecode, text, blank line. Opening one is easy — text editor to eyeball it, VLC to watch it, Subtitle Edit or Aegisub to fine-tune it. The part that used to cost hours, and the part most readers are really here for, is generating the first SRT from a raw video or audio file — and that's the job AI is genuinely good at now.

If you're sitting on a video, a recording, or an interview you need subtitled, try Subanana. The free plan handles files up to 15 minutes, which is usually enough to test on your actual footage before committing. Paid plans start at US$9/mo (approx HK$68/mo, annual billing) — full pricing is on the pricing page.

SRT File Explained: How to Open, Edit, and Auto-Generate One