What is AI Transcription?

Rewind.ai transcription converts audio and video to text using Whisper, an open-source speech recognition model. Upload a file and receive a text transcript in seconds.

Yes. Transcription costs roughly 4 tokens per second of audio. A 5-minute file costs about 1,200 tokens, and you receive 2,500 free tokens per day.

What languages are supported?

Whisper handles 99+ languages and detects the language from the audio automatically. No configuration needed.

What file formats are supported?

MP3, WAV, M4A, FLAC, OGG, MP4, WEBM and most common audio and video formats.

How accurate is the transcription?

Whisper large-v3 matches the accuracy of commercial speech-to-text services on clear audio. Quality depends on background noise, accents and audio bitrate.

Can I get timestamps?

Yes. You can export plain text, SRT subtitles with timestamps, WebVTT or JSON with word-level timing.

Do I need to sign up?

No. You can start transcribing without an account. Signing up is free and increases your daily token allowance.

Can I transcribe video files?

Yes. Use /transcribe/video/, upload an MP4, WebM or MOV file, and the tool extracts the audio track and transcribes it.

How does it compare to Otter.ai or Rev?

Rewind.ai runs the same Whisper model at no cost. Otter charges $8-24 per month; Rev charges per minute of audio.

Can I edit the transcription?

The output text is fully editable in the browser. Copy, adjust and download it in any supported format.

Yes. Visit /api/ for API documentation covering batch transcription and other endpoints.

Live Transcription Meeting Transcription Podcast Transcription Zoom Transcription Audio to Text Video to Text Phone Call Transcription More →

AI Transcription

Upload a file or record live to get a transcript with accurate timestamps. Supports speaker diarization, SRT/VTT subtitle export and 100+ languages with automatic detection. Cost is proportional to clip length. Runs on Whisper large-v3 and Parakeet (self-hosted), with Wizper and ElevenLabs STT available on paid plans.

Upload
Live

Record from your microphone — your transcript appears after you click Stop.

Common transcription uses

Interviews + podcasts

Speaker diarization labels each participant separately. Export SRT for a video editor or plain text for a written article.

Auto captions + subtitles

Upload your video, choose SRT or WebVTT, then use /video/subtitle/ to apply the captions. Two steps from clip to captioned video.

Meeting notes

Upload a Zoom or Teams recording to get a full transcript with speaker labels. Run the result through /write/summarize/ for concise bullet-point minutes.

Lectures + lessons

Turn a 90-minute lecture into text, then use /study/flashcards/ or /write/summarize/ to build study materials from it.

Foreign-language audio

Whisper identifies 99 languages automatically. Transcribe in the source language, then pass the text to /translate/ to convert it.

Legal + medical

Word-level timestamps, speaker labels and JSON export give you the detail needed for court-reporter or clinical-note work.

Rewind.ai transcription vs the alternatives

What you get	Rewind.ai	Otter.ai	Descript	Rev.com
Free daily usage	5K+ tokens/day	300 minutes/mo	1 hr/month	—
Engine	Whisper large-v3, Parakeet	Proprietary	Proprietary	Human + AI
Languages	99	English-focused	22	30+
Speaker diarization
SRT / VTT export		Paid	Paid
Public API		Limited	Limited
Live streaming STT	(free)	Paid	—	—
Sign-up required	No	Yes	Yes	Yes

Competitor figures are based on publicly listed free tiers as of 2026. Verify with each provider before relying on them.

Rewind.ai transcription uses Whisper large-v3. Handles audio files, video files and live microphone input. Speaker diarization, 99 languages, SRT/VTT/TXT export included.

How to Use AI Transcription

Upload your file

Drop in an audio or video file. No account needed to start.

Transcribe

Whisper turns the speech into text in seconds, with timestamps if you want them.

Edit and export

Fix anything that needs it, then copy the text or download it as SRT.

Call the transcription API directly

The endpoint follows the OpenAI REST format and accepts a bearer token, so whatever HTTP client you already use will work without changes. Token usage is metered the same way as in the browser.

API Documentation Get API Key

curl -X POST https://api.rewind.ai/v1/stt/ \
  -H "Authorization: Bearer sk-rewind-..." \
  -H "Content-Type: application/json" \
  -d '{"file": "@audio.mp3", "language": "auto"}'

Related Free AI Tools

Live Transcription

Meeting Transcription

Podcast Transcription

Zoom Transcription

Audio to Text

Video to Text

Phone Call Transcription

AI Transcription FAQ

Free AI Transcription converts audio and video files to text using the Whisper speech-recognition model. Upload a file and get text back in seconds.

Yes! Transcription costs ~4 tokens per second of audio. A 5-minute file costs ~1,200 tokens. You get 2,500/day free.

Whisper supports 99+ languages with automatic language detection. Just upload your audio and it detects the language automatically.

MP3, WAV, M4A, FLAC, OGG, MP4, WEBM and most common audio/video formats.

Whisper is one of the most accurate STT models available, comparable to commercial services. Accuracy varies by audio quality and language.

Yes! Choose between plain text or timestamped output (SRT subtitle format).

Up to 25MB for anonymous users, 100MB for signed-in users. For larger files, split them first.

No! Transcribe files immediately without an account.

Yes, use /transcribe/video/, upload an MP4/WebM/MOV and we extract the audio and transcribe it.

Our transcription uses the same Whisper model and is completely free. Otter charges $8-24/month, Rev charges per minute.

The transcribed text is fully editable, copy, modify and download as needed.

Yes! Access our transcription API at /api/ for batch processing.

Create Free Account

No credit card required

How would you rate this tool?