Cliptude Cliptude Docs

Uploading Your Own Voiceover

Already have a recorded narration? Skip the AI voice entirely and let Cliptude build the video around your own audio. The pipeline will transcribe your recording, align scenes to it, and use your voice as the final narration,no text prompt or duration estimate needed.

Where to Find the Upload Input

The voiceover upload field is on the Create Video page, in the Upload Your Own Voice section beneath the main prompt area.

  1. 1 Go to Create Video from the main navigation.
  2. 2 Scroll to the Upload Your Own Voice section and click the file selector.
  3. 3 Select your audio file and submit the form. The prompt and duration fields become optional when a file is attached.

Accepted File Formats

Cliptude accepts the following audio formats for voiceover upload:

MP3 M4A WAV WEBM OGG FLAC

Most standard audio recording and export formats are supported. If your editor exports in one of these formats, you can use it directly.

File Size & Duration Limits

Max file size

200 MB

Max audio duration

30 minutes

Files that exceed either limit will be rejected at upload time and you will be shown an error message.

What Happens After Upload

Uploading a voiceover changes the video generation pipeline in a few key ways:

  • Voice selection is skipped. You are taken directly to the background selection step since your audio is already provided.
  • Automatic transcription. Cliptude uses Whisper to transcribe your audio. The resulting text is used to generate scenes and source visuals,no manual script needed.
  • Your audio is used as-is. The ElevenLabs TTS step is bypassed entirely,credit usage for AI voice generation is not charged.
  • Duration is measured automatically. The length of your audio determines the final video duration,no need to set a duration in the form.

Tips for Best Results

  • Record in a quiet environment with minimal background noise,cleaner audio produces more accurate transcriptions.
  • Prefer MP3 or WAV at 44.1 kHz or 48 kHz for optimal transcription quality.
  • Speak clearly and at a natural pace,rushed or heavily accented speech may reduce transcription accuracy.
  • Keep the recording focused on the topic,unrelated sections at the start or end can affect scene segmentation.
Mutually exclusive with Talking Head upload. You can upload either a voiceover audio file or a talking head video,not both in the same submission. If you need to use a talking head video with your own narration, see the Talking Head Upload page instead.