Cliptude Cliptude Docs

Uploading a Talking Head Video

The Talking Head format lets you upload a video of yourself speaking directly to camera. Cliptude transcribes your narration, extracts your clips, and overlays them on a generated background, giving your content a personal, on-camera feel without any manual editing.

What is the Talking Head Format?

Unlike other formats that generate a voiceover from a text prompt, Talking Head uses a video you record. The pipeline:

  • 1 Transcribes your speech with Whisper to extract the script.
  • 2 Uses face detection to locate and crop the speaking segments from your footage.
  • 3 Splits the video into scene clips aligned to the transcribed text.
  • 4 Composes the final video with your talking head clips placed on the chosen background, with optional overlays and captions.

Where to Find the Upload Input

The talking head upload field is on the Create Video page, in the Upload Talking Head Video section beneath the main prompt area.

  1. 1 Go to Create Video from the main navigation.
  2. 2 Scroll to the Upload Talking Head Video section and click the file selector.
  3. 3 Select your video file. The format will automatically be set to Talking Head and the prompt / duration fields become optional.
  4. 4 Submit the form. After upload you will be taken to the background selection step.

Accepted File Formats

Cliptude accepts the following video formats for talking head upload:

MP4 MOV WEBM AVI

MP4 (H.264) is recommended for the best compatibility and smallest file size. MOV files exported directly from iPhone or camera apps are also fully supported.

File Size Limit

Max file size

500 MB

Files larger than 500 MB will be rejected at upload time. If your recording exceeds this limit, compress it using a tool like HandBrake (free) before uploading.

What Happens After Upload

Auto Transcription

Whisper transcribes your audio. The transcript becomes the video script,no manual script entry required.

Face Detection & Clip Extraction

The pipeline detects your face, extracts the speaking segments, and splits the footage into per-scene clips.

Format Auto-Set

The video format is automatically set to Talking Head. You don't need to select it manually.

Background Selection

After upload you proceed directly to choosing a background. Your clips will be composited on top of the selected background video.

Tips for a Great Talking Head Video

  • Good lighting. Face the light source (a window or ring light) so your face is evenly lit. Avoid backlit setups where the background is brighter than your face.
  • Clear audio. Record in a quiet room. Use a lapel or USB microphone if possible,built-in laptop mics introduce noise that reduces transcription accuracy.
  • Keep your face visible. Look at the camera and avoid covering your face. The face detector needs a clear, forward-facing view throughout the video.
  • Solid or simple background. Recording in front of a plain wall or a simple background helps the face detector focus on you rather than the surroundings.
  • Compress before uploading. Export in MP4 at 1080p or 720p to keep file sizes manageable. Larger resolutions don't improve the final output quality.
Mutually exclusive with Voiceover upload. You can upload either a talking head video or an audio voiceover,not both in the same submission. If you only have a voice recording (no video), use the Voiceover Upload option instead.