Avatar Onboarding Guide — SocialSEO (Client Upload Instructions)

Follow this end-to-end checklist to capture and upload every asset required for avatar training. The content below mirrors the downloadable README for consistency across vendors and Ops.

Last updated: 29 September 2025

What we need from you — short (copy/paste)

Record 4 separate continuous takes (one wardrobe/look per take). 3–5 minutes per take.
Provide 30–180 minutes of clean voice audio for Professional Voice Cloning (30 min minimum; 60+ recommended).
Upload everything in the exact folder structure and name files using the template. Do not edit or splice takes before upload.

Quick start checklist

Review wardrobe options and set up your camera, lighting, and quiet recording environment.
Record four 3–5 minute video takes (one continuous take per look) including consent and brand phrases.
Capture 30–180 minutes of voice audio in WAV or 192kbps+ MP3 plus a pronunciation list read-through.
Fill out the power words list and confirm pronunciation notes for brand terms and names.
Name files using the folder template and upload them into the correct directories.
Complete the QA checklist, confirm consent assets, and notify SocialSEO Ops via WhatsApp.

Interactive quick start tracker (optional)

Use these checkboxes to track progress locally. Your selections stay on this browser only. If JavaScript is disabled, this section behaves like a standard checklist.

Review wardrobe options and set up your camera, lighting, and quiet recording environment.

Record four 3–5 minute video takes (one continuous take per look) including consent and brand phrases.

Capture 30–180 minutes of voice audio in WAV or 192kbps+ MP3 plus a pronunciation list read-through.

Fill out the power words list and confirm pronunciation notes for brand terms and names.

Name files using the folder template and upload them into the correct directories.

Complete the QA checklist, confirm consent assets, and notify SocialSEO Ops via WhatsApp.

Video recording rules

Camera & framing

Use 4K or 1080p, 25/30 fps progressive. Mount the camera at eye level and keep the lens centred.
Frame head and shoulders with 3–4 fingers of headroom. Keep both ears visible and maintain eye contact with the lens.
Stabilise the camera on a tripod or solid surface. No handheld motion or auto-zoom.

Gestures & delivery

Keep gestures natural and below the chest line. Avoid touching your face or covering your mouth.
Speak clearly, smile occasionally, and include your full name and brand keywords in each take.
Hold a steady posture. Do not cut between takes; provide single continuous files per look.

Lighting & wardrobe

Use soft, even, front-biased lighting. Eliminate strong shadows, flicker, and mixed colour temperatures.
Wear four different mid-tone outfits. Avoid chroma clash (no green for greenscreen), tight stripes, or reflective jewellery.
Keep background clean or greenscreen without wrinkles. Turn off ceiling fans/AC for silent ambience.

Single take rule: each wardrobe look must be one uninterrupted 3–5 minute file. Do not edit, splice, or apply filters before upload.

Voice / PVC instructions

Deliver clean vocal recordings suitable for Professional Voice Cloning (PVC). Aim for 60+ minutes; 30 minutes is the minimum acceptable duration.

Total duration: 30–180 minutes of natural speech recorded across multiple clips (3–10 minutes each works best).
File formats: WAV (preferred) at 44.1 kHz or 48 kHz, 16-bit or 24-bit; alternatively MP3 at 192 kbps or higher.
Capture in a quiet, untreated room with HVAC off. Maintain a consistent 15–20 cm mic distance and avoid clipping above −6 dBFS.
Include a pronunciation list read-through covering brand names, jargon, regional words, and power words.
Record 2–3 emotional variations (neutral, energetic, empathetic) plus a verification phrase: “I confirm SocialSEO may use this voice recording to create my AI voice clone.”

Pronunciation & power words

Create your personalised pronunciation and power words list below to keep important terms handy while you record.

Your pronunciation list

Folder & filename convention

Replicate this folder tree exactly when submitting assets. Do not rename or compress folders.

ClientName_AVATAR_ONBOARDING/
├─ 01_raw_video/
│   ├─ look01_2025-09-28_take01.mp4
│   ├─ look02_2025-09-28_take01.mp4
├─ 02_voice_clips_PVC/
│   ├─ pvc_part01_2025-09-28.mp3  (30-60 mins total)
│   ├─ pronunciation_list_read.mp3
├─ 03_talking_heads_archive/
├─ 04_stills_headshots/
├─ 05_extras/
└─ consent/
    └─ consent_yourname_2025-09-29.mp3

QA checklist before upload

All four video looks exported as single continuous takes, eye-level framing, hands below chest.
Voice bundle totals at least 30 minutes across WAV/MP3 files at required sample rate and bitrate.
Consent audio present and named consent_[NAME]_YYYYMMDD.mp3.
Pronunciation list (text + audio read) included in 02_voice_clips_PVC/.
Folder structure exactly matches template with no extra ZIP layers.
WhatsApp notification sent to SocialSEO Ops with upload confirmation and link.

WhatsApp micro-fix templates

Copy and personalise when requesting quick fixes.

Hi [Name], thanks for uploading. Quick request: can you re-record Look02 keeping hands below chest and keeping direct eye contact? Please record one continuous take and upload to 01_raw_video/look02_YYYY-MM-DD_take01.mp4. Example: https://example.com/sample-good-take

Vendor-specific notes

HeyGen

Use 4K masters when possible. HeyGen performs best with evenly lit greenscreen plates and explicit pronunciation audio. Flag any shots with jewellery or hair covering ears.

Runway

Ensure background is flat or keyed to solid colour. Provide still headshots in 4K PNG along with the talking head masters. Runway QC requires a written summary of wardrobe colours.

ElevenLabs

Deliver 90+ minutes of raw voice where possible and include emotional read markers in filenames (e.g., pvc_part03_energy.wav). Upload the consent audio alongside pronunciation references.