Follow this end-to-end checklist to capture and upload every asset required for avatar training. The content below mirrors the downloadable README for consistency across vendors and Ops.
Last updated:
What we need from you — short (copy/paste)
- Record 4 separate continuous takes (one wardrobe/look per take). 3–5 minutes per take.
- Provide 30–180 minutes of clean voice audio for Professional Voice Cloning (30 min minimum; 60+ recommended).
- Upload everything in the exact folder structure and name files using the template. Do not edit or splice takes before upload.
Quick start checklist
- Review wardrobe options and set up your camera, lighting, and quiet recording environment.
- Record four 3–5 minute video takes (one continuous take per look) including consent and brand phrases.
- Capture 30–180 minutes of voice audio in WAV or 192kbps+ MP3 plus a pronunciation list read-through.
- Fill out the power words list and confirm pronunciation notes for brand terms and names.
- Name files using the folder template and upload them into the correct directories.
- Complete the QA checklist, confirm consent assets, and notify SocialSEO Ops via WhatsApp.
Interactive quick start tracker (optional)
Use these checkboxes to track progress locally. Your selections stay on this browser only. If JavaScript is disabled, this section behaves like a standard checklist.
Video recording rules
Camera & framing
- Use 4K or 1080p, 25/30 fps progressive. Mount the camera at eye level and keep the lens centred.
- Frame head and shoulders with 3–4 fingers of headroom. Keep both ears visible and maintain eye contact with the lens.
- Stabilise the camera on a tripod or solid surface. No handheld motion or auto-zoom.
Gestures & delivery
- Keep gestures natural and below the chest line. Avoid touching your face or covering your mouth.
- Speak clearly, smile occasionally, and include your full name and brand keywords in each take.
- Hold a steady posture. Do not cut between takes; provide single continuous files per look.
Lighting & wardrobe
- Use soft, even, front-biased lighting. Eliminate strong shadows, flicker, and mixed colour temperatures.
- Wear four different mid-tone outfits. Avoid chroma clash (no green for greenscreen), tight stripes, or reflective jewellery.
- Keep background clean or greenscreen without wrinkles. Turn off ceiling fans/AC for silent ambience.
Single take rule: each wardrobe look must be one uninterrupted 3–5 minute file. Do not edit, splice, or apply filters before upload.
Voice / PVC instructions
Deliver clean vocal recordings suitable for Professional Voice Cloning (PVC). Aim for 60+ minutes; 30 minutes is the minimum acceptable duration.
- Total duration: 30–180 minutes of natural speech recorded across multiple clips (3–10 minutes each works best).
- File formats: WAV (preferred) at 44.1 kHz or 48 kHz, 16-bit or 24-bit; alternatively MP3 at 192 kbps or higher.
- Capture in a quiet, untreated room with HVAC off. Maintain a consistent 15–20 cm mic distance and avoid clipping above −6 dBFS.
- Include a pronunciation list read-through covering brand names, jargon, regional words, and power words.
- Record 2–3 emotional variations (neutral, energetic, empathetic) plus a verification phrase: “I confirm SocialSEO may use this voice recording to create my AI voice clone.”
Pronunciation & power words
Create your personalised pronunciation and power words list below to keep important terms handy while you record.
Consent & verification
Record the consent script as audio and reference the naming convention below. Upload the consent audio file inside the consent/ folder.
“I, [Full name], confirm that this is my voice and I give permission to SocialSEO and its service providers to create and use an AI voice clone of my voice for content creation and brand media. I confirm I have the rights to provide these recordings and understand how the generated voices will be used.”
Filename format: consent_[NAME]_YYYYMMDD.mp3
Folder & filename convention
Replicate this folder tree exactly when submitting assets. Do not rename or compress folders.
ClientName_AVATAR_ONBOARDING/
├─ 01_raw_video/
│ ├─ look01_2025-09-28_take01.mp4
│ ├─ look02_2025-09-28_take01.mp4
├─ 02_voice_clips_PVC/
│ ├─ pvc_part01_2025-09-28.mp3 (30-60 mins total)
│ ├─ pronunciation_list_read.mp3
├─ 03_talking_heads_archive/
├─ 04_stills_headshots/
├─ 05_extras/
└─ consent/
└─ consent_yourname_2025-09-29.mp3
QA checklist before upload
- All four video looks exported as single continuous takes, eye-level framing, hands below chest.
- Voice bundle totals at least 30 minutes across WAV/MP3 files at required sample rate and bitrate.
- Consent audio present and named
consent_[NAME]_YYYYMMDD.mp3. - Pronunciation list (text + audio read) included in
02_voice_clips_PVC/. - Folder structure exactly matches template with no extra ZIP layers.
- WhatsApp notification sent to SocialSEO Ops with upload confirmation and link.
WhatsApp micro-fix templates
Copy and personalise when requesting quick fixes.
Hi [Name], thanks for uploading. Quick request: can you re-record Look02 keeping hands below chest and keeping direct eye contact? Please record one continuous take and upload to 01_raw_video/look02_YYYY-MM-DD_take01.mp4. Example: https://example.com/sample-good-take
Vendor-specific notes
HeyGen
Use 4K masters when possible. HeyGen performs best with evenly lit greenscreen plates and explicit pronunciation audio. Flag any shots with jewellery or hair covering ears.
Runway
Ensure background is flat or keyed to solid colour. Provide still headshots in 4K PNG along with the talking head masters. Runway QC requires a written summary of wardrobe colours.
ElevenLabs
Deliver 90+ minutes of raw voice where possible and include emotional read markers in filenames (e.g., pvc_part03_energy.wav). Upload the consent audio alongside pronunciation references.