How to create a Street Food Spice Trial video
The prompt
Subject: Traveler gripping a dented tin plate heaped with a fiery local street snack; olive linen button-up with sleeves rolled, worn leather shoulder bag with a bamboo straw poking out. Action: (0:00–0:07) Turns to lens with a wide grin and lifts the plate; dips camera/plate downward briefly to showcase the dish; leans back in with a low, conspiratorial smirk; delivers hook lines; holds a single silent beat; closes with "let's find out." Scene: Late-night hawker alley in Chiang Mai; amber lanterns and strung Edison bulbs overhead; steam rising from clay pots and flat iron griddles; wooden push carts; low rattan stools; hand-chalked menu boards; motorbikes nudging through the crowd; customers clustered around stalls; damp cobblestones mirroring orange and pink light; chili paste glistening with dried chilies, toasted peanuts, lime zest, and torn holy basil; vendor spooning sauce in the background; foot traffic adds layered depth. Style: Vertical 9:16, handheld selfie at arm's length, 24–26mm phone-wide; traveler framed in upper third with stall and plate anchoring lower frame; single continuous take with optional micro whip or tap-to-focus for B-roll inserts; warm practical lantern light as key, pink neon as rim; natural "phone-real" color grade — no heavy processing. Camera is the phone in the traveler's hand at chest-to-eye level. Dialogue: Traveler says: "I'm in Chiang Mai — locals swear this is the hottest thing on the street." Traveler says: "Look at that chili paste… it's practically glowing." (amused disbelief) Traveler says: "Apparently most visitors can't make it past the first bite." (conspiratorial lean) Traveler says: "Let's find out." (confident grin) Voice-Over: None — all lines are direct on-camera delivery. Sound Effects: Traveler's voice clean and upfront; ambient hawker alley bed — griddle hiss, iron spatula scrapes, vendor shouts, crowd murmur (mixed low-to-mid); no unwanted music (optional ultra-low lo-fi texture kept under −20 LUFS); short sizzle accent timed to the plate tilt-down. NEGATIVE: subtitles, captions, watermarks, text overlays, logos, poor lighting, low resolution, compression artifacts, oversaturation, over-sharpening, inconsistent character appearance, cartoonish skin, distorted hands, audio sync issues, banding, jitter beyond natural handheld wobble, extreme motion blur, unwanted crowd screams.
How it works
- 1Tweak the prompt or pick a different model.
- 2Hit Generate — your clip renders in seconds.
- 3Open it in the editor to build a full video.