VIRAL ENGINE — The Full Guide

How to engineer a viral video using AI that simulates how your brain reacts to every second of it

What you're getting: A step-by-step setup for Meta's TRIBE v2 brain model + how to connect it to Claude to compare up to 10 video versions and find the winner. Zero coding required.

What is TRIBE v2?

Meta's AI research team (FAIR) just open-sourced a model called TRIBE v2.

It was trained on 1,115 hours of fMRI brain scans from 720 real volunteers — people who watched movies, listened to podcasts, and read text while their brain activity was recorded at high resolution inside an MRI machine.

TRIBE v2 learned the patterns. Now it can predict how a human brain responds to any video — without putting anyone in an MRI machine.

You give it a video. It tells you — second by second — which brain regions are lighting up and which are going dark.

When attention is spiking

When engagement flatlines

When a moment gets encoded into memory

When the viewer is about to scroll

Meta's own research shows TRIBE v2's predictions are more accurate than a single real brain scan — because real scans are noisy (heartbeats, movement, device artifacts). The model strips all that out.

They open-sourced everything. It's completely free.

Paper: TRIBE: TRImodal Brain Encoder — arXiv 2507.22229Meta blog: Introducing TRIBE v2License: CC BY-NC 4.0 (free for non-commercial use)

What You'll Need (All Free)

Tool	What it's for	Cost
Google Colab	Runs the model in the cloud — no computer required	Free (Pro recommended)
Hugging Face account	Download the model weights	Free
Claude	Compare your video versions and tell you the winner	Free tier works
Your video file	What you're analyzing	—

Part 1: One-Time Setup (15–20 min)

if you a visual learner you can also follow this simple guide
https://www.youtube.com/watch?v=VER-F4wdA9Q

Step 1 — Open the Official Colab Notebook

Click this link to open Meta's official demo notebook:

→ Open TRIBE v2 Colab Notebook

This is Meta's own notebook. You don't need to write any code — every cell is already written for you.

Step 2 — Switch to a GPU

The model needs a GPU to run. Here's how to get one:

In the top menu, click Runtime

Click Change runtime type

Under "Hardware accelerator", select T4 GPU

Click Save

Tip: If you see errors about running out of memory later, upgrade to Colab Pro ($10/month) and select A100 GPU instead. The full TRIBE v2 model is large (≈30GB of weights loaded at once). T4 works for shorter videos; A100 is more reliable.

Step 3 — Create a Hugging Face Account

Go to huggingface.co

After signing in, click your profile picture → Settings → Access Tokens

Click New Token

Give it any name (e.g. tribe-token)

Set permission to Read

Click Create token and copy it — you'll need this in a minute

Step 4 — Request Llama Model Access

TRIBE v2 uses Meta's Llama 3.2 model to process any text in your video. Llama is gated — you need to request access first.

Go to the Llama 3.2 model page on Hugging Face

You'll see a form asking you to agree to Meta's terms

Fill it out and submit

Wait for approval — usually takes 30 min to 2 hours

You'll get an email when you're approved

You only need to do this once. After approval, your token automatically has access.

Step 5 — Add Your Token to Colab

Back in your Colab notebook:

Look at the left sidebar and click the 🔑 key icon (Secrets)

Click Add new secret

In the "Name" field type: HF_TOKEN

In the "Value" field, paste the token you copied from Hugging Face

Toggle on Notebook access

Step 6 — Install the Model

In the notebook, find the first code cell that looks like this:

!uv pip install "tribev2[plotting] @ git+https://github.com/facebookresearch/tribev2.git"

Click the ▶ play button on that cell to run it. It will install all the required packages.

When it finishes, you'll see a message asking you to restart the runtime. Do that:

Click Runtime → Restart session

Do NOT run the install cell again after restarting

Part 2: Running Your First Video

Step 7 — Load the Model

Run the next cell in the notebook:

from tribev2.demo_utils import TribeModel, download_file
from tribev2.plotting import PlotBrain
from pathlib import Path

CACHE_FOLDER = Path("./cache")

model = TribeModel.from_pretrained(
    "facebook/tribev2",
    cache_folder=CACHE_FOLDER,
)
plotter = PlotBrain(mesh="fsaverage5")

This downloads the model weights and gets everything ready. It will take 2–5 minutes the first time.

Step 8 — Upload Your Video

The notebook includes a sample video by default. To use your own video, find the cell that downloads the sample video and replace with this:

# video_path = CACHE_FOLDER / "sample_video.mp4"
# url = "https://download.blender.org/durian/trailer/sintel_trailer-480p.mp4"
# download_file(url, video_path)

from google.colab import files
from pathlib import Path
import shutil

uploaded = files.upload()
filename = next(iter(uploaded))

video_path = CACHE_FOLDER / filename
shutil.move(filename, video_path)

df = model.get_events_dataframe(video_path=video_path)
display(df.head(8)[["type", "start", "duration", "filepath", "text", "context"]])

Run that cell — an Upload button will appear. Click it and select your video file from your computer.

Supported formats: MP4, MOV, AVI
Recommended: Keep videos under 3 minutes for faster processing on free Colab

Step 9 — Run the Brain Analysis

Run these cells in order:

# Step 1: Extract the multimodal events from your video
df = model.get_events_dataframe(video_path=video_path)

# Step 2: Run brain prediction
preds, segments = model.predict(events=df)

print(f"Predictions shape: {preds.shape}")
# Output: (n_timesteps, n_vertices)
# n_timesteps = number of seconds in your video
# n_vertices = 20,000 brain surface points

This is the core step. TRIBE v2 is now predicting how a brain responds to every second of your video.

Note: TRIBE v2 outputs predictions offset by 5 seconds — this accounts for the natural delay between a stimulus and the brain's blood-flow response (hemodynamic lag). So second 10 in the output = the brain's reaction to second 5 of your video.

Step 10 — Visualize the Results

# Show the first 15 seconds of brain activity
plotter.plot(preds[:15], segments[:15], video_path=video_path)

This generates an interactive 3D brain heatmap synced to your video. You'll see:

Red/warm colors = high activation (brain is engaged)

Blue/cool colors = low activation (brain is disengaged)

The map updates second by second as the video plays

Key brain regions to watch:

Region	Location	What it means for your video
Dorsal attention network	Top/parietal	Viewer is actively paying attention
Visual cortex	Back of brain	Processing what they're seeing
Auditory cortex	Sides/temporal	Responding to sound/music
Default mode network	Spreads across midline	When this activates, mind is wandering — you're losing them
Hippocampus area	Medial temporal	Memory encoding — they'll remember this moment

Step 11 — Get Second-by-Second Numbers

To get numerical data you can feed into Claude, run this:

import numpy as np

# Get mean activation across all brain regions for each second
activation_per_second = preds.mean(axis=1)

# Get the top region activation for each second
top_region_per_second = preds.max(axis=1)

# Print a simple table
print("Second | Mean Activation | Peak Activation")
print("-------|-----------------|----------------")
for i, (mean_val, peak_val) in enumerate(zip(activation_per_second, top_region_per_second)):
    print(f"  {i+1:3d}s | {mean_val:15.4f} | {peak_val:.4f}")

Copy this output — you'll paste it into Claude in Part 3.

Part 3: Connect to Claude — Find Your Best Version

This is where it gets powerful. You can run up to 10 versions of your video through TRIBE v2, collect the second-by-second activation data from each one, and then ask Claude which one wins and exactly why.

How to do it

For each video version, run Steps 8–11 and copy the output table. You'll end up with something like:

Video 1 (original cut):
Second | Mean Activation | Peak Activation
  1s   | 0.0234          | 0.1832
  2s   | 0.0198          | 0.1644
  3s   | 0.0156          | 0.1421
...

Video 2 (re-edited hook):
Second | Mean Activation | Peak Activation
  1s   | 0.0412          | 0.2981
  2s   | 0.0389          | 0.2754
...

The Claude Prompt (copy-paste this)

In Claude Code paste this:

You are a neuromarketing analyst. I ran multiple versions of my video through Meta's TRIBE v2 brain simulation model. Below is the second-by-second brain activation data for each version. TRIBE v2 was trained on 1,115 hours of fMRI data from 720 real people — high activation = brain is engaged, low activation = brain is checking out.
Analyze each video version and tell me:
Which version has the strongest overall brain engagement score
Which version has the best opening 3 seconds (hook strength)
Which version has the fewest attention drop-off moments
The exact seconds where each video loses the viewer (activation drops below average)
Your final recommendation: which version to post, and why
Here is the data:
[PASTE YOUR DATA HERE]

Claude will come back with a ranked analysis, tell you the winner, identify the exact seconds where each version loses attention, and give you concrete re-edit instructions.

Part 4: Re-Edit Based on What the Brain Says

Once Claude identifies your drop-off points, here's how to fix them:

Problem the brain data shows	What to do in your edit
Activation drops in first 3s	Re-cut your hook — lead with the most visually striking frame you have
Sustained low activation in middle	Add a pattern interrupt: cut to a new angle, add text overlay, use a sound effect
Activation never fully recovers after drop	Remove the dead section entirely — it's killing the rest
Activation high but drops at the end	Your CTA is weak — add urgency, movement, or a direct question
Default mode network activating	Your visuals are too static — add motion, zooms, or jump cuts
Audio regions flat	Your audio is too monotone — vary pace, add music, or cut silence

Run the re-edited version back through TRIBE v2. Compare the new activation data against the original. Repeat until the brain stays engaged.

Quick Reference Links

Resource	Link
Official Colab Notebook	Open in Colab
GitHub Repository	facebookresearch/tribev2
Model on Hugging Face	facebook/tribev2
Llama 3.2 Access Request	meta-llama/Llama-3.2-3B
Interactive Demo	aidemos.atmeta.com/tribev2
Meta Research Blog	ai.meta.com/blog/tribe-v2
Original Paper (arXiv)	arxiv.org/abs/2507.22229

Common Issues & Fixes

"CUDA out of memory" error
→ Your GPU ran out of VRAM. Upgrade to Colab Pro and switch to A100 GPU. The full model needs ~30GB.

"Access to model is restricted" on Hugging Face
→ You haven't been approved for Llama yet. Wait for the approval email (up to 2 hours) or check your HuggingFace email inbox for the confirmation.

"HF_TOKEN not found" error
→ Go back to the 🔑 key icon in the Colab sidebar and make sure you added the secret with the exact name HF_TOKEN (all caps, underscore).

Upload button doesn't appear
→ Make sure you commented out the sample video download lines before running the upload cell.

Predictions look the same for every second
→ Your video might be too short (under 10 seconds) or the model needs more frames. Try a video that's at least 30 seconds.

Runtime disconnects mid-run
→ Colab free tier disconnects after 90 minutes of inactivity. Either upgrade to Pro or keep the tab active. The model weights are cached in ./cache — you don't need to re-download them if you reconnect.

What the Numbers Actually Mean

TRIBE v2 outputs values on the cortical surface — each number represents predicted BOLD signal (Blood Oxygen Level Dependent), which is what an fMRI scanner measures.

Higher values = more oxygenated blood flowing to that brain region = more neural activity = more engagement

Lower values = less activity = the brain is in a resting or wandering state

You don't need to understand neuroscience to use this. The pattern is simple: watch for the dips. Every time the mean activation drops below the baseline for your video, that's a second your viewer's brain is checking out.

The goal is a graph that stays high, climbs at key moments, and never flatlines.