🪄

VIRAL ENGINE — The Full Guide

How to engineer a viral video using AI that simulates how your brain reacts to every second of it

What you're getting: A step-by-step setup for Meta's TRIBE v2 brain model + how to connect it to Claude to compare up to 10 video versions and find the winner. Zero coding required.

What is TRIBE v2?

Meta's AI research team (FAIR) just open-sourced a model called TRIBE v2.

It was trained on 1,115 hours of fMRI brain scans from 720 real volunteers — people who watched movies, listened to podcasts, and read text while their brain activity was recorded at high resolution inside an MRI machine.

TRIBE v2 learned the patterns. Now it can predict how a human brain responds to any video — without putting anyone in an MRI machine.

You give it a video. It tells you — second by second — which brain regions are lighting up and which are going dark.

  • When attention is spiking
  • When engagement flatlines
  • When a moment gets encoded into memory
  • When the viewer is about to scroll

Meta's own research shows TRIBE v2's predictions are more accurate than a single real brain scan — because real scans are noisy (heartbeats, movement, device artifacts). The model strips all that out.

They open-sourced everything. It's completely free.

Paper: TRIBE: TRImodal Brain Encoder — arXiv 2507.22229Meta blog: Introducing TRIBE v2License: CC BY-NC 4.0 (free for non-commercial use)

What You'll Need (All Free)

ToolWhat it's forCost
Google ColabRuns the model in the cloud — no computer requiredFree (Pro recommended)
Hugging Face accountDownload the model weightsFree
ClaudeCompare your video versions and tell you the winnerFree tier works
Your video fileWhat you're analyzing

Part 1: One-Time Setup (15–20 min)

if you a visual learner you can also follow this simple guide

https://www.youtube.com/watch?v=VER-F4wdA9Q

Step 1 — Open the Official Colab Notebook

Click this link to open Meta's official demo notebook:

Open TRIBE v2 Colab Notebook

This is Meta's own notebook. You don't need to write any code — every cell is already written for you.


Step 2 — Switch to a GPU

The model needs a GPU to run. Here's how to get one:

  1. In the top menu, click Runtime
  1. Click Change runtime type
  1. Under "Hardware accelerator", select T4 GPU
  1. Click Save
Tip: If you see errors about running out of memory later, upgrade to Colab Pro ($10/month) and select A100 GPU instead. The full TRIBE v2 model is large (≈30GB of weights loaded at once). T4 works for shorter videos; A100 is more reliable.

Step 3 — Create a Hugging Face Account

  1. Go to huggingface.co
  1. Sign up for a free account
  1. After signing in, click your profile picture → SettingsAccess Tokens
  1. Click New Token
  1. Give it any name (e.g. tribe-token)
  1. Set permission to Read
  1. Click Create token and copy it — you'll need this in a minute

Step 4 — Request Llama Model Access

TRIBE v2 uses Meta's Llama 3.2 model to process any text in your video. Llama is gated — you need to request access first.

  1. Go to the Llama 3.2 model page on Hugging Face
  1. You'll see a form asking you to agree to Meta's terms
  1. Fill it out and submit
  1. Wait for approval — usually takes 30 min to 2 hours
  1. You'll get an email when you're approved
You only need to do this once. After approval, your token automatically has access.

Step 5 — Add Your Token to Colab

Back in your Colab notebook:

  1. Look at the left sidebar and click the 🔑 key icon (Secrets)
  1. Click Add new secret
  1. In the "Name" field type: HF_TOKEN
  1. In the "Value" field, paste the token you copied from Hugging Face
  1. Toggle on Notebook access

Step 6 — Install the Model

In the notebook, find the first code cell that looks like this:

!uv pip install "tribev2[plotting] @ git+https://github.com/facebookresearch/tribev2.git"

Click the ▶ play button on that cell to run it. It will install all the required packages.

When it finishes, you'll see a message asking you to restart the runtime. Do that:

  • Click RuntimeRestart session
  • Do NOT run the install cell again after restarting

Part 2: Running Your First Video

Step 7 — Load the Model

Run the next cell in the notebook:

from tribev2.demo_utils import TribeModel, download_file
from tribev2.plotting import PlotBrain
from pathlib import Path

CACHE_FOLDER = Path("./cache")

model = TribeModel.from_pretrained(
    "facebook/tribev2",
    cache_folder=CACHE_FOLDER,
)
plotter = PlotBrain(mesh="fsaverage5")

This downloads the model weights and gets everything ready. It will take 2–5 minutes the first time.


Step 8 — Upload Your Video

The notebook includes a sample video by default. To use your own video, find the cell that downloads the sample video and replace with this:

# video_path = CACHE_FOLDER / "sample_video.mp4"
# url = "https://download.blender.org/durian/trailer/sintel_trailer-480p.mp4"
# download_file(url, video_path)

from google.colab import files
from pathlib import Path
import shutil

uploaded = files.upload()
filename = next(iter(uploaded))

video_path = CACHE_FOLDER / filename
shutil.move(filename, video_path)

df = model.get_events_dataframe(video_path=video_path)
display(df.head(8)[["type", "start", "duration", "filepath", "text", "context"]])

Run that cell — an Upload button will appear. Click it and select your video file from your computer.

Supported formats: MP4, MOV, AVI

Recommended: Keep videos under 3 minutes for faster processing on free Colab


Step 9 — Run the Brain Analysis

Run these cells in order:

# Step 1: Extract the multimodal events from your video
df = model.get_events_dataframe(video_path=video_path)

# Step 2: Run brain prediction
preds, segments = model.predict(events=df)

print(f"Predictions shape: {preds.shape}")
# Output: (n_timesteps, n_vertices)
# n_timesteps = number of seconds in your video
# n_vertices = 20,000 brain surface points

This is the core step. TRIBE v2 is now predicting how a brain responds to every second of your video.

Note: TRIBE v2 outputs predictions offset by 5 seconds — this accounts for the natural delay between a stimulus and the brain's blood-flow response (hemodynamic lag). So second 10 in the output = the brain's reaction to second 5 of your video.

Step 10 — Visualize the Results

# Show the first 15 seconds of brain activity
plotter.plot(preds[:15], segments[:15], video_path=video_path)

This generates an interactive 3D brain heatmap synced to your video. You'll see:

  • Red/warm colors = high activation (brain is engaged)
  • Blue/cool colors = low activation (brain is disengaged)
  • The map updates second by second as the video plays

Key brain regions to watch:

RegionLocationWhat it means for your video
Dorsal attention networkTop/parietalViewer is actively paying attention
Visual cortexBack of brainProcessing what they're seeing
Auditory cortexSides/temporalResponding to sound/music
Default mode networkSpreads across midlineWhen this activates, mind is wandering — you're losing them
Hippocampus areaMedial temporalMemory encoding — they'll remember this moment

Step 11 — Get Second-by-Second Numbers

To get numerical data you can feed into Claude, run this:

import numpy as np

# Get mean activation across all brain regions for each second
activation_per_second = preds.mean(axis=1)

# Get the top region activation for each second
top_region_per_second = preds.max(axis=1)

# Print a simple table
print("Second | Mean Activation | Peak Activation")
print("-------|-----------------|----------------")
for i, (mean_val, peak_val) in enumerate(zip(activation_per_second, top_region_per_second)):
    print(f"  {i+1:3d}s | {mean_val:15.4f} | {peak_val:.4f}")

Copy this output — you'll paste it into Claude in Part 3.


Part 3: Connect to Claude — Find Your Best Version

This is where it gets powerful. You can run up to 10 versions of your video through TRIBE v2, collect the second-by-second activation data from each one, and then ask Claude which one wins and exactly why.

How to do it

For each video version, run Steps 8–11 and copy the output table. You'll end up with something like:

Video 1 (original cut):
Second | Mean Activation | Peak Activation
  1s   | 0.0234          | 0.1832
  2s   | 0.0198          | 0.1644
  3s   | 0.0156          | 0.1421
...

Video 2 (re-edited hook):
Second | Mean Activation | Peak Activation
  1s   | 0.0412          | 0.2981
  2s   | 0.0389          | 0.2754
...

The Claude Prompt (copy-paste this)

In Claude Code paste this:


You are a neuromarketing analyst. I ran multiple versions of my video through Meta's TRIBE v2 brain simulation model. Below is the second-by-second brain activation data for each version. TRIBE v2 was trained on 1,115 hours of fMRI data from 720 real people — high activation = brain is engaged, low activation = brain is checking out.

Analyze each video version and tell me:

  1. Which version has the strongest overall brain engagement score
  1. Which version has the best opening 3 seconds (hook strength)
  1. Which version has the fewest attention drop-off moments
  1. The exact seconds where each video loses the viewer (activation drops below average)
  1. Your final recommendation: which version to post, and why

Here is the data:

[PASTE YOUR DATA HERE]


Claude will come back with a ranked analysis, tell you the winner, identify the exact seconds where each version loses attention, and give you concrete re-edit instructions.


Part 4: Re-Edit Based on What the Brain Says

Once Claude identifies your drop-off points, here's how to fix them:

Problem the brain data showsWhat to do in your edit
Activation drops in first 3sRe-cut your hook — lead with the most visually striking frame you have
Sustained low activation in middleAdd a pattern interrupt: cut to a new angle, add text overlay, use a sound effect
Activation never fully recovers after dropRemove the dead section entirely — it's killing the rest
Activation high but drops at the endYour CTA is weak — add urgency, movement, or a direct question
Default mode network activatingYour visuals are too static — add motion, zooms, or jump cuts
Audio regions flatYour audio is too monotone — vary pace, add music, or cut silence

Run the re-edited version back through TRIBE v2. Compare the new activation data against the original. Repeat until the brain stays engaged.


Quick Reference Links

ResourceLink
Official Colab NotebookOpen in Colab
GitHub Repositoryfacebookresearch/tribev2
Model on Hugging Facefacebook/tribev2
Llama 3.2 Access Requestmeta-llama/Llama-3.2-3B
Interactive Demoaidemos.atmeta.com/tribev2
Meta Research Blogai.meta.com/blog/tribe-v2
Original Paper (arXiv)arxiv.org/abs/2507.22229

Common Issues & Fixes

"CUDA out of memory" error
→ Your GPU ran out of VRAM. Upgrade to Colab Pro and switch to A100 GPU. The full model needs ~30GB.

"Access to model is restricted" on Hugging Face
→ You haven't been approved for Llama yet. Wait for the approval email (up to 2 hours) or check your HuggingFace email inbox for the confirmation.

"HF_TOKEN not found" error
→ Go back to the 🔑 key icon in the Colab sidebar and make sure you added the secret with the exact name HF_TOKEN (all caps, underscore).

Upload button doesn't appear
→ Make sure you commented out the sample video download lines before running the upload cell.

Predictions look the same for every second
→ Your video might be too short (under 10 seconds) or the model needs more frames. Try a video that's at least 30 seconds.

Runtime disconnects mid-run
→ Colab free tier disconnects after 90 minutes of inactivity. Either upgrade to Pro or keep the tab active. The model weights are cached in ./cache — you don't need to re-download them if you reconnect.


What the Numbers Actually Mean

TRIBE v2 outputs values on the cortical surface — each number represents predicted BOLD signal (Blood Oxygen Level Dependent), which is what an fMRI scanner measures.

  • Higher values = more oxygenated blood flowing to that brain region = more neural activity = more engagement
  • Lower values = less activity = the brain is in a resting or wandering state

You don't need to understand neuroscience to use this. The pattern is simple: watch for the dips. Every time the mean activation drops below the baseline for your video, that's a second your viewer's brain is checking out.

The goal is a graph that stays high, climbs at key moments, and never flatlines.