Skip to main content

The Caption Is the Weapon

Manipulation Breakdowns · 10 min read · By D0

Seventy Million Witnesses

A video accumulated seventy million views on X. The caption said it showed an Iranian missile striking a US fighter jet.

The footage was from a video game.

Not a sophisticated AI deepfake. Not even a convincing one. Rendered polygons from a military simulation game — the kind of software you can buy for thirty dollars — recaptioned, uploaded, and watched by seventy million people who believed, or shared as though they believed, that they were watching a real event.

This is the Iran-Israel-US war’s most underexamined information phenomenon. Not the sophisticated state-linked deepfakes synchronized with military operations — those get the research attention, the Citizen Lab reports, the careful forensic analysis. The video game footage gets less attention, partly because it seems too absurd to take seriously as a manipulation mechanism. Games look like games. The geometry is wrong. The lighting is wrong. The physics are wrong.

Seventy million views.

The Games, the Clips, the Numbers

Since Operation Epic Fury began in late February 2026, researchers and fact-checkers have documented a wave of video game footage circulating as authentic combat coverage. The games involved aren’t obscure: ARMA 3, War Thunder, Call of Duty. Military simulation software with wide install bases, photorealistic-enough aesthetics for small mobile screens, and vast libraries of user-generated content that can be clipped, decontextualized, and reposted.

Specific documented cases:

An ARMA 3 clip captioned as “the US has unleashed its powerful F-15 fighter jets” — showing a Russian Air Force SU-57 simulation — reached more than five million views before being identified as gameplay.

A War Thunder clip captioned as an Iranian plane engaging a US ship reached seven million views. Texas Governor Greg Abbott shared it on his official account before deleting it after fact-checkers identified the source. A sitting US official amplified a War Thunder clip as evidence of a real engagement.

The seventy-million-view clip — an Iranian missile supposedly striking a US fighter jet — was also video game footage. The specific game was not identified in the fact-checks that documented it. The view count was not an outlier. The three most widely-viewed fabricated videos of the conflict accumulated over one hundred million views combined.

These are not fringe incidents. They are a structural feature of how the conflict’s information environment works.

Why the Caption Wins

The psychological question raised by seventy million views is not: how did people miss that this was a video game? The question is: what does it take for people to miss it?

The answer starts with something well-established in cognitive psychology: under high-arousal, high-urgency conditions, the human perceptual system takes shortcuts. Deliberate, analytical processing — the kind that notices inconsistencies in lighting, questions whether an explosion moves like a real explosion, checks whether the aircraft model matches any known US or Iranian inventory — requires attentional resources. Arousal depletes those resources and shifts processing toward faster, less analytical modes.

War content is specifically engineered to activate arousal. Military conflict is existentially significant. When something looks like combat between two countries whose conflict might escalate, the emotional system treats it as immediately relevant in a way that a sports clip or entertainment video doesn’t trigger.

Into that aroused, resource-depleted state, a caption arrives before the video plays.

This is the mechanism: the caption is processed first, and it sets the frame that governs everything the viewer sees after. Research on schema activation consistently shows that when a frame is established before ambiguous information is encountered, the ambiguous information gets interpreted through the frame rather than evaluated independently.

“Iranian missile strikes US jet” establishes a schema: military combat, jet aircraft, missile trajectory, explosion. Everything the viewer then sees — including obviously rendered polygons — gets processed through that schema. The brain is not asking is this real footage? It is asking does this look like what an Iranian missile striking a US jet would look like? It’s evaluating fit to the category, not authenticity. A game engine’s explosion fits the category well enough.

On a mobile screen, at 480p, scrolling at speed through a feed of other emotionally charged content, during an active international conflict: the game engine’s rendering resolution was never going to be the failure point. The caption had already done its work.

The Engagement Farming Layer

State-linked influence operations produce some of the fake footage. They receive most of the analytical attention. But the documented evidence from the current conflict suggests a distinct and arguably larger source: engagement farmers with no political agenda whatsoever.

Accounts that post viral war content grow. One pro-Iran-aligned account grew from 700,000 to 1.4 million followers in a single week by posting conflict content. That is a monetizable event. The growth translates to ad revenue, affiliate income, Substack subscribers, consulting engagements — the standard attention-economy payoff structure.

For engagement farmers, the calculation is different from propaganda. They don’t need the content to advance a narrative. They need the content to spread. War content spreads. Video game footage captioned as war content spreads just as well as real footage, requires no dangerous proximity to actual conflict zones, is available in unlimited supply from game libraries, and is indexed on platforms where content ID systems aren’t trained to identify ARMA 3 frames.

The engagement farming layer is important because it means the spread of fake war footage is not primarily a state-directed operation. It’s an emergent property of attention economics during crisis. No coordinator is necessary. Every individual creator making the rational choice to post high-engagement content produces, in aggregate, an information environment filled with fake footage — without any central direction.

X’s Grok AI, faced with the volume of false footage, reportedly provided conflicting fact-checks on identical content within minutes of each other. The ecosystem was too chaotic for automated systems to navigate consistently.

Who Gets Scrutinized

There is a verification asymmetry worth noting. Fake content from Iranian-aligned networks received thorough fact-checking attention from Western fact-checkers, major platforms’ integrity teams, and mainstream outlets. Fake content amplified by politicians friendly to the US-Israel coalition position received less systematic correction.

Texas Governor Abbott shared then deleted the War Thunder clip. That incident was documented by fact-checkers. But the clip reached his audience before the deletion — at seven million views — and the deletion itself generated no sustained coverage. The pattern: high-credibility amplifiers who share false footage briefly before deleting receive the benefit of the doubt (error, moved too fast) in ways that official Iranian state media sharing false footage doesn’t.

This isn’t a claim that one side is more dishonest than the other. It’s an observation about which falsifications receive scrutiny. Scrutiny is not distributed neutrally. Fake footage that confirms hostile-power narratives gets examined more closely than fake footage that confirms ally-friendly narratives. The asymmetry in scrutiny is itself a form of information environment shaping.

The Reuse Problem

A fourth category deserves mention alongside video game footage, AI-generated content, and engagement farming: old footage reused with false context.

A four-million-view clip claiming to show Iranian missiles striking Tel Aviv was actually footage of Algerian football fans celebrating a league title in Algiers — fireworks and crowd excitement. The same clip had been debunked in 2023 when it circulated with different false context. It was recycled, recaptioned, and circulated again to four million new views.

This is the cheapest falsification available. No AI generation needed. No game footage required. Find existing footage that looks like it could be combat, add a caption, post. The archive of human video is so large and so poorly indexed that the supply of recyclable footage is essentially infinite. The caption does the work.

What Actually Helps

The standard media literacy advice — check the source, look for metadata, reverse-image search — is accurate and insufficient. It describes a verification process that takes time, tools, and attentional resources. During a fast-moving conflict, scrolling a feed under emotional activation, most people are not going to pause to run a reverse image search before sharing.

Some practices are more realistic.

Notice the caption before you notice the content. The manipulation lives in the frame, not primarily in the footage. When you encounter war content, the relevant question isn’t does this look real? but who posted this, when, and what do they gain from this specific claim spreading? The game footage looks game-enough for those who are looking at the footage. The manipulation target is the person who processes the caption and then confirms against the footage rather than evaluating the footage independently.

Delay as a defense. The view counts are almost always highest in the first hours after posting, before fact-checks exist. You may not be able to determine whether footage is real in the acute phase. You can defer to sharing it. Sharing is the amplification mechanism. Consuming and waiting is not the same as sharing and amplifying.

Treat volume as a signal, not as confirmation. Seventy million views does not mean seventy million independent witnesses verified the footage. It means seventy million people saw it in a context where sharing was frictionless. High view counts reflect engagement optimization, not evidential weight. The view counter is a metric for the algorithm. It tells you nothing about truth.

High-credibility amplifiers create false authority. When a politician or verified account shares footage, the credibility transfers. The implicit inference is that they must have checked. They often haven’t. Treat amplification by credible sources as a weak signal of authenticity, not a strong one.

Conclusion

The seventy-million-view clip was video game footage. The governor shared the War Thunder clip. The top three fabricated videos hit a hundred million combined views. None of this required a state intelligence agency, a sophisticated AI pipeline, or an operations center. It required a caption, a platform, and a conflict generating enough urgency to activate the psychological conditions where captions override evidence.

The AI deepfakes and pre-staged influence operations are real and significant. But they’re not the whole picture — or even most of the volume. The bulk of false war footage spreading at scale is produced by engagement farmers exploiting attention economics, recycled from old footage, or clipped from video games by accounts that want followers and know that war content delivers them.

The manipulation infrastructure for this isn’t hidden. It’s a caption box, an upload button, and an algorithm that doesn’t distinguish between real footage and ARMA 3 at scale.

The caption is doing the work. The footage is just evidence for the frame the caption already established.

Seventy million people saw a video game. Very few of them knew they were watching one.


This article is part of Decipon’s Manipulation Breakdowns series, which dissects real influence tactics using the NCI Protocol framework.


Sources: