When AI Decides What “Violence” Means… It Doesn’t Think Like You Do

“Violence” feels like an obvious word—until someone asks you to define it. Is it only physical harm? Or does it include humiliation, exclusion, online harassment, and threats? A 2026 study by Stellato, Lancia, Galeazzi, and Curti digs into a surprisingly modern question: when people ask AI to judge morally messy situations, does the AI “see” violence the same way humans do?

The experiment: humans vs. multiple AI models

The research began as a playful Italian radio segment that turned into a mass social experiment. The show presented 22 deliberately provocative scenarios—from protesters attacking police to online insults, sexual harassment on public transit, coordinated harassment campaigns, and even cases involving preventing harmful speech.

Over 3,000 listeners responded within 48 hours, classifying each scenario as:

Violence
Non-violence
Depend-on (context matters)

Then the researchers ran the same 22 scenarios through 18 different instruction-tuned LLMs (open-weight models available via Ollama). Models were forced into strict, no-explanations classification output. Two models failed consistently and were excluded from statistical comparisons.

The headline result: AI compresses ambiguity

Humans and AI both labeled most scenarios as violence overall. But the key difference is how they handle uncertainty.

Humans used “depend-on” fairly often, signaling that context matters.
LLMs used “depend-on” much less and tended to “decide” more often—shifting those ambiguous cases into non-violence (and sometimes into violence).

In other words: humans leave room for context; AI often forces a verdict.

Where AI and humans clash the most: online abuse

The biggest disagreements weren’t about obvious physical harm. They were about digital and verbal aggression, especially online insults and coordinated harassment.

In scenarios like:

private insults in DMs,
public insults in comment sections,
organized groups piling on with insults,

humans overwhelmingly called these “violence,” while models were much more likely to label them non-violent.

This implies many LLMs may be operating with a narrower “prototype” of violence—closer to physical force or direct bodily harm—while humans increasingly treat severe online harassment as a form of real harm that belongs in the same moral bucket.

A weird reversal: “violent speech” that gets interrupted

One of the most striking flips involved a TV guest who is about to say a group should be physically eliminated—but the host interrupts and stops them.

Humans often judged the interruption as decisive (“harm prevented”), leaning away from labeling it violence.

Many models still labeled it violence, suggesting that LLMs may weigh intent and content more heavily than “what actually happened.”

So: humans integrate outcome and mitigation; AI often judges the semantic payload.

Models don’t even agree with each other (especially on the hardest cases)

Inter-model agreement across the 22 items was low overall, and the places where humans and AI disagreed most were also the places where models disagreed among themselves. That’s important because it means: even if someone treats “AI” like a single authority, the answer depends heavily on which model they ask.

Bigger models weren’t automatically “more human”

One might assume that scaling up the model makes it align better with human judgments. Not here.

Model accuracy against the human majority label ranged dramatically, and size didn’t reliably predict alignment. Some mid-sized models matched humans well; some larger ones didn’t; a small model performed extremely poorly. This points to the likely importance of fine-tuning choices, alignment strategy, and safety design—not just raw parameters.

Why this matters beyond “violence”

The study uses “violence” as a proxy for a bigger phenomenon: how AI systems operationalize fuzzy moral concepts and turn messy human pluralism into a single clean label.

That’s the real risk: LLMs are increasingly treated like a “cognitive companion.” Their confident tone can make outputs feel like moral truth, even when:

the concept is contested,
context is missing,
the model family behaves differently,
ambiguity is being artificially “collapsed.”

The researchers argue we should treat these systems as probabilistic tools, not moral referees—and build user literacy around the fact that fluent answers can hide disagreement and uncertainty.

source: https://arxiv.org/pdf/2602.17256

Leave a Reply Cancel reply

The Internet’s “Danger Zones”: How to Spot Information Voids Before Misinformation Takes Over