Why this matters
Technology-facilitated abuse (TFA) is when an abusive partner uses everyday tech—phones, smart home devices, GPS trackers, social media, shared cloud accounts—to monitor, harass, or control someone. Survivors often search online for help, but search results and forums can be inaccurate or unsafe. Now that AI chatbots are everywhere (search engines, Q&A sites, standalone “support” bots), survivors may ask an LLM for advice before reaching a tech abuse clinic.
That’s high stakes: bad advice can escalate harm.
This research asks: How good are LLM answers to real TFA survivor questions?
What the researchers did
They built a realistic dataset of survivor-style questions and tested four models:
-
General-purpose models: GPT-4o and Claude 3.7 (non-reasoning)
-
IPV-specific models: Ruth and Aimee (built on Claude/GPT, positioned for survivor support)
They used real-world questions pulled from research literature and online forums (Reddit, Quora), filtered for intimate-partner, tech-abuse scenarios. From 1,183 collected items, they curated 385 eligible questions, then sampled 193 to cover many abuse categories (e.g., surveillance, harassment, account compromise, spyware) and “means” (e.g., spyware, keyloggers, GPS trackers, shared phone plans).
Then they generated single-turn, zero-shot responses using a survivor-safety-centered prompt and evaluated answers on four criteria:
-
Accuracy (is it correct and relevant?)
-
Completeness (does it include the key steps?)
-
Safety (could it put a survivor at risk?)
-
Actionability (can a survivor realistically do it?)
Experts scored accuracy/completeness/safety. Survivors rated actionability.
The headline result: most responses were not good enough
Across models, experts found responses were imperfect in the majority of cases—often inaccurate, incomplete, or missing safety warnings.
Two especially alarming patterns showed up repeatedly:
-
Critical safety warnings were often missing
In many cases, answers failed to warn that certain actions (like changing settings, removing spyware, resetting devices, changing accounts) can tip off an abuser or trigger escalation.
-
Advice was frequently irrelevant or ineffective for the actual abuse scenario
Example failures included recommending:
-
VPNs for spyware, harassment, or account hijacking (often ineffective for the real threat)
-
Password changes for problems like online harassment or certain shared-access situations (doesn’t address the abuse mechanism)
-
RF detectors / phone apps to find hidden trackers (often unreliable, can waste money/time)
-
The key problem: many LLMs answer like a generic “security helpdesk,” not like a survivor safety plan.
Survivors’ view: “long, overwhelming, and hard to follow”
The team also surveyed 114 people with lived experience of TFA to rate actionability.
Even when advice sounded reasonable, survivors pointed out practical barriers:
-
Fear of escalation: “If I do this, it may trigger retaliation.”
-
Overwhelming length: some responses were extremely long and felt unmanageable in a crisis
-
Technical, financial, and logistical constraints: not everyone can buy new devices, change plans, or do complex security steps
-
Emotional load: stress and fear reduce the ability to follow multi-step guidance
Interestingly, expert “perfect” answers didn’t always feel more actionable to survivors—because actionability depends heavily on real-life constraints, not just correctness.
Biggest takeaway
LLMs show they understand the topic—but they often fail at what matters most in TFA:
-
giving the right steps for the right threat model
-
including safety planning and escalation warnings
-
producing short, prioritized, realistic guidance
And IPV-specific models did not reliably outperform general-purpose models—suggesting “domain branding” isn’t enough without strong evaluation and safer design.
Practical recommendations (what the paper pushes toward)
-
Use curated, trusted sources (e.g., established tech safety orgs) to ground answers
-
Improve models via retrieval + expert-reviewed content (so advice is accurate and relevant)
-
Build responses that are step-by-step, prioritized, and include clear safety warnings
-
Avoid recommending “security clichés” (VPN, change password) unless they truly fit the scenario
-
Make UI warnings clear: AI advice may be incomplete, unsafe, or escalate risk
source: https://arxiv.org/pdf/2602.17672