State of AI Trust 2025: What Went Wrong, What Works, and What's Coming Next

7 days ago
14 min read

Written by Wang Gan, AI Scientist

A plain-English year-end review for anyone building or using AI — by AIDX Tech

2025 was the year AI trust stopped being a "nice-to-have" and became a production requirement. Real incidents — data breaches, cyberattacks, hiring bias lawsuits, and even AI-powered ransomware — showed that when AI systems aren't properly tested and monitored, the consequences go far beyond a bad demo.

What's in this article?

The biggest AI safety & trust incidents of 2025 (and what they teach us)
How leading teams actually reduce AI risk — the "AI trust stack"
The best tools for AI safety, testing & monitoring in 2025
A side-by-side comparison table
Which tool is best for which use case
Trends to watch in 2026
Frequently asked questions

The Biggest AI Safety Incidents of 2025

These aren't hypothetical risks. Each incident below had real-world consequences — leaked data, legal scrutiny, operational disruption, or reputational damage.

🔓 McDonald's AI Hiring Chatbot Leaked Applicant Data

When: June–July 2025

McDonald's AI-powered hiring chatbot, "Olivia" (built by Paradox.ai), exposed a massive volume of job applicant chat records due to a default admin password ("123456") and a basic API vulnerability.

Why it matters: Even a "safe" AI model can cause serious harm if the surrounding system has weak security basics. AI apps often sit on top of large, sensitive datasets — making them high-value targets.

What to do: Test not just the AI model itself, but also the authentication, access controls, and data flows around it. Run abuse-case testing before launch.

🔗 https://www.wired.com/story/mcdonalds-ai-hiring-chat-bot-paradoxai/

🚫 DeepSeek Restricted Sign-Ups After Cyberattack

When: January 2025

DeepSeek, a popular AI service, temporarily blocked new user registrations after reporting "large-scale malicious attacks." The episode was a reminder that AI availability is part of AI trust — if your app goes down under attack, that's a trust failure too.

What to do: Test for traffic abuse and reliability. Build graceful fallbacks. Monitor for unusual traffic patterns.

🔗 https://www.theverge.com/2025/1/27/24353023/deepseek-ai-app-restricting-sign-ups-malicious-attacks

☠️ Fake DeepSeek Packages Stole Developer Credentials

When: February 2025

Hackers uploaded fake Python packages to PyPI that looked like legitimate DeepSeek developer tools. When installed, they stole environment variables — a common hiding spot for API keys and secrets.

Why it matters: Modern AI apps are assembled from many open-source components. Attackers know that compromising the AI toolchain is often easier than attacking the model itself.

What to do: Scan your AI project's dependencies for malicious packages. Rotate API keys regularly and use least-privilege access.

🔗 https://www.securitymagazine.com/articles/101366-deepseek-impersonating-malware-is-stealing-data-research-finds

🤖 "LameHug" Malware Used an AI to Generate Attack Scripts

When: July 2025

A malware family called "LameHug" was reported to use an open-source language model to generate malicious scripts dynamically during attacks — making it harder for traditional antivirus tools to detect.

Why it matters: AI is now a tool in attackers' kits, not just defenders'. If your AI system can be tricked into generating dangerous code or leaking secrets, you need adversarial testing.

What to do: Test your AI endpoints for prompt injection and jailbreak vulnerabilities. Monitor for suspicious output patterns at runtime.

🔗 https://hipaatimes.com/ai-powered-lamehug-malware-targets-windows-systems-with-real-time-commands#:~:text=The%20big%20picture,response%20using%20current%20security%20tools.

💣 "PromptLock": The First AI-Powered Ransomware

When: August 2025

ESET researchers disclosed "PromptLock" — described as the first known AI-powered ransomware. It used a locally running language model to generate malicious scripts in real time, making attacks more adaptive and harder to detect.

Why it matters: Even as a proof-of-concept, this signals that attackers will use AI to make their tools smarter. Defenders need to raise the bar.

What to do: Treat prompt injection as a real security threat. Test for data exfiltration and policy bypass. Log and monitor your AI layer.

🔗 https://www.eset.com/us/about/newsroom/research/eset-discovers-promptlock-the-first-ai-powered-ransomware/

⚖️ Workday AI Hiring Bias Case Expanded Legal Scrutiny

When: Ongoing in 2025

The Mobley v. Workday case continued — and in 2025, a judge ordered Workday to provide a list of customers who had enabled certain AI hiring features. This extended legal exposure from the employer to the AI technology provider.

Why it matters: If AI influences who gets hired or rejected, it needs to be tested for bias and be explainable. This is no longer optional in regulated industries.

What to do: Run bias and fairness evaluations. Maintain documentation you can defend. Monitor outcome distributions post-deployment.

🔗 https://www.hrdive.com/news/workday-must-supply-list-of-employers-who-enabled-hiredscore-ai/756506/

📣 "PRISONBREAK": AI-Enabled Influence Operation

When: 2025

The Citizen Lab documented an AI-enabled influence campaign ("PRISONBREAK") that used synthetic social media personas and AI-generated content to manipulate political opinion.

Why it matters: AI trust isn't only about whether a chatbot gives correct answers. It also includes whether your AI product could be weaponized for disinformation at scale.

What to do: If you build content generation features, test for misuse — impersonation, persuasion, and disinformation patterns. Consider provenance signals where relevant.

🔗 https://munkschool.utoronto.ca/research/we-say-you-want-revolution

🌮 Taco Bell Paused AI Drive-Thru After Trolling Chaos

When: August 2025

Taco Bell reportedly paused its AI drive-thru ordering pilot after users discovered they could place absurd orders — like 18,000 cups of water — and the system accepted them.

Why it matters: Real users will always find edge cases you didn't think of. Without guardrails and anomaly detection, small model failures become brand incidents.

What to do: Test for abuse cases before launch. Add business-rule guardrails (e.g., quantity limits). Monitor real conversations for drift and unexpected patterns.

🔗 https://www.bbc.com/news/articles/ckgyk2p55g8o

The Common Thread

Different industries, different technologies, same underlying failure mode: teams shipped AI systems without enough adversarial thinking, evidence-based evaluation, and continuous monitoring. The good news is that most of these risks are testable. The bad news is that many organizations still treat AI risk as a policy document—rather than an engineering workflow.

How Smart Teams Reduce AI Risk: The Trust Stack

If 2023–2024 was about “can we build it?”, 2025 was about “can we operate it safely?” The most mature teams treat AI trust as a stack—multiple layers of controls that work together. No single tool or policy catches everything.

Here's what each layer does:

Layer	What It Does
Pre-deployment evaluation	Test safety, accuracy, tone, and policy compliance before going live
Adversarial testing / red teaming	Actively try to break the system with jailbreaks, prompt injections, and attack patterns
Runtime guardrails	Real-time controls that scan inputs/outputs, detect PII, and enforce business rules
Monitoring & incident response	Watch production behavior continuously; alert on anomalies and policy violations
Governance & compliance	Documentation and workflows to prove responsible AI practice to stakeholders and regulators

No single tool covers everything. The rest of this article maps the tooling landscape so you can build the right stack for your needs.

Top AI safety, robustness testing & monitoring solutions

The AI trust tooling market in 2025 looks a lot like security and observability did a decade ago: a few end‑to‑end platforms, many strong point solutions, and plenty of open-source building blocks.To keep this practical, we focus on tools that help teams test for AI safety and robustness (pre‑deployment and adversarial testing), and/or monitor and enforce guardrails in production.

AIDX Tech (AIDX) — end-to-end AI trust testing + monitoring (SaaS & on‑prem)

Website: https://www.aidxtech.com/

AIDX combines a SaaS testing platform with professional services, so whether you need fast blackbox testing or a deep on-prem deployment, there's a path that fits your environment.

Two Ways to Work With AIDX

☁️ SaaS Testing Platform (Blackbox)

Connect your AI application endpoint and run structured test suites — no model access required. Fast to start, built for repeatable evaluation at scale.

🏢 Professional Services (On-Prem & Custom)

For complex or sensitive environments: on-prem deployment, customized test design, deeper evaluation, and monitoring rollout — with AIDX as your implementation partner.

What AIDX Covers

Capability	Details
🛡️ Model Safety Testing	Standardized benchmarks across a granular risk taxonomy — unsafe content, policy violations, and more
🔓 Jailbreak & Adversarial Testing	Multi-turn attack patterns and adversarial prompts to surface bypasses and leakage
🧠 Hallucination & Factuality Eval	Checks for incorrect or ungrounded outputs — especially critical for RAG systems
🎯 Alignment Evaluation	Assess whether a model stays aligned to its role, policy, and domain constraints under pressure (project-based)
🤖 Agent Behavior Monitoring	Visibility into tool-use, step-by-step actions, and policy adherence in agentic workflows (project-based)
🔒 Guardrail Deployment	Prompt/response scanning, injection detection, and enforceable safety rails in production (project-based)
🌐 Deployment Flexibility	SaaS for speed; on-prem/private cloud for sensitive or regulated environments

Platform Components

DX Suite — Testing

BenchDX — Safety benchmarking across a structured risk taxonomy
RobustDX — Red teaming and robustness evaluation
HalluDX — Hallucination and factuality testing
AlignDX — Alignment evaluation under real-world pressure

MX Suite — Monitoring

AgentMX — Step-level agent behavior monitoring and audit trails
ModelMX — Prompt/model monitoring, prompt injection detection, and runtime guardrails

Designed for Operational Realities

AIDX ships with pre-configured test suites mapped to common governance frameworks, automation for running tests at scale, and reporting that helps teams communicate risk and remediation clearly to stakeholders — not just engineers.

Trusted in Production

AIDX is deployed across Singapore's public sector and enterprise ecosystem, including:

Synapxe · HTX · National Healthcare Group · ST Engineering · TÜV SÜD · Ensign InfoSecurity · Fortitude.asia

AIDX is also engaged with Singapore's national AI governance ecosystem, including IMDA and the AI Verify Foundation.

Practical use cases include prompt injection detection, adversarial red teaming, and AI application security posture tracking.

Ready to test your AI application? Get started with AIDX →

Promptfoo — Developer-First Evals & Red Teaming

Type: Open Source | Website: promptfoo.dev

Promptfoo is a developer-oriented toolkit for testing prompts, LLM application behavior, and red teaming. It is commonly used in CI/CD workflows to run repeatable evaluations and catch regressions when prompts, models, or retrieval pipelines change.

Strengths: Fast to adopt for engineers: local-first and automation-friendly. Good fit for prompt/agent evaluation in development and CI. Strong open-source community and extensibility.

Limitations: Primarily a developer tool rather than an end-to-end enterprise assurance platform. Governance, audit reporting, and on-prem enterprise support may require additional work around the tool.

✅ Best for: Product and engineering teams who want fast, repeatable LLM evals and red teaming during development.

Microsoft PyRIT — Extensible Red-Teaming Framework

Type: Open Source | Project: azure.github.io/PyRIT | GitHub

PyRIT (Python Risk Identification Tool for generative AI) is an open-source framework designed to help security engineers proactively identify risks in generative AI systems. It supports building and executing attack strategies and evaluating responses.

Strengths: Flexible and extensible: suitable for custom attack research and security testing workflows. Designed with security testing in mind (risk identification).

Limitations: Framework-level: teams typically need engineering time to implement, operationalize, and report results. Not a turnkey governance or monitoring platform by itself.

✅ Best for: Security teams and researchers who want to build tailored red-teaming campaigns for GenAI systems.

NVIDIA garak — LLM Vulnerability Scanner

Type: Open Source | Project: github.com/NVIDIA/garak

garak is an open-source toolkit for probing and scanning LLMs for many classes of weaknesses, such as jailbreaks, prompt injection, toxicity generation, data leakage, hallucinations, and more. It can be a strong baseline scanner for teams building evaluation pipelines.

Strengths: Wide coverage of vulnerability categories and probes. Useful as a baseline scanner to catch obvious weaknesses early.

Limitations: Like many scanners, results still require interpretation and follow-up mitigation work. Operationalizing garak into enterprise reporting and continuous monitoring requires additional engineering.

✅ Best for: Teams who want a broad, open-source LLM scanning baseline to complement their own evaluation strategy.

Lakera Guard — Real-Time Prompt Injection & Data Leakage Defense

Type: Commercial | Website: lakera.ai/lakera-guard

Lakera Guard is positioned as an API-layer security control for GenAI applications. It focuses on runtime protection against prompt attacks and sensitive data leakage — useful when you need guardrails in production.

Strengths: Designed for low-latency runtime protection (a 'gate' in front of the model). Good fit for prompt injection and data leakage mitigation at inference time.

Limitations: Primarily a runtime defense layer; teams may still need dedicated evaluation/red teaming to measure risk before launch. Does not replace broader governance, reporting, and end-to-end testing coverage.

✅ Best for: Production teams who need a dedicated runtime 'AI firewall' for prompt injection and leakage risks.

HiddenLayer — AI Security Platform

Type: Commercial | Website: hiddenlayer.com/aisec-platform

HiddenLayer focuses on security for AI systems, including supply chain security, runtime defenses, and security posture management for AI assets. This type of platform is relevant for organizations treating AI as part of their security program (not just a product feature).

Strengths: Security-first framing: helps teams think in terms of threats, posture, and defense. Covers aspects like artifact/model scanning and runtime security controls.

Limitations: Security platforms may not focus on domain-specific quality metrics (e.g., business hallucination scoring) unless configured. May require integration work to align with application-level evaluation and governance reporting.

✅ Best for: Security-conscious enterprises that need AI systems integrated into their broader security posture and runtime defense strategy.

Mindgard — Automated AI Red Teaming & Security Testing

Type: Commercial | Website: mindgard.ai

Mindgard focuses on AI application security testing and automated red teaming, with a focus on uncovering AI-specific vulnerabilities that traditional AppSec tools don't cover well.

Strengths: Security-testing focus and strong positioning for AI-specific vulnerabilities. Designed to help teams discover and mitigate vulnerabilities before deployment.

Limitations: Like other point solutions, may need to be paired with broader evaluation, monitoring, and governance workflows depending on requirements.

✅ Best for: AppSec and security engineering teams that want AI-specific offensive testing coverage.

NVIDIA NeMo Guardrails — Programmable Guardrails Toolkit

Type: Open Source | Project: developer.nvidia.com/nemo-guardrails

NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational systems. It supports implementing rules and rails for topic control, safety, and other constraints, often via developer-defined configurations.

Strengths: Open source and flexible for teams who want full control of guardrail logic and deployment. Useful for building policy-enforcing conversation flows and safety rails.

Limitations: Primarily a guardrail layer; teams still need evaluation and monitoring to measure effectiveness over time.

✅ Best for: Engineering teams who want to implement customized, programmable guardrails around LLM applications.

Credo AI — AI Governance, Risk Management & Compliance

Type: Commercial | Website: credo.ai

Credo AI is positioned as an AI governance platform — helping organizations manage AI oversight, risk, and compliance across use cases, models, and vendors. Governance platforms are especially relevant where accountability and documentation are required.

Strengths: Strong fit for governance teams and regulated organizations needing structured oversight. Focus on risk management and compliance workflows across the AI lifecycle.

Limitations: Governance platforms typically need to be paired with technical testing and monitoring tools to generate the evidence that governance workflows rely on.

✅ Best for: Organizations building an AI governance program and needing a system of record for oversight and compliance.

Deepchecks — ML Testing & Monitoring

Type: Open Source + Commercial | Website: deepchecks.com

Deepchecks provides open-source testing for machine learning models and data, covering issues like data integrity, distribution shift, and performance checks. While it's not LLM-specific, it's a practical tool for traditional ML trust and robustness workflows.

Strengths: Strong for classic ML validation: data checks, distribution mismatch, model performance checks. Open-source with a clear practitioner focus.

Limitations: Less focused on GenAI-specific risks like jailbreaks and prompt injection unless extended.

✅ Best for: ML teams running predictive/CV models who need repeatable validation and monitoring checks.

Arthur AI — Production Monitoring + Evals + Guardrails

Type: Commercial | Website: arthur.ai/platform

Arthur AI positions itself as a platform to monitor, evaluate, and secure AI across the lifecycle, including support for both traditional ML and generative AI. Teams often use it to instrument production behavior and track quality and policy metrics continuously.

Strengths: Strong focus on monitoring and continuous evaluation across production. Covers both traditional ML and GenAI use cases.

Limitations: Monitoring platforms may still require a dedicated red-teaming layer for deep adversarial testing and jailbreak techniques.

✅ Best for: Teams that already have AI in production and need continuous monitoring and evaluation to maintain reliability.

Comparison Table: Where Each Solution Fits

Legend: ✅ Core capability | ◑ Partial / requires significant configuration | — Not the primary focus

Solution	Pre-deploy Testing	Adversarial Testing	Hallucination Eval	Alignment & Agent Eval	Runtime Guardrails	Monitoring	Compliance Reporting	Blackbox Endpoint	SaaS + On-prem	Services & Custom
AIDX	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Promptfoo	◑	✅	◑	◑	—	—	—	✅	✅	—
Microsoft PyRIT	◑	✅	—	—	—	—	—	✅	✅	—
NVIDIA garak	◑	✅	◑	—	—	—	—	✅	✅	—
Lakera Guard	—	◑	—	—	✅	◑	—	✅	◑	◑
HiddenLayer	◑	◑	—	—	◑	✅	◑	◑	✅	◑
Mindgard	◑	✅	—	—	◑	◑	—	✅	◑	◑
NVIDIA NeMo Guardrails	—	—	—	◑	✅	◑	—	✅	✅	—
Credo AI	—	—	—	—	—	◑	✅	—	◑	✅
Deepchecks	✅	—	—	—	—	✅	—	—	✅	—
Arthur AI	✅	◑	✅	◑	✅	✅	◑	✅	✅	✅

A note on AIDX's full coverage: AIDX combines blackbox endpoint testing (SaaS) with project-based depth (alignment/agent evaluation, on‑prem deployment, and guardrail/monitoring rollouts). For teams that need one vendor to cover both assurance and operations, an end‑to‑end footprint reduces integration overhead and helps keep trust controls consistent.

Which Tool Is Best for Your Situation?

If you need…	Best fit
One platform covering testing + monitoring + on-prem	AIDX
Fast LLM evals and red teaming in CI/CD	Promptfoo
Custom GenAI red-teaming campaigns (security teams)	Microsoft PyRIT
Open-source LLM vulnerability scanning baseline	NVIDIA garak
A runtime "AI firewall" for prompt injection	Lakera Guard
AI integrated into your enterprise security posture	HiddenLayer
AI-specific offensive security testing	Mindgard
Programmable guardrails you fully control	NVIDIA NeMo Guardrails
AI governance and compliance workflows	Credo AI
Traditional ML validation and monitoring	Deepchecks
Always-on production monitoring for ML + GenAI	Arthur AI

Practical tip: Most mature organizations end up with a stack — an end-to-end platform like AIDX, complemented by specialized components (a runtime firewall, a governance system, an open-source scanner) depending on their risk profile.

What to Watch in 2026: 7 Emerging Trends

1) "EvalOps" Becomes Standard Practice

Teams are moving from one-off prompt checks to continuous evaluation pipelines — just like DevOps normalized automated testing in software. Expect evaluation to run automatically every time a model, prompt, or data source changes.

2) Agents Require New Safety Thinking

AI agents don't just answer questions — they call APIs, trigger transactions, and take multi-step actions. This dramatically expands the attack surface. Expect more focus on agent behavior monitoring, tool permissioning, and audit trails.

3) AI Security Merges with Application Security

Prompt injection and data exfiltration increasingly look like classic AppSec problems. Security teams will apply familiar frameworks: threat modeling, red teaming, logging, alerting, and incident response.

4) Compliance Means Evidence, Not Just Policy Docs

Regulators and risk teams want proof: test results, remediation logs, and governance workflows — not just a slide deck. Expect more demand for audit-ready reports mapped to frameworks like NIST AI RMF, ISO/IEC 42001, EU AI Act, and Singapore's AI governance initiatives.

5) Information Integrity Becomes a Product Risk

As more products generate content, teams must test for misuse — impersonation, persuasion campaigns, and synthetic disinformation. Provenance signals (watermarking, disclosure UX) will become more common.

6) Multilingual and Multimodal Safety Is Non-Negotiable

A safety setup that works for English text may fail for local languages, code-mixed inputs, or image-based prompts. Global AI apps need safety testing that matches how users actually interact.

7) Open-Weight Models Increase Operator Responsibility

Running local models gives teams more control and privacy — but it also means more responsibility: model sourcing, artifact scanning, patching, and monitoring all fall on the operator.

Conclusion

The biggest lesson from 2025 is simple: AI trust failures are rarely mysterious. They come from predictable gaps — no structured evaluation, insufficient adversarial testing, missing guardrails, and weak monitoring.

AI systems now touch hiring decisions, healthcare workflows, financial transactions, and critical infrastructure. Trust has to be engineered — measured, stress-tested, monitored, and documented.

A practical starting point:

Connect your AI endpoint to a structured testing suite
Run safety, jailbreak, and hallucination evaluations
Fix the findings, then monitor continuously
Produce audit-ready evidence your stakeholders can understand

AIDX is designed to make that workflow fast and repeatable — whether you want a SaaS platform for blackbox testing or a project-based engagement for on-prem deployment, alignment evaluation, agent monitoring, and guardrail rollout.

[→ Learn more about AIDX at aidxtech.com]

FAQ

What does "AI trust" actually mean?

It means the ability to reliably operate an AI system in the real world while managing safety, security, robustness, and accountability. It's about more than accuracy — it includes resistance to attacks, privacy protection, and governance.

Is AI trust the same as AI safety?

Safety is a major part of trust, but trust is broader. Safety focuses on harmful outputs and unintended behavior. Trust also includes security (attacks), robustness (behavior under stress), reliability (availability), and governance (documentation and oversight).

What is jailbreak testing?

It's adversarial testing where you try to bypass a system's intended constraints — getting an AI to reveal its system prompt, ignore safety rules, or produce restricted content. Modern jailbreaks often use multi-turn strategies, role-play, and indirect prompt injection.

What is hallucination testing?

It measures whether an AI outputs incorrect or ungrounded information — especially when users expect factual answers. For RAG systems, this includes checking citation accuracy and whether the model invents facts not found in the retrieved context.

Do I need access to model weights to run trust testing?

No. Most important risks can be tested in a blackbox way — just using inputs and outputs at the application endpoint. This is how most teams deploy models in practice, and it's often the fastest starting point.

How often should we test?

At minimum: before launch and after any meaningful change (model update, prompt edit, data change). For higher-risk apps, continuous testing and always-on monitoring are recommended.

Should we choose one platform or build a stack?

Most mature teams use a stack. An end-to-end platform (like AIDX) reduces integration overhead, while point solutions can fill specific gaps — a runtime firewall, a governance system, or an open-source scanner in CI.

What's the fastest way to get started with AIDX?

Start with a scoped blackbox assessment: connect your AI endpoint, run safety + jailbreak + hallucination tests, review the risk report, and prioritize fixes. From there, expand into continuous monitoring and deeper evaluations as needed.

Published by AIDX Tech — an AI testing and monitoring platform helping teams measure, improve, and prove the safety and robustness of AI applications in production. aidxtech.com