AI Agent Security: Prompt Injection, Data Leakage, and the OWASP LLM Top 10 for the Mittelstand

28 April 202638 min read

Co-founder at Superkind

AI agent security controls protecting enterprise systems

In June 2025 a security researcher at Aim Security sent a single innocuous email into a Microsoft 365 mailbox. The recipient never opened it. They never clicked anything. They never even saw it. Within minutes, Microsoft 365 Copilot had read the hidden instructions buried in that email, scanned the user’s SharePoint, OneDrive, and Teams, and quietly exfiltrated confidential data through an image URL the attacker controlled. The exploit was named EchoLeak (CVE-2025-32711) and earned a CVSS score of 9.3¹².

EchoLeak is not an outlier. In 2025 we also saw Cursor IDE compromised through a poisoned README¹⁶, GitHub Copilot tricked into enabling unattended command execution from a public repo¹⁶, Devin AI manipulated by a USD 500 research budget into installing command-and-control malware¹⁶, and Gemini Enterprise wiping victim memory through a Jira ticket - earning a USD 15,000 bug bounty¹⁶. These are production exploits in tools that German Mittelstand companies use today.

Meanwhile 87 percent of German companies report being victims of data theft, espionage, or sabotage in the past 12 months, with damages of EUR 289.2 billion²⁵. IBM’s 2025 Cost of a Data Breach report found that 97 percent of breached organisations with an AI security incident lacked proper AI access controls, and shadow AI alone added USD 670,000 to the average breach⁷. This guide is for the CISO, IT lead, Geschäftsführer, or Datenschutzbeauftragter at a German SME who needs to know what is actually broken in AI agent security, what the OWASP LLM Top 10 means in plain language, and how to harden a production agent in 90 days.

TL;DR

Prompt injection is OWASP’s number-one LLM risk for 2025 and cannot be fully prevented. The strategy is to limit blast radius, not block every attack.

The lethal trifecta - private data, untrusted content, and external communication in the same agent - is the pattern behind every major 2025 incident.

Defense in depth across seven layers (identity, input, capability, output, monitoring, human review, incident response) is the only architecture that holds up under real attacks.

EU AI Act Article 15 requires high-risk AI systems to be resilient against data poisoning, adversarial inputs, and confidentiality attacks. BSI guidance and NIST AI RMF point to the same controls.

90 days is enough to inventory, threat-model, harden, and red-team a production agent if you focus on one system at a time.

The Mittelstand Threat Surface

The German Mittelstand is now an attractive target. Around 36 percent of German companies use AI in some form²⁷, but most adopted faster than they secured. The result is an agent layer sitting on top of mission-critical systems with very little hardening underneath. The data on the threat is unambiguous.

Cyberattacks are the norm - 87 percent of German companies experienced data theft, espionage, or sabotage in the past 12 months, up from 81 percent the prior year. 59 percent of companies feel their existence is threatened by cyberattacks. Annual damage to the German economy: EUR 202.4 billion²⁵.
Russia and China are the primary attackers - 46 percent of affected companies report attacks from Russia, the same percentage from China²⁵. Mittelstand companies with sensitive IP are now treated as soft targets.
AI access controls are missing - 97 percent of organisations with an AI-related security incident lacked proper AI access controls⁷. 63 percent had no AI governance policies in place at all.
Shadow AI is silent and expensive - Shadow AI was a factor in 20 percent of breaches and added USD 670,000 to the average breach cost. Shadow AI breaches expose unusually high amounts of personally identifiable information⁷⁹.
The average breach is still expensive - USD 4.44 million globally in 2025, down 9 percent from the prior year because AI-enhanced detection cut response time, but unevenly distributed - ungoverned AI systems were both more likely to be breached and more costly when they were⁷.
Mittelstand prefers German AI - 88 percent of German companies consider their AI provider’s country of origin important; 93 percent of those would prefer a German-origin AI solution²⁷. Sovereignty concerns are now driving procurement, not just IT preferences.

Key Data Point

Gartner predicts that AI applications will drive 50 percent of all cybersecurity incident response efforts by 2028¹⁰. Translation: within two years, half of every CISO’s incident queue will trace back to an AI agent or AI-assisted attack. The teams that prepare for this now spend less time firefighting later.

The Mittelstand threat surface is wider than most boards realise because AI agents combine three things that used to be separate. They access the same data your ERP, CRM, and SharePoint hold. They process untrusted content from email, web pages, and customer documents. And they can act - sending messages, calling APIs, creating records. That combination is the security problem in a single sentence.

Threat Vector	Frequency in 2025	Average Cost Impact	Source
Cyberattack on German company	87% in past 12 months	EUR 289.2 billion total	Bitkom 2025²⁵
AI-related security incident	13% of organisations	USD 4.44M average breach	IBM 2025⁷
Shadow AI involvement	20% of breaches	+USD 670K extra cost	IBM 2025⁷⁹
Missing AI access controls	97% of breached AI orgs	Higher PII exposure	IBM 2025⁷
No AI governance policy	63% of organisations	Slower detection, higher cost	IBM/Ponemon 2025⁷
Prompt injection in cyber response	Projected 50% by 2028	Half of all IR effort	Gartner 2026¹⁰

The Lethal Trifecta - The Pattern Behind Every Major Incident

Security researcher Simon Willison coined the term prompt injection in 2022 and the framing that turned it operational - the lethal trifecta - in June 2025⁴. The trifecta is the cleanest mental model anyone has produced for AI agent risk. If you remember nothing else from this article, remember this.

The three properties

Access to private data - The agent can read information that should not be shared publicly: your inbox, calendar, customer database, source code, ERP, SharePoint, file system.
Exposure to untrusted content - The agent processes input from sources you do not control: incoming email, websites, PDFs, support tickets, calendar invites, vendor documents, public repositories.
Ability to communicate externally - The agent can send data out: outbound email, webhooks, HTTP requests, public file writes, image fetches, link generation.

When all three are present in the same agent context, an attacker who plants instructions in any untrusted source can use them to read private data and exfiltrate it. The agent has no built-in mechanism to refuse this - it cannot reliably tell the difference between “your boss told me to send this report” and “an attacker hid an instruction in a PDF telling me to send this report.” That is the entire vulnerability class.

“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.”

- Simon Willison, security researcher and co-creator of Django⁴

How the trifecta maps to real 2025 incidents

EchoLeak (Microsoft 365 Copilot) - Private data: Outlook, OneDrive, SharePoint. Untrusted content: incoming email. External comms: image URL fetch. All three present. Attack succeeded¹².
GitHub Copilot RCE (CVE-2025-53773) - Private data: developer’s local environment. Untrusted content: public repo code comments. External comms: shell execution. All three present. Attack succeeded¹⁶.
Cursor IDE (CVE-2025-54135) - Private data: developer machine. Untrusted content: README files. External comms: writing arbitrary configuration that triggered MCP servers. All three present. Attack succeeded¹⁶.
Gemini Enterprise Jira - Private data: agent memory across sessions. Untrusted content: Jira ticket descriptions. External comms: agent-controlled actions. All three present. Attack succeeded¹⁶.
Devin AI - Private data: developer credentials and code. Untrusted content: project description. External comms: open ports, install C2 malware. All three present. Attack succeeded⁴.

The Mitigation

Break the trifecta. Remove any one property and exfiltration becomes much harder. The most practical removals are limiting external communication (no outbound HTTP from agent context) or sandboxing untrusted content (separate agent instance with no private data access). Most production agents can ship the same business value with one property removed.

High Risk (Full Trifecta)

✗ Email summariser with web search and outbound email
✗ Customer service agent with CRM access and outbound chat
✗ Coding assistant with repo access, web fetch, and shell
✗ Browser-use agent with file access and arbitrary navigation
✗ MCP-enabled assistant with broad tool access and untrusted MCP servers

Lower Risk (Trifecta Broken)

✓ Read-only RAG over internal docs, no external comms
✓ Document classifier with no internet and no write access
✓ Public web scraper with no private data in context
✓ Translation agent with no persistent state and no tools
✓ Approval-gated workflow where every external action needs human sign-off

The OWASP LLM Top 10 - What Each Risk Actually Means

OWASP publishes the de-facto industry list of the most critical LLM security risks. The 2025 edition reordered the risks based on real-world incidents in 2024-2025, with Sensitive Information Disclosure jumping from sixth to second place and Supply Chain rising to third¹. Here is each risk in plain language with what it looks like in a Mittelstand context.

LLM01:2025 Prompt Injection

An attacker manipulates the LLM’s behaviour by inserting instructions into the input. Direct prompt injection is when the user types the malicious instruction; indirect prompt injection is when the instruction comes through external content the LLM reads². Indirect is the dangerous one - the user is innocent, but the email, PDF, or webpage they ask the agent to summarise contains hidden commands.

LLM02:2025 Sensitive Information Disclosure

The model leaks private data through its outputs - PII, credentials, system prompts, intellectual property. Promoted from sixth to second in 2025 due to repeated real-world data leaks¹. In the Mittelstand this typically means an agent with access to HR data summarising a request and accidentally including someone else’s salary, or a code assistant pasting a customer’s API key into an unrelated answer.

LLM03:2025 Supply Chain

You are trusting models, fine-tuning datasets, plugins, and tools you did not build. Each is a potential injection point. A model downloaded from a public registry can be poisoned. A third-party MCP server can mask malicious tools. A fine-tune dataset can carry backdoor triggers. Mittelstand companies that wire together open-source pieces without provenance checks inherit every weakness in the chain¹.

LLM04:2025 Data and Model Poisoning

Attackers deliberately corrupt training data, fine-tuning data, or RAG knowledge bases. The model behaves normally on most inputs but produces attacker-controlled outputs on specific triggers. Particularly relevant for companies that ingest customer-submitted content (support tickets, product reviews, uploaded documents) into a knowledge base used by the agent¹.

LLM05:2025 Improper Output Handling

The application treats LLM output as if it were trusted code or trusted data. The agent generates a SQL query that gets executed without parameterisation. The agent emits JavaScript that gets rendered without escaping. The agent produces a shell command that runs without review. Classic injection bug, just with the LLM as the source¹.

LLM06:2025 Excessive Agency

The agent has more permissions, tools, or autonomy than the use case requires. An email summariser does not need send capability. A document analyser does not need shell access. The blast radius of a successful prompt injection scales directly with the agent’s permissions. Most Mittelstand agent deployments fail this check on day one¹.

LLM07:2025 System Prompt Leakage

The agent reveals its own system prompt - the configuration, persona, internal instructions, sometimes embedded credentials or tool definitions - through clever user prompts. New entry in 2025. System prompts often contain data sources, naming conventions, and security rules that help an attacker plan the next step¹.

LLM08:2025 Vector and Embedding Weaknesses

The vulnerability class for RAG architectures. Poisoned documents in the vector database, embedding inversion attacks that recover original content, retrieval that ignores access controls, and vector spaces that mix tenants. New entry in 2025 because RAG moved from prototype to production across the Mittelstand¹.

LLM09:2025 Misinformation

The model produces plausible-sounding but factually wrong outputs that humans treat as correct. In a security context, this is a misclassification of a malicious input as benign, a hallucinated control that does not exist, or a confidently wrong threat assessment. Misinformation overlaps with operational reliability but has direct security consequences when humans rely on agent output for decisions¹.

LLM10:2025 Unbounded Consumption

Resource exhaustion attacks - prompt patterns that force expensive computation, infinite loops in agent reasoning, denial-of-service through concurrent expensive requests. Cost-based attacks are now a real threat for any agent on a metered API. A sustained attack against an unbounded agent can rack up six-figure cloud bills in days¹.

OWASP Risk	2025 Rank Change	Most Exposed Mittelstand System	Primary Mitigation
Prompt Injection	#1 (unchanged)	Email + document agents	Input filtering, context isolation
Sensitive Info Disclosure	#6 to #2	HR + finance copilots	Output filtering, access scoping
Supply Chain	Up to #3	RAG + MCP ecosystems	Provenance, signing, allow-lists
Data and Model Poisoning	#4	Customer-content RAG	Source vetting, drift monitoring
Improper Output Handling	#5	Code + SQL agents	Treat output as untrusted, sandbox
Excessive Agency	#6	Browser-use + MCP agents	Least privilege, capability scoping
System Prompt Leakage	New	Customer-facing chatbots	No secrets in prompts, separation
Vector + Embedding Weaknesses	New	Production RAG systems	Tenant isolation, ACL-aware retrieval
Misinformation	#9	Decision-support agents	Grounding, citations, confidence scoring
Unbounded Consumption	#10	Public-facing agents on metered APIs	Quotas, budgets, rate limits

“Any AI working in an adversarial environment with untrusted training data or input is vulnerable to prompt injection. It’s an existential problem that, near as I can tell, most people developing these technologies are just pretending isn’t there.”

- Bruce Schneier, security technologist and Lecturer at Harvard Kennedy School⁵

Worried about your existing AI agent?

Book a 30-minute call. We will review your current architecture against the OWASP Top 10.

Book a Demo →

Layered defense-in-depth security architecture for AI agents

Real Incidents in 2025 - What Happened, What Failed, What to Learn

Theory is easy. The teaching examples are the public 2025 incidents because they show exactly which controls were missing. Five short case studies, then the pattern that connects them.

Case 1: EchoLeak (Microsoft 365 Copilot, June 2025)

What happened - Researchers at Aim Security sent a benign-looking email containing hidden prompt injection text. M365 Copilot indexed it. When the user later asked Copilot any question, the hidden instructions executed - reading inbox, OneDrive, SharePoint, and Teams content, then exfiltrating through a markdown image fetch to an attacker-controlled URL¹².
Why it worked - LLM Scope Violation: external untrusted content was processed in the same context as private data and could trigger external network calls.
What was missing - Context isolation between trusted instructions and untrusted email body. Output filtering on outbound URLs. Microsoft’s XPIA classifier was bypassed through reference-style markdown¹³.
Mitigation - Microsoft patched server-side without a client-side update. The fix limits Copilot’s ability to follow hidden adversarial prompts in files¹⁵.

Case 2: Cursor IDE Remote Code Execution (CVE-2025-54135)

What happened - An attacker hid prompt injection in a public repository’s README. When a developer opened the repo with Cursor active, the agent was instructed to write a malicious .cursor/mcp.json file. That configuration loaded a hostile MCP server with arbitrary command execution¹⁶.
Why it worked - The agent treated repository content as if it were user instructions. Configuration files were writable without user confirmation. MCP servers loaded automatically.
What was missing - Capability gating, write protection on configuration files, MCP server allow-list.

Case 3: GitHub Copilot Unattended Execution (CVE-2025-53773)

What happened - An attacker embedded prompt injection in code comments in a public repository. When a developer opened it with Copilot active, the injection modified IDE settings to enable “unattended command execution”. Subsequent commands ran without prompts¹⁶.
Why it worked - Copilot had write access to its own privilege configuration. The agent could escalate its own permissions through normal output.
What was missing - Privilege boundaries between agent context and agent configuration. The agent should not be able to elevate itself.

Case 4: Devin AI Defenseless (Research, 2025)

What happened - A security researcher spent USD 500 on Devin’s autonomous coding agent and found it completely defenseless. Through carefully crafted prompts, the agent exposed ports to the internet, leaked access tokens, and installed command-and-control malware⁴.
Why it worked - Maximum agency, no capability restrictions, no approval gates, no anomaly detection.
What was missing - Almost everything. The teaching value is showing what an unhardened agent looks like in production.

Case 5: Gemini Enterprise Jira Memory Wipe (USD 15,000 bounty)

What happened - A researcher submitted a Jira ticket whose description contained instructions targeting Gemini Enterprise. When the agent processed the ticket, it silently wiped the user’s persistent memory across sessions¹⁶.
Why it worked - Untrusted user content (ticket description) had the same trust level as the agent’s own state operations.
What was missing - Privilege separation between data input and agent state mutations.

The pattern that connects them

Untrusted content was treated as trusted - In every case, the agent had no architectural distinction between “the user’s request” and “text the agent happened to read.”
Capabilities were too broad - In every case, the agent had more tools, more permissions, or more network access than the use case required.
External communication was unrestricted - In every case, the agent could initiate outbound traffic that an attacker could control.
Detection lagged the exploit - In every case, the attack succeeded silently. Detection came from researchers, not from production monitoring.
The fix was architectural, not patch-based - Every vendor had to redesign the trust boundary, not just block a string pattern.

The Lesson for the Mittelstand

If a Microsoft, GitHub, Google, or Cognition-built agent can be compromised through these patterns, your in-house pilot will be too unless you design the trust boundaries deliberately. The good news: the controls are well-known and most can be implemented without changing your model or vendor.

The Defense-in-Depth Architecture - Seven Layers That Hold Up

No single control stops prompt injection. The BSI’s LLM evasion-attack guidance, NIST’s AI RMF GenAI profile, and OWASP’s mitigation guidance all converge on the same answer: layered defenses that limit what any successful injection can actually do¹⁹²³. Here is the seven-layer model that production teams actually ship.

Layer 1: Identity and Least-Privilege Access

Dedicated service accounts - Each agent gets its own identity, not shared with humans. Permissions are scoped to the minimum the use case requires.
Read-only by default - Write access is granted per use case after a security review, not by default.
Object-level scoping - The agent sees the records it needs and nothing else. A sales agent does not get HR data even if both live in the same database.
Short-lived credentials - Tokens rotate. Long-lived API keys are the single highest-leverage compromise target.
Audit on every call - Every action the agent takes lands in your SIEM with the user, the prompt, the tool, and the result.

Layer 2: Input Filtering and Context Isolation

Prompt-injection classifiers - Specialised models that score input for injection-like patterns. Not perfect, but a useful first filter.
Trust labels - User input gets one trust level. Retrieved documents get a lower one. External web content gets the lowest.
Quarantine for low-trust content - Suspicious inputs go through a stricter pipeline (smaller model, no tools, no private data) before re-entering the main flow.
Structural delimiters - Untrusted content is wrapped in unambiguous boundaries the model is trained to respect, even if not perfectly.
Length and content limits - Block obviously hostile inputs (very long instruction sequences, base64-encoded payloads, hidden Unicode).

Layer 3: Capability Restriction and Sandboxing

Tool allow-lists - The agent can only call pre-approved tools. New tools require security review.
No shell, no internet by default - Both are added per use case with an explicit approval, not by default.
Sandboxed execution - Code the agent generates runs in an isolated container with no production network access.
MCP server allow-lists - If you use MCP, only signed and approved servers connect. Allow-list, not deny-list.
Per-action quotas - The agent cannot send 10,000 emails or call an expensive API in a loop.

Layer 4: Output Filtering and DLP

Outbound URL filtering - Image fetches, link generation, and webhook calls are filtered against an allow-list. EchoLeak exfiltrated through an image URL - this control would have stopped it.
PII and secret scanning - Outputs are scanned for credentials, API keys, and PII patterns before they leave the agent boundary.
Hallucination grounding - Factual claims must cite a source the agent retrieved. Unsourced confident claims are flagged.
Schema validation - Tool inputs the agent generates are validated against strict schemas before execution.
Markdown rendering controls - In customer-facing outputs, block reference-style markdown that can hide payloads.

Layer 5: Monitoring, Alerting, and Anomaly Detection

Full transcript logging - Every prompt, every response, every tool call goes to a queryable store.
Behavioural baselines - Normal token usage, normal tool call patterns, normal output sizes. Deviations alert.
Cost anomaly detection - Unbounded consumption attacks show up as cost spikes. Alert at 2x baseline.
Outbound traffic monitoring - Agent network egress is on the same DLP and SIEM rails as the rest of your traffic.
Red-team replay - Known prompt-injection payloads run against production weekly. Failures trigger investigation.

Layer 6: Human-in-the-Loop and Approval Gates

Risk-tiered approvals - Read-only summarisation: no approval. Send email to customer: approval. Move money: two-person approval.
Reversibility check - Reversible actions can run autonomously; irreversible ones (deletes, payments, public posts) require human sign-off.
Confidence thresholds - The agent escalates to humans when its own confidence is below a defined level.
Sample audits - Even fully autonomous actions are sample-audited weekly.
Kill switch - One command pauses the agent globally. Tested quarterly.

Layer 7: Incident Response and Recovery

Runbook - A documented playbook for “agent compromised” - who decides, who shuts it down, who notifies, who investigates.
Forensics-ready logs - 30 to 90 days of transcripts and tool calls retained for incident investigation.
Tabletop exercises - Quarterly. The CISO walks through a real EchoLeak-style scenario with the AI lead and Datenschutzbeauftragter.
Vendor incident clauses - Your contracts require disclosure of relevant CVEs and patch timelines.
Recovery testing - You have rehearsed how to revoke agent credentials, rotate keys, and restore from clean state.

Layer	Primary Goal	Stops	Typical Tools
1. Identity	Limit reach	Excessive Agency, Sensitive Info Disclosure	IAM, service accounts, scoped tokens
2. Input filtering	Limit injection success	Prompt Injection (direct + indirect)	Classifiers, structural delimiters, trust labels
3. Capability restriction	Limit blast radius	Excessive Agency, Improper Output Handling	Tool allow-lists, sandboxes, MCP signing
4. Output filtering	Limit exfiltration	Sensitive Info Disclosure, exfiltration	DLP, URL allow-list, schema validation
5. Monitoring	Detect attacks	Unbounded Consumption, novel attacks	SIEM, behavioural analytics, cost alerts
6. Human review	Catch high-stakes errors	Misinformation, irreversible mistakes	Approval workflows, kill switch
7. Incident response	Contain and recover	All categories post-compromise	Runbooks, tabletop, forensics

“It is impossible to block prompt injection 100 percent of the time. We need to change our mindset.”

- Dennis Xu, Senior Director Analyst at Gartner¹¹

EU AI Act, BSI Guidance, and the NIST AI RMF

AI agent security is not just a good idea - it is increasingly a regulatory obligation. Three frameworks matter for the German Mittelstand: the EU AI Act, the BSI’s LLM guidance, and the NIST AI Risk Management Framework. They overlap more than they conflict.

EU AI Act Article 15 - Cybersecurity for high-risk systems

Article 15 explicitly requires high-risk AI systems to be resilient against attacks. The text names specific threats²⁰:

Data poisoning - Manipulation of training data to corrupt model behaviour.
Model poisoning - Tampering with pre-trained components.
Adversarial examples (model evasion) - Inputs crafted to make the model misbehave - prompt injection lives here.
Confidentiality attacks - Attempts to extract training data or system prompts.
Model flaws - Exploitable weaknesses in the model itself.

For high-risk systems, controls must be appropriate to the relevant risks and circumstances. Resilience can be achieved through technical redundancy, backups, and fail-safe plans²⁰. Most internal Mittelstand agents fall into limited or minimal risk - so Article 15 is often not directly binding - but the same principles flow through GDPR Article 32 (security of processing) and the NIS2 directive for critical infrastructure operators.

EU AI Act penalties

Prohibited AI violations - Up to EUR 35 million or 7 percent of global revenue²¹
High-risk non-compliance - Up to EUR 15 million or 3 percent of global revenue²¹
Misleading information - Up to EUR 7.5 million or 1 percent of global revenue²¹
SME provision - For SMEs the cap is whichever is lower (not higher), giving smaller companies proportionate exposure²¹
Full applicability - 2 August 2026²²

BSI guidance on evasion attacks

The Bundesamt für Sicherheit in der Informationstechnik (BSI) published “Evasion Attacks on LLMs - Countermeasures in Practice” targeting developers and IT managers in companies and public authorities using pre-trained models¹⁹. The core recommendations:

Layered defense - Combine technical controls (filters, sandboxing, RAG with trusted retrieval) and organisational practices (adversarial testing, governance, training).
Assume single-control failure - No single safeguard is sufficient. Multiple layers compensate when one is bypassed.
Continuous monitoring - Special risks of evasion attacks require active observation, not periodic audits.
Adversarial testing - Red-team your own systems before attackers do.
Defense in depth across the lifecycle - Security applies during development, deployment, and operation - not just at one phase.

NIST AI RMF Generative AI Profile (NIST-AI-600-1)

The NIST AI RMF GenAI Profile released in July 2024 catalogues over 400 mitigation actions across the AI lifecycle²³. It is voluntary in the EU but widely adopted by US-headquartered vendors and increasingly referenced in German procurement. The profile covers risks beyond OWASP - confabulation, harmful bias, environmental impact - while overlapping on the security categories.

Framework	Geography	Binding?	Focus for AI Agent Security
EU AI Act	EU	Yes (high-risk)	Article 15 cybersecurity, Article 4 literacy
BSI LLM Guidance	Germany	Recommendation	Evasion attack countermeasures
NIST AI RMF GenAI Profile	US	Voluntary	400+ mitigations across lifecycle
OWASP LLM Top 10	Global	Industry standard	Top 10 LLM application risks
ISO/IEC 42001	Global	Certifiable	AI management system standard
GDPR Article 32	EU	Yes	Security of processing personal data

AI Agent Compliance Checklist

Inventory of all AI agents and their access scopes documented
Each agent classified by EU AI Act risk category (most will be limited or minimal)
Article 4 AI literacy training delivered to all employees who interact with AI
GDPR Article 32 security controls mapped to each agent
Datenschutzbeauftragter has reviewed each agent’s data flows
Betriebsrat informed for any agent processing employee data
BSI evasion-attack countermeasures applied to high-risk agents
Vendor contracts include AI security clauses (CVE disclosure, patch SLA)
Audit log retention meets sector requirements (typically 90+ days)
Incident response runbook tested quarterly

The 90-Day Hardening Playbook

Most Mittelstand companies already have one or two AI agents in production - usually a Copilot deployment, a vendor-built customer service agent, or an internal pilot. A 90-day hardening engagement is the realistic path from “we have it but we are nervous” to “we know what could go wrong and we have controls in place.” Here is the week-by-week breakdown.

Phase 1: Inventory and Threat Model (Weeks 1-4)

Week 1: Agent inventory - List every AI agent, copilot, and AI-enabled tool in use - sanctioned and unsanctioned. Include browser extensions, IDE assistants, vendor-built features. The shadow AI footprint is usually 3-5x larger than IT believes.
Week 2: Data flow mapping - For each agent, document what data it reads, what tools it can call, and what external traffic it can generate. This is where you discover which agents have the lethal trifecta.
Week 3: Threat modelling - For each high-priority agent, walk through OWASP LLM Top 10. Score each risk as high, medium, or low for your context. Identify the top 5 gaps.
Week 4: Compliance gap analysis - Map each agent to EU AI Act risk category, GDPR obligations, sector-specific rules. Identify gaps in literacy training, documentation, and audit logging.

Phase 2: Implement Controls (Weeks 5-8)

Week 5: Identity and access - Move every agent to a dedicated service account. Apply least privilege. Rotate any long-lived credentials. Enable per-action audit logging to your SIEM.
Week 6: Input and output filtering - Deploy a prompt-injection classifier on input. Add DLP scanning on output. Implement URL allow-lists for outbound communication. Block reference-style markdown where applicable.
Week 7: Capability restriction - Remove tools the agent does not need. Sandbox code execution. Allow-list MCP servers. Implement per-action quotas. Add the kill switch.
Week 8: Approval gates - Tier actions by reversibility and risk. Wire human approval for irreversible or high-impact actions. Document the approval matrix so it is consistent across teams.

Phase 3: Red Team and Operationalise (Weeks 9-12)

Week 9: Red team exercise - Run known prompt-injection payloads against every agent. Try EchoLeak-style indirect injection. Try Excessive Agency exploitation. Try cost-based DoS. Document what works.
Week 10: Tabletop exercise - CISO, AI lead, Datenschutzbeauftragter, and Betriebsrat (if relevant) walk through an “agent compromised” scenario. Identify gaps in the runbook.
Week 11: Monitoring rollout - Deploy behavioural baselines and anomaly alerts. Wire cost monitoring. Add weekly red-team replay to CI. Test the kill switch in production conditions.
Week 12: Governance and review - Establish the monthly governance routine. Train the IT team on the runbook. Brief the board with measurable outcomes (gaps closed, controls operational, residual risks accepted).

90-Day Readiness Checklist

Every AI agent has a documented owner and risk classification
Every agent runs with a least-privilege service account
Input filtering and output DLP are deployed
Capability allow-lists are enforced (no shell or internet by default)
External URL allow-list blocks unexpected exfiltration paths
Audit logs flow to your SIEM with 90+ day retention
Approval gates are wired for irreversible actions
Kill switch tested at least once in production conditions
Weekly red-team replay runs in CI
Quarterly tabletop exercise scheduled

In-House

✓ Builds internal capability - your team learns the patterns
✓ Full context - your team knows your systems
✗ Talent gap - LLM security specialists are scarce in the Mittelstand
✗ Slower - 6-9 months is typical for first hardening cycle
✗ Blind spots - your team has not seen 100 production agents

External Partner

✓ Faster - 90 days is realistic for a focused engagement
✓ Pattern library - partner has seen what works and what fails
✓ Independent perspective - external red-team finds blind spots
✗ Knowledge transfer required - capability has to land internally before they leave
✗ Vendor selection matters - generalist consultants often miss the LLM-specific risks

12 Questions to Ask an AI Agent Vendor Before You Sign

Most procurement teams ask the wrong questions. They focus on features and pricing. The questions that actually surface risk are concrete and architectural. Use this list verbatim in your next vendor call.

Where does prompt injection sit on your roadmap? - The honest answer is “continuously and forever.” Vendors who say “we have solved it” are not credible.
Which OWASP LLM Top 10 risks have you tested against and how? - Look for specific test methodology, not a checkbox claim.
What controls do you have on outbound network traffic from the agent? - The EchoLeak class of attack lives here.
How do you isolate untrusted content from trusted instructions in the model context? - Specific architectural answer, not a marketing line.
What is your CVE disclosure SLA? - You should expect notification within days, not after a press release.
How do you handle MCP servers, plugins, or tools added by customers? - Allow-list, signing, sandboxing - get specifics.
Where is data processed and stored, and which sub-processors are involved? - GDPR Article 28 question. Get the full sub-processor list.
How long are agent transcripts retained, and who can access them? - Critical for both incident response and GDPR compliance.
Show me the audit log format - Real systems have rich, queryable logs. Toy systems do not.
What is your incident response history for AI-specific issues? - Anonymised post-mortems are a strong positive signal.
How do you handle EU AI Act conformity assessment if our use case becomes high-risk? - The vendor should be ready or actively preparing.
Can you provide references from a similarly sized German Mittelstand customer? - Two references, both reachable, both honest about what went wrong.

Red Flag Phrases

Walk away from vendors who say “our model is fully secure,” “prompt injection is solved,” “you do not need to worry about that,” or “our enterprise tier handles all of this.” Real security vendors talk about defense in depth, residual risk, and continuous improvement - not about silver bullets.

How Superkind Handles AI Agent Security

Superkind builds custom AI agents for SMEs and enterprises. Security is not a separate workstream we add at the end - it is the architecture from week one. Every agent we ship is designed against the lethal trifecta and the OWASP LLM Top 10 by default.

Trifecta-aware design - Every agent we build starts with a deliberate decision about which of the three properties (private data, untrusted content, external comms) it actually needs. We default to breaking the trifecta wherever business value allows.
Least-privilege by default - Each agent gets a dedicated service account scoped to the minimum data and tools the use case requires. Read-only is the starting point; write access is justified per use case.
Input and output filtering built-in - Every agent ships with prompt-injection classification on input and DLP scanning on output. Outbound URLs are allow-listed. Reference-style markdown is blocked in customer-facing channels.
Capability restriction enforced - No shell, no internet, no MCP server unless explicitly required. Every tool is allow-listed. Every action is logged. Quotas prevent unbounded consumption.
Approval gates wired in - Irreversible actions go through human review. Risk tiers and approval matrices are documented. Kill switches are tested before launch, not after.
Audit-ready from day one - Every prompt, response, and tool call lands in your SIEM. 90-day default retention. EU AI Act and GDPR audit fields populated automatically.
Sovereign data handling - Customer data stays within your infrastructure or EU-region cloud. No training on your data. No third-party sub-processors without explicit approval.
Continuous red-team - Known prompt-injection payloads run against your agents weekly. New CVEs trigger an automatic regression test. We disclose what we find within days.

Approach	Generic AI Vendor	Superkind
Default agent posture	Maximum capability for demos	Minimum capability, expand per use case
Trifecta handling	Often all three properties present	Trifecta broken by default where possible
Outbound traffic	Open by default	Allow-listed by default
Audit logging	Optional add-on	Built-in to your SIEM
Red-team cadence	Annual at best	Weekly replay + new CVE regression
EU AI Act readiness	Customer responsibility	Documentation prepared in delivery
Incident SLA	Standard support tier	CVE disclosure within 5 business days

Pros

✓ Security-first architecture - trifecta-aware from week one
✓ OWASP-aligned controls - every Top 10 risk has a documented mitigation
✓ EU AI Act and BSI ready - documentation produced as part of delivery
✓ Weekly red-team replay - regressions caught before customers see them
✓ SIEM-native logging - integrates with your existing security stack

Cons

✗ Slower initial demo - approval gates make sales demos feel less magical
✗ Fewer features at launch - capability restriction means your first agent does less than vendors who ship without it
✗ Engagement model - we need access to your real systems, not just documentation
✗ Not for unhardened pilots - if you want a 2-week proof of concept with no security, we are the wrong partner

Decision Framework: Are You Ready to Deploy or Should You Harden First?

Different starting points need different responses. Use this table to decide whether you ship now, ship with hardening, or pause and invest in foundations first.

Signal	What It Means	Action
You have one or more agents in production with broad permissions	High blast radius, lethal trifecta likely present	Run the 90-day hardening playbook now
Employees are using shadow AI with company data	20% of breaches involve shadow AI; +USD 670K cost	Offer a sanctioned alternative within 30 days
You are evaluating a new agent vendor	Procurement is your last cheap security gate	Ask the 12 questions before signing
You ran a pilot that worked but never went to production	Common pause point - usually security or governance	Start with the threat model, not another pilot
Your agent reads customer-submitted content	High prompt injection exposure	Prioritise input filtering and capability restriction
You handle regulated data (health, finance, public sector)	EU AI Act high-risk classification likely	Treat Article 15 controls as binding now, not in 2026
You have no audit logs from your agent	You cannot detect or investigate an incident	Add SIEM logging before any other change

Ship and Harden

✓ Maintains business momentum - the agent delivers value while you harden
✓ Real production data - threat model is grounded, not theoretical
✓ Team learns by doing - capability builds during hardening
✗ Residual risk exists - until controls are operational
✗ Requires honest scope - no broad capabilities until hardening lands

Delay and Build

✓ Lower initial risk - controls in place before users
✓ Cleaner audit trail - controls visible from day one
✗ Pilot purgatory risk - shipping is what makes projects real
✗ Slower learning - threat models built without real traffic miss things
✗ Competitive lag - others are shipping with imperfect security

Frequently Asked Questions

Prompt injection is when an attacker hides instructions inside something the AI agent reads - an email, a PDF, a calendar invite, a support ticket - and the agent treats those hidden instructions as if you had typed them yourself. The agent has no reliable way to tell the difference between your real instructions and the attacker's smuggled ones. OWASP ranks it as the number-one risk for LLM applications in 2025.

Real. In 2025, EchoLeak (CVE-2025-32711) exfiltrated data from Microsoft 365 Copilot through a single innocuous email. Cursor IDE got remote code execution from a poisoned README. GitHub Copilot was tricked into enabling unattended command execution from a public repo. Gemini Enterprise wiped victim memory through a Jira ticket and earned a USD 15,000 bug bounty. These are production exploits, not lab demos.

No. Gartner analyst Dennis Xu put it directly: "It is impossible to block prompt injection 100 percent of the time." The strongest defenses combine input filtering, output filtering, capability restriction, and human review. The mindset has to shift from "prevent every attack" to "limit the blast radius when an attack succeeds."

A term from security researcher Simon Willison. An AI agent has the lethal trifecta when it combines three properties at once: access to private data, exposure to untrusted content, and the ability to communicate externally. If any single property is missing, exfiltration becomes much harder. The fastest mitigation is breaking the trifecta - usually by removing external communication or sandboxing untrusted content.

Yes. Article 15 of the EU AI Act explicitly requires high-risk AI systems to be resilient against attacks including data poisoning, model poisoning, adversarial examples, and confidentiality breaches. Article 99 sets penalties up to EUR 15 million or 3 percent of global turnover for high-risk non-compliance. From August 2026, full applicability begins. Most internal Mittelstand agents fall into limited or minimal risk, but security obligations still apply through GDPR, the NIS2 directive, and sector regulations.

Shadow AI is when employees use AI tools outside IT approval - private ChatGPT accounts on company devices, free browser plugins, unsanctioned automation scripts. IBM's 2025 Cost of a Data Breach Report found shadow AI was a factor in 20 percent of breaches and added USD 670,000 to the average cost. The fix is not banning AI - it is offering a sanctioned alternative people actually want to use.

Through least-privilege access. The agent gets a dedicated service account with the minimum permissions it needs - read-only where possible, scoped to specific objects, audited continuously. Combine this with input and output filtering, retrieval grounding, capability restriction (no shell, no internet by default), and human approval for actions above a defined risk threshold. Audit logs go to your existing SIEM.

On-premise removes one threat - data leaving your network through a third-party API. It does not remove prompt injection, supply chain risk, model poisoning, excessive agency, or any other OWASP LLM risk. The hosting decision is real but smaller than people think. The architecture and controls around the model matter more than where the GPUs sit.

The German Federal Office for Information Security (BSI) published "Evasion Attacks on LLMs - Countermeasures in Practice" targeting developers and IT managers. It recommends layered defense - input filters, sandboxing, retrieval-augmented generation with trusted sources, adversarial testing, and continuous monitoring. The core message: no single control is sufficient, and even well-configured systems can be subverted without defense in depth.

A focused hardening engagement runs about 90 days. Weeks 1-4: inventory, threat model, gap assessment. Weeks 5-8: implement input/output filtering, capability restrictions, and identity controls. Weeks 9-12: red-team exercises, incident response runbook, and monitoring rollout. Most Mittelstand companies discover three to five high-severity gaps in the first phase.

CISO accountability with shared execution. The CISO owns policy, risk acceptance, and audit. The AI lead or vendor owns implementation. The Datenschutzbeauftragter signs off on data handling. The Betriebsrat is informed where agents touch employee data. A monthly governance routine - 30 minutes - keeps all four aligned without creating a committee.

It rarely looks like a movie hack. The pattern is quiet exfiltration over hours or days - the agent processes a malicious document, follows hidden instructions, and leaks data through a path that looks normal: an outbound email, a webhook, a shared link. Detection comes from anomaly monitoring on agent traffic, not from antivirus. This is why output logging and DLP on agent channels matter as much as input filtering.

Different threat profile, not strictly safer. Open weights remove third-party data exposure but add supply chain risk - models you download can have backdoors or be poisoned during training. Commercial APIs put a vendor between you and the data but give you better incident response and patching. The honest answer is: pick based on data sensitivity, your team's capacity to operate either, and the security maturity of your chosen vendor or model registry.

Sources

Henri Jung

Co-founder of Superkind, where he helps SMEs and enterprises deploy custom AI agents that actually fit how their teams work. Henri is passionate about closing the gap between what AI can do and the value it creates in real companies. He believes the Mittelstand has everything it needs to lead in AI - it just needs the right approach.

Ready to harden your AI agents?

Book a 30-minute call with Henri. We will review your current setup against the OWASP LLM Top 10 and outline a 90-day hardening plan - no commitment, no sales pitch.