Back to Blog

Agent-Native Documentation: Why Mittelstand Wikis, ERPs, and SOPs Need to Be Written for AI Agents

Henri Jung, Co-founder at Superkind
Henri Jung

Co-founder at Superkind

Dark matte metal index card holder with structured plates and one bright orange tab, representing machine-addressable agent-native documentation

The first time a Mittelstand IT lead realises something has shifted is usually a Tuesday afternoon. The new AI agent has been live for two weeks. A finance clerk asks it “what is our policy for cancelling a customer credit note in DATEV when the original invoice was already exported?” The agent answers confidently. The answer is wrong. Forensics shows the agent never reached the relevant policy page - it sits behind three Confluence clicks, in a screenshot of a SharePoint diagram, in a German-only PDF that the agent could only see as a blank box.

The agent did not lie. It read what it could read - a sidebar, a navigation tree, a marketing call-to-action, three half-loaded JavaScript widgets - and inferred. The actual policy was invisible. The agent was working with a pile of pixels and chrome where a structured document should have been. This is the moment most Mittelstand teams discover their entire documentation estate, lovingly maintained for human readers since 2008, was never written for the new primary reader. As Cloudflare put it bluntly when they shipped Markdown for Agents in April 2026: “Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside.”1

This guide covers what agent-native documentation actually is, why the human-first wiki has a hidden tax that compounds with every agent you ship, the three surfaces that have settled into the 2026 stack (llms.txt, markdown endpoints, MCP servers), how to convert a 5,000-page Confluence to agent-ready markdown without hiring a second documentation team, the seven pitfalls that derail Mittelstand projects, and a 90-day roadmap. Written for IT leaders, knowledge managers, platform engineers, and the Geschäftsführer who has to decide whether documentation is a project or a product.

TL;DR

Documentation has a new primary reader - AI agents now consume more pages of internal docs per day than humans do, and they read in a way humans never did. Wikis written for clicks, screenshots, and navigation trees waste 60 to 80 percent of their tokens on HTML chrome and degrade retrieval quality.

Three surfaces have stabilised - llms.txt as a public index, markdown endpoints (Accept: text/markdown or /index.md) as the lingua franca, and MCP servers as the live, authenticated interface. Cloudflare, Anthropic, Stripe, Vercel, and Zapier all ship at least two of the three by April 2026.

The token economics are real - Cloudflare measured an 80 percent token reduction moving from HTML to markdown on a single blog post (16,180 tokens to 3,150)1. For a Mittelstand wiki that is the difference between an agent answering correctly and running out of context.

llms.txt is necessary but not sufficient - It is an index, not the content. Agents fetch llms-full.txt more than twice as often as the lighter llms.txt index5. Internal docs need an MCP server in front of Confluence, SAP, and DATEV - public crawling is not enough.

The Mittelstand-specific work is the cleanup - Stable anchors, page consolidation, screenshot replacement, German-only docs surfaced. The conversion of a 5,000-page Confluence is 6 to 10 engineering weeks, most of it automatable; the structural cleanup is what takes judgement.

The 12-month budget lands at 30,000 to 90,000 euros for the docs-for-agents layer (markdown export pipeline, MCP server for internal sources, governance). The payback is the first agent rollout that actually works because it can read what your team wrote.

Why Agent-Native Documentation Matters Now

Six concrete reasons documentation has gone from a quiet back-office project to a 2026 IT-strategy decision.

  • Agents read more docs than humans do - On documentation sites that have instrumented their traffic, agent and assistant requests now make up a majority of unique reads. The Cloudflare Markdown for Agents launch references this shift directly: agents are the new primary reader, and they read structurally, not visually1.
  • Markdown is the agent lingua franca - Markdown has become the universal exchange format for agents and AI systems, with explicit structure that minimises token waste and maximises retrieval quality2. Every modern coding agent (Claude Code, Cursor, OpenCode) sends Accept headers requesting markdown directly.
  • The token tax compounds - A single H2 heading costs roughly three tokens in markdown versus twelve to fifteen in HTML. Across a 50-page Confluence space the difference is six figures of tokens per agent run, and the Mittelstand team paying per-token sees it in the bill within weeks1.
  • llms.txt has tipped from experimental to expected - By April 2026 Anthropic, Stripe, Zapier, Cloudflare, Vercel, Cursor, Postman, and most developer-tool documentation publish an llms.txt index. Profound data shows agents fetch llms-full.txt over twice as often as the lighter index5.
  • MCP went from spec to substrate - The Model Context Protocol surpassed 97 million monthly SDK downloads by Q1 2026, with enterprise readiness as the top 2026 roadmap priority12. MCP servers are the live, authenticated way to expose internal Confluence, SharePoint, SAP, and DATEV docs to agents.
  • The Mittelstand documentation estate is uniquely vulnerable - German Mittelstand wikis tend to be old, deep, multilingual, screenshot-heavy, and maintained part-time by the people who also ship product. Of every documentation pattern, this is the one most expensive to keep human-only and most rewarding to make agent-native.

The defining 2026 statistic

Cloudflare measured an 80 percent token reduction moving the same blog post from HTML to markdown - 16,180 tokens down to 3,1501. For a Mittelstand wiki, that is the difference between an agent that answers correctly and one that runs out of context window mid-sentence. No other documentation optimisation lever in 2026 comes close to this multiple.

The Hidden Tax of Human-First Documentation

Human-first documentation was never wrong. It was right for the audience it served. The problem is that the audience now includes a second reader with completely different ergonomics, and the patterns that delight humans actively obstruct agents. Six concrete patterns that turn into hidden tax.

  1. HTML chrome drowns the signal - Navigation trees, sidebars, breadcrumbs, footer mega-menus, cookie banners, sharing widgets, related-content carousels. None of it is the document; all of it costs tokens. The Cloudflare measurement of 16,180 tokens of HTML for 3,150 tokens of actual content is a five-to-one ratio of chrome to substance1.
  2. JavaScript-rendered content is invisible - Single-page-app docs that render in the browser are blank to most retrieval pipelines. The agent fetches the URL, gets a shell, and sees an empty page. By the time the JS finishes, the agent has already moved on.
  3. Screenshots and GIFs are opaque - A Visio process diagram embedded as a PNG is invisible to a text-only agent and roughly 1,500 to 4,000 tokens of nothing-useful to a vision model. The structured information the diagram conveys is absent unless someone wrote it next to the image.
  4. Click-here navigation breaks retrieval - “See the troubleshooting section” with a hyperlink reads as instructions to a human and as a dead end to an agent that fetched only this page. The agent does not know to follow the link, and would not have the auth cookie if it did.
  5. GUI walkthroughs depend on what the agent cannot see - “Click the gear icon, then choose Settings, then the second tab” is unactionable without the screenshots. An agent cannot click; it needs the underlying API call, the configuration key, the file path. Every GUI walkthrough is a missing API doc waiting to be written.
  6. PDFs are quietly catastrophic - A scanned policy PDF from 2014 is a wall of pixels with whatever OCR your retrieval pipeline runs. Tables become wordlists. Multi-column layouts interleave nonsense. The Mittelstand SOP estate is roughly 30 to 70 percent PDFs, and each one is a small disaster.

The defining test

Run curl against the most important page in your wiki with the header Accept: text/markdown. If the response is HTML, an error, or a marketing redirect, your documentation is not agent-native and your agent is guessing whenever it asks about that page.

Why the tax is hard to see from the inside

  • Humans compensate silently - When a human cannot find something in the wiki, they ask a colleague, search again, or live with not knowing. The compensation is invisible. An agent does not compensate; it answers wrong with confidence.
  • The wiki tools optimise for humans - Confluence, SharePoint, Notion, MediaWiki - all of them spent the last decade optimising the visual reader experience. The export-to-markdown story is an afterthought, the API is rate-limited, and the page chrome is hard to strip cleanly.
  • The cost lives in the agent bill, not the wiki bill - Token waste shows up on the OpenAI or Anthropic invoice, not on the Confluence invoice. The team that pays the cost is not the team that owns the wiki, so the optimisation never gets prioritised.
  • Hallucinations look like correct answers - When the agent cannot find the actual policy, it produces something policy-shaped from training data. The Mittelstand IT team that does not have a way to test answers against ground truth never realises the agent is making things up.

What Agent-Native Documentation Actually Is

Strip away the vendor language and agent-native documentation has a simple definition. It is the same content your humans read, served through a structured surface an agent can ingest directly, with five mandatory attributes.

  • Markdown as the canonical source - Markdown is the format the document is written in or exports to losslessly. HTML, PDFs, and GUI exports are derived; markdown is the source of truth. Markdown has become the lingua franca for agents and AI systems2.
  • Stable, semantic anchors - Every section has a stable URL fragment that does not change when the page is reorganised. An agent that retrieved /policies/credit-notes#cancellation last week can retrieve the same anchor this week. Anchor instability is the silent killer of retrieval quality.
  • Structured headings, tables, and lists - H2 and H3 carry meaning, not just visual weight. Tables are markdown tables, not images of tables. Lists are real bullet or numbered lists. The structure is what the agent uses to chunk the document into retrievable pieces.
  • Machine-addressable index - An llms.txt at the root, a sitemap.xml that includes the markdown URLs, or an MCP server with a list_documents tool. The agent must be able to discover what exists without crawling the whole human site.
  • Auth-aware delivery for internal content - Internal docs go through an MCP server that respects user identity and document-level permissions. The agent only ever sees the chunks the calling user is allowed to see.
AttributeHuman-first wikiAgent-native docs
Source formatHTML rendered from CMSMarkdown, canonical and exportable
DiscoveryNavigation tree and search boxllms.txt + sitemap + MCP list_documents
AnchorsAuto-generated, change with editsStable, slugified, hand-curated
VisualsScreenshots, diagrams, GIFsMarkdown tables and described steps next to images
DeliveryHTML over HTTPS for browsersAccept: text/markdown or /index.md, MCP server for internal
AuthSession cookieOAuth 2.1 with on-behalf-of for internal docs
Token cost5x to 10x markdown baselineMarkdown baseline, 80 percent leaner

The Three Surfaces Every Agent-Native Stack Needs

The 2026 consensus, reflected in the Cloudflare, Anthropic, Vercel, Stripe, and Mintlify implementations, splits agent-native documentation into three surfaces. Each one solves a different retrieval problem and has a different governance model. Treating them as one or skipping any of them is the most common Mittelstand mistake.

Surface 1: llms.txt - the public index

  • What it is - A plain markdown file served at /llms.txt that lists the most important documentation pages, each as a markdown link with a one-line description5. Optionally paired with /llms-full.txt that inlines the full content for one-shot ingestion.
  • What it does - Tells an agent what exists without forcing it to crawl the whole site. Acts as a curated entry point, not a complete inventory.
  • Where it fits - Public documentation, marketing pages an agent should know about, partner-facing docs. Anywhere a human-readable index would also make sense.
  • What it does not do - It does not improve traditional SEO, and major AI companies have not committed to read it automatically5. It is best treated as an optimisation for agents that already know how to look for it.

Surface 2: Markdown endpoints - the lingua franca

  • What it is - Every documentation page available as markdown, either through a content-negotiation header (Accept: text/markdown) or through a URL convention (.md or /index.md suffix)2.
  • What it does - Returns the actual content, stripped of HTML chrome, as markdown the agent can paste directly into context. Cloudflare measured 80 percent token reduction on a single blog post; the same applies to Mittelstand wiki pages.
  • Where it fits - All public documentation. The Mintlify, Docusaurus, and Hugo ecosystems make this near-trivial; for a custom-built docs site the work is one middleware that strips chrome and serves markdown.
  • What replaces what - HTML scraping pipelines, custom crawl-and-clean code, and the brittle workarounds Mittelstand teams currently maintain to feed wikis into RAG.

Surface 3: MCP server - the authenticated interface

  • What it is - A Model Context Protocol server that fronts your internal documentation (Confluence, SharePoint, SAP help, DATEV manuals) and exposes search, retrieval, and listing as MCP tools11.
  • What it does - Returns chunked markdown for a query, respects user identity, applies document-level permissions, and produces an audit trail.
  • Where it fits - Internal docs, anything behind authentication, anything that needs query parameters or filters, and any content that changes faster than crawling can keep up.
  • Standards stack - MCP authorization based on OAuth 2.1 with PKCE and RFC 8707 Resource Indicators16. Identity propagation so the audit trail terminates at a human, not at the agent.
SurfaceFormatAuthBest for
llms.txtStatic markdown indexPublicDiscovery, public docs
Markdown endpointMarkdown via header or .md URLPublic or tokenPublic content delivery
MCP serverTool calls returning markdownOAuth 2.1 + on-behalf-ofInternal, dynamic, authenticated

The 2026 Stack: llms.txt, Markdown for Agents, MCP

The three standards have settled into specific layers that compose well. None are novel; the combination is what is new and what every Mittelstand IT team needs to know by name.

llms.txt and llms-full.txt

  • What it is - Jeremy Howard’s 2024 proposal: a /llms.txt file at the site root listing the top documentation pages as markdown links, optionally paired with /llms-full.txt that inlines the entire documentation set7.
  • Why it matters for the Mittelstand - It is the cheapest agent-readiness move available. Most CMS and static site generators (Mintlify, Docusaurus, Astro) generate llms.txt from sitemap and frontmatter automatically. Adoption by Anthropic, Stripe, Vercel, Zapier, and Cloudflare makes it the de facto baseline for documentation in 20265.
  • The honest caveat - llms.txt does not improve search rankings, OpenAI and Perplexity do not commit to read it, and 8 of 9 sites measured no traffic change after publishing one5. Publish it because agents that already retrieve it perform better; do not publish it expecting magic.
  • The Mittelstand pattern - Generate llms.txt automatically from your sitemap. Generate llms-full.txt for any docs space under 200,000 tokens of total content. Both are static files; updating them is a CI step.

Markdown for Agents

  • What it is - Cloudflare’s April 2026 launch that made every documentation page available as markdown through Accept: text/markdown headers and /index.md URL suffixes, with response headers signalling token count1. Available beta-free for Pro, Business, and Enterprise plans.
  • Why it matters for agents - Coding agents (Claude Code, Cursor, OpenCode) already send Accept headers requesting markdown. When the documentation responds with markdown, the agent gets clean, structured content; when it does not, the agent falls back to HTML and pays the token tax.
  • What the headers add - The x-markdown-tokens header in the response tells the agent how much context the document will consume before it requests it, enabling smart context budgeting2. Content-Signal headers (ai-train, search, ai-input) tell the agent what use is permitted.
  • The Mittelstand pattern - For sites already on Cloudflare, enable the feature. For self-hosted docs, add a content-negotiation middleware that returns markdown when the Accept header requests it. Either is one engineering day.

MCP authorization for internal docs

  • What it is - The Model Context Protocol authorization specification, based on OAuth 2.1 with PKCE and RFC 8707 Resource Indicators11. MCP servers act as OAuth Resource Servers; agents present scoped, short-lived tokens to call them.
  • Why it matters for documentation - Internal docs need authentication. MCP gives you a single, standard, audit-able way to expose Confluence, SharePoint, SAP help, and DATEV manuals to agents without inventing per-source auth schemes. As of March 2026 MCP surpassed 97 million monthly SDK downloads12.
  • The 2026 enterprise priority - MCP’s 2026 roadmap names enterprise readiness as a top focus area, alongside transport evolution, agent communication, and governance maturation13. Audit trails, SSO-integrated auth, and gateway behaviour are the gaps being closed.
  • The Mittelstand pattern - One internal MCP server per major source (Confluence, SAP, DATEV). All behind the same OAuth 2.1 issuer. Identity propagation through the agent so the audit trail terminates at a human.

“Markdown has quickly become the lingua franca for agents and AI systems as a whole. AI agents could bypass the complexities of intent analysis and document conversion, and instead receive structured markdown directly.”

- Cloudflare engineering team announcing Markdown for Agents (April 2026)1

Want documentation your agents can actually read?

We help Mittelstand IT teams convert wikis, SAP manuals, and DATEV docs into agent-native markdown plus MCP servers - so the next agent rollout works the first time.

Book a Demo →
Single dark matte metal plate with a thin orange band wrapped around its edge, floating above a stack of identical plates - representing one chunk of documentation retrieved from an indexed corpus

A Reference Architecture for Mittelstand Agent-Native Docs

The components that have settled into place for production-grade agent documentation in 2026. None are novel; the value is in choosing a sensible combination for a Mittelstand context and wiring them so they survive the next reorganisation of the wiki.

  1. Single canonical markdown source - Whether stored in Git, in a headless CMS that exports markdown (Sanity, Strapi), or in Notion with markdown export. The source of truth is markdown; everything else is rendered from it.
  2. Static site generator for humans - Mintlify, Docusaurus, Astro, or Hugo. Renders the markdown to HTML for human readers. Critically, the same generator emits llms.txt and the per-page markdown endpoints automatically.
  3. llms.txt and llms-full.txt at the root - Generated automatically from sitemap and frontmatter. Updated on every deploy. Indexed by the agent platforms that look for it.
  4. Content-negotiation middleware - Returns markdown when Accept: text/markdown is requested, HTML otherwise. Adds x-markdown-tokens response header for context budgeting. Five lines of code on Cloudflare Workers, Vercel Edge, or any reverse proxy.
  5. MCP server for internal docs - One MCP server per major internal source: Confluence MCP, SharePoint MCP, SAP help MCP, DATEV MCP. All behind a single OAuth 2.1 issuer (Entra ID, Keycloak, Auth0).
  6. Identity-propagating gateway - The agent calls the MCP server with on-behalf-of-user tokens. The MCP server filters results by document-level ACL. The audit log records the user, the agent, the document, the chunks returned.
  7. Markdown export pipeline - Nightly job that pulls from Confluence, SharePoint, and DATEV through their REST APIs, runs html-to-markdown plus structural cleanup, and refreshes the markdown source. Handles screenshots by extracting captions and OCR.
  8. Quality monitoring - 20 to 50 canonical user questions run nightly through the agent, scored against curated correct answers. Regressions flagged by Slack or email. The metric you watch is “answer accuracy”, not “documents indexed”.
ComponentRecommended defaultSovereign-EU optionWhen to upgrade
Markdown sourceGit repo, branch-protectedSelf-hosted Gitea or GitLabAlways; non-negotiable
Static site generatorMintlify or DocusaurusAstro or Hugo, self-built llms.txtFrom day one
Markdown endpointCloudflare Markdown for AgentsSelf-hosted middlewareBefore second agent ships
MCP serverOutline, Confluence, Notion MCP serversSelf-hosted MCP wrappersFor any internal docs
AuthExisting Entra ID or OktaKeycloak self-hostedAlways; covered by your IdP
Quality monitoringCustom eval suite, 20-50 promptsSame, EU-hostedBefore first production rollout

The Mittelstand Documentation Stack: Confluence, SharePoint, SAP, DATEV

No two Mittelstand documentation estates look the same, but the source systems do. Five recurring patterns and how they translate into agent-native delivery.

Confluence (the deep wiki)

  • The pain - Five to fifty thousand pages, ten years of accumulated structure, deeply nested spaces, screenshots from old UIs, half the pages outdated, no clear owner per space.
  • The agent-native pattern - Confluence MCP server (open-source options exist on mcp.directory17) plus a nightly markdown export to Git for the “canonical” subset. The MCP server respects Confluence space and page permissions; the markdown export covers the 200 to 500 pages the business actually relies on.
  • The cleanup that pays off - Page consolidation: merge duplicates, archive zombies, fix anchors. A 5,000-page Confluence often collapses to 1,500 pages of actual canonical content; the rest is history that should be archived, not deleted.

SharePoint (the document graveyard)

  • The pain - Document libraries with thousands of Word, Excel, and PDF files. Multiple versions of the same SOP. Folder structures that reflect the org chart of 2018, not the org chart of 2026.
  • The agent-native pattern - SharePoint MCP server plus a markdown extraction pipeline that converts each Office document to markdown (pandoc, docx-to-md, custom OCR for scanned PDFs). The MCP server respects SharePoint permissions through Microsoft Graph.
  • The cleanup that pays off - Per-document ownership and last-reviewed dates. The agent should treat anything older than two years and unowned as suspect, and tell the user when it does.

SAP help and DATEV manuals (the vendor docs)

  • The pain - You cannot edit them. They are huge. They are written for SAP basis administrators or DATEV consultants, not for the staff using the system. The information you need is buried in transaction-code-specific notes.
  • The agent-native pattern - MCP wrapper around the vendor documentation. SAP MCP servers exist (K2View, custom)20; DATEV equivalents are emerging. Google launched a Developer Knowledge MCP server in early 2026 for exactly this category of problem14.
  • The cleanup that pays off - Curated overlay docs that capture “our way of doing X in our SAP” - the institutional knowledge that vendor docs do not contain. Stored in your own markdown source, served alongside the vendor docs, weighted higher in retrieval.

Notion and Coda (the new layer)

  • The pain - Rapid, easy to write, organic structure that becomes a maze within months. Heavy use of databases, embedded blocks, and inline mentions that do not export cleanly.
  • The agent-native pattern - Notion MCP server (Notion ships an official one) plus a markdown export pipeline for the canonical subset. The MCP server handles dynamic queries; the markdown export anchors the version-controlled source of truth.
  • The cleanup that pays off - Decide which Notion databases are reference material and which are working drafts. Only reference material goes through the agent; drafts stay human-only until promoted.

PDF SOPs and policy documents (the long tail)

  • The pain - Hundreds of PDFs, many scanned, many German-only, many with stamps and signatures. The richest source of policy truth and the worst format for agents.
  • The agent-native pattern - Batch OCR plus structured markdown extraction. Modern multimodal models (GPT-4o, Gemini 2.5, Claude 3.7) handle this well, but the output needs human review for the high-stakes policies. Store the resulting markdown alongside the original PDF; the PDF is the legal artefact, the markdown is what the agent reads.
  • The cleanup that pays off - Per-PDF ownership assignment. Anything legally binding gets a named owner who reviews the markdown extraction every six months. Anything historical can be left as-is and flagged read-only.

From PDF Manuals to Copy-Paste Specs: The Conversion Playbook

The conversion of human-first docs to agent-native docs is mostly tooling and a small amount of judgement. Six steps that take a Mittelstand wiki from blank to agent-ready in a single quarter.

  1. Pick the canonical 10 percent - Most wikis follow a power law: 10 percent of pages answer 80 percent of questions. Identify them through analytics, search logs, or by asking the support team. Convert these first; the long tail can wait.
  2. Export to markdown - Confluence and SharePoint have REST APIs that return XHTML; pandoc converts XHTML to markdown losslessly for 90 percent of cases. PDFs go through OCR plus markdown extraction. Notion ships a native markdown export.
  3. Strip the chrome - Remove navigation breadcrumbs, related-content widgets, share buttons, footer links, edit metadata. Keep only the actual content. A regex pass handles most of it; the rest is template-specific.
  4. Replace screenshots with structured text - For each screenshot, write a markdown table or numbered steps that capture the same information. Keep the screenshot for humans; the agent reads the text. AI-assisted alt text plus human review handles this at scale.
  5. Add stable anchors - Every H2 and H3 gets a hand-curated anchor: not just “heading-1” but “cancellation”. The Mintlify and Docusaurus toolchains generate slugs automatically; for hand-edited docs, add a frontmatter anchor field.
  6. Generate llms.txt and the markdown endpoints - One CI step. The static site generator emits both. Verify with curl. Done.

Conversion checklist for one wiki space

  • Top 50 pages identified through analytics or support tickets
  • REST API export to XHTML configured and scheduled
  • Pandoc or html-to-markdown pipeline producing clean markdown
  • Chrome strip regex applied and verified on five sample pages
  • Screenshot replacement decided per page (keep, replace, or both)
  • Hand-curated anchors added for top 50 pages
  • llms.txt and llms-full.txt generated and served
  • Markdown endpoint reachable via Accept header and /index.md
  • 20 canonical questions answered correctly by agent
  • Page-update-to-agent-refresh cycle measured (target under 1 hour)

What automation handles, what humans handle

  • Automation handles - HTML-to-markdown conversion, sitemap generation, llms.txt assembly, anchor slug generation, table extraction from well-structured HTML, alt text from images, OCR.
  • Humans handle - Page consolidation decisions, anchor renaming for stability, screenshot-to-table judgement calls, vendor-doc overlay writing, policy interpretation, Betriebsrat sign-off.
  • Both together - Quality evaluation: humans curate the 20-question test set, automation runs it nightly and flags regressions.

The 7 Documentation Pitfalls That Derail Mittelstand Projects

The pattern of failure is consistent enough to enumerate. Each is preventable; almost all happen in the first six months of agent rollout, usually because nobody owned the documentation question explicitly.

  1. Ship the agent before the docs are ready - The agent goes live on a Confluence that nobody converted. Users get wrong answers, trust collapses, the project is killed before the docs catch up. Fix: convert top 50 pages of relevant docs as a precondition to the agent going live, not as a follow-up.
  2. Maintain two sources of truth - The team writes in Confluence and separately maintains a markdown copy. They drift. The agent answers from yesterday’s markdown while the human reads today’s Confluence. Fix: one canonical source, two surfaces. Markdown source rendered to HTML for humans, served as markdown for agents.
  3. Public llms.txt with internal content in it - The team auto-generates llms.txt from the full sitemap and accidentally publishes internal docs to the open web. Fix: explicit allowlist for llms.txt; internal docs go through MCP, not the public index.
  4. Forget about screenshots and PDFs - The markdown looks great until the agent hits the policy document that exists only as a scanned PDF. Fix: PDF inventory in week one of the project; OCR plus structured markdown extraction in week two.
  5. Skip identity propagation in the MCP server - The MCP server returns documents without checking the calling user’s permissions. The agent answers questions a finance clerk should not be allowed to ask, with content from the management board space. Fix: OAuth 2.1 with on-behalf-of; the MCP server filters by the user’s ACL, not the agent’s.
  6. Treat the project as a one-off - The team ships the conversion, declares victory, and moves on. New pages get written in HTML-only. Within six months the agent is back to guessing. Fix: documentation guidelines updated, CI checks for markdown export, new-page checklist includes anchor and exclusion review.
  7. Measure indexed pages, not answer quality - The dashboard shows 5,000 pages indexed, the agent gives confident wrong answers. Fix: nightly evaluation suite with 20 to 50 canonical questions and curated correct answers. The number the dashboard shows is “answer accuracy”, not “documents ingested”.

Mittelstand-friendly documentation wins

  • Single markdown source rendered to two surfaces
  • llms.txt + markdown endpoint + MCP server in place from day one
  • Top 50 pages converted before the first agent ships
  • Identity propagation enforced in every MCP server
  • Nightly evaluation suite tracks answer accuracy, not page count

What to avoid even under deadline pressure

  • Shipping the agent on top of an unconverted wiki
  • Maintaining HTML and markdown copies in parallel
  • Auto-publishing llms.txt without an explicit allowlist
  • Letting screenshots and PDFs stay opaque to the agent
  • Treating documentation conversion as a one-time project

“The real turning point for AI in 2026 is not autonomy, but the maturity of infrastructure - where agentic runtimes, GPU efficiency, and organisational design will decide who wins.”

- McKinsey Technology, Reimagining Tech Infrastructure for Agentic AI22

GoBD, EU AI Act, GDPR, and the Betriebsrat

Agent-native documentation sits at the intersection of three German Mittelstand compliance regimes. None of them mention “agent-native docs” explicitly; all of them require it implicitly the moment an agent reads a regulated document.

EU AI Act

  • Article 4 (AI literacy) - Adequate AI literacy for everyone using or directing AI tools25. The team that writes documentation now writes for two audiences; the team that operates agents needs to understand what their docs surface contains. Literacy lands on documentation governance, not just prompt engineering.
  • Article 14 (human oversight) - Designed-in human oversight requires that humans can verify agent answers against source documents. Without agent-native docs, “trace the answer back to the policy” becomes impossible because the agent never read the policy directly.
  • Mittelstand action - One paragraph in your AI literacy training that explains the documentation surfaces (llms.txt, markdown endpoint, MCP server) and what each contains.

GDPR

  • Auftragsverarbeitung (Article 28) - Each external MCP server provider, each documentation hosting provider, each agent platform needs a signed DPA. Keep the list short; review annually.
  • Right of access and erasure (Articles 15 and 17) - When a user requests their data, you need to know which documents the agent retrieved on their behalf. Identity propagation through the MCP server is what makes this technically possible.
  • Mittelstand action - Identity propagation as a hard architectural requirement. The MCP audit log records the user, the document, the chunks. Erasure requests reach both the source system and the agent platform’s observability.

GoBD and tax-relevant documentation

  • The principle - Tax-relevant documents must remain unaltered, traceable, and reproducible. The July 2025 BMF letter clarified the rules for digitally maintained records26. Documents the agent reads to answer tax-relevant questions count as part of the documentation chain.
  • Mittelstand action - The original PDF or invoice is the GoBD-relevant artefact and must be archived in compliant storage. The markdown extraction the agent reads is a derivative; flag it as such, link back to the original, and keep both.
  • What auditors ask - Show me the source document the agent quoted, show me the trace that links the agent answer to that source, show me the change history of the markdown extraction. All three need to be on hand.

Betriebsrat

  • Why it matters - Internal documentation often includes works-council-relevant content (HR policies, performance criteria, time-recording procedures). Exposing those to an agent means employees may receive policy interpretations from a non-human source.
  • Mittelstand action - One-page agreement with the works council covering which documentation spaces are agent-readable, how policy interpretations get attributed, and the escalation path when an agent answer affects an employee right.

A 90-Day Implementation Roadmap

The work breaks into three 30-day sprints. By day 90 a Mittelstand team has the canonical 10 percent of its documentation agent-native, with llms.txt, markdown endpoints, and an MCP server in production for at least one internal source.

Days 0-30: Inventory, picks, and first markdown export

  • Inventory the documentation estate - Confluence spaces, SharePoint libraries, Notion databases, PDF SOPs, vendor docs (SAP help, DATEV manuals). One row per source, with size, owner, and last-reviewed date.
  • Pick the canonical 10 percent - The pages that answer 80 percent of questions. Identify through analytics, support tickets, or a 30-minute interview with the support lead.
  • Stand up the markdown source repo - Git repo, branch protection, two reviewers per merge. This is where the canonical markdown lives.
  • First export pipeline - One Confluence space, exported through REST API, converted with pandoc, committed to Git. Manual cleanup pass on the top 20 pages.
  • Set the policy - One-page document covering markdown-as-source-of-truth, who owns each space, what gets exported publicly versus internally, screenshot-to-table guidelines.

Days 31-60: Surfaces, MCP, and the first agent test

  • Static site generator and llms.txt - Pick Mintlify, Docusaurus, or Astro. Deploy to a preview environment. Verify llms.txt and llms-full.txt are generated and served correctly.
  • Markdown endpoint - Either Cloudflare Markdown for Agents or a custom middleware. Verify with curl that Accept: text/markdown returns clean content.
  • First MCP server - For one internal source (often Confluence or SharePoint). Behind the existing OAuth 2.1 issuer. Identity propagation tested end-to-end.
  • 20-question evaluation suite - 20 canonical user questions, curated correct answers, nightly run that scores the agent. The dashboard the team watches.
  • First agent end-to-end test - Pilot agent reads from the markdown endpoint plus the MCP server. Compare answer quality against the previous HTML-scraping baseline.

Days 61-90: Hardening, second wave, and the operating model

  • PDF and screenshot conversion - The long-tail content. OCR pipeline plus structured markdown extraction. Human review for the high-stakes policies.
  • Second MCP server - The next internal source. By now the platform pattern is repeatable; subsequent servers ship faster.
  • CI checks on new pages - Every new doc page goes through markdown export, anchor check, exclusion list check. The new-page workflow includes these gates.
  • Quality monitoring scaled - Evaluation suite expanded to 50 questions. Slack or email alerts on regressions. Quarterly review of correct-answer curation.
  • Document the operating model - One-page diagram covering source repo, generators, surfaces, MCP servers, identity propagation. The artefact the next team member starts with.

Day-90 minimum viable agent-native docs

  • Canonical 10 percent of docs in markdown source of truth
  • Static site generator producing both HTML and llms.txt
  • Markdown endpoint reachable via Accept header and /index.md
  • At least one MCP server fronting an internal source
  • OAuth 2.1 with on-behalf-of identity propagation
  • Per-document ACL respected by the MCP server
  • 20+ question evaluation suite running nightly
  • Page-update-to-agent-refresh cycle under 1 hour
  • Article 4 documentation-governance literacy delivered
  • Operating model diagram and one-page policy in place

How Superkind Fits Into Agent-Native Docs

Superkind builds custom AI agents for the Mittelstand with documentation treated as part of the architecture, not a separate workstream. We typically own the markdown conversion pipeline, the llms.txt and markdown endpoints, the MCP server design, and the identity-propagating gateway for the agents we ship.

  • Documentation conversion as part of the agent build - We do not ship an agent on top of unconverted docs. The canonical 10 percent is part of the same engagement, not a follow-up project.
  • Single markdown source, two surfaces - Source of truth in Git or your headless CMS, rendered to HTML for humans and to markdown plus llms.txt for agents. One source, no drift.
  • MCP servers for Confluence, SharePoint, SAP, DATEV - Off-the-shelf where one exists, custom where it does not. All behind your existing OAuth 2.1 issuer.
  • Identity propagation end-to-end - On-behalf-of-user tokens through every MCP server. The audit trail terminates at a human, not at the agent.
  • PDF and screenshot pipeline - OCR plus structured markdown extraction. Human review on the high-stakes policies. The original artefact preserved for GoBD.
  • Quality monitoring as a service - 20 to 50 canonical questions, nightly evaluation, quarterly curation review with your subject-matter experts.
  • Compliance baked in - EU AI Act Article 4 literacy materials covering documentation governance, GDPR Article 28 DPAs for any external MCP provider, GoBD source-document linking.
  • Operating model handover - We do not leave you with a black box. The CI checks, the eval suite, the new-page workflow, the diagram - all transferred to your team for the long-term run.

When Superkind is the right partner

  • You have a Confluence, SharePoint, or DATEV estate that agents cannot read
  • Your IT team is small and the markdown pipeline feels overwhelming
  • Your agents need to integrate with SAP, DATEV, or German-only PDFs
  • Compliance and Betriebsrat alignment matter from day one
  • You want one documentation pattern that scales across many agents

Where you might prefer a different option

  • You have a 10-engineer documentation team building this themselves
  • Your docs are already in markdown with stable anchors
  • Mintlify or Docusaurus already covers everything you need
  • You want a black-box SaaS with no integration into your systems

Decision Framework: Are Our Docs Actually Agent-Ready?

A six-dimension check that helps a Mittelstand IT lead and Geschäftsführer answer the readiness question in one steering session, with a curl command instead of a vendor presentation.

DimensionNot readyReady to scaleAudit-grade
Source formatHTML rendered from CMSMarkdown plus HTMLMarkdown canonical, HTML derived
DiscoverySitemap onlyllms.txt publishedllms.txt + llms-full.txt + MCP list
Markdown deliveryNone.md URL suffixAccept header + .md + x-markdown-tokens
Internal docsBehind login, no agent pathMCP server, basic authMCP + OAuth 2.1 + on-behalf-of
Quality monitoringNoneManual spot-checkNightly eval suite, 50 questions
Refresh cycleDaysHoursUnder 60 minutes

Most Mittelstand firms in 2026 land in the “not ready” column on four or more dimensions. The right answer is not to wait. The right answer is to fix the not-ready columns as the first 90 days of agent-native docs, in parallel with the first agent rollout, not as a precondition that delays both.

Frequently Asked Questions

Agent-native documentation is written and served so an AI agent can ingest it directly without parsing HTML, navigating menus, or interpreting screenshots. The same content can still be rendered for humans, but the canonical source of truth is structured markdown with stable anchors, semantic headings, and a machine-addressable index. Regular documentation assumes a human will click through a navigation tree, read context from sidebar widgets, and infer the right next step from layout cues - none of which an agent can reliably do.

Both. By April 2026 Anthropic, Stripe, Zapier, Cloudflare, Vercel, and most developer-tool documentation sites publish an llms.txt index. According to Profound data, AI agents fetch llms-full.txt files more than twice as often as the lighter llms.txt index. The standard does not improve traditional SEO, and OpenAI, Anthropic, and Perplexity have not committed to read it automatically - so it is best treated as an optimisation for agents that already know how to look for it, not a magic ranking signal.

Their measurement compared one of their own blog posts rendered as HTML against the same content as markdown: 16,180 tokens versus 3,150 tokens. The savings come from stripping div wrappers, navigation chrome, script tags, and styling boilerplate that have no semantic value for an agent. A single H2 heading costs roughly three tokens in markdown versus twelve to fifteen in HTML. For a Mittelstand wiki the practical effect is more relevant context fitting in the same context window, lower per-call cost, and faster retrieval.

llms.txt is a static, public index file: an agent fetches it once and reads markdown URLs from it. An MCP server is a live, callable interface: the agent makes a request and the server returns content, often filtered to the user identity and the question. llms.txt is right for public docs an agent can crawl. MCP is right for internal docs, search, and any content that needs authorisation, freshness, or query parameters. Most production Mittelstand stacks end up using both.

Yes. The pattern that has stabilised is a single markdown source of truth that renders for humans through a static site (Mintlify, Docusaurus, Hugo, Astro) and is exposed for agents through llms.txt and an MCP server. You do not maintain two versions; you maintain one canonical source and serve it through two surfaces. The cheap way to know whether your docs are agent-native is the curl test: curl with Accept: text/markdown, see what comes back, decide if an agent could do its job with that.

Internal docs do not belong in a public llms.txt. The Mittelstand pattern is an internal MCP server that fronts your Confluence, SharePoint, or DATEV manuals, with OAuth 2.1 authorisation, on-behalf-of-user identity propagation, and per-document access controls. The agent only ever sees the chunks the calling user is allowed to see. German-only content is fine; modern LLMs handle German natively, and the structural improvements (headings, tables, anchors) work in any language.

Less than most teams expect, because most of the work is automatable. Confluence exports through its REST API to a tree of XHTML, which converts to markdown with off-the-shelf tooling (pandoc, html-to-markdown). The non-automatable work is the structural cleanup: stable anchors, page consolidation, dead-link removal, screenshot replacement with described tables. A typical 5,000-page Mittelstand wiki lands at 6 to 10 engineering weeks for the first agent-ready cut, then a tooling pass to keep it that way as new pages get written.

They are the single biggest blocker. An agent that cannot see the screenshot has to guess what the visual was conveying, and guesses are how production incidents start. The fix is dual-track: keep the screenshot for humans, add a structured markdown table or numbered steps next to it that capture the same information in text. As a forcing function, write the markdown first; if the screenshot still adds value after, keep it. Most do not.

For vendor docs you cannot edit, the answer is an MCP server in front of them, not a rewrite. Google launched a Developer Knowledge MCP server in early 2026 for exactly this reason: a canonical machine-readable gateway to a vendor catalogue. The same pattern works for your SAP documentation (K2View, SAP MCP), DATEV manuals (custom MCP wrapper), and any third-party knowledge base. The agent calls the MCP server, the MCP server returns chunked markdown, the original vendor docs stay untouched.

Article 4 (AI literacy) requires adequate competence for everyone using or directing AI tools. Agent-native documentation is half the literacy story: the team that writes docs needs to understand they are now writing for two audiences, and the team that operates agents needs to understand which docs the agent has access to and which it does not. The literacy obligation lands cleanly on documentation governance, not just on prompt engineering.

Three things compound. First, agents waste 60 to 80 percent of their context budget on HTML chrome, leaving less room for real content. Second, retrieval quality degrades because HTML structure confuses chunking and embedding. Third, the agent silently hallucinates the parts it could not find, and your team trusts the answer because the agent sounds confident. The Vercel Markdown for Agents data and LangChain benchmarks both showed structured markdown plus llms.txt outperforming vector search and context-stuffing approaches significantly.

Four practical metrics. First, the curl test: does Accept: text/markdown return clean structured content? Second, the token ratio: how many tokens does an HTML page cost versus its markdown equivalent? Third, the answer test: pick 20 real user questions, run them through your agent, score the answers against your own experts. Fourth, the freshness test: when a doc page is updated, how long until the agent reflects it? Anything over an hour is too slow for production.

Related Articles

Sources

  1. Cloudflare - Introducing Markdown for Agents (April 2026)
  2. Cloudflare - Markdown for Agents Documentation Reference
  3. Cloudflare - Docs for Agents
  4. ALM Corp - Cloudflare Markdown for Agents: Complete Technical Guide and 80% Token Reduction
  5. Mintlify - What is llms.txt? Breaking Down the Skepticism
  6. Mintlify - Real llms.txt Examples From Leading Tech Companies
  7. llms.txt Specification - Original Proposal by Jeremy Howard
  8. Anthropic - llms.txt Index for Documentation
  9. Vercel - llms.txt for Documentation
  10. Stripe - llms.txt Index
  11. Anthropic - Model Context Protocol Authorization Specification
  12. Model Context Protocol - 2026 MCP Roadmap
  13. WorkOS - MCP 2026 Roadmap Makes Enterprise Readiness a Top Priority
  14. Google Developers Blog - Introducing the Developer Knowledge API and MCP Server
  15. Microsoft Learn - MCP Server Overview
  16. Stack Overflow Blog - Authentication and Authorization in Model Context Protocol
  17. mcp.directory - Best Documentation MCP Servers: Outline, Confluence, Notion and More
  18. Docsie - MCP Docs Integration: What Actually Works in 2026
  19. Document360 - Cloudflare Markdown for AI Agents: Limits and Better Approach
  20. K2View - SAP MCP: Unlocking SAP Data Access for AI Agents
  21. Kai Waehner - Data Ownership in the Age of Agentic AI: SAP API Policy Forces a Reckoning
  22. McKinsey - Reimagining Tech Infrastructure for Agentic AI
  23. Karpathy - 2025 LLM Year in Review
  24. Karpathy - Sequoia Ascent 2026 Summary
  25. EU AI Act - Article 4: AI Literacy
  26. BMF - GoBD Letter (July 2025)
  27. Bitkom - KI im Mittelstand 2026 Studie
  28. Postman - The New Postman: AI-Native and Built for the Agentic Era
Henri Jung, Co-founder at Superkind
Henri Jung

Co-founder of Superkind, where he helps SMEs and enterprises deploy custom AI agents that actually fit how their teams work. Henri is passionate about closing the gap between what AI can do and the value it creates in real companies. He believes the Mittelstand has everything it needs to lead in AI - it just needs the right approach.

Ready to make your wiki something agents can actually read?

We help Mittelstand IT teams convert Confluence, SharePoint, and DATEV docs into agent-native markdown plus MCP servers - so the next agent rollout works because it can read what your team wrote. Talk to Henri about what your docs should look like.

Book a Demo →