Software 3.0 in the Mittelstand: Why Programming Is Now Prompting - and What That Means for Your IT Strategy

2 May 202636 min read

Co-founder at Superkind

Industrial dark matte typewriter with a fresh sheet of paper and an orange platen knob, representing English as the new programming language

Sometime in late 2025, your most curious engineer stopped writing code. They still ship more than ever, but the file they keep open is not an IDE - it is a chat window with a frontier model and a long, careful prompt that doubles as the spec. The hottest new programming language in your company is suddenly English.

That is the headline of what Andrej Karpathy calls Software 3.0. At YC’s AI Startup School in June 2025 he laid it out plainly: large language models are a new kind of computer, you program them in English, and they deserve a major version upgrade in how we think about software¹. Software 1.0 was the code humans wrote. Software 2.0 was the weights of neural networks trained on data. Software 3.0 is prompts in natural language directing an LLM⁴. All three layers now coexist inside the same products.

For the German Mittelstand the strategic question is no longer whether this matters. Bitkom reported in February 2026 that 41 percent of German companies actively use AI, up from 17 percent only two years earlier⁹. Gartner forecasts 40 percent of all enterprise apps will feature task-specific AI agents by the end of 2026, up from less than 5 percent at the start of 2025¹⁴. The question is what an IT strategy looks like when the building blocks of software shift this fast - and what a Mittelstand-sized company should actually do on Monday morning.

TL;DR

Programming is becoming prompting - Karpathy’s Software 3.0 thesis is that LLMs are a new kind of computer programmed in English. Software 1.0 (code), 2.0 (weights), and 3.0 (prompts) now coexist in every modern app.

The context window is the new source code - the prompt plus retrieved data plus tools plus memory is the program. Gartner is already telling CIOs to lead the shift to context engineering as prompt engineering fades.

The Mittelstand has an asymmetric advantage - 41 percent of German firms now use AI (Bitkom 2026), but the IT staffing shortage is permanent and the labour market shrinks 3.9 million by 2030. Software 3.0 is the only lever that closes both gaps inside one planning cycle.

Some apps will be eaten by the model - thin-logic SaaS (basic OCR, simple form generators, single-purpose tools) is being replaced as a single chat session does the same job. Heavy-workflow SaaS is safe for years.

Jagged intelligence forces a domain-by-domain trust map - the same LLM refactors a 100,000-line codebase brilliantly and then makes a basic logic error. Treat the agent as a brilliant intern with perfect API recall and reliably odd blind spots.

The 12-month Mittelstand budget is 150,000 to 400,000 euros for an LLM gateway, observability, a 1-2 FTE platform team, governance, and the runway to convert the first 5 to 10 winning prototypes into production agents.

The Software 3.0 Shift Has Arrived in the Mittelstand

Most German IT leaders still treat generative AI as one item on a long roadmap. The data argues for a different framing: a generational shift in how software is built, already inside the company.

41 percent of German companies actively use AI - Bitkom’s February 2026 study found 41 percent of German firms now use AI in production, up from 17 percent in 2024 and 9 percent in 2022⁹. The two-year doubling is faster than cloud adoption was at the equivalent stage.
48 percent more are planning - The same Bitkom study reports another 48 percent of companies are actively planning AI deployment, leaving only 11 percent who say they have no plans⁹. Inactivity is now the minority position.
SMEs are catching up but not caught up - Adoption climbs above 60 percent for German firms with 500-plus employees while remaining lower for the classic Mittelstand bracket⁹. The asymmetry is exactly the gap Software 3.0 can close.
40 percent of enterprise apps will have task-specific agents - Gartner predicts 40 percent of enterprise applications will feature task-specific AI agents by year-end 2026, up from less than 5 percent at the start of 2025¹⁴. The base of installed software is being rewritten under your feet.
80 percent of engineers will need to upskill - Gartner expects 80 percent of the engineering workforce to upskill through 2027 to remain effective in an LLM-centric stack¹⁵. The half-life of yesterday’s engineering practice has fallen sharply.
149,000 IT positions remain unfilled - Bitkom Akademie reports 149,000 unfilled IT roles in Germany, with developers and architects the most sought-after profiles¹⁰. The labour math forces Software 3.0 even on companies that would have preferred to wait.
The workforce shrinks by 3.9 million - The OECD projects a 3.9 million decline in the German working-age population by 2030²³. There is no version of the next decade where IT teams grow fast enough to keep up using Software 1.0 alone.

Key Data Point

The Bitkom study released in February 2026 shows German AI adoption doubling roughly every two years, reaching 41 percent of all firms with growing investment momentum⁹. The Mittelstand is no longer experimenting at the edge - it is operationalising in the middle.

The structural pressure is what makes the Mittelstand context distinct. A 200-person firm with two and a half people in IT cannot simply hire its way out of an exploding internal-software backlog. Software 3.0 collapses that backlog in a way no previous tooling shift did, because the unit of building is now a written description rather than a sprint.

Indicator	2026 Reality	Source
German firms actively using AI	41% (up from 17% in 2024)	Bitkom 2026⁹
German firms planning AI	48% (additional)	Bitkom 2026⁹
Adoption at 500-plus employee firms	Above 60%	Bitkom 2026⁹
Enterprise apps with task-specific agents	40% by year-end 2026 (from <5% in early 2025)	Gartner¹⁴
Engineers needing upskilling	80% through 2027	Gartner¹⁵
Unfilled IT roles in Germany	149,000	Bitkom Akademie¹⁰
Working-age population decline by 2030	3.9 million	OECD²³

“LLMs are a new kind of computer, and you program them in English. Hence I think they are well deserving of a major version upgrade in terms of software.”

- Andrej Karpathy, founding member of OpenAI and former Director of AI at Tesla, on X about his YC AI Startup School talk, June 2025²

What Software 3.0 Actually Is (and What It Is Not)

The label is shorthand for a specific architectural worldview. Three layers now coexist inside almost every modern application, each with its own programming model.

The three layers, side by side

Software 1.0 - Explicit code written by humans in Python, Java, ABAP, C#. The runtime is the CPU and the operating system. The unit of work is the function. This is still the right layer for deterministic logic, calculations, integrations, and anything safety- or audit-critical.
Software 2.0 - Programs expressed as the weights of a neural network trained on data, originally framed by Karpathy in 2017³. The runtime is a GPU executing matrix multiplications. The unit of work is the model. Vision systems, fraud scoring, recommendation engines, and predictive maintenance models live here.
Software 3.0 - Programs expressed as natural-language prompts plus context, executed by a frontier LLM. The runtime is the LLM, the context window is the working memory, and the “source code” is the prompt and its supporting data, tools, and examples⁴. This is where most of the value of the next IT cycle will be authored.

Layer	Programming language	Runtime	Strength	Weakness
Software 1.0	Python, Java, ABAP, C#	CPU + OS	Deterministic, auditable	Slow to write, brittle to change
Software 2.0	Training data + architecture	GPU	Pattern recognition at scale	Opaque, expensive to retrain
Software 3.0	English (or German) + context	LLM as host process	Fast to author, broad coverage	Probabilistic, jagged at the edges

What Software 3.0 looks like in practice

A concrete example. A Mittelstand machinery firm wants to handle inbound spare-parts emails: parse the customer’s message, look up the part by description against an SAP catalogue, generate a quote, and reply in fluent German. The Software 1.0 version is a Python service with a parser, an SAP connector, a quote generator, and an email integration - measured in months of engineering. The Software 3.0 version is a 600-word prompt plus an MCP server that exposes the SAP catalogue and the quoting API to a frontier model. The first working version ships in days. The remaining work is evaluation, guardrails, and the small Software 1.0 layer that actually sends the email and writes the audit log.

The defining shift

In Software 3.0 the Mittelstand IT department writes fewer functions and more specifications. Karpathy’s framing is that humans now design and the model fills in the implementation. The strategic implication is that the bottleneck for new internal software is shifting from engineering capacity to spec quality.

What Software 3.0 is not

Not the end of code - The 1.0 and 2.0 layers stay. They get smaller in volume but more important per line, because they are what the LLM calls when reliability matters.
Not vibe coding - Vibe coding is one consumer-grade expression of Software 3.0 for citizen developers⁵. Software 3.0 also includes agentic engineering for production systems, the LLM-as-OS pattern for new products, and the rewrite of internal tools that historically lived in Excel.
Not a chatbot strategy - The chat interface is a thin demo of what Software 3.0 can do. Most production value lives in agents that operate over data and APIs without anyone asking them anything.
Not a vendor decision - Choosing a model (GPT, Claude, Gemini, Mistral) is a tactical question. Choosing to organise your IT around English-as-engineering-interface is the strategic one. The model swaps every six months; the operating model does not.

Why Software 3.0 Hits the Mittelstand Differently

Large enterprises will adopt Software 3.0 cautiously because they have the IT capacity to ship long-tail tools the slow way and the reputational risk to move carefully. The Mittelstand has neither luxury. Three structural pressures make Software 3.0 more strategic in a 200-person firm than in a 20,000-person one.

The IT staffing shortage is structural, not cyclical - Bitkom Akademie reports 149,000 unfilled IT positions in Germany, with developers, architects, and security specialists most in demand¹⁰. The DIHK separately reports Germany needs 300,000 skilled foreign workers per year just to maintain current staffing²². There is no plausible path where Mittelstand IT teams grow fast enough to absorb the workload using Software 1.0 alone.
The process knowledge sits in the business - The deepest Mittelstand strength is operational expertise concentrated in domain experts who have spent decades learning the work. Software 3.0 lets that expertise become software directly, with the controller, the production planner, or the service dispatcher writing the spec rather than translating it through a ticket system.
EU AI Act compliance is now table stakes - Article 4 of the EU AI Act obliges every German employer to ensure adequate AI literacy for everyone using AI tools²⁵. The compliance work is identical for a firm that has 5 production agents and one that has 50, so the per-agent overhead drops sharply at higher volumes. Software 3.0 makes the volume strategy work.
Existing investments become more valuable, not less - The 25-year SAP investment, the deep DATEV integration, the ten years of cleaned-up SharePoint, the institutional contracts knowledge - these become the substrate the agent layer feeds on. Bitkom’s 2026 study shows German firms with strong data foundations capture roughly twice the AI value of those without⁹.
The competitive gap compounds quickly - A Mittelstand firm shipping 2 internal tools a quarter against a peer shipping 20 closes its operational gap inside 18 months. McKinsey reports that high-AI-performers are nearly three times as likely to have fundamentally redesigned individual workflows¹⁶. The compounding is faster than any other digital transformation lever.

The Mittelstand asymmetry

The same Software 3.0 stack is more valuable to a 200-person Mittelstand firm than to a 20,000-person multinational, because the multinational already has the IT capacity to grind through the long tail and the Mittelstand does not. The cost of standing still is therefore higher in the smaller firm, not the larger one.

The Mittelstand-specific failure modes

The same conditions that make Software 3.0 powerful also create predictable Mittelstand traps. Three are worth naming up front.

Treating it as a tool purchase - Buying Copilot licences for everyone and calling it strategy. The licences sit unused at 64 percent of seats according to Microsoft’s own usage data, because the operating model never changed. Software 3.0 is an operating-model shift, not a SKU.
Banning it instead of governing it - The Geschäftsführer hears about a vibe-coded sales tool and orders a moratorium on AI tools across the company. The work moves to private accounts and shadow IT, the audit trail vanishes, and the Betriebsrat conversation gets harder. Governance survives a ban does not.
Outsourcing the spec-writing - Hiring an agency to write the agent prompts. The agency leaves, the spec rots, the agent breaks the next time the SAP schema shifts, and the institutional knowledge that matters never gets internalised. Spec writing is a permanent skill, not a project deliverable.

The Context Window Is the New Source Code

The single sharpest practical implication of Software 3.0 is that the context window - what goes into the LLM at the moment of execution - is the program. Gartner explicitly told CIOs in early 2026 to “lead the shift to context engineering as prompt engineering fades”¹³. IBM frames context engineering as the discipline of structuring what information to include and how to format it so the LLM can use it correctly¹⁹. For a Mittelstand IT leader, this is the most important new skill to invest in.

What lives in a production context window

System prompt - The role definition, behavioural rules, brand voice, and refusal policy. This is the closest thing to traditional source code in Software 3.0 and should live in version control with code review.
Retrieved context - The relevant documents, ERP records, customer history, contracts, manuals, or knowledge-base entries pulled in for this specific task. Retrieval quality is now a first-order engineering concern.
Tools and APIs - The set of actions the model is allowed to take in this run, defined as schemas. This is where SAP, DATEV, Salesforce, ServiceNow, your custom APIs, and increasingly MCP servers plug in²⁰.
Examples - Few-shot examples of correct behaviour, especially for edge cases. In a Mittelstand context this often includes the “how we do it here” conventions that distinguish the firm from generic best practice.
Memory - Carryover from prior sessions, user preferences, and learned facts about the customer or process. Memory design is one of the most under-engineered parts of most Mittelstand agents.
The user query - The actual instruction or question for this run. Often the smallest part of the context window in production.

Context layer	Software 1.0 analogue	Owner	Change frequency
System prompt	Source code	Platform team	Weeks-months
Retrieved context	Database query results	Data + retrieval team	Per request
Tools and APIs	Library imports	Integration team	Months
Examples	Unit tests as docs	Domain expert + platform	Months
Memory	Session storage	Platform team	Per session
User query	Function arguments	End user	Per request

Why context engineering beats prompt engineering

Prompt engineering is what you do when you assume the prompt is the program. Context engineering is what you do when you accept that the model itself is fixed and the lever is what you put in front of it. Three reasons the Mittelstand should bias toward context engineering as the primary craft.

Context survives model swaps - The frontier model changes every six months. A well-engineered context (clean retrieval, well-named tools, clear examples) ports across providers. A clever prompt that exploited a specific model’s quirks does not.
Context is auditable - For EU AI Act and audit purposes, you want to be able to show what the model saw at the moment it made a decision. Retrieved context, tool definitions, and examples are auditable artefacts. Prompt-engineering tricks are not.
Context scales with your company - Better internal data, cleaner SharePoint, better-defined APIs all make every agent better at once. Prompt-engineering improvements do not compound the same way.

The context engineering rule

If you can move work from the prompt into the context, do it. If you can move work from the LLM into a tool call, do it. Prompts should describe intent and policy. Tools should do the deterministic work. Context should provide the truth.

Want a Software-3.0 operating model for your IT?

We help Mittelstand IT teams design the agent runtime, context engineering layer, and governance that turn a fast experiment into a reliable production agent.

Book a Demo →

Stacked dark matte industrial disks ascending in size with one orange ring, representing the layered context window of a production Software 3.0 agent

When the Neural Net Eats Your App

The hardest part of Software 3.0 strategy for the Mittelstand is licensing. Some of the SaaS contracts you renewed last year are paying for capabilities the next frontier model will absorb in a single chat session. A clear-eyed view of which categories are at risk, which are safe, and which are still genuinely hard is now part of the IT-strategy conversation.

Categories at risk of being absorbed by the model

Basic OCR and document parsing - Single-purpose OCR tools that extract text from invoices, receipts, or forms are increasingly outperformed by frontier multimodal models in a single API call. The Mittelstand IDP category is in active disruption.
Generic image generation tools - Standalone “create an image from text” products lose to the same capability inside Gemini, ChatGPT, or Claude. The MenuGen pattern (a small app whose entire value can be replicated by a single multimodal prompt) generalises.
Single-purpose form generators - Tools whose entire value is “turn this description into a form” or “turn this spec into a one-page web app” compete with Lovable, v0, and the inline app generation now baked into Power Apps and Copilot Studio.
Lightweight transcription and summarisation - The category of buy-this-thing-that-summarises-meetings is collapsing into the meeting platform itself, and the Mittelstand is paying twice for capabilities that overlap.
Generic translation tools - High-quality translation is now a feature of every frontier LLM. Specialised translation SaaS still wins on terminology management and certified workflows but the unit cost of basic translation is approaching zero.

Categories that stay safe (for now)

Heavy-workflow ERP and accounting - SAP, DATEV, Lexware, Salesforce - the value is in the workflow, the data, the regulatory plumbing, and the integration network. The LLM will operate over them, not replace them.
Compliance-bound systems of record - HRIS, payroll, e-invoicing, GoBD-compliant archives. The audit guarantees are the product. The LLM is one more user of the system, not its successor.
Industry-specific verticals with deep domain integration - MES systems on the shop floor, plant maintenance platforms, fleet management for service organisations. The hardware integration is the moat, not the UI.
Established collaboration suites - Microsoft 365, Google Workspace, Atlassian. The LLM gets added, not subtracted. The interesting question is whether your firm uses the AI features that already came with the seat.

The Mittelstand build-vs-buy question, rewritten

The classical question was “buy where the vendor is good, build where you are different.” In a Software 3.0 world the question becomes three.

Is the vendor’s value mostly UI on top of an LLM? If yes, you can probably build the same thing in days against a frontier model and get a tool that fits your workflow exactly. The MenuGen pattern.
Is the vendor’s value the workflow, data, or compliance network? If yes, keep buying. The agent layer goes on top.
Is the vendor itself becoming agent-native? If they are exposing MCP servers, structured tools, or evaluation harnesses, they are in your future stack. If they are still selling a chat box bolted onto an old SaaS, they are not.

Software category	Risk of being absorbed	Mittelstand action
Basic OCR	High	Consolidate into agent stack
Single-purpose form/app generators	High	Use Lovable / v0 / Copilot inline
Generic translation	Medium	Keep for certified flows; drop generic seats
Meeting summarisation point tools	High	Use the seat you already have
SAP, DATEV, Lexware	Low	Wrap with agent layer
HRIS, payroll, e-invoicing	Low	Keep, expose via MCP/APIs
MES, plant maintenance, fleet	Low	Keep, integrate with agents
Microsoft 365 / Workspace	Low	Use the AI you already pay for

Jagged Intelligence and the Trust Question

The hardest practical truth of Software 3.0 is that the same model can be brilliant and stupid in the same hour. Karpathy uses the term jagged intelligence to describe LLMs that “can both perform extremely impressive tasks while simultaneously struggle with some very dumb problems”⁴. Models trained with reinforcement learning on verifiable domains (code, mathematics, structured reasoning) spike in capability there and remain rough at the edges where no such verification signal exists.

What jagged intelligence looks like in a Mittelstand context

Brilliant at code-shaped tasks - Refactoring a 100,000-line internal codebase, generating SQL against a well-defined schema, converting between data formats, parsing unstructured PDFs into clean records.
Reliably good at writing-shaped tasks - Drafting customer emails, summarising meeting transcripts, translating, generating quote text from a structured input.
Mixed at decision-shaped tasks - Recommending which supplier to choose, prioritising a service queue, scoring leads. Capability depends heavily on context quality.
Erratic at common-sense edges - Confident wrong answers about physical-world questions, unit conversions, or things that depend on local context the model has not been told.
Outright bad at undefined work - Anything where the success criterion was not in the prompt. The model will optimise for whatever it can measure, often the wrong thing.

The trust map principle

There is no such thing as a single trust setting for an LLM. Trust is per domain, per task type, and per consequence of failure. The Software-3.0-mature Mittelstand IT team maintains a trust map that names which agents may take which actions autonomously, which need a human in the loop, and which are read-only by policy.

The Mittelstand trust map (a starting template)

Task category	Trust level	Default mode	Human checkpoint
Internal drafting (emails, summaries)	High	Suggest	Sender approval
Code generation (internal tools)	High	Generate, run tests	Engineer review before prod
Data extraction (invoices, contracts)	Medium-high	Extract + confidence score	Human review when low confidence
Customer-facing reply (B2B)	Medium	Draft	Account manager approval
Booking, ordering, financial actions	Low without policy	Propose, do not execute	Named approver per amount
Hiring, credit, safety decisions	Never autonomous	Decision support only	Always human (EU AI Act)

“Demo is works.any(), product is works.all(). The gap between a demo and a product in the AI era is the difference between getting it right once and getting it right every time.”

- Andrej Karpathy, on shipping LLM-based products, summarised in Latent Space coverage of his Software 3.0 talk⁴

What this means for evaluation

The natural-language interface seduces teams into shipping agents that “feel right” in a demo. The 80-20 work that makes Software 3.0 actually production-grade is evaluation. Three minimum practices the Mittelstand should adopt for any production agent.

A frozen evaluation set - 30 to 100 representative inputs with known correct answers, run on every release. No agent goes to production without one.
An LLM-judge harness - A second model scoring the production agent’s outputs against rubric criteria. Not perfect, but consistent enough to catch regressions and cheaper than human review at volume.
Human-spot-check sampling - 1 to 5 percent of production runs reviewed by a domain expert weekly. The qualitative signal that the rubric misses lives here.

7 IT-Strategy Decisions That Change in a Software 3.0 World

Most Mittelstand IT strategies were written when Software 1.0 was the only paradigm and Software 2.0 was a research curiosity. Seven concrete decisions deserve a fresh look once Software 3.0 is on the table.

Decision 1: The build-buy-rent line moves

Old line - Buy where the vendor is good, build where you are different.
New line - Buy where the vendor owns the workflow and data, build the agent layer that operates over it, and avoid renting any pure-LLM-on-top SaaS unless the vendor is genuinely agent-native.
Mittelstand action - Audit current SaaS spend through the absorption-risk lens. Cut renewals that are paying for thin LLM wrappers your platform team can replicate in a sprint.

Decision 2: The platform team gets renamed and re-funded

Old shape - Infrastructure team running Kubernetes, network, identity, observability.
New shape - Same plus the LLM gateway, model catalogue, prompt and context registry, evaluation harness, MCP servers, and agent runtime. This is the platform that determines how fast every team can ship Software 3.0.
Mittelstand action - Add 1 to 2 FTEs to the platform team specifically for the agent stack. This is the single highest-leverage hire of 2026.

Decision 3: Spec design becomes a first-class skill

Old shape - Business analysts translate requirements into PRDs, engineers translate PRDs into code.
New shape - Senior people in every department write specs that are detailed enough to be agent-executable. The spec is the new unit of value.
Mittelstand action - Run a 2-day spec-writing workshop for the top 30 senior people in the firm. Pair the best spec writers with the platform team to land the first 5 production agents.

Decision 4: Hiring rebases on demonstrated agent fluency

Old signal - Whiteboard puzzles, algorithm questions, language trivia.
New signal - Ship a real working project under time pressure with full agent access; defend the design decisions in person.
Mittelstand action - Refactor the engineering interview within Q3 of 2026. Add a 90-minute agent-orchestration exercise. Drop the puzzles.

Decision 5: The architecture review board gets a trust-map mandate

Old mandate - Approve technology choices, integration patterns, security exceptions.
New mandate - All of the above plus maintain the trust map: which decisions agents may take autonomously, which need a human, which are forbidden. This is now Geschäftsführer-visible.
Mittelstand action - Add the trust map as a standing item in the IT steering committee. Review quarterly with Compliance and the Betriebsrat.

Decision 6: Data quality and metadata become a top-three IT investment

Old framing - Data quality is a BI problem.
New framing - Data quality is the substrate that every agent feeds on. Bad SAP master data, messy SharePoint, unlabelled documents - these directly cap what your agents can do.
Mittelstand action - Fund a 90-day data clean-up sprint per major source system in 2026. Pair it with MCP server publication so agents can consume the cleaned data.

Decision 7: Compliance moves left, not right

Old shape - Compliance reviews go-live decisions and audits annually.
New shape - Compliance is wired into the agent runtime. Every prompt, every tool call, every output is logged with the context the model saw. Article 4 literacy, Article 14 oversight, and audit trails are platform features, not paperwork.
Mittelstand action - Fund the observability layer in the platform team. Pre-commit to the audit story before shipping the first production agent.

The Software-3.0-Native Operating Model

The seven decisions above add up to a coherent operating model. Most Mittelstand IT teams already have most of the pieces; the work is recombining them around the LLM as host process and the prompt-plus-context as the new unit of authoring.

The five layers of a Software-3.0 stack

Model layer - The frontier LLMs you use, accessed through a single internal gateway. Multi-provider by default (OpenAI, Anthropic, Google, Mistral, plus a sovereign EU option) with a per-task routing policy. Versioned and observable.
Context layer - The retrieval, MCP servers, tool definitions, prompt registry, and memory store that supply the model with the right inputs at the right time. This is where most of the Mittelstand-specific value lives.
Agent runtime - The orchestration layer that runs multi-step agents, handles retries, enforces guardrails, logs to the observability store, and integrates with human-in-the-loop checkpoints.
Evaluation layer - Frozen eval sets, LLM-judge harnesses, sampling tools, drift detection, regression dashboards. The closest analogue to a CI/CD test suite for non-deterministic systems.
Governance layer - Trust map, AI literacy training, audit logging, EU AI Act mapping, Betriebsrat alignment, BSI considerations. Not a separate function - a horizontal layer cut into all four above.

The team that runs it

Three new roles are enough to start, none of them senior to existing ones. A 200-person Mittelstand firm can begin with 1.5 to 2 FTE in total.

Platform engineer (agent stack) - Owns the LLM gateway, the agent runtime, the MCP servers, and the evaluation infrastructure. Senior engineer with strong product sense.
Spec lead - Senior person from product, operations, or strategy who works with domain experts to write the agent specs. Not necessarily an engineer; must be a structured writer.
Evaluator - Builds eval sets, owns the LLM-judge harness, samples production traffic, surfaces regressions. Often a QA engineer or analyst rotated into the role.

Strengths of the Software-3.0 operating model

10x faster iteration on internal tools and agents
Domain experts can ship working software directly
Multi-provider model strategy survives price drops
Compliance is wired in at the platform layer
Compounding data quality investments pay across all agents

Where it is harder than it looks

Evaluation is the hidden majority of the work
Spec design is a permanent skill, not a project
Existing IT processes assume deterministic systems
Trust map needs continual recalibration as models improve
Vendor lock-in risk is real if you skip the gateway pattern

A 12-Month Roadmap for the Mittelstand

The work breaks naturally into four 90-day phases. The total investment for a 200-person firm typically lands at 150,000 to 400,000 euros across the year, with the first measurable production return between months 6 and 9.

Days 0-90: Platform foundations and one production agent

Stand up the LLM gateway - One internal endpoint in front of two or more model providers. Logging, rate limits, cost attribution. Two engineering weeks.
Publish the first MCP servers - SAP read-only, SharePoint read-only, customer master data read-only. Three engineering weeks.
Pick the first production agent - High volume, medium consequence, well-bounded. Spare-parts email triage, supplier-onboarding intake, internal IT helpdesk are common starting points.
Build the eval harness - 50 frozen examples, an LLM-judge rubric, a sampling pipeline. Two engineering weeks.
Run the first AI literacy training - Article 4 baseline for all staff, deeper module for citizen developers and platform team. Two days of consulting plus internal rollout.

Days 91-180: Three more agents and the trust map

Ship three additional production agents - Pick from the 5 most common Mittelstand patterns: customer-service deflection, sales lead enrichment, internal knowledge search, contract review, document extraction.
Publish the trust map - First version reviewed with Geschäftsführer, Compliance, and Betriebsrat. Wire it into the agent runtime.
Set up the prompt-and-context registry - Version control for system prompts, tool definitions, and example sets. Code-review process for production changes.
Run the first spec-writing workshop - Top 30 senior people in the firm, 2 days, real working agents at the end.

Days 181-270: Vibe coding lane and the SaaS audit

Stand up the citizen-development sandbox - Quality-gated lane for vibe-coded internal tools. Pair with the existing vibe-coding playbook from the Superkind blog.
Run the SaaS absorption audit - Map every SaaS contract against the absorption-risk table. Cancel or consolidate the thin-LLM-wrapper renewals.
Tighten evaluation - Expand eval sets, add regression dashboards, add drift detection on production traffic.
Refresh the hiring loop - Drop the puzzles, add the agent-orchestration exercise. Roll out to the next two open IT roles.

Days 271-365: Scale and institutionalise

Ship the next 5 to 10 production agents - Now mostly built by the spec leads in business units, with platform support.
Publish quarterly governance reports - To Geschäftsführer, Compliance, and Betriebsrat. Trust map updates, eval results, incident review.
Run the second wave of training - Deeper Article 4 modules, spec-design clinics, model-update briefings.
Plan the year-2 roadmap - The basic platform exists. Year 2 is about depth: domain-specific RL fine-tunes, multimodal use cases, sovereign model options, and the agent-native rebuild of the highest-volume internal tools.

12-month minimum viable Software-3.0 stack

LLM gateway with 2+ providers and per-task routing
3+ MCP servers exposing core internal data read-only
Agent runtime with logging, retries, and HITL hooks
Frozen evaluation harness and LLM-judge rubric
Prompt-and-context registry under version control
Trust map reviewed quarterly
5 to 10 production agents with documented ROI
Citizen-development sandbox with quality gate
EU AI Act Article 4 training rolled out to all staff
Quarterly governance report to Geschäftsführer + Betriebsrat

EU AI Act, GDPR, and the Betriebsrat

Software 3.0 does not get a regulatory free pass. The good news is that the obligations are mostly the same ones every Mittelstand firm is already wrestling with for AI in general - they just need to be wired into the new operating model rather than bolted on later.

EU AI Act

Article 4 (AI literacy) - Every employer must ensure adequate AI literacy among everyone using or directing AI tools²⁵. In a Software 3.0 world this includes citizen developers, spec leads, agent operators, and the Geschäftsführer. The literacy work is not optional and not delegable.
Risk classification - Most Software-3.0 internal tools are limited-risk or minimal-risk. The classification depends on what the tool does, not how it was built. A vibe-coded HR scoring tool is high-risk; a vibe-coded dashboard is not.
Article 14 (human oversight) - High-risk systems require designed-in human oversight. The trust map and the HITL hooks in the agent runtime are how this gets implemented in Software 3.0.
Implementation timeline - The bulk of high-risk obligations apply from August 2026, with general-purpose AI obligations already in force²⁶. Plan agents that touch high-risk decisions accordingly.

GDPR

Lawful basis still applies per processing - The fact that an LLM is involved does not change the GDPR analysis of what data is processed, why, and on what legal basis.
Data residency matters for sovereign deployments - The German Mittelstand bias toward EU data residency is well-served by the multi-provider gateway pattern. Route sensitive workloads to EU-hosted models, keep US providers for non-sensitive tasks.
Logging is now richer - The richer audit trail of Software 3.0 (every prompt, context, and output) is a GDPR feature, not a bug. Structure logs so they support deletion, export, and access requests.
Auftragsverarbeitung (DPA) per provider - Each model provider you route through needs its own DPA. Keep the list short and reviewed.

Betriebsrat

Frame Software 3.0 as a productivity programme, not a job cut - The honest case for the Mittelstand is that Software 3.0 closes a structural staffing gap, not that it replaces existing roles. Lead with that framing in the first conversation.
Bring the Betriebsrat into the trust map - The trust map is the single artefact that addresses most Betriebsrat concerns at once. Walk through it, refine it, sign it off as a standing document.
Carve out employee-data uses - Any agent that touches HR, performance, or attendance data needs explicit Betriebsrat involvement. Keep a separate, shorter approval path for these.
Publish the AI policy - One-page document covering allowed tools, forbidden uses, monitoring scope, and escalation. Renew annually.

How Superkind Fits Into the Software 3.0 Stack

Superkind builds custom AI agents for the Mittelstand and enterprise, with a process-first philosophy that fits how German operations teams actually work. In the Software 3.0 stack we typically own the agent runtime, the evaluation harness, the MCP integration into SAP, DATEV, and SharePoint, and the governance scaffolding around the agents we ship.

What we deliver in a Software 3.0 engagement

Agent runtime in your environment - Hosted in your tenancy, integrated with your identity provider, with the LLM gateway and observability hooks in place. No black-box SaaS.
MCP servers for SAP, DATEV, SharePoint, and your custom systems - Read-only by default, write-enabled per agent under explicit policy. The integration layer is where Mittelstand-specific value lives.
Frozen evaluation sets and LLM-judge harnesses - Built around the actual production traffic of the agent, not generic benchmarks. The audit story stands up to BNetzA scrutiny.
Trust-map design with your team - Workshop-driven, refined with Compliance and the Betriebsrat, wired into the agent runtime as enforced policy.
Spec-writing partnership - We pair with your senior domain experts on the first 5 production specs, then hand the practice over.
EU AI Act and GDPR alignment - Article 4 literacy module, Article 14 oversight design, Auftragsverarbeitung paperwork per model provider.
Multi-provider model strategy - GPT, Claude, Gemini, Mistral, plus a sovereign EU option. Routing is per task, not per company.
90-day production milestone - First production agent live in 90 days, with documented ROI, trust map, and audit trail.

When Superkind is the right partner

You are a 50 to 5,000-person German Mittelstand firm
Your IT team is small and the backlog is structural
You need agent-native integration into SAP, DATEV, or legacy ERPs
Compliance and Betriebsrat alignment matter from day one
You want production agents in 90 days, not a 12-month consulting cycle

Where you might prefer a different option

You only need a Copilot rollout - the in-house Microsoft channel is fine
Your use case is one bounded SaaS feature, not an operating-model shift
You have a 50-engineer in-house AI team already - go direct
You want a black-box SaaS with no integration into your systems

Decision Framework: Are You Ready for Software 3.0?

A simple decision framework helps a Mittelstand IT leader and Geschäftsführer get to a yes-or-no answer on Software 3.0 within one steering-committee session. Six dimensions, three honest answers each.

Dimension	Not ready	Ready to start	Ready to scale
IT capacity vs backlog	No backlog at all	2-quarter backlog	1-year+ backlog
Internal data quality	SAP master data is chaos	Cleanable in 90 days	Already cleaned
Spec writing capability	No senior writers	3-5 strong writers	Spec design is institutional
Compliance readiness	No EU AI Act work yet	Article 4 literacy started	Audit trail and trust map exist
Geschäftsführer sponsorship	Sees AI as IT topic	Will sponsor a 12-month programme	Already counts agents in OKRs
Budget posture	No new budget	150-400K euros in year 1	1%+ of revenue committed

Most Mittelstand firms land between “ready to start” and “ready to scale” on most dimensions and below the line on one or two. The right answer is almost never to wait. The right answer is to fix the laggard dimension as part of the first 90 days, not as a precondition.

Frequently Asked Questions

Software 3.0 is the worldview that large language models are a new kind of computer and that natural language is the way we program them. Andrej Karpathy laid it out at the YC AI Startup School in June 2025: Software 1.0 is code humans write, Software 2.0 is neural network weights trained from data, Software 3.0 is prompts in English directing an LLM. All three layers coexist inside modern apps. The strategic implication for the Mittelstand is that the bottleneck for new internal software is shifting from engineering capacity to spec quality.

No. It means engineers spend less time on boilerplate and more time on spec design, integration, evaluation, and operating the agent layer. Gartner predicts 80 percent of the engineering workforce will need to upskill through 2027. Mittelstand IT teams typically report 30 to 50 percent more capacity for strategic work after the shift, not headcount reductions. The role moves from author to director.

Vibe coding is one consumer-grade expression of Software 3.0 - someone describes an app and ships what the model produces, often without reading the code. Software 3.0 is the broader category. It includes vibe coding for citizen developers, agentic engineering for production systems, and the LLM-as-OS pattern where business logic lives in prompts and tools rather than in compiled code. Vibe coding raises the floor; Software 3.0 changes the building.

In Software 3.0 the LLM is the runtime and the prompt plus its supporting context (system prompt, retrieved data, tools, examples, memory) is the program. Context engineering, not just prompt engineering, is the new craft. Gartner explicitly told CIOs in 2026 to lead the shift to context engineering as prompt engineering fades. For the Mittelstand this means investing in clean internal data, MCP-style structured context delivery, and tool definitions - not just better prompt phrasings.

Selectively, yes. Some categories with thin logic on top of a model (basic OCR, simple form generators, basic image editors) are already being replaced when a frontier model can do the job inside a single chat session. The pattern Karpathy points to is that lightweight apps lose to capable models. Heavy SaaS that owns workflows, data, and integrations (SAP, DATEV, Salesforce, ERP) is safe for years. The Mittelstand action is to scrutinise renewal of thin-logic SaaS more carefully than thick-workflow SaaS.

The EU AI Act applies to AI systems based on what they do, not how they are built. A Software-3.0 internal tool that screens job applicants is a high-risk AI system regardless of whether it was vibe-coded, agentically engineered, or compiled from C++. Article 4 obliges every German employer to ensure adequate AI literacy among everyone who uses or directs AI tools. The most efficient compliance path is to wire AI governance into the Software 3.0 operating model from day one, not as a retrofit.

It depends on the cut. Bitkom reported in February 2026 that 41 percent of German companies actively use AI, up from 17 percent two years earlier, but firms with 500-plus employees are over 60 percent. SMEs are catching up but not on parity. Combined with the structural IT staffing shortage (roughly 149,000 unfilled IT jobs reported by Bitkom Akademie), the gap is large enough that Software 3.0 is one of the few levers that can close it within a single planning cycle.

Karpathy uses jagged intelligence to describe the fact that the same LLM can refactor a 100,000-line codebase brilliantly and then make a basic logic error a five-year-old would not. Performance spikes on verifiable domains where reinforcement learning has been applied (code, maths) and degrades at the edges. For the Mittelstand this means a domain-by-domain trust map, not a single trust setting. Treat the agent as a brilliant intern with perfect API recall and reliably odd blind spots.

The first 12 months typically run 150,000 to 400,000 euros all-in. That covers the LLM gateway and observability, a small platform team (1 to 2 FTE), the agent runtime, governance and compliance work, training, and the budget to convert the first 5 to 10 winning prototypes into production tools. Tooling licences (Cursor, Claude Code, Copilot, model APIs) usually add 50 to 120 euros per active user per month. The payback typically comes from the third or fourth production agent.

Yes, and that is exactly the integration layer where the Mittelstand creates moats. The Software 3.0 pattern is to keep SAP, DATEV, S/4HANA, and the AS/400 as the systems of record, expose their data and actions through MCP servers or wrapped APIs, and let the LLM operate over them. Most production agents in the Mittelstand are 30 percent prompt design, 30 percent integration glue, and 40 percent governance and evaluation.

Three things, none of which can be delegated. First, declare that English (or German) is now a first-class engineering interface and resource the platform team accordingly. Second, set the trust map for which decisions the agent layer can take autonomously, which need a human in the loop, and which never touch AI. Third, fund the spec-design upskilling for senior people in every department, because spec quality is the new bottleneck. Everything else is execution.

Lean toward generalists with strong taste, judgement, and writing. The hiring signal that holds up is asking candidates to ship a real working project under time pressure with full agent access, then having them defend the design decisions. Whiteboard puzzles are now a poor proxy. Most Mittelstand firms only need to refactor their hiring for two or three roles to start: a platform engineer who runs the agent runtime, a senior product engineer who can lead spec design, and an evaluator who builds the test environments.

Mostly the opposite. The 25-year-old SAP investment, the deep DATEV integration, the cleaned-up SharePoint - these become the substrate the agent layer feeds on. Software 3.0 makes the value of clean, accessible internal data and well-defined APIs go up sharply, because they are now consumable not just by humans but by agents. The investments that lose value are bespoke CRUD apps that wrapped a database in a UI, because that pattern is now hours of vibe coding.

Sources

Henri Jung

Co-founder of Superkind, where he helps SMEs and enterprises deploy custom AI agents that actually fit how their teams work. Henri is passionate about closing the gap between what AI can do and the value it creates in real companies. He believes the Mittelstand has everything it needs to lead in AI - it just needs the right approach.

Ready to make English a first-class engineering interface in your IT?

We help Mittelstand IT teams design the Software 3.0 operating model and ship the first production agents in 90 days. Talk to Henri about what your stack would look like.