Your SharePoint Is a Goldmine: Turning Ten Years of Documents Into an AI Agent’s Knowledge Base

21 April 202634 min read

Co-founder at Superkind

Filing drawer with one orange-tabbed document among many identical dark registers

Every Mittelstand company that runs Microsoft 365 is sitting on the same asset: a SharePoint environment that has quietly absorbed ten or fifteen years of contracts, product specifications, technical drawings, quality manuals, training materials, customer emails saved as PDFs, project wikis, and the odd vendor catalogue. Roughly 80 percent of Fortune 500 companies rely on SharePoint, and in Germany it is the default document layer in tens of thousands of mid-sized firms¹⁷. The content is there. Most employees simply cannot find it.

The idea that AI unlocks this is correct. The way most companies try to do it - turn on Copilot, hope for the best - is not. Microsoft itself is now openly warning customers about oversharing: Copilot does not judge whether access is appropriate, it simply reads what the user is already allowed to read⁴. A decade of convenient “shared with everyone except external users” decisions suddenly becomes a search engine for things nobody ever intended to surface.

This guide explains the workable path. You treat SharePoint as exactly what it is: a goldmine that needs a deliberate shaft, not a bulldozer. A custom AI agent, built on top of the content you already own, with grounded retrieval, permissions respected at query time, metadata enrichment, and a scope that can actually be governed. Ten years of documents become an institutional memory that answers questions instead of a drive letter that stores them.

TL;DR

Your SharePoint already holds the answers - product, quality, finance, legal, HR, and operational knowledge accumulated over a decade of Microsoft 365 use.

Copilot alone is not the same as an agent - it summarises what you show it; it does not reason across sources, respect scopes, or combine SharePoint with ERP or CRM.

Oversharing is the real blocker - a permissions cleanup with SharePoint Advanced Management and Purview is non-negotiable before any broad AI rollout^4,5.

Grounding, chunking, and metadata do the heavy lifting - hybrid retrieval with reranking, semantic chunking, and enriched metadata are what turn raw content into accurate answers^10,11.

A focused 60-day deployment works - one scope, one use case, one measurable KPI; expand only after the first agent proves value.

What’s Already in Your SharePoint

Before deciding what an agent can do with your SharePoint, you have to be honest about what is actually in it. A typical Mittelstand tenant with 300 to 1,500 employees has far more than anyone remembers - and less structure than anyone assumes.

Contracts and agreements - customer terms, NDAs, supplier agreements, framework contracts, often with multiple versions across sites.
Product and engineering content - datasheets, technical drawings (often PDFs of CAD exports), bill-of-material documents, change notices, compliance certifications.
Quality and compliance documentation - ISO manuals, SOPs, work instructions, audit reports, DSGVO records, EU AI Act policies once the deadline bites.
Finance and controlling - budget templates, forecast decks, board materials, investor updates, month-end close documentation.
Sales and marketing - proposal templates, pricing sheets, RFP responses, customer case studies, partner enablement, event collateral.
HR and training - role descriptions, onboarding materials, training modules, internal policies, evaluation templates, Betriebsrat notes.
Project archives - entire Teams and project sites where the real history of how something got done actually lives.
The long tail - random screenshots, scanned PDFs from personal printers, legacy Word files, duplicated email attachments, shadow-IT spreadsheets.

Inventory Rule of Thumb

When Mittelstand customers actually measure, the ratio is consistently the same: roughly 20 percent of the content is actively used, 60 percent is referenced occasionally, and 20 percent has not been touched in three years but cannot safely be deleted. An agent project is the best excuse the company has ever had to systematically look at that 80 percent.

Category	Typical locations	Agent value
Contracts	Legal site, Sales archive, customer folders	High (clause lookup, obligation extraction)
Technical specs	Engineering site, product-line subsites	High (answering customer and support questions)
SOPs / QM	Quality site, plant-specific sites	High (compliance queries, shift handover)
HR policies	HR site, onboarding pages	Medium (general Q&A, scoped access)
Project history	Teams channels, project sites	Medium-High (lessons learned, reuse)
Finance templates	Controlling site	Medium (budget guidance, forecast support)

Why Microsoft 365 Copilot Alone Is Not Enough

Copilot is a useful productivity assistant. It summarises a document you point it at, drafts an email based on a Teams thread, and saves time on routine writing. That is different from an AI agent that answers complex questions by reasoning across hundreds of documents with citations you can trust. Most Mittelstand companies that try to use Copilot as their knowledge base hit the same walls.

Where Copilot stops being enough

Single-source reasoning - Copilot excels when you point it at a document; it struggles when the answer requires combining clauses from three contracts and a price list.
No cross-system reach - questions that need SharePoint plus ERP, plus CRM, plus a vendor portal are outside what generic Copilot is designed for.
Inherited permissions - Copilot respects user permissions, which sounds safe but exposes every legacy oversharing decision already baked into SharePoint⁴.
No scoped agents by default - Copilot does not know which ten documents are the authoritative source for a given question; it sees the whole graph.
No workflow execution - Copilot answers; it rarely acts. An agent can draft, route, update a record, and close a task.
Licensing math - at EUR 28-30 per user per month, Copilot becomes expensive well before it becomes widely adopted⁸.

Capability	Microsoft 365 Copilot	Custom Agent on SharePoint
Single-document summary	Strong	Strong
Multi-document reasoning with citations	Limited	Strong (with proper RAG setup)
Scoped knowledge base per use case	Basic (agents in Copilot Studio)	Full control
Integration with ERP, CRM, MES	Limited (connectors)	Full (any API)
Permissions trimming at retrieval	Inherited from SharePoint	Enforced per query, tighter by design
Ability to act	Growing but limited	Full (drafts, records, workflows)
Pricing	Per seat, per month	Per use case, tied to outcomes

The Practical Answer

Most Mittelstand customers end up running both. Copilot for general productivity where the per-seat licence is justified, custom agents for the two or three workflows where cross-source reasoning, scoped knowledge, and action execution deliver disproportionate value.

The Oversharing Trap (and How to Fix It First)

Oversharing is the single biggest reason SharePoint-based AI projects stall. Microsoft now describes it as one of the most pervasive governance issues in Microsoft 365, amplified the moment Copilot or a custom agent reads the tenant^4,5. It is not a theoretical problem.

How the problem accumulates

Default sharing set to everyone - the simplest permission at site creation often becomes the permanent one.
“Everyone except external users” - the most dangerous group in Microsoft 365; it looks controlled but effectively means everybody internal.
Broken permission inheritance - site-level permissions do not match folder-level or file-level permissions, so one tightening move does not fix the leaves.
Legacy external guest access - partners and contractors from years ago still have active links.
Shadow site creation - every new Team creates a SharePoint site; over years, permissions diverge in ways nobody tracks.
Sensitivity labels unused - Microsoft Purview labels exist but were never rolled out; sensitive content is indistinguishable from general content.

The cleanup sequence

Run a permission audit - SharePoint Advanced Management Permission State Reports show exactly which sites, folders, and documents are broadly accessible⁴.
Kill the “everyone except external” group where it does not belong - replace with explicit groups or Microsoft 365 groups tied to actual teams.
Reinstate inheritance where it was broken unnecessarily - flat, inheritance-based permissions are easier to govern than fine-grained overrides.
Apply sensitivity labels to clear categories (confidential, internal, public) - the agent can then respect them automatically.
Use Restricted Content Discovery to block specific high-risk sites from agent access even if individual users still need them⁴.
Set up change monitoring - permission drift is continuous; treat governance as an ongoing process, not a one-off project.

Critical Sequence

Oversharing cleanup happens before the agent goes live, not after. A custom agent can narrow its own scope, but the underlying SharePoint still needs to be defensible. Rolling an agent out into a messy permissions landscape makes problems that were previously hidden suddenly visible - and a senior executive running a casual query is not how you want to find out.

Cleanup first (recommended)

✓ Safe to expand scope later - add content without re-opening the permission question every time
✓ Agent quality improves - less noise in retrieval, better citations
✓ Audit-ready - DSGVO and sector compliance demand this regardless of AI
✓ Leadership confidence - board can approve a broad rollout without surprises

Skip or defer cleanup

✗ Silent leaks become loud - the first accidental HR answer to a general user ends the programme
✗ Agent quality suffers - retrieval picks up stale, duplicate, or contradictory content
✗ Stalled rollouts - Microsoft itself attributes most Copilot deployments stalling to governance gaps⁸
✗ Regulatory exposure - EU AI Act transparency and data-protection obligations compound

Grounding, Chunking, and Metadata: What Actually Makes Retrieval Work

The difference between an agent that gives useful answers and one that invents plausible nonsense is almost never the language model. It is the plumbing underneath: how content is broken up, enriched, retrieved, and grounded. Three choices matter more than the rest.

Chunking: how content is broken into retrievable pieces

Naive fixed-size chunking - splits by character count; cheap and fast, but cuts sentences and ideas at arbitrary points. Works for trivial cases only.
Recursive character-based chunking - splits at natural boundaries (paragraphs, sentences) with empirically tuned sizes; the current sweet spot for most Mittelstand content¹¹.
Semantic chunking - uses embeddings to group semantically related passages; more expensive but higher precision for dense technical content.
Heading-aware chunking - respects Word or HTML heading structure; ideal for SOPs, manuals, and documentation that is already well-structured.
Table and figure handling - tables extracted and stored as JSON with row identifiers and column headers; figures sent through vision models for alt-text generation¹¹.

Metadata enrichment: what the agent knows about each chunk

Source identifiers - site, library, folder, file name, version - non-negotiable for citations.
Authoring metadata - author, last modified date, department, language.
Sensitivity labels - inherited from Microsoft Purview; used to filter or warn at answer time.
LLM-generated metadata - a small model writes a summary, key entities, document type, topic tags for each chunk, substantially improving retrieval precision¹².
Effective date and expiry - contracts and policies are time-bound; the agent has to know what is current.
Partition tags - allow queries to be scoped to a specific subset (“finance-reports”, “product-A”) without physical data movement⁹.

Retrieval: how the agent finds the right chunks

Vector-only retrieval - semantic similarity, good for conceptual queries, weak for exact identifiers like contract or part numbers.
Keyword-only retrieval (BM25) - strong for exact terms, blind to synonyms and paraphrases.
Hybrid retrieval - both combined; now the baseline, not a luxury. 80 percent of enterprise RAG systems use this pattern¹¹.
Reranking - a smaller model reorders top results by true relevance; essential once the corpus exceeds 1 million chunks¹¹.
Metadata filtering - pre-filters by author, date, sensitivity, or partition before the semantic search runs.
Multi-query expansion - the agent reformulates the user’s question several ways and unions the results for better recall.

Layer	Weak choice	Baseline choice	Production choice
Chunking	Fixed-size	Recursive character-based	Heading-aware + semantic
Metadata	Source only	Source + author + date	Source + author + date + sensitivity + LLM tags
Retrieval	Vector only	Hybrid (vector + BM25)	Hybrid + rerank + metadata filter
Grounding	Implicit	Cited once per answer	Inline citations + source previews
Freshness	Batch weekly	Batch daily	Event-driven incremental

“SharePoint content powers the semantic foundation of Microsoft 365, enabling Copilot and agents to reason over contextualised documents and sites with industry-leading semantic index and RAG architecture.”

- Microsoft 365 Blog, SharePoint at 25¹

See an agent running on your SharePoint

Book a 30-minute call. We will sketch the scope, oversharing fixes, and first use case together.

Book a Demo →

Neat rack of dark document plates with one orange-tabbed chunk retrieved from the stack

Beyond “Chat With Your Documents”

“Chat with your documents” is the demo that sells the idea and underwhelms in reality. The useful agent pattern is different: an agent that knows which sources are authoritative for a question, plans multiple retrievals, combines SharePoint with other systems, and takes the next action once the answer is clear.

The four capabilities that separate agents from search

Scope awareness - the agent knows that product warranty questions live in a specific site, HR policy in another, finance templates in a third. It does not dilute retrieval across the whole tenant.
Multi-step planning - it decomposes a complex question, issues multiple retrieval calls, reflects on intermediate answers, and recovers when the first attempt returns weak results¹⁰.
Cross-system grounding - it combines SharePoint content with ERP records, CRM context, and external APIs where needed, citing each source separately.
Action execution - once the answer is trustworthy, it drafts the email, updates the record, creates the task, or triggers the workflow, with human-in-the-loop checkpoints for anything high-impact.

A realistic flow

A regional sales manager asks: “What’s the most recent pricing we quoted to Kunde X for product line B, and is that still within our current discount policy?” A classic Copilot answer looks at whichever document is nearest. A proper agent:

Retrieves the most recent quote PDF from the customer folder in SharePoint.
Retrieves the current discount policy from the sales governance site.
Looks up the customer in CRM to confirm status and classification.
Reconciles quote, policy, and status - flags any policy deviation explicitly.
Returns a cited summary; offers to draft a follow-up email that remains within policy.

What This Buys You

Saved time is only part of the return. The larger win is that institutional memory becomes queryable. A new sales manager answers a question as well as a 15-year veteran because the 15 years of context are actually reachable, not stuck in a hallway conversation.

5 Concrete Use Cases for the Mittelstand

These are the five use cases where a SharePoint-grounded agent reliably pays for itself inside a year. All of them assume the oversharing cleanup is done, the scope is defined, and the retrieval layer is production-grade.

1. Technical customer support with product knowledge

Content sources - datasheets, manuals, FAQs, past support tickets, engineering change notices.
What the agent does - answers a support engineer’s question with citations so they can respond to the customer in minutes instead of half a day.
Typical ROI - 30-50 percent reduction in average handling time; measurable first-contact resolution improvement.
Governance - moderate; content is non-confidential but product-specific accuracy matters.

2. Contract clause and obligation lookup

Content sources - customer and supplier contract libraries, framework agreements, NDAs.
What the agent does - answers “what are our termination rights in the Muller contract?” with the clause, the effective date, and a link to the signed PDF.
Typical ROI - legal and sales save weeks of aggregate lookup time per quarter; risk of missed obligations drops.
Governance - scoped access to legal-approved users; sensitivity labels strictly respected.

3. Quality and compliance answering

Content sources - ISO manuals, SOPs, work instructions, audit logs, CAPA records.
What the agent does - a machine operator asks “what is the current torque spec for part 4B?” and gets the exact line from the current SOP with the revision number.
Typical ROI - fewer errors on the shop floor; shorter audit preparation; institutional knowledge preserved before retirements.
Governance - high; regulated industries require traceable citations and revision control.

4. RFP and proposal assembly

Content sources - past RFP responses, case studies, capability statements, pricing sheets.
What the agent does - drafts a first-pass response to a new RFP question using the company’s own approved language, with citations back to the source documents.
Typical ROI - 50-70 percent reduction in first-draft time; higher consistency across proposals.
Governance - scoped to sales; human review remains mandatory.

5. Onboarding and internal knowledge queries

Content sources - HR policies, onboarding wikis, product handbooks, org charts.
What the agent does - answers “how do I book a training?” or “who owns the Salzgitter customer relationship?” without anyone having to find the right SharePoint page.
Typical ROI - faster time-to-productivity for new hires; lower load on HR and admin teams.
Governance - general access with sensitivity labels preventing leakage into confidential HR content.

Use Case	Primary Benefit	Typical Payback	Governance Weight
Technical support	30-50% faster handling	3-6 months	Medium
Contract lookup	Risk + time savings	3-9 months	High
Quality & compliance	Fewer errors, audit-ready	6-12 months	Very High
RFP assembly	50-70% first-draft time saved	3-6 months	Medium
Internal Q&A	HR load down, faster onboarding	6-9 months	Low-Medium

The 60-Day Playbook

SharePoint agents can be deployed faster than full stack-wide projects because the content and the authentication layer already exist. Here is how a successful 60-day first deployment actually runs.

Phase 1: Scope and cleanup (Weeks 1-3)

Week 1: Use case and scope - pick one use case (support, contracts, quality) and one content scope (specific sites and libraries). Resist the urge to index everything.
Week 2: Oversharing audit - run SharePoint Advanced Management reports on the selected scope, fix permissions, apply labels, enable Restricted Content Discovery where needed⁴.
Week 3: Content curation - identify authoritative versus archival content, deprecate obvious duplicates, confirm sensitivity classifications.

Phase 2: Build and ground (Weeks 4-7)

Week 4: Indexing pipeline - configure chunking, embeddings, metadata enrichment, partition tags. Decide on Azure AI Search or an equivalent store.
Week 5: Agent logic - assemble planning, retrieval, reranking, and citation behaviour. Integrate any cross-system sources (ERP, CRM) that the use case needs.
Week 6: Internal testing - run against a fixed set of real questions from the target team. Measure retrieval precision, answer quality, and citation accuracy.
Week 7: Refinement - close the obvious gaps: missing metadata, wrong chunking for specific document types, policy edges that need explicit handling.

Phase 3: Rollout and measure (Weeks 8-9)

Week 8: Pilot launch - go live with the target team. Add feedback mechanisms to every answer. Monitor usage and citation health daily.
Week 9: Measurement - compare to the baseline set in Phase 1 (handling time, lookup time, error rate). Present to leadership. Scope the next use case or the next content expansion.

Go-Live Readiness Checklist

Permission audit complete on the target scope, oversharing fixes applied
Sensitivity labels mapped for all relevant content categories
Restricted Content Discovery configured for high-risk sites
Chunking strategy validated against sample documents from each document type
Hybrid retrieval with reranking running and measured
Inline citations back to the source document with revision visible in every answer
Incremental re-indexing on SharePoint change events working end-to-end
Baseline KPIs captured (handling time, lookup time, error rate, escalation rate)
Feedback mechanism in the UI captures user rating on each answer
Betriebsrat informed where the agent touches employee-related content

DSGVO, EU AI Act, and Betriebsrat

A SharePoint agent inherits the compliance obligations of the content it reads. In a European Mittelstand context, three frameworks matter most. None of them are blockers, but all three need to be planned in from the start.

DSGVO

Content stays in tenant - agents retrieve from your Microsoft 365 tenant; the index can live in your Azure subscription in a European region.
LLM processing location - Azure OpenAI in EU regions or European model providers (Mistral, Aleph Alpha) keep inference inside EU jurisdiction.
Legal basis - documented per use case; legitimate interest typically covers operational Q&A; employee-facing processes need a dedicated legal basis.
Subprocessor review - any external LLM provider is listed in the Verarbeitungsverzeichnis; DPAs signed; data residency verified.
Right to explanation - citations and retrievable audit logs satisfy the practical requirement; the agent can show exactly which documents produced which answer.

EU AI Act

Risk classification - most SharePoint knowledge agents fall under minimal or limited risk; transparency (“this is an AI agent”) is the main obligation¹⁸.
Article 4 literacy - users and admins need documented training by August 2026; the agent project is a natural moment to roll this out.
High-risk scenarios - HR-related agents (hiring, evaluation), safety-critical knowledge, or credit scoring content push the agent into high-risk territory; conformity assessment applies.
Record keeping - audit logs and source citations map directly to the Act’s documentation expectations.

Betriebsrat

Co-determination triggers - any agent that touches employee-facing processes (HR queries, evaluation, internal comms, monitoring) requires a Betriebsvereinbarung.
Operational agents are lighter - a support-desk or contract-lookup agent usually needs information and consultation rather than formal agreement.
Early engagement accelerates approval - scoped agents with clear audit trails, documented data sources, and defined human-in-the-loop rules are dramatically easier to approve than open-ended pilots.
Template agreement - structure the Betriebsvereinbarung around scope, permitted actions, human review requirements, retention, and evaluation KPIs; it mirrors the technical design of the agent.

Unified Governance

The same audit log satisfies DSGVO right-to-explanation, EU AI Act record-keeping, and Betriebsrat oversight. Design the log once with all three audiences in mind and you avoid running parallel compliance programmes.

“A RAG system can only be as good as the data it queries - outdated, contradictory, or poorly structured documents will produce problematic answers.”

- Keerok, Enterprise RAG: Building an AI Knowledge Base in 2026¹⁰

How Superkind Fits

Superkind builds custom AI agents that sit on top of your existing stack, and SharePoint is one of the most common content sources we integrate. The principle is the same as with ERP and CRM: we do not move your content, we do not replace your tools, we put an agent layer on top and make the content you already own actually usable.

Scope before scale - we start with one use case and one content scope; the second always reuses the infrastructure of the first.
Oversharing cleanup included - our deployments start with a permission audit of the target scope, using SharePoint Advanced Management where available.
Grounded retrieval by design - hybrid retrieval, heading-aware and semantic chunking, LLM-generated metadata enrichment, reranking; no toy demos that fall over in production.
Cross-system agents - SharePoint plus ERP plus CRM plus whatever else the use case actually requires. Not limited to Microsoft’s ecosystem.
European-first deployment options - Azure OpenAI in EU regions, Mistral or Aleph Alpha for sovereign scenarios, Azure AI Search or equivalent stores inside your tenant.
Outcome-based pricing - per use case, tied to measurable outcomes, not per-seat contracts that cost regardless of adoption.
Compliance built in - DSGVO-aligned architecture, EU AI Act-ready documentation, Betriebsrat-friendly scope and audit trail templates.
Continuous improvement - we do not hand over and disappear; the agent gets sharper with usage and your team shapes it through feedback.

Approach	M365 Copilot Alone	Superkind Custom Agent
Scope control	Whole tenant, limited partitioning	Scoped per use case
Cross-system reach	Connectors with limits	Any API or system
Retrieval control	Semantic index managed by Microsoft	Full control: chunking, metadata, retrieval, rerank
Pricing	Per seat / month	Per use case, outcome-based
Governance	Inherits tenant permissions	Tenant + per-scope + per-agent controls
Ability to act	Limited outside Microsoft apps	Full action execution with human-in-the-loop

Pros

✓ Scoped agents - narrow, governable, measurable
✓ Cross-system capability - SharePoint plus anything else the workflow needs
✓ Production-grade retrieval - hybrid, reranked, metadata-rich
✓ Outcome-based pricing - per use case, not per seat
✓ DSGVO- and AI-Act-ready - European deployment options, documented audit trail

Cons

✗ Not self-serve - requires engagement with our team
✗ Not a replacement for Copilot - most customers run both
✗ Oversharing work required - we will not index a messy scope just to make a deadline
✗ Capacity-limited - we work with a focused number of clients at a time

Decision Framework

Not every company is ready for a SharePoint agent today. Here is how to tell.

Signal	What it means	Action
10+ years on SharePoint with active content creation	High-value knowledge base candidate	Start with a single scope; plan the oversharing cleanup in parallel
Employees rely on tribal knowledge to find documents	Classic retrieval gap that agents solve cleanly	Prioritise use cases where the same question is asked many times a week
You tried Copilot and adoption stalled	Usually oversharing or scope problems, not the model	Fix permissions, define scope, add an agent for the concrete workflow
Regulatory or audit pressure is rising	Citations and traceable retrieval become compliance tools	Start with quality or compliance content; audit log becomes dual-purpose
Skilled employees are retiring in the next two years	Institutional memory is about to walk out the door	Agent project with heavy veteran involvement to codify tacit knowledge
SharePoint is a mess with no governance in place	Agent project will hit oversharing problems fast	Start with cleanup as a dedicated phase, not parallel to the agent build

Act now

✓ Content is already there - no migration, no platform change, fast time-to-value
✓ Oversharing cleanup is overdue anyway - the agent project finally funds it
✓ Early adopters compound - a second use case on the same foundation is cheap
✓ Institutional memory preserved - critical before retirements hit

Wait

✗ Content keeps growing - the cleanup gets harder, not easier
✗ Competitors pull ahead - same SharePoint, better usable knowledge
✗ Employees turn to ChatGPT with copy-paste - shadow AI risk rises
✗ Regulation compounds - AI Act deadlines arrive regardless

Frequently Asked Questions

No. A well-designed agent reads directly from your existing SharePoint and OneDrive through the Microsoft Graph API or Microsoft Search APIs. Documents stay where they are. The agent builds an index alongside SharePoint that is refreshed when documents change, but nothing gets moved or duplicated into a third-party store unless you choose that pattern.

Not if the agent is built to respect SharePoint permissions at retrieval time. Every request is executed in the context of the user asking, so results are trimmed to what that user already has access to. The risk is permissions being too wide in SharePoint itself. That is why oversharing cleanup is the first step, not the last.

Copilot works well for single-document summarisation and simple drafting. For use cases that require reasoning across many documents, combining SharePoint with external systems (ERP, CRM), or grounding with strict source citations, a custom agent usually delivers better results. Many Mittelstand companies end up running both, with Copilot for general productivity and custom agents for high-value workflows.

The agent works on what is there, but messy data produces messy answers. The cleanup does not have to be perfect or finished before the agent goes live. A narrow first use case surfaces exactly which subset of content matters, and the curation happens alongside deployment rather than as a multi-year prerequisite.

Modern agents combine OCR for scanned PDFs and vision models for images. Results vary by document quality. Typed PDFs and structured Word or Excel documents work extremely well. Low-resolution scans, handwriting, or pure technical drawings may need additional processing. The agent can still surface the document and let a human open the original when needed.

Incremental re-indexing runs on SharePoint change events, so new or modified documents become searchable within minutes. Old versions can be removed or retained depending on your policy. The agent always cites the source document and revision, so users can see whether they are looking at the latest version.

Sensitive site collections should be excluded from the agent index entirely or scoped to a separate agent with its own access control. Most deployments start with general knowledge content and add sensitive scopes only when explicitly needed. Microsoft Purview sensitivity labels and Restricted Content Discovery can help enforce boundaries.

Yes. This is the main reason to build a custom agent rather than rely on Copilot alone. The agent can retrieve a contract from SharePoint, check payment status in SAP, look up the customer in CRM, and return a combined answer with citations. Cross-system reasoning is where agents deliver disproportionate value.

A focused deployment on a single content scope and a single use case typically runs EUR 40-120K in year one, including oversharing cleanup, indexing, agent logic, and team enablement. Costs scale with the breadth of content, the number of use cases, and any integrations with other systems. Running costs are dominated by Azure OpenAI or equivalent inference charges and grow with usage.

Yes, when designed correctly. Content stays inside your Microsoft 365 tenant, processing purposes are documented, access controls are inherited from SharePoint, and audit logs record every query. If the agent uses an external LLM, data residency and subprocessor agreements need to be reviewed. Many Mittelstand customers choose Azure OpenAI in a European region or a sovereign European model for this reason.

If the agent touches employee-related processes (HR documents, evaluations, communication) then yes, co-determination applies and a Betriebsvereinbarung is required. For purely operational knowledge bases (product specs, quality documents, technical SOPs) the obligation is lighter but early engagement is still wise. A narrow scope and clear audit trail make Betriebsrat approval considerably faster.

Trying to index everything at once. A sprawling initial scope hides quality problems, explodes the cost of oversharing cleanup, and dilutes the first-use-case ROI. The companies that succeed pick a single content scope, ship a single use case, demonstrate measurable value, and then expand. The second use case is always faster and cheaper than the first because the foundation already exists.

Sources

Henri Jung

Co-founder of Superkind, where he helps SMEs and enterprises deploy custom AI agents that actually fit how their teams work. Henri is passionate about closing the gap between what AI can do and the value it creates in real companies. He believes the Mittelstand has everything it needs to lead in AI - it just needs the right approach.

Ready to turn your SharePoint into real institutional intelligence?

Book a 30-minute call with Henri. We will sketch the scope, the oversharing fixes, and the first use case together - no sales pitch, no commitment.