
TyrolAI Docs: Why I built my own RAG system for industry
Enterprise RAG sounds like a solved problem. In practice, most off-the-shelf solutions fall apart at GDPR, Active Directory, or the question of who is allowed to see which document. That is why I built TyrolAI Docs.
When people say "RAG system" today, most think of a handful of ChatGPT plugins or a few Python scripts with LangChain. That works fine in a demo. But what happens when an Austrian machine builder with 400 employees wants to make maintenance manuals, SOPs, and quality guidelines searchable via AI? Then the real problems begin.
Data must not leave the EU. Process engineering must not see what HR has. Employees should sign in with their Windows account, not yet another password. An audit log is required because internal audit demands it. And eventually someone asks for an air-gapped deployment because a plant in a remote valley has no internet connection.
At that point, the half-life of most demos drops to a few minutes. That is exactly why I built TyrolAI Docs.
What is TyrolAI Docs?
TyrolAI Docs is an enterprise RAG platform that makes company knowledge searchable via AI chat. Employees ask questions in natural language and get answers with source citations - directly from their own documents.
The target audience is industrial companies with 100 to 5,000 employees. In other words, the mid-sized manufacturers that form the backbone of the Austrian and German economy. Large enough to have a real knowledge problem. Too small to afford a dedicated data science team that builds LangChain pipelines from scratch.
The platform is based on OpenRAG (an open-source project from IBM under the Apache 2.0 license), which I extended with enterprise features that are actually needed in industry.
Why not just ChatGPT Enterprise?
I get this question often. The short answer: because ChatGPT Enterprise sends data to the US and because the access control is too coarse.
The long answer:
GDPR compliance. For many European industrial companies, storing technical documentation or process descriptions in the US is simply not an option. Not because it is illegal, but because internal compliance or the data protection officer will not approve it. TyrolAI Docs runs either in the EU (via Azure OpenAI in EU regions) or fully on-premise with local LLMs through Ollama. Some customers run it on their own server in the server room next to the production line.
Document access by department. In a manufacturing company, not everyone has access to everything. Quality assurance should not be able to search purchasing contracts. Managers should see HR data, machine operators should not. TyrolAI Docs implements this as Document-Level Security: on upload, you define which AD groups have access. The AI only answers questions based on documents the asking user is actually allowed to see.
Microsoft SSO with AD group sync. No employee in a 500-person company wants to remember another password. Authentication runs through Entra ID / Azure AD. Active Directory groups sync automatically. When someone changes department, access to documents updates with them.
Audit trail. Every action is logged, HMAC-signed, and tamper-proof. This sounds like overkill until internal audit asks for the first time which documents a specific employee accessed in the last three months.
The architecture in three sentences
At its core, TyrolAI Docs consists of a FastAPI backend (Python 3.13), a Next.js frontend (React 19 / TypeScript), OpenSearch 3.2 as a hybrid search engine (full-text plus vector), Langflow for the RAG pipeline, and Docling for document processing with OCR. In front sits an Nginx as reverse proxy with TLS and WAF, behind it run Celery and Redis for asynchronous tasks like indexing and synchronization. The whole system runs in Docker containers and can be deployed via Docker Compose, AWS EKS, Azure AKS, or Kubernetes Helm.
For customers who need absolute data sovereignty, there is an air-gap mode with Ollama and local models like Llama 3.1 or Mistral. No internet needed. No data leaves the plant.
Which LLM providers can I use?
Deliberately not locked in. Every company has different preferences or constraints. TyrolAI Docs supports:
- Azure OpenAI in EU regions - GPT-4o, GPT-4, GDPR-compliant
- OpenAI directly - if US data residency is acceptable
- Anthropic Claude - Sonnet, Opus, Haiku
- Ollama - local models on your own hardware, fully offline
- IBM WatsonX - for customers with existing IBM infrastructure
Switching between providers is a configuration change, not a rebuild project.
Documents and data sources
Knowledge in companies is rarely in one place. It is spread across SharePoint, OneDrive, network drives, SAP, internal wikis, and PDFs on the server of some foreman. TyrolAI Docs can connect directly to these sources and sync them automatically:
- OneDrive, SharePoint, S3, Google Drive - with periodic sync
- SAP integration via OData with field mapping and change notifications
- Direct upload of PDFs, Word, Excel, PowerPoint - with OCR for scanned documents
For document processing I use Docling, which cleanly converts complex layouts, tables, and scanned PDFs into text. This is an underrated step: a RAG system is only as good as the quality of the chunks it indexes.
What users get day-to-day
On the frontend side, TyrolAI Docs looks to employees like a ChatGPT-style interface. That is intentional - nobody should need training to use it. Underneath sit a number of features that were repeatedly requested in practice:
- Chat export as PDF, Word, or Markdown for documentation
- Bookmarks for important conversations
- Prompt templates for recurring requests like "summarize this document"
- Document preview with the used chunks and relevance score - so users can trace where an answer came from
- Document versioning with diff view
- Thumbs-up / thumbs-down feedback on AI answers for quality improvement
What admins need
The actual enterprise difference sits on the admin side. Six roles from Viewer to Compliance Officer. Bulk user import via CSV. Token budgets per user, group, or global - so that LLM costs never spiral. SCIM 2.0 for automatic user provisioning via Entra ID or Okta. Audit log with HMAC signature. GDPR functions for data export (Art. 15) and deletion (Art. 17).
This sounds like a lot, and it is a lot. But this is exactly the point: without these functions, a RAG system cannot be deployed in a regulated industrial environment.
Honest limitations
Because I write this in every blog post and this one should be no exception:
TyrolAI Docs does not replace a domain expert. The AI can search documents and generate answers. It does not replace the experienced maintenance technician who knows why this specific bearing always fails first. It also does not replace the compliance officer who decides which standard applies in which case.
The quality of answers depends on the quality of documents. If maintenance manuals contradict each other, if SOPs have not been updated in three years, if the same procedure exists in three different versions in the system - then even the best RAG system delivers inconsistent answers. A RAG project is often also a documentation cleanup project.
Rollout takes time. Not the technical installation - that is done in a few days. But integrating with existing data sources, defining access rights per department, the initial upload and tagging of relevant documents, training the power users - that takes weeks to months, depending on company size.
It is not the right solution for every use case. For structured data in databases, a classic SQL query or a BI tool is often better. For very specific domain questions, a fine-tuned chatbot might make more sense. RAG is strong when knowledge exists as text documents and employees want to search that knowledge conversationally.
Why an open-source foundation?
The platform is based on IBM's OpenRAG under the Apache 2.0 license. Two reasons for this:
First: do not reinvent every wheel. Smart people at IBM built a solid RAG framework. My value-add is at the points that actually matter in an Austrian industrial company - enterprise security, SAP integration, GDPR functions, document-level access control. Not in the fundamental retrieval mechanism.
Second: trust. A company deploying such a system internally must be able to audit the code. There is no black box on top of what I deliver. The customer can continue to run the system without me if needed.
How it works in practice
TyrolAI Docs is not a closed product that you buy as a license and deploy. It is a foundation on which I build the adaptations that matter for each customer. A typical engagement looks like this:
- A two-week pilot with a bounded use case - for example, all maintenance manuals for a specific machine line.
- Evaluation with real users from the plant.
- Rollout to more document types, departments, and integrations.
- Handover to internal IT for ongoing operation - or ongoing support via a maintenance contract.
If that sounds interesting for your company, tell me your use case in a few lines. I will get back with an honest assessment.