Designing an MCP Server for Unstructured Data

Table of contents

Introduction

Agents are most valuable when connected to your data. However, most data is unstructured: PDFs, Word documents, emails, etc. Unlike structured databases, unstructured documents cannot be queried directly. Agents often have to reread entire documents to recover context, wasting both time and tokens. And once your agent has read a document, that context is often local to your machine or even to that agent.

🧶 Ariadne processes unstructured documents into a searchable vector index and exposes them through a Model Context Protocol (MCP) server. This allows you to save tokens and share context between agents and teammates.

Let’s walk through the design of Ariadne.

Requirements

The most important functional requirements are

Users can upload any number of documents, and expect that all of their context has been faithfully captured and stored durably.
Users can connect any MCP Client to an MCP server to access their context.
The MCP server serves full documents retrieved by identifier, and chunks retrieved through semantic similarity search.

The non-functional requirements are:

The architecture should not preclude future horizontal scaling, although the current implementation targets a local single-machine deployment.
Document ingestion is expected to be asynchronous and may take several seconds. In contrast, the MCP server should return tool results reasonably quickly, say, in <500ms, so that downstream agents using Ariadne remain responsive.
The target deployment is a single local machine that cannot host big LLMs.
Document collections are assumed to remain relatively small, which makes a local vector database practical while also improving semantic search quality by limiting the search space.

Data Model

There are only two pieces of information to retrieve: full documents and semantic chunks. These entities are modeled separately because they have different access patterns. Full documents are retrieved by identifier, while chunks are retrieved through semantic similarity search.

ChromaDB supports both access patterns and is a good fit for the initial implementation. It is lightweight, easy to deploy locally, persists to disk, and provides metadata filtering alongside vector similarity search. In ChromaDB, data is grouped into collections, and each item in ChromaDB has a document, embedding, and user-defined metadata. Ariadne uses those primitives to model its two entities as collections:

Document represents the full text content of a document. Its only metadata is the document name, and it does not need an embedding because it will only be searched for by id.
Chunk represents a chunk of text content from a document. Its only metadata are document_id, which refers to its parent record in Document, and chunk_idx, which is its index in the chunks of its parent document.

API

From the user’s perspective, interacting with Ariadne consists of three steps:

Create a pair of collections to hold complete documents and semantic chunks.
Upload documents for asynchronous processing.
Query the indexed data via the MCP server.

On the data ingestion side, there are two endpoints:

POST /create_collection
POST /process_document

create_collection configures a collection with a given name, and process_document begins the processing pipeline. The processing server extracts text, performs any configured LLM enrichments, chunks the document, and stores both the full document and its chunks in ChromaDB.

Once documents have been indexed, they become available through two MCP tools:

get_full_document(name | id)
search(query, document_id?)

get_full_document retrieves the complete source document from the Document collection by id or name, while search performs semantic similarity search for query over the Chunk collection, optionally filtered by document_id. New documents can continue to be uploaded while the database is queried for previously indexed data.

High-Level Design

At a high-level, the system is composed of a few key components, plus the embedding and enrichment models they call out to:

A processing server that asynchronously processes uploaded documents and inserts the documents and chunks into the database. The processing pipeline can be configured to call an LLM to enrich the documents by annotating pictures, adding code or figure understanding, and/or performing entity extraction.
The core vector database, which handles embeddings by calling out to an embedding model.
An MCP server which accepts requests from any MCP client, searches the vector database via its two tools, and returns the results to the client.

The following diagram provides an overview of the system:

In the processing server, documents are converted via preconfigured Docling pipelines, which can have LLM enrichments configured to run locally or via an API, though this is not yet implemented. Then the converted documents are serialized before being added to the Document collection and chunked before being added to the Chunk collection. A hybrid chunking strategy is used to chunk documents based on both document hierarchy and token boundaries.

ChromaDB is configured to generate embeddings using Ollama, and the MCP server is implemented with FastMCP over a Streamable HTTP transport.

Scaling the System

Ariadne is currently designed to run on a single machine, which is sufficient for its intended use. Most users will index a relatively small corpus of documents, making a local deployment the best option. Nevertheless, the architecture leaves a clear path to horizontal scaling should future requirements demand it.

Here is what the scaled system could look like:

The first bottleneck is document ingestion. As the number of uploads increases, document processing can be scaled horizontally by introducing multiple processing workers behind an API gateway. Rather than processing uploads synchronously, incoming documents can be placed onto a message queue. This decouples uploads from processing, provides built-in retry behavior, can guarantee at-least-once delivery, and allows workers to consume documents at whatever rate available compute permits.

The MCP server is naturally stateless, making it straightforward to run multiple instances behind an MCP gateway.

Eventually, the vector database may become a limiting factor as collections or query volume grow. At that point, ChromaDB can be replaced with a distributed vector database or a sharded deployment. Because Ariadne organizes data into independent collections, which should be small, sharding by collection is a natural strategy.

Finally, a semantic cache can be introduced in front of the vector database to reduce repeated similarity searches. Agent workflows often issue semantically similar queries, making semantic caching an effective way to lower database load and improve response latency. Since document ingestion is relatively infrequent, cache invalidation remains simple compared to systems with frequent writes.

This evolution allows each component to scale independently as usage grows.

2026-07-01 · 5 min read

../