Designing an MCP Server for Unstructured Data

Table of contents

Introduction

Agents are most valuable when connected to your data. However, most data is unstructured: PDFs, Word documents, emails, etc. Unlike structured databases, unstructured documents cannot be queried directly. Agents often have to reread entire documents to recover context, wasting both time and tokens. And once your agent has read a document, that context is often local to your machine or even to that agent.

🧶 Ariadne processes unstructured documents into a searchable vector index and exposes them through a Model Context Protocol (MCP) server. This allows you to save tokens and share context between agents and teammates.

Let’s walk through the design of Ariadne.

Requirements

The most important functional requirements are

The non-functional requirements are:

Data Model

There are only two pieces of information to retrieve: full documents and semantic chunks. These entities are modeled separately because they have different access patterns. Full documents are retrieved by identifier, while chunks are retrieved through semantic similarity search.

ChromaDB supports both access patterns and is a good fit for the initial implementation. It is lightweight, easy to deploy locally, persists to disk, and provides metadata filtering alongside vector similarity search. In ChromaDB, data is grouped into collections, and each item in ChromaDB has a document, embedding, and user-defined metadata. Ariadne uses those primitives to model its two entities as collections:

API

From the user’s perspective, interacting with Ariadne consists of three steps:

  1. Create a pair of collections to hold complete documents and semantic chunks.
  2. Upload documents for asynchronous processing.
  3. Query the indexed data via the MCP server.

On the data ingestion side, there are two endpoints:

create_collection configures a collection with a given name, and process_document begins the processing pipeline. The processing server extracts text, performs any configured LLM enrichments, chunks the document, and stores both the full document and its chunks in ChromaDB.

Once documents have been indexed, they become available through two MCP tools:

get_full_document retrieves the complete source document from the Document collection by id or name, while search performs semantic similarity search for query over the Chunk collection, optionally filtered by document_id. New documents can continue to be uploaded while the database is queried for previously indexed data.

High-Level Design

At a high-level, the system is composed of a few key components, plus the embedding and enrichment models they call out to:

The following diagram provides an overview of the system:

High-Level Diagram

In the processing server, documents are converted via preconfigured Docling pipelines, which can have LLM enrichments configured to run locally or via an API, though this is not yet implemented. Then the converted documents are serialized before being added to the Document collection and chunked before being added to the Chunk collection. A hybrid chunking strategy is used to chunk documents based on both document hierarchy and token boundaries.

ChromaDB is configured to generate embeddings using Ollama, and the MCP server is implemented with FastMCP over a Streamable HTTP transport.

Scaling the System

Ariadne is currently designed to run on a single machine, which is sufficient for its intended use. Most users will index a relatively small corpus of documents, making a local deployment the best option. Nevertheless, the architecture leaves a clear path to horizontal scaling should future requirements demand it.

Here is what the scaled system could look like:

Scaled Diagram

The first bottleneck is document ingestion. As the number of uploads increases, document processing can be scaled horizontally by introducing multiple processing workers behind an API gateway. Rather than processing uploads synchronously, incoming documents can be placed onto a message queue. This decouples uploads from processing, provides built-in retry behavior, can guarantee at-least-once delivery, and allows workers to consume documents at whatever rate available compute permits.

The MCP server is naturally stateless, making it straightforward to run multiple instances behind an MCP gateway.

Eventually, the vector database may become a limiting factor as collections or query volume grow. At that point, ChromaDB can be replaced with a distributed vector database or a sharded deployment. Because Ariadne organizes data into independent collections, which should be small, sharding by collection is a natural strategy.

Finally, a semantic cache can be introduced in front of the vector database to reduce repeated similarity searches. Agent workflows often issue semantically similar queries, making semantic caching an effective way to lower database load and improve response latency. Since document ingestion is relatively infrequent, cache invalidation remains simple compared to systems with frequent writes.

This evolution allows each component to scale independently as usage grows.

· 5 min read