Context MCP Server - Jeremy Dawes

Overview

A self-hosted documentation RAG system that provides intelligent search across curated technical documentation. Built to replace cluttered tools like Context7 with a focused, manageable set of docs you actually use.

MCP Tools

40+

Doc Sources

23K+

Indexed Chunks

Search Types

The Problem

Documentation search tools accumulate clutter:

Context7 has thousands of docs you never use
Finding relevant information takes too long
No control over what’s indexed
Stale documentation causes incorrect answers

AI assistants need access to current, curated documentation.

Solution

Complete control over your documentation index:

Search3

Hybrid, vector, keyword

Sources5

CRUD + scraping

Content4

Chunks, pages, context

Discovery3

Related, suggest, compare

Stats2

Freshness, metrics

Export1

Markdown to R2

Key Capabilities

Hybrid Search

Three search modes for different needs:

Mode	Best For
Hybrid	General queries (vector + BM25 keyword)
Vector	Conceptual/semantic questions
Keyword	Exact function/API name lookups

Intelligent Scraping Pipeline

Multi-stage content processing:

URL Discovery - Firecrawl /map or browser crawling
Content Extraction - Firecrawl (primary), Browser Rendering, or fetch
Regex Cleanup - Remove nav, footers, UI chrome
AI Cleanup - Workers AI (Qwen 32B) removes marketing fluff
Semantic Chunking - Split at H2/H3 boundaries
Embedding - BGE-base-en-v1.5 via Workers AI
Indexing - Store in Vectorize for similarity search

Auto-Updates

Configurable refresh schedules:

Hourly - Fast-changing API docs
Daily - Most documentation
Weekly - Stable reference material

Change detection via ETag/Last-Modified headers and content hashing.

MCP Tools (18 total)

Search:

search_docs - Hybrid semantic + keyword search
lookup_api - Exact function/class name lookup
search_code - Find code examples

Source Management:

list_sources - View indexed documentation
add_source - Add new documentation URL
update_source - Modify source config
delete_source - Remove documentation
scrape_source - Trigger re-indexing

Content Retrieval:

get_chunk - Retrieve specific chunk by ID
get_full_page - Get complete page content
get_page_context - Get chunk with surrounding context

Discovery:

get_related_chunks - Find similar content
suggest_related - AI-powered suggestions
compare_chunks - Compare two chunks

Architecture

Cloudflare Workers
├── Admin UI (React SPA)
├── REST API (Hono)
├── MCP Server (Durable Objects)
└── Queue Handler (async processing)

Storage:
├── D1 (metadata, sources, pages, chunks)
├── Vectorize (768-dim embeddings)
├── R2 (markdown exports)
└── KV (OAuth sessions)

Features

18 MCP Tools - Complete documentation access
Hybrid Search - Vector + keyword for accuracy
AI Cleanup - < 1% cruft rate in indexed content
Auto-Updates - Cron-triggered re-indexing
Admin UI - Visual source management
OAuth - Google authentication with allowlist
Markdown Export - Download docs as combined markdown

Quality Metrics

Production results from indexed sources:

Source	Chunks	Cruft Rate
Vercel AI SDK	1,199	0.25%
Hono	525	0%
shadcn/ui	490	0.6%

Two-stage cleanup (regex + AI) achieves < 1% cruft across all sources.

Use Cases

Claude Code Integration - Add @context MCP server for instant doc search

API Lookups - Find exact function signatures and parameters

Learning New Frameworks - Search concepts across multiple doc sources

Code Examples - Find implementation patterns and snippets

Version Tracking - Keep docs current with auto-updates