Context MCP Server
Self-hosted documentation RAG with 18 MCP tools. Semantic search across 40+ curated technical doc sources with hybrid vector + keyword search.
Overview
A self-hosted documentation RAG system that provides intelligent search across curated technical documentation. Built to replace cluttered tools like Context7 with a focused, manageable set of docs you actually use.
The Problem
Documentation search tools accumulate clutter:
- Context7 has thousands of docs you never use
- Finding relevant information takes too long
- No control over what’s indexed
- Stale documentation causes incorrect answers
AI assistants need access to current, curated documentation.
Solution
Complete control over your documentation index:
Hybrid, vector, keyword
CRUD + scraping
Chunks, pages, context
Related, suggest, compare
Freshness, metrics
Markdown to R2
Key Capabilities
Hybrid Search
Three search modes for different needs:
| Mode | Best For |
|---|---|
| Hybrid | General queries (vector + BM25 keyword) |
| Vector | Conceptual/semantic questions |
| Keyword | Exact function/API name lookups |
Intelligent Scraping Pipeline
Multi-stage content processing:
- URL Discovery - Firecrawl
/mapor browser crawling - Content Extraction - Firecrawl (primary), Browser Rendering, or fetch
- Regex Cleanup - Remove nav, footers, UI chrome
- AI Cleanup - Workers AI (Qwen 32B) removes marketing fluff
- Semantic Chunking - Split at H2/H3 boundaries
- Embedding - BGE-base-en-v1.5 via Workers AI
- Indexing - Store in Vectorize for similarity search
Auto-Updates
Configurable refresh schedules:
- Hourly - Fast-changing API docs
- Daily - Most documentation
- Weekly - Stable reference material
Change detection via ETag/Last-Modified headers and content hashing.
MCP Tools (18 total)
Search:
search_docs- Hybrid semantic + keyword searchlookup_api- Exact function/class name lookupsearch_code- Find code examples
Source Management:
list_sources- View indexed documentationadd_source- Add new documentation URLupdate_source- Modify source configdelete_source- Remove documentationscrape_source- Trigger re-indexing
Content Retrieval:
get_chunk- Retrieve specific chunk by IDget_full_page- Get complete page contentget_page_context- Get chunk with surrounding context
Discovery:
get_related_chunks- Find similar contentsuggest_related- AI-powered suggestionscompare_chunks- Compare two chunks
Architecture
Cloudflare Workers
├── Admin UI (React SPA)
├── REST API (Hono)
├── MCP Server (Durable Objects)
└── Queue Handler (async processing)
Storage:
├── D1 (metadata, sources, pages, chunks)
├── Vectorize (768-dim embeddings)
├── R2 (markdown exports)
└── KV (OAuth sessions)
Features
- 18 MCP Tools - Complete documentation access
- Hybrid Search - Vector + keyword for accuracy
- AI Cleanup - < 1% cruft rate in indexed content
- Auto-Updates - Cron-triggered re-indexing
- Admin UI - Visual source management
- OAuth - Google authentication with allowlist
- Markdown Export - Download docs as combined markdown
Quality Metrics
Production results from indexed sources:
| Source | Chunks | Cruft Rate |
|---|---|---|
| Vercel AI SDK | 1,199 | 0.25% |
| Hono | 525 | 0% |
| shadcn/ui | 490 | 0.6% |
Two-stage cleanup (regex + AI) achieves < 1% cruft across all sources.
Use Cases
Claude Code Integration - Add @context MCP server for instant doc search
API Lookups - Find exact function signatures and parameters
Learning New Frameworks - Search concepts across multiple doc sources
Code Examples - Find implementation patterns and snippets
Version Tracking - Keep docs current with auto-updates
Interested in a similar solution?
Let's talk about your project