workflow active
Document Semantic Search
Process documents and build semantic search with OpenAI embeddings, Gemini, and Qdrant vector database. 780+ views on n8n.
n8nOpenAIGeminiQdrant
Overview
An n8n workflow for building semantic search over your documents. Processes PDFs, docs, and text files, creates embeddings with OpenAI, stores them in Qdrant, and enables AI-powered search with Gemini.
780+
Views
2
AI Models
Qdrant
Vector DB
How It Works
Ingestion Pipeline
- Upload Documents - PDFs, DOCX, TXT, or URLs
- Text Extraction - Convert to plain text
- Chunking - Split into semantic chunks
- Embeddings - Generate vectors with OpenAI
- Storage - Store in Qdrant vector database
Search Pipeline
- Query - Receive search query
- Embed Query - Convert to vector
- Vector Search - Find similar chunks in Qdrant
- Context Assembly - Gather relevant chunks
- AI Answer - Gemini synthesises response
Workflow Components
Ingestion3
Doc processing pipeline
Embeddings1
OpenAI ada-002
Vector DB1
Qdrant storage
Search1
Semantic matching
AI1
Gemini answers
Output1
Formatted response
Features
- Multi-Format Support - PDF, DOCX, TXT, Markdown
- Smart Chunking - Preserves semantic boundaries
- Hybrid Search - Combines vector + keyword search
- Source Attribution - Links back to original documents
- Incremental Updates - Add new docs without full reindex
Architecture
Documents → Text Extraction → Chunking → OpenAI Embeddings
↓
Qdrant DB
↓
Query → Embed Query → Vector Search → Gemini Answer
Example Query
User: “What’s our refund policy for software subscriptions?”
System:
Based on the company policies document:
**Software Subscription Refunds**
- Full refund available within 14 days of purchase
- Pro-rata refund for annual plans cancelled after 14 days
- No refund for monthly plans (cancel before renewal)
Special cases:
- Technical issues preventing use: Full refund at any time
- Billing errors: Immediate correction + refund
📄 Source: Company-Policies-2024.pdf (Page 12)
Use Cases
Knowledge Base - Search internal documentation
Legal/Compliance - Find relevant policy sections
Research - Search academic papers and reports
Customer Support - Find answers in product docs
Configuration
| Component | Options |
|---|---|
| Embedding Model | text-embedding-ada-002, text-embedding-3-small |
| Chunk Size | 500-2000 tokens |
| Overlap | 50-200 tokens |
| Vector DB | Qdrant (self-hosted or cloud) |
| LLM | Gemini Pro, GPT-4 |
Interested in a similar solution?
Let's talk about your project