Spring AI with RAG and Vector Search
Spring AI RAG vector search integration brings the power of retrieval-augmented generation to the Java ecosystem. Spring AI 1.0, released in early 2026, provides first-class support for connecting language models with your proprietary data through vector databases, making it straightforward to build AI features in existing Spring Boot applications.
This guide walks you through building a production RAG pipeline — from ingesting documents and generating embeddings to querying vector stores and augmenting LLM prompts with relevant context. If your team already runs Spring Boot in production, Spring AI is the fastest path to adding intelligent search and Q&A capabilities.
Understanding the RAG Architecture
RAG solves a fundamental limitation of language models: they only know what they were trained on. By retrieving relevant documents from your data and injecting them into the prompt, you get accurate answers grounded in your specific content.
RAG Pipeline Flow:
1. INGESTION (offline)
Documents → Chunking → Embedding Model → Vector Database
2. RETRIEVAL (runtime)
User Query → Embedding → Vector Similarity Search → Top-K Documents
3. AUGMENTATION (runtime)
System Prompt + Retrieved Documents + User Query → LLM → Response
Setting Up Spring AI with PGVector
<!-- pom.xml dependencies -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
# application.yml
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
temperature: 0.3
embedding:
options:
model: text-embedding-3-small
vectorstore:
pgvector:
dimensions: 1536
index-type: HNSW
distance-type: COSINE_DISTANCE
datasource:
url: jdbc:postgresql://localhost:5432/aiapp
Document Ingestion Pipeline
@Service
public class DocumentIngestionService {
private final VectorStore vectorStore;
private final TokenTextSplitter textSplitter;
public DocumentIngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
this.textSplitter = new TokenTextSplitter(
800, // chunk size (tokens)
200, // overlap (tokens)
5, // min chunk size
10000, // max chunk size
true // keep separators
);
}
public void ingestDocuments(List<Resource> resources) {
var documentReader = new TikaDocumentReader(resources);
var documents = documentReader.get();
// Split into chunks with metadata
var chunks = textSplitter.apply(documents);
// Add custom metadata for filtering
chunks.forEach(chunk -> {
chunk.getMetadata().put("source", chunk.getMetadata()
.getOrDefault("source", "unknown"));
chunk.getMetadata().put("ingested_at",
Instant.now().toString());
});
// Store embeddings in PGVector
vectorStore.add(chunks);
log.info("Ingested {} chunks from {} documents",
chunks.size(), documents.size());
}
// Ingest from various sources
public void ingestPDF(String path) {
ingestDocuments(List.of(new FileSystemResource(path)));
}
public void ingestWebPage(String url) {
var reader = new JsoupDocumentReader(url);
var docs = textSplitter.apply(reader.get());
vectorStore.add(docs);
}
}
Building the RAG Query Service
@Service
public class RagQueryService {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public RagQueryService(ChatClient.Builder builder,
VectorStore vectorStore) {
this.chatClient = builder
.defaultSystem("""
You are a helpful assistant that answers questions
based on the provided context. If the context doesn't
contain relevant information, say so honestly.
Always cite which document your answer comes from.
""")
.build();
this.vectorStore = vectorStore;
}
public String query(String userQuestion) {
// Retrieve relevant documents
var searchRequest = SearchRequest.query(userQuestion)
.withTopK(5)
.withSimilarityThreshold(0.7);
var relevantDocs = vectorStore.similaritySearch(searchRequest);
// Build context from retrieved documents
String context = relevantDocs.stream()
.map(doc -> "Source: %s
Content: %s".formatted(
doc.getMetadata().get("source"),
doc.getContent()))
.collect(Collectors.joining("
---
"));
// Augmented prompt with retrieved context
return chatClient.prompt()
.user(u -> u.text("""
Context:
{context}
Question: {question}
Answer based on the context above:
""")
.param("context", context)
.param("question", userQuestion))
.call()
.content();
}
}
Advanced Patterns
Metadata Filtering
// Filter by metadata during retrieval
var searchRequest = SearchRequest.query(question)
.withTopK(5)
.withFilterExpression(new FilterExpressionBuilder()
.eq("department", "engineering")
.and(b -> b.gte("ingested_at", "2026-01-01"))
.build());
Streaming Responses
@GetMapping(value = "/api/ask", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamAnswer(@RequestParam String question) {
var context = retrieveContext(question);
return chatClient.prompt()
.user(buildPrompt(context, question))
.stream()
.content();
}
When NOT to Use RAG
RAG adds complexity and latency. Skip it when the LLM already knows the answer (general knowledge), when data changes faster than you can re-index, or when exact keyword search (Elasticsearch) is sufficient. Consequently, evaluate whether a simple prompt or fine-tuned model would serve your use case better.
Key Takeaways
Spring AI RAG brings intelligent search to Java applications with minimal boilerplate. PGVector integration means you can add vector search to your existing PostgreSQL database. As a result, start with a small document corpus, measure retrieval quality, and expand as you validate the approach with real users.