I’ve built enough RAG systems in Python to know the pain points: wiring embeddings manually, juggling vector DB clients, managing chunkers, and keeping the whole pipeline consistent..

If you have not explored my rag series I highly recommend checking out : Phase 1 and Phase 2.

Now I wanted to see how far the Java ecosystem has come — and Spring AI is the first framework that actually feels engineered for real applications, not just demos.

This post breaks down what Spring AI really gives you, why it’s fundamentally different from the Python-first ecosystem, and how we can assemble a clean, production-ready RAG pipeline using Spring Boot + pgvector.

Why Spring AI?

Most RAG systems fail not because embeddings or LLMs are hard — they fail because the surrounding infrastructure becomes unmanageable. That’s the exact gap Spring AI fills: it gives Java developers a consistent, production-grade abstraction layer for LLM pipelines.

Unified abstractions for LLMs, embeddings, and vector stores → swap providers using config, and not the code.
Auto-configured RAG pipeline: Embeddings, Chunking, and Vector operations work without manual wiring.
Enterprise-grade Spring integration: Security, Observability, Scaling, and Deployment fit into existing Java stacks.

In short ,You get all the goodness of spring in your AI Systems.

Core abstractions that matter

Spring AI ships with a set of interfaces that remove all the manual plumbing:

Class/Interface Name	Parent Class / Interface	Purpose	Examples
ChatClient(I)	NA	Fluent API client for interacting with AI models (LLMs).	details
ChatModel(I)	NA	Low-level interface that connects directly to the AI Provider	details
ChatMemory(I)	NA	Interface for storing and retrieving chat conversation history.	details
JdbcChatMemoryRepository	ChatMemoryRepository(I)	Persists chat memory to a JDBC-compatible database	InMemoryChatMemoryRepository, CassandraChatMemoryRepository
SearchRequest	NA	Request object to define search parameters.	details
Document	NA	Represents a text document with content and metadata.	details
PagePdfDocumentReader	DocumentReader	Reads PDF files and converts them into Document objects.	TikaDocumentReader, TextDocumentReader
TokenTextSplitter	TextSplitter	Splits large documents into smaller chunks based on token count.	TextSplitter (Abstract), RecursiveCharacterTextSplitter

Spring AI's abstractions allow you to switch underlying providers (e.g., swapping OpenAI for Azure or Ollama, or Postgres for Pinecone) with zero code changes to your business logic. They also eliminate boilerplate by providing a unified, fluent API (ChatClient) for common patterns like memory management, RAG, and function calling.

Auto-configuration with SpringBoot

In Python RAG, you have to wire everything manually:

load embedding model
pass embedding model into vector DB client
handle text → embedding conversions yourself

Spring AI kills that entire category of boilerplate.

So, if you include following dependency in you pom.xml

spring-ai-starter-model-openai , spring-ai-starter-vector-store-pgvector, spring-ai-pdf-document-reader, spring-ai-starter-model-chat-memory-repository-jdbc

Spring will:

detect the OpenAiEmbeddingModel
inject it automatically into the PgVectorStore
trigger embeddings implicitly whenever you call vectorStore.add() or .similaritySearch()
Process the documents
Enable memoization by keeping last 'n' messages in the context.

Architecture we’re building

For this RAG baseline, I'm using:

Spring Boot 3 + Java 21
OpenAI via Spring AI starter
pgvector inside PostgreSQL
SpringAI 1.0.3

Technical Walkthrough

1. PDF Ingestion Pipeline

Goal:

Read PDF → split into chunks → attach metadata → persist embeddings.

public void processPdf(File pdfFile, String sessionId) {
    PagePdfDocumentReader reader = new PagePdfDocumentReader(new FileSystemResource(pdfFile));
    List<Document> documents = reader.read();

    TokenTextSplitter splitter = new TokenTextSplitter();
    List<Document> splitDocuments = splitter.apply(documents);

    splitDocuments.forEach(doc ->
        doc.getMetadata().put("session_id", sessionId)
    );

    vectorStore.add(splitDocuments);  // Embeddings done implicitly
}

Points to notice:

The chunker is token-aware
Metadata tagging is mandatory if you want per-session isolation
Embeddings happen automatically under the hood

2. Retrieval + Response Generation

public String query(String sessionId, String question) {
        // save user message for UI history
        MessageEntity userMsg = new MessageEntity();
        userMsg.setSessionId(sessionId);
        userMsg.setRole("user");
        userMsg.setContent(question);
        messageRepo.save(userMsg);

        // 1. Retrieve similar documents
        SearchRequest request = SearchRequest.builder()
                .query(question)
                .topK(6)
                .filterExpression("session_id == '" + sessionId + "'")
                .build();
        List<Document> docs = vectorStore.similaritySearch(request);

        String context = docs.isEmpty() ? ""
                : docs.stream().map(Document::getText).collect(Collectors.joining("\n---\n"));

        // 2. Generate Response using ChatClient with Memory
        String systemText = "You are an AI assistant. Use the context to answer the question.\n" +
                "If the answer is not in the context, respond \"I don't have enough information.\"\n\n" +
                "Context:\n" + context;

        String answer = chatClient.prompt()
                .system(systemText)
                .user(question)
                .advisors(a -> a.param("chat_memory_conversation_id", sessionId)
                        .param("chat_memory_retrieve_size", 10))
                .call()
                .content();

        // save assistant message for UI history
        MessageEntity botMsg = new MessageEntity();
        botMsg.setSessionId(sessionId);
        botMsg.setRole("assistant");
        botMsg.setContent(answer);
        messageRepo.save(botMsg);

        return answer;
    }

Capabilities we’re using:

Top-K retrieval
Session filtering using metadata
Threshold-based recall filtering
Automatic embedding of the query

What to explore next ?

This is where Spring AI becomes more interesting than the standard Python RAG pipeline:

1. Model Context Protocol (MCP)

Standard, structured way for models to call tools or external data sources.

2. Function Calling

Let the LLM run Java methods — not hallucinate answers. This is how you move from RAG → agentic systems.

3. Multimodal ingestion

Images, tables, audio — not just text.

4. RAG evaluation framework

Automated scoring of answer accuracy, relevance, and grounding.

Demo & Source Code

📦 GitHub Repository: https://github.com/tpushkarsingh/rag_springai

🌐 Read more tutorials: https://blog.slayitcoder.in

💼 Connect with me on LinkedIn: https://www.linkedin.com/in/tpushkarsingh

Final Take

Spring AI finally gives Java developers a clean, maintainable, production-grade path to build RAG systems — without reinventing every AI primitive. If you’ve written enough glue code in Python, this feels refreshing.

And more importantly: This setup is stable enough to form the base of a real RAG product.

Subscribe to the SlayItCoder newsletter

Building RAG with Spring AI: A Practical Guide for Java Developers