Spring AI — Artificial Intelligence & Large Language Models

Complete guide to Spring AI: the official Spring framework for integrating AI and Large Language Models (LLMs) into Java applications. Learn architecture, components, and build real-world AI-powered features.

Table of Contents

1. What is Spring AI?

Spring AI is an official Spring project that provides a unified abstraction layer for integrating artificial intelligence capabilities into Java applications. It simplifies working with Large Language Models (LLMs), embedding models, vector databases, and AI-powered features while maintaining Spring's familiar patterns: dependency injection, auto-configuration, and testing support.

Instead of writing provider-specific code for OpenAI, Anthropic, or other AI services, Spring AI offers a consistent API that lets you switch between providers or models with minimal code changes. It follows Spring Boot's convention-over-configuration philosophy, making AI integration as straightforward as adding a dependency and configuring properties.

2. Why Use Spring AI?

  • Provider abstraction: write code once and switch between OpenAI, Anthropic, Azure OpenAI, Ollama, and other providers by changing configuration.
  • Spring Boot integration: auto-configuration, property-based setup, and seamless integration with the Spring ecosystem.
  • Modular architecture: include only the components you need (chat, embeddings, vector stores, RAG) to keep dependencies minimal.
  • Production-ready: built-in support for retries, rate limiting, observability, and error handling.
  • Testing support: easy mocking and testing of AI components using Spring's testing framework.
  • RAG support: built-in Retrieval Augmented Generation (RAG) framework for context-aware AI applications.

3. Spring AI Architecture

Spring AI follows a modular, layered architecture that separates concerns and promotes flexibility:

3.1 Architecture Layers

  1. Application Layer: Your Spring Boot application code (controllers, services, repositories).
  2. Spring AI Abstractions: Core interfaces like ChatModel, EmbeddingModel, VectorStore.
  3. Provider Implementations: Concrete implementations for different AI providers (OpenAI, Anthropic, etc.).
  4. AI Provider APIs: External HTTP APIs or local model servers.

This architecture allows you to:

  • Write business logic against stable Spring AI interfaces
  • Switch AI providers without changing application code
  • Test with mock implementations
  • Combine multiple providers in the same application
graph TB subgraph "Application Layer" A[Controllers] --> B[Services] B --> C[Repositories] end subgraph "Spring AI Abstractions" D[ChatClient/ChatModel] E[EmbeddingModel] F[VectorStore] G[Document] end subgraph "Provider Implementations" H[OpenAI Implementation] I[Anthropic Implementation] J[Ollama Implementation] K[PostgreSQL VectorStore] L[Pinecone VectorStore] end subgraph "AI Provider APIs" M[OpenAI API] N[Anthropic API] O[Local Ollama Server] P[Vector Database] end B --> D B --> E B --> F D --> H D --> I D --> J E --> H E --> I F --> K F --> L H --> M I --> N J --> O K --> P L --> P style A fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style B fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style C fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style D fill:#fff4e1,stroke:#f57c00,stroke-width:2px style E fill:#fff4e1,stroke:#f57c00,stroke-width:2px style F fill:#fff4e1,stroke:#f57c00,stroke-width:2px style G fill:#fff4e1,stroke:#f57c00,stroke-width:2px style H fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style I fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style J fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style K fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style L fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

4. Core Components

The following diagram illustrates how Spring AI components interact:

graph TB subgraph "Spring AI Core Components" A[ChatClient] --> B[ChatModel] C[EmbeddingModel] D[VectorStore] E[Document] F[PromptTemplate] end subgraph "Provider Implementations" B --> G[OpenAI ChatModel] B --> H[Anthropic ChatModel] B --> I[Ollama ChatModel] C --> J[OpenAI EmbeddingModel] C --> K[Anthropic EmbeddingModel] D --> L[PostgreSQL VectorStore] D --> M[Pinecone VectorStore] D --> N[Redis VectorStore] end subgraph "RAG Components" O[Document Loader] --> E E --> P[Text Splitter] P --> C C --> D D --> Q[RetrievalAugmentationAdvisor] Q --> B end A --> B F --> A style A fill:#e1f5ff,stroke:#0273bd,stroke-width:3px style B fill:#fff4e1,stroke:#f57c00,stroke-width:2px style C fill:#fff4e1,stroke:#f57c00,stroke-width:2px style D fill:#fff4e1,stroke:#f57c00,stroke-width:2px style E fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style F fill:#fff4e1,stroke:#f57c00,stroke-width:2px

4.1 ChatModel

The ChatModel interface is the primary abstraction for interacting with LLMs. It handles conversational AI, text generation, and chat-based interactions.

Key Features:

  • Multi-turn conversations with message history
  • System prompts and user messages
  • Streaming responses for real-time interactions
  • Function calling and tool integration

4.2 EmbeddingModel

EmbeddingModel converts text into numerical vectors (embeddings) that capture semantic meaning. Essential for semantic search, similarity matching, and RAG applications.

4.2.1 Understanding Vector Embeddings

Vector Embeddings are numerical representations of text (or other data) in a high-dimensional space. They transform words, sentences, or documents into arrays of numbers that capture semantic meaning.

Imagine representing the meaning of words as numbers:

"cat"    → [0.2, 0.8, 0.1, 0.5, ...]  (1536 numbers for OpenAI)
"dog"    → [0.3, 0.7, 0.2, 0.4, ...]
"car"    → [0.1, 0.2, 0.9, 0.3, ...]

Key Properties:

  • Similar meaningsSimilar vectors: Words with related meanings produce vectors that are close together in the high-dimensional space
  • Different meaningsDifferent vectors: Unrelated words produce vectors that are far apart
  • Fixed size: Each embedding has the same number of dimensions (e.g., 1536 for OpenAI's text-embedding-ada-002)

4.2.2 How Embeddings Work

The embedding process involves three main steps:

  1. Text Input: The model receives text input (e.g., "What is artificial intelligence?")
  2. Embedding Model Processing: The neural network breaks text into tokens, analyzes context and meaning, and generates a numerical representation
  3. Vector Output: The model outputs a fixed-size vector (e.g., 1536 numbers for OpenAI ada-002)

In Spring AI, this process is simplified:

// Spring AI makes this simple:
EmbeddingModel embeddingModel; // Auto-configured

// Convert text to vector
List<Double> embedding = embeddingModel.embed("Hello, world!");
// Result: [0.123, -0.456, 0.789, ...] (1536 dimensions)

4.2.3 Embedding Models

An Embedding Model is a neural network that converts text into vectors. Different models have different characteristics:

  • OpenAI text-embedding-ada-002: 1536 dimensions, 8191 token context length, affordable and high quality
  • OpenAI text-embedding-3-large: 3072 dimensions, higher quality
  • OpenAI text-embedding-3-small: 1536 dimensions, faster processing
  • Other providers: Cohere, local models, etc.

Use Cases:

  • Document similarity search
  • Semantic search in knowledge bases
  • Clustering and classification
  • RAG context retrieval

4.3 VectorStore

VectorStore is an abstraction for storing and querying vector embeddings. A Vector Store is a database optimized for storing and searching vectors efficiently.

4.3.1 Why Vector Stores?

Problem: Traditional databases can't efficiently search by similarity. They're designed for exact matches, not semantic similarity.

Solution: Vector stores use specialized indexes (like HNSW) for fast similarity search, allowing you to find documents with similar meanings rather than exact text matches.

4.3.2 How Vector Stores Work

The vector store process involves several steps:

  1. Store Document: Add a document (e.g., "AI is transforming...")
  2. Generate Embedding: Convert the document to a vector [0.123, -0.456, ...]
  3. Store in Vector Store: Save the vector along with the document content and metadata
  4. Query: When searching (e.g., "What is AI?"), generate a query embedding
  5. Find Similar Vectors: Use distance metrics to find the most similar vectors
  6. Return Top Results: Retrieve the most similar documents

4.3.3 Distance Metrics

Distance metrics measure how similar two vectors are. Spring AI supports several metrics:

1. Cosine Distance

  • Formula: 1 - cosine_similarity
  • What it measures: Angle between vectors (ignores magnitude)
  • Range: 0 (identical) to 2 (opposite)
  • Why use it: Focuses on direction, not magnitude; excellent for text embeddings; normalized range

2. Euclidean Distance (L2)

  • Formula: √(Σ(xi - yi)²)
  • What it measures: Straight-line distance between points
  • When to use: When magnitude matters in your use case

3. Dot Product

  • Formula: Σ(xi × yi)
  • What it measures: How aligned vectors are
  • When to use: When you need raw similarity scores

4.3.4 Index Types

Vector stores use specialized indexes to enable fast similarity search:

HNSW (Hierarchical Navigable Small World)

  • What it is: Graph-based index for fast similarity search
  • Benefits: O(log n) search time, high recall rate, scalable to millions of vectors
  • Trade-offs: Uses more memory, takes time to build the index
  • Best for: Production applications requiring high accuracy and performance

IVFFlat

  • What it is: Inverted file index
  • Benefits: Memory efficient, fast to build
  • Trade-offs: Lower recall than HNSW, slower search for large datasets
  • Best for: Smaller datasets or when memory is constrained

4.3.5 Supported Vector Stores

Spring AI supports multiple vector databases:

  • PostgreSQL with pgvector: Native SQL integration, HNSW index support, cosine distance optimization
  • Pinecone: Managed vector database service
  • Chroma: Open-source vector database
  • Weaviate: Vector search engine
  • Milvus: Open-source vector database
  • Redis: In-memory vector storage
  • Simple in-memory store: For testing and development

4.3.6 PgVectorStore Configuration

When using PostgreSQL with pgvector, you can configure the distance type:

@Bean
public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel) {
    return new PgVectorStore.Builder(jdbcTemplate, embeddingModel)
        .withDistanceType(PgVectorStore.PgDistanceType.COSINE_DISTANCE)
        .build();
}

4.4 Document

The Document class represents text content with metadata. Used for storing and retrieving documents in vector stores for RAG applications.

4.5 RAG (Retrieval Augmented Generation)

RAG enhances LLM responses with relevant context from your documents. It combines retrieval of relevant information with generation of responses based on that context.

4.5.1 The Problem RAG Solves

Without RAG:

User: "What is our refund policy?"
LLM: [Generic answer based on training data, may be outdated or incorrect]

With RAG:

User: "What is our refund policy?"
System: 
  1. Search documents for "refund policy"
  2. Find relevant sections from YOUR documents
  3. Add to prompt: "Based on: [your actual policy]..."
  4. LLM: [Answer based on YOUR current documents]

4.5.2 RAG Components

Spring AI provides a complete RAG framework that combines:

  • Document Loading: Load documents from various sources (PDF, text files, web pages)
  • Text Splitting: Chunk documents into manageable pieces
  • Embedding: Convert chunks to vectors
  • Storage: Store in vector databases
  • Retrieval: Find relevant context for queries
  • Generation: Use retrieved context to generate accurate responses

4.5.3 RAG Process Flow

The RAG process follows these steps:

  1. User Query: User asks a question (e.g., "What is machine learning?")
  2. Generate Query Embedding: Convert the query to a vector [0.123, -0.456, ...]
  3. Vector Similarity Search: Find top K similar documents in the vector store
  4. Retrieve Document Context: Extract the content from the most similar documents
  5. Build Prompt with Context: Combine retrieved context with the user's question
  6. Send to LLM: The LLM generates an answer using the provided context
  7. Return Answer: Return the generated response to the user

The RAG workflow is illustrated below:

graph LR subgraph "Document Processing" A[Load Documents] --> B[Split into Chunks] B --> C[Generate Embeddings] C --> D[Store in Vector DB] end subgraph "Query Processing" E[User Query] --> F[Generate Query Embedding] F --> G[Similarity Search] G --> H[Retrieve Top K Documents] end subgraph "Response Generation" H --> I[Build Context] I --> J[Create Prompt with Context] J --> K[LLM Generation] K --> L[Return Response] end D --> G style A fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style B fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style C fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style D fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style E fill:#fff4e1,stroke:#f57c00,stroke-width:2px style F fill:#fff4e1,stroke:#f57c00,stroke-width:2px style G fill:#fff4e1,stroke:#f57c00,stroke-width:2px style H fill:#fff4e1,stroke:#f57c00,stroke-width:2px style I fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style J fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style K fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style L fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

4.6 Prompt Templates

Spring AI supports prompt templating using StringTemplate, allowing dynamic prompt construction with variables and conditionals.

4.6.1 What are Prompt Templates?

Prompt Templates are reusable text patterns for LLM interactions. They provide a structured way to create prompts with placeholders that get filled with dynamic content.

4.6.2 Why Use Templates?

Without Template (hard to maintain):

String prompt = "Answer: " + question; // Hard to maintain, error-prone

With Template (maintainable and consistent):

PromptTemplate template = new PromptTemplate("""
    Answer the following question based on the provided context.
    If the answer cannot be found in the context, say "I don't know."
    
    Context: {context}
    Question: {question}
    """);

Prompt prompt = template.create(Map.of(
    "context", context,
    "question", question
));

Benefits:

  • Consistency: Same format every time, ensuring predictable LLM behavior
  • Maintainability: Update prompt structure in one place
  • Reusability: Use the same template across different queries and contexts
  • Type Safety: Compile-time checking of template variables

4.6.3 RAG Prompt Template Structure

For RAG (Retrieval-Augmented Generation) applications, prompt templates typically include:

  1. Instructions: "Answer based on context" - tells the LLM how to use the context
  2. Context Placeholder: {context} - filled with retrieved documents
  3. Question Placeholder: {question} - the user's question
  4. Fallback Instructions: "I don't know" if context is insufficient - prevents hallucination

4.7 Model Context Protocol (MCP)

Spring AI supports the Model Context Protocol, enabling AI models to interact with external tools, databases, and services through a standardized interface.

5. Project Setup

To get started with Spring AI, add the Spring AI BOM (Bill of Materials) and the specific dependencies you need.

5.1 Gradle Configuration

Create a build.gradle file with the following configuration:

plugins {
    id 'java'
    id 'org.springframework.boot' version '3.2.0'
    id 'io.spring.dependency-management' version '1.1.4'
}

java {
    sourceCompatibility = '17'
    targetCompatibility = '17'
}

ext {
    springAiVersion = '1.0.0'
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
    }
}

repositories {
    mavenCentral()
}

dependencies {
    // Spring Boot starters
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.springframework.boot:spring-boot-starter-validation'
    
    // Spring AI OpenAI (or use spring-ai-anthropic, spring-ai-ollama, etc.)
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
    
    // Optional: Vector Store (e.g., PostgreSQL with pgvector)
    implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
    
    // Testing
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
    testImplementation 'org.mockito:mockito-core'
    testImplementation 'org.mockito:mockito-junit-jupiter'
    testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

tasks.named('test') {
    useJUnitPlatform()
    testLogging {
        events "passed", "skipped", "failed"
        exceptionFormat "full"
    }
}

5.2 Maven Configuration (Alternative)

If you prefer Maven, use the following pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0</version>
        <relativePath/>
    </parent>

    <properties>
        <spring-ai.version>1.0.0</spring-ai.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <!-- Spring AI OpenAI -->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
        </dependency>

        <!-- Optional: Vector Store -->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
        </dependency>
    </dependencies>
</project>

6. Configuration

Configure Spring AI using application.properties or application.yml. Spring Boot's auto-configuration handles the rest.

6.1 OpenAI Configuration

# application.properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.max-tokens=500

6.2 Anthropic Configuration

# application.properties
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-3-opus-20240229
spring.ai.anthropic.chat.options.temperature=0.7

6.3 Ollama (Local Models) Configuration

# application.properties
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama2

6.4 Vector Store Configuration (PostgreSQL)

# application.properties
spring.datasource.url=jdbc:postgresql://localhost:5432/vectordb
spring.datasource.username=postgres
spring.datasource.password=password

spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE

7. Real-World Examples

7.1 Example 1: Simple Chat Service

A basic service that uses ChatClient to generate text responses:

package com.example.ai.service;

import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class ChatService {
    private final ChatClient chatClient;

    public ChatService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    public String chat(String userMessage) {
        return chatClient.call(userMessage);
    }

    public String chatWithSystemPrompt(String systemPrompt, String userMessage) {
        Prompt prompt = new Prompt(List.of(
            new SystemMessage(systemPrompt),
            new UserMessage(userMessage)
        ));
        return chatClient.call(prompt).getResult().getOutput().getContent();
    }
}

7.2 Example 2: REST Controller for Chat

Expose the chat service via REST API:

package com.example.ai.controller;

import com.example.ai.service.ChatService;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/chat")
public class ChatController {
    private final ChatService chatService;

    public ChatController(ChatService chatService) {
        this.chatService = chatService;
    }

    @PostMapping
    public ChatResponse chat(@RequestBody ChatRequest request) {
        String response = chatService.chat(request.getMessage());
        return new ChatResponse(response);
    }

    // DTOs
    public record ChatRequest(String message) {}
    public record ChatResponse(String response) {}
}

7.3 Example 3: Document Embedding and Vector Store

Store documents as embeddings and perform semantic search:

package com.example.ai.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class DocumentService {
    private final EmbeddingModel embeddingModel;
    private final VectorStore vectorStore;

    public DocumentService(EmbeddingModel embeddingModel, VectorStore vectorStore) {
        this.embeddingModel = embeddingModel;
        this.vectorStore = vectorStore;
    }

    public void addDocument(String content, String metadata) {
        Document document = new Document(content);
        document.getMetadata().put("source", metadata);
        vectorStore.add(List.of(document));
    }

    public List<Document> searchSimilar(String query, int topK) {
        return vectorStore.similaritySearch(
            org.springframework.ai.vectorstore.SearchRequest.builder()
                .withQuery(query)
                .withTopK(topK)
                .build()
        );
    }
}

7.4 Example 4: RAG Application

Complete RAG implementation that retrieves relevant context before generating responses. The following diagram shows the component interactions:

sequenceDiagram participant User participant Controller participant RAGService participant VectorStore participant EmbeddingModel participant ChatClient User->>Controller: POST /api/rag/ask Controller->>RAGService: ask(question) RAGService->>EmbeddingModel: embed(question) EmbeddingModel-->>RAGService: queryVector RAGService->>VectorStore: similaritySearch(queryVector) VectorStore-->>RAGService: relevantDocuments RAGService->>RAGService: buildContext(documents) RAGService->>ChatClient: call(prompt with context) ChatClient-->>RAGService: response RAGService-->>Controller: answer Controller-->>User: JSON response

Implementation code:

package com.example.ai.service;

import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Service
public class RAGService {
    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public RAGService(ChatClient chatClient, VectorStore vectorStore) {
        this.chatClient = chatClient;
        this.vectorStore = vectorStore;
    }

    public String ask(String question) {
        // 1. Retrieve relevant documents
        List<Document> relevantDocs = vectorStore.similaritySearch(
            org.springframework.ai.vectorstore.SearchRequest.builder()
                .withQuery(question)
                .withTopK(5)
                .build()
        );

        // 2. Build context from retrieved documents
        String context = relevantDocs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));

        // 3. Create prompt with context and question
        String promptTemplate = """
            Answer the following question based on the provided context.
            If the answer cannot be found in the context, say "I don't know."

            Context:
            {context}

            Question: {question}
            """;

        PromptTemplate template = new PromptTemplate(promptTemplate);
        Prompt prompt = template.create(Map.of(
            "context", context,
            "question", question
        ));

        // 4. Generate response
        return chatClient.call(prompt).getResult().getOutput().getContent();
    }
}

7.5 Example 5: Streaming Chat Response

Stream responses in real-time for better user experience:

package com.example.ai.controller;

import org.springframework.ai.chat.ChatClient;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;

@RestController
@RequestMapping("/api/chat")
public class StreamingChatController {
    private final ChatClient chatClient;

    public StreamingChatController(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @PostMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestBody ChatRequest request) {
        return chatClient.stream(request.message())
            .map(response -> response.getResult().getOutput().getContent());
    }
}

7.6 Example 6: Multi-Provider Setup

Use multiple AI providers in the same application:

package com.example.ai.service;

import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.ChatModel;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;

@Service
public class MultiProviderService {
    private final ChatModel openAiChatModel;
    private final ChatModel anthropicChatModel;

    public MultiProviderService(
            @Qualifier("openAiChatModel") ChatModel openAiChatModel,
            @Qualifier("anthropicChatModel") ChatModel anthropicChatModel) {
        this.openAiChatModel = openAiChatModel;
        this.anthropicChatModel = anthropicChatModel;
    }

    public String useOpenAI(String prompt) {
        return openAiChatModel.call(prompt).getResult().getOutput().getContent();
    }

    public String useAnthropic(String prompt) {
        return anthropicChatModel.call(prompt).getResult().getOutput().getContent();
    }
}

8. Best Practices

8.1 Error Handling and Retries

Implement retry logic for transient failures:

@Service
public class ResilientChatService {
    private final ChatClient chatClient;
    private final RetryTemplate retryTemplate;

    public ResilientChatService(ChatClient chatClient) {
        this.chatClient = chatClient;
        this.retryTemplate = RetryTemplate.builder()
            .maxAttempts(3)
            .exponentialBackoff(1000, 2, 10000)
            .retryOn(IOException.class)
            .build();
    }

    public String chatWithRetry(String message) {
        return retryTemplate.execute(context -> {
            return chatClient.call(message);
        });
    }
}

8.2 Rate Limiting

Use Spring's rate limiting to prevent API quota exhaustion:

@Service
public class RateLimitedChatService {
    private final ChatClient chatClient;
    private final RateLimiter rateLimiter;

    public RateLimitedChatService(ChatClient chatClient) {
        this.chatClient = chatClient;
        this.rateLimiter = RateLimiter.create(10.0); // 10 requests per second
    }

    public String chat(String message) {
        rateLimiter.acquire();
        return chatClient.call(message);
    }
}

8.3 Caching

Cache responses for repeated queries:

@Service
public class CachedChatService {
    private final ChatClient chatClient;
    private final CacheManager cacheManager;

    public CachedChatService(ChatClient chatClient, CacheManager cacheManager) {
        this.chatClient = chatClient;
        this.cacheManager = cacheManager;
    }

    @Cacheable(value = "chatResponses", key = "#message")
    public String chat(String message) {
        return chatClient.call(message);
    }
}

8.4 Prompt Engineering

  • Use system prompts to define AI behavior and context
  • Structure prompts with clear instructions and examples
  • Use prompt templates for dynamic content
  • Validate and sanitize user inputs before sending to models
  • Test prompts with different models to ensure consistency

8.5 Observability

  • Log prompts and responses (be careful with PII/sensitive data)
  • Track token usage and costs
  • Monitor latency and error rates
  • Use distributed tracing for debugging

9. Testing

Spring AI makes testing easy with mock implementations. The following example demonstrates testing with JUnit 5 and Mockito:

package com.example.ai.service;

import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.ChatResponse;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.when;

@SpringBootTest
class ChatServiceTest {
    @MockBean
    private ChatClient chatClient;

    @Autowired
    private ChatService chatService;

    @Test
    void testChat() {
        // Mock response
        when(chatClient.call("Hello")).thenReturn("Hello, how can I help?");

        // Test
        String response = chatService.chat("Hello");
        assertEquals("Hello, how can I help?", response);
    }

    @Test
    void testChatWithSystemPrompt() {
        // Test with system prompt
        ChatResponse mockResponse = new ChatResponse(
            new ChatResponse.Result(
                new ChatResponse.Result.Output("I'm a helpful assistant.")
            )
        );
        when(chatClient.call(any(Prompt.class)))
            .thenReturn(mockResponse);

        String response = chatService.chatWithSystemPrompt(
            "You are a helpful assistant.",
            "Hello"
        );
        assertEquals("I'm a helpful assistant.", response);
    }
}

Run tests using Gradle:

# Run all tests
./gradlew test

# Run specific test class
./gradlew test --tests ChatServiceTest

# Run with coverage
./gradlew test jacocoTestReport

The component relationships in a typical Spring AI test setup:

graph TB subgraph "Test Configuration" A[SpringBootTest] --> B[Test Context] B --> C[Mock Beans] end subgraph "Service Under Test" D[ChatService] --> E[ChatClient Interface] end subgraph "Mock Implementation" C --> F[Mock ChatClient] F --> G[Stubbed Responses] end subgraph "Test Execution" H[Test Method] --> D D --> F F --> I[Assertions] end A --> H E -.->|injected| F style A fill:#e1f5ff,stroke:#0273bd,stroke-width:2px style D fill:#fff4e1,stroke:#f57c00,stroke-width:2px style F fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px style I fill:#fce4ec,stroke:#c2185b,stroke-width:2px

10. Advanced Concepts

10.1 Token Limits

Tokens are pieces of text that LLMs process. They can be words, subwords, or characters depending on the tokenization method.

10.1.1 Token Limits by Model

  • GPT-4: ~8,192 tokens (input + output combined)
  • GPT-3.5: ~4,096 tokens
  • text-embedding-ada-002: 8,191 tokens per input

10.1.2 Why Token Limits Matter

Problem: LLMs have maximum token limits. If your input exceeds this limit, you'll get an error or the model will truncate the input.

Solutions:

  • Chunking: Split long documents into smaller pieces that fit within token limits
  • Summarization: Summarize content before sending to the LLM
  • Selective Context: Only send the most relevant parts of documents (this is what RAG does!)
  • Streaming: For long outputs, use streaming to handle responses that exceed limits

10.2 Temperature Parameter

Temperature controls the randomness and creativity in LLM responses. It's a parameter you can configure when making API calls.

10.2.1 Temperature Ranges

  • Low (0.1-0.3): Deterministic, focused, consistent responses. Best for factual answers, code generation, or when you need reproducible results.
  • Medium (0.7): Balanced creativity and consistency. Good default for most applications.
  • High (0.9-1.0): Creative, varied, unpredictable. Best for creative writing, brainstorming, or when you want diverse responses.

10.2.2 Example

Prompt: "Complete: The sky is"

Temperature 0.1: "blue" (always the same, most likely answer)
Temperature 0.7: "blue", "cloudy", "clear" (varied but reasonable)
Temperature 1.0: "blue", "purple", "raining cats" (very creative, may be nonsensical)

In Spring AI, you configure temperature in your application properties:

spring.ai.openai.chat.options.temperature=0.7

10.3 Streaming Responses

Streaming sends responses as they're generated, rather than waiting for the complete response before sending it to the client.

10.3.1 Benefits of Streaming

  • Faster Perceived Response: Users see text immediately, improving perceived performance
  • Better User Experience: Feels more interactive and responsive, similar to ChatGPT
  • Lower Latency: Don't wait for complete response before starting to display content
  • Handles Long Responses: Can handle responses that exceed token limits by streaming chunks

10.3.2 Implementation in Spring AI

Spring AI supports streaming through reactive streams (Flux):

Flux<String> stream = chatClient.stream(prompt);
// Returns chunks as they're generated

This is particularly useful for chat interfaces where users expect to see responses appear in real-time.

11. Production Considerations

11.1 Cost Management

  • Monitor token usage and implement budgets
  • Use smaller models for simple tasks
  • Cache responses when appropriate
  • Batch requests when possible

11.2 Security

  • Store API keys securely (use environment variables or secret management)
  • Validate and sanitize all inputs
  • Implement rate limiting to prevent abuse
  • Audit AI interactions for compliance
  • Consider data privacy when sending sensitive information to external providers

11.3 Performance

  • Use streaming for long responses
  • Implement async processing for non-interactive tasks
  • Optimize RAG retrieval with proper chunking and indexing
  • Use connection pooling for vector databases

12. Conclusion

Spring AI brings the power of AI and LLMs to the Spring ecosystem with a clean, provider-agnostic API. Whether you're building chatbots, implementing RAG applications, or adding semantic search capabilities, Spring AI provides the abstractions and tooling you need to build production-ready AI features.

The framework's modular architecture, Spring Boot integration, and comprehensive component support make it an excellent choice for Java developers looking to incorporate AI capabilities into their applications. Start with simple chat services, then explore embeddings, vector stores, and RAG as your needs grow.

For more information, visit the Spring AI Reference Documentation and explore the official examples and guides.

Post a Comment

0 Comments