Complete guide to Spring AI: the official Spring framework for integrating AI and Large Language Models (LLMs) into Java applications. Learn architecture, components, and build real-world AI-powered features.
Table of Contents
1. What is Spring AI?
Spring AI is an official Spring project that provides a unified abstraction layer for integrating artificial intelligence capabilities into Java applications. It simplifies working with Large Language Models (LLMs), embedding models, vector databases, and AI-powered features while maintaining Spring's familiar patterns: dependency injection, auto-configuration, and testing support.
Instead of writing provider-specific code for OpenAI, Anthropic, or other AI services, Spring AI offers a consistent API that lets you switch between providers or models with minimal code changes. It follows Spring Boot's convention-over-configuration philosophy, making AI integration as straightforward as adding a dependency and configuring properties.
2. Why Use Spring AI?
- Provider abstraction: write code once and switch between OpenAI, Anthropic, Azure OpenAI, Ollama, and other providers by changing configuration.
- Spring Boot integration: auto-configuration, property-based setup, and seamless integration with the Spring ecosystem.
- Modular architecture: include only the components you need (chat, embeddings, vector stores, RAG) to keep dependencies minimal.
- Production-ready: built-in support for retries, rate limiting, observability, and error handling.
- Testing support: easy mocking and testing of AI components using Spring's testing framework.
- RAG support: built-in Retrieval Augmented Generation (RAG) framework for context-aware AI applications.
3. Spring AI Architecture
Spring AI follows a modular, layered architecture that separates concerns and promotes flexibility:
3.1 Architecture Layers
- Application Layer: Your Spring Boot application code (controllers, services, repositories).
- Spring AI Abstractions: Core interfaces like
ChatModel,EmbeddingModel,VectorStore. - Provider Implementations: Concrete implementations for different AI providers (OpenAI, Anthropic, etc.).
- AI Provider APIs: External HTTP APIs or local model servers.
This architecture allows you to:
- Write business logic against stable Spring AI interfaces
- Switch AI providers without changing application code
- Test with mock implementations
- Combine multiple providers in the same application
4. Core Components
The following diagram illustrates how Spring AI components interact:
4.1 ChatModel
The ChatModel interface is the primary abstraction for interacting with LLMs. It handles conversational AI, text generation, and chat-based interactions.
Key Features:
- Multi-turn conversations with message history
- System prompts and user messages
- Streaming responses for real-time interactions
- Function calling and tool integration
4.2 EmbeddingModel
EmbeddingModel converts text into numerical vectors (embeddings) that capture semantic meaning. Essential for semantic search, similarity matching, and RAG applications.
4.2.1 Understanding Vector Embeddings
Vector Embeddings are numerical representations of text (or other data) in a high-dimensional space. They transform words, sentences, or documents into arrays of numbers that capture semantic meaning.
Imagine representing the meaning of words as numbers:
"cat" → [0.2, 0.8, 0.1, 0.5, ...] (1536 numbers for OpenAI)
"dog" → [0.3, 0.7, 0.2, 0.4, ...]
"car" → [0.1, 0.2, 0.9, 0.3, ...]
Key Properties:
- Similar meanings → Similar vectors: Words with related meanings produce vectors that are close together in the high-dimensional space
- Different meanings → Different vectors: Unrelated words produce vectors that are far apart
- Fixed size: Each embedding has the same number of dimensions (e.g., 1536 for OpenAI's text-embedding-ada-002)
4.2.2 How Embeddings Work
The embedding process involves three main steps:
- Text Input: The model receives text input (e.g., "What is artificial intelligence?")
- Embedding Model Processing: The neural network breaks text into tokens, analyzes context and meaning, and generates a numerical representation
- Vector Output: The model outputs a fixed-size vector (e.g., 1536 numbers for OpenAI ada-002)
In Spring AI, this process is simplified:
// Spring AI makes this simple:
EmbeddingModel embeddingModel; // Auto-configured
// Convert text to vector
List<Double> embedding = embeddingModel.embed("Hello, world!");
// Result: [0.123, -0.456, 0.789, ...] (1536 dimensions)
4.2.3 Embedding Models
An Embedding Model is a neural network that converts text into vectors. Different models have different characteristics:
- OpenAI text-embedding-ada-002: 1536 dimensions, 8191 token context length, affordable and high quality
- OpenAI text-embedding-3-large: 3072 dimensions, higher quality
- OpenAI text-embedding-3-small: 1536 dimensions, faster processing
- Other providers: Cohere, local models, etc.
Use Cases:
- Document similarity search
- Semantic search in knowledge bases
- Clustering and classification
- RAG context retrieval
4.3 VectorStore
VectorStore is an abstraction for storing and querying vector embeddings. A Vector Store is a database optimized for storing and searching vectors efficiently.
4.3.1 Why Vector Stores?
Problem: Traditional databases can't efficiently search by similarity. They're designed for exact matches, not semantic similarity.
Solution: Vector stores use specialized indexes (like HNSW) for fast similarity search, allowing you to find documents with similar meanings rather than exact text matches.
4.3.2 How Vector Stores Work
The vector store process involves several steps:
- Store Document: Add a document (e.g., "AI is transforming...")
- Generate Embedding: Convert the document to a vector [0.123, -0.456, ...]
- Store in Vector Store: Save the vector along with the document content and metadata
- Query: When searching (e.g., "What is AI?"), generate a query embedding
- Find Similar Vectors: Use distance metrics to find the most similar vectors
- Return Top Results: Retrieve the most similar documents
4.3.3 Distance Metrics
Distance metrics measure how similar two vectors are. Spring AI supports several metrics:
1. Cosine Distance
- Formula:
1 - cosine_similarity - What it measures: Angle between vectors (ignores magnitude)
- Range: 0 (identical) to 2 (opposite)
- Why use it: Focuses on direction, not magnitude; excellent for text embeddings; normalized range
2. Euclidean Distance (L2)
- Formula:
√(Σ(xi - yi)²) - What it measures: Straight-line distance between points
- When to use: When magnitude matters in your use case
3. Dot Product
- Formula:
Σ(xi × yi) - What it measures: How aligned vectors are
- When to use: When you need raw similarity scores
4.3.4 Index Types
Vector stores use specialized indexes to enable fast similarity search:
HNSW (Hierarchical Navigable Small World)
- What it is: Graph-based index for fast similarity search
- Benefits: O(log n) search time, high recall rate, scalable to millions of vectors
- Trade-offs: Uses more memory, takes time to build the index
- Best for: Production applications requiring high accuracy and performance
IVFFlat
- What it is: Inverted file index
- Benefits: Memory efficient, fast to build
- Trade-offs: Lower recall than HNSW, slower search for large datasets
- Best for: Smaller datasets or when memory is constrained
4.3.5 Supported Vector Stores
Spring AI supports multiple vector databases:
- PostgreSQL with pgvector: Native SQL integration, HNSW index support, cosine distance optimization
- Pinecone: Managed vector database service
- Chroma: Open-source vector database
- Weaviate: Vector search engine
- Milvus: Open-source vector database
- Redis: In-memory vector storage
- Simple in-memory store: For testing and development
4.3.6 PgVectorStore Configuration
When using PostgreSQL with pgvector, you can configure the distance type:
@Bean
public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel) {
return new PgVectorStore.Builder(jdbcTemplate, embeddingModel)
.withDistanceType(PgVectorStore.PgDistanceType.COSINE_DISTANCE)
.build();
}
4.4 Document
The Document class represents text content with metadata. Used for storing and retrieving documents in vector stores for RAG applications.
4.5 RAG (Retrieval Augmented Generation)
RAG enhances LLM responses with relevant context from your documents. It combines retrieval of relevant information with generation of responses based on that context.
4.5.1 The Problem RAG Solves
Without RAG:
User: "What is our refund policy?"
LLM: [Generic answer based on training data, may be outdated or incorrect]
With RAG:
User: "What is our refund policy?"
System:
1. Search documents for "refund policy"
2. Find relevant sections from YOUR documents
3. Add to prompt: "Based on: [your actual policy]..."
4. LLM: [Answer based on YOUR current documents]
4.5.2 RAG Components
Spring AI provides a complete RAG framework that combines:
- Document Loading: Load documents from various sources (PDF, text files, web pages)
- Text Splitting: Chunk documents into manageable pieces
- Embedding: Convert chunks to vectors
- Storage: Store in vector databases
- Retrieval: Find relevant context for queries
- Generation: Use retrieved context to generate accurate responses
4.5.3 RAG Process Flow
The RAG process follows these steps:
- User Query: User asks a question (e.g., "What is machine learning?")
- Generate Query Embedding: Convert the query to a vector [0.123, -0.456, ...]
- Vector Similarity Search: Find top K similar documents in the vector store
- Retrieve Document Context: Extract the content from the most similar documents
- Build Prompt with Context: Combine retrieved context with the user's question
- Send to LLM: The LLM generates an answer using the provided context
- Return Answer: Return the generated response to the user
The RAG workflow is illustrated below:
4.6 Prompt Templates
Spring AI supports prompt templating using StringTemplate, allowing dynamic prompt construction with variables and conditionals.
4.6.1 What are Prompt Templates?
Prompt Templates are reusable text patterns for LLM interactions. They provide a structured way to create prompts with placeholders that get filled with dynamic content.
4.6.2 Why Use Templates?
Without Template (hard to maintain):
String prompt = "Answer: " + question; // Hard to maintain, error-prone
With Template (maintainable and consistent):
PromptTemplate template = new PromptTemplate("""
Answer the following question based on the provided context.
If the answer cannot be found in the context, say "I don't know."
Context: {context}
Question: {question}
""");
Prompt prompt = template.create(Map.of(
"context", context,
"question", question
));
Benefits:
- Consistency: Same format every time, ensuring predictable LLM behavior
- Maintainability: Update prompt structure in one place
- Reusability: Use the same template across different queries and contexts
- Type Safety: Compile-time checking of template variables
4.6.3 RAG Prompt Template Structure
For RAG (Retrieval-Augmented Generation) applications, prompt templates typically include:
- Instructions: "Answer based on context" - tells the LLM how to use the context
- Context Placeholder:
{context}- filled with retrieved documents - Question Placeholder:
{question}- the user's question - Fallback Instructions: "I don't know" if context is insufficient - prevents hallucination
4.7 Model Context Protocol (MCP)
Spring AI supports the Model Context Protocol, enabling AI models to interact with external tools, databases, and services through a standardized interface.
5. Project Setup
To get started with Spring AI, add the Spring AI BOM (Bill of Materials) and the specific dependencies you need.
5.1 Gradle Configuration
Create a build.gradle file with the following configuration:
plugins {
id 'java'
id 'org.springframework.boot' version '3.2.0'
id 'io.spring.dependency-management' version '1.1.4'
}
java {
sourceCompatibility = '17'
targetCompatibility = '17'
}
ext {
springAiVersion = '1.0.0'
}
dependencyManagement {
imports {
mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
}
}
repositories {
mavenCentral()
}
dependencies {
// Spring Boot starters
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.springframework.boot:spring-boot-starter-validation'
// Spring AI OpenAI (or use spring-ai-anthropic, spring-ai-ollama, etc.)
implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
// Optional: Vector Store (e.g., PostgreSQL with pgvector)
implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
// Testing
testImplementation 'org.springframework.boot:spring-boot-starter-test'
testImplementation 'org.mockito:mockito-core'
testImplementation 'org.mockito:mockito-junit-jupiter'
testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}
tasks.named('test') {
useJUnitPlatform()
testLogging {
events "passed", "skipped", "failed"
exceptionFormat "full"
}
}
5.2 Maven Configuration (Alternative)
If you prefer Maven, use the following pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.0</version>
<relativePath/>
</parent>
<properties>
<spring-ai.version>1.0.0</spring-ai.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Spring AI OpenAI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Optional: Vector Store -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
</dependencies>
</project>
6. Configuration
Configure Spring AI using application.properties or application.yml. Spring Boot's auto-configuration handles the rest.
6.1 OpenAI Configuration
# application.properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.max-tokens=500
6.2 Anthropic Configuration
# application.properties
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-3-opus-20240229
spring.ai.anthropic.chat.options.temperature=0.7
6.3 Ollama (Local Models) Configuration
# application.properties
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama2
6.4 Vector Store Configuration (PostgreSQL)
# application.properties
spring.datasource.url=jdbc:postgresql://localhost:5432/vectordb
spring.datasource.username=postgres
spring.datasource.password=password
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
7. Real-World Examples
7.1 Example 1: Simple Chat Service
A basic service that uses ChatClient to generate text responses:
package com.example.ai.service;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
public class ChatService {
private final ChatClient chatClient;
public ChatService(ChatClient chatClient) {
this.chatClient = chatClient;
}
public String chat(String userMessage) {
return chatClient.call(userMessage);
}
public String chatWithSystemPrompt(String systemPrompt, String userMessage) {
Prompt prompt = new Prompt(List.of(
new SystemMessage(systemPrompt),
new UserMessage(userMessage)
));
return chatClient.call(prompt).getResult().getOutput().getContent();
}
}
7.2 Example 2: REST Controller for Chat
Expose the chat service via REST API:
package com.example.ai.controller;
import com.example.ai.service.ChatService;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/chat")
public class ChatController {
private final ChatService chatService;
public ChatController(ChatService chatService) {
this.chatService = chatService;
}
@PostMapping
public ChatResponse chat(@RequestBody ChatRequest request) {
String response = chatService.chat(request.getMessage());
return new ChatResponse(response);
}
// DTOs
public record ChatRequest(String message) {}
public record ChatResponse(String response) {}
}
7.3 Example 3: Document Embedding and Vector Store
Store documents as embeddings and perform semantic search:
package com.example.ai.service;
import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
public class DocumentService {
private final EmbeddingModel embeddingModel;
private final VectorStore vectorStore;
public DocumentService(EmbeddingModel embeddingModel, VectorStore vectorStore) {
this.embeddingModel = embeddingModel;
this.vectorStore = vectorStore;
}
public void addDocument(String content, String metadata) {
Document document = new Document(content);
document.getMetadata().put("source", metadata);
vectorStore.add(List.of(document));
}
public List<Document> searchSimilar(String query, int topK) {
return vectorStore.similaritySearch(
org.springframework.ai.vectorstore.SearchRequest.builder()
.withQuery(query)
.withTopK(topK)
.build()
);
}
}
7.4 Example 4: RAG Application
Complete RAG implementation that retrieves relevant context before generating responses. The following diagram shows the component interactions:
Implementation code:
package com.example.ai.service;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Service
public class RAGService {
private final ChatClient chatClient;
private final VectorStore vectorStore;
public RAGService(ChatClient chatClient, VectorStore vectorStore) {
this.chatClient = chatClient;
this.vectorStore = vectorStore;
}
public String ask(String question) {
// 1. Retrieve relevant documents
List<Document> relevantDocs = vectorStore.similaritySearch(
org.springframework.ai.vectorstore.SearchRequest.builder()
.withQuery(question)
.withTopK(5)
.build()
);
// 2. Build context from retrieved documents
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
// 3. Create prompt with context and question
String promptTemplate = """
Answer the following question based on the provided context.
If the answer cannot be found in the context, say "I don't know."
Context:
{context}
Question: {question}
""";
PromptTemplate template = new PromptTemplate(promptTemplate);
Prompt prompt = template.create(Map.of(
"context", context,
"question", question
));
// 4. Generate response
return chatClient.call(prompt).getResult().getOutput().getContent();
}
}
7.5 Example 5: Streaming Chat Response
Stream responses in real-time for better user experience:
package com.example.ai.controller;
import org.springframework.ai.chat.ChatClient;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;
@RestController
@RequestMapping("/api/chat")
public class StreamingChatController {
private final ChatClient chatClient;
public StreamingChatController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@PostMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestBody ChatRequest request) {
return chatClient.stream(request.message())
.map(response -> response.getResult().getOutput().getContent());
}
}
7.6 Example 6: Multi-Provider Setup
Use multiple AI providers in the same application:
package com.example.ai.service;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.ChatModel;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;
@Service
public class MultiProviderService {
private final ChatModel openAiChatModel;
private final ChatModel anthropicChatModel;
public MultiProviderService(
@Qualifier("openAiChatModel") ChatModel openAiChatModel,
@Qualifier("anthropicChatModel") ChatModel anthropicChatModel) {
this.openAiChatModel = openAiChatModel;
this.anthropicChatModel = anthropicChatModel;
}
public String useOpenAI(String prompt) {
return openAiChatModel.call(prompt).getResult().getOutput().getContent();
}
public String useAnthropic(String prompt) {
return anthropicChatModel.call(prompt).getResult().getOutput().getContent();
}
}
8. Best Practices
8.1 Error Handling and Retries
Implement retry logic for transient failures:
@Service
public class ResilientChatService {
private final ChatClient chatClient;
private final RetryTemplate retryTemplate;
public ResilientChatService(ChatClient chatClient) {
this.chatClient = chatClient;
this.retryTemplate = RetryTemplate.builder()
.maxAttempts(3)
.exponentialBackoff(1000, 2, 10000)
.retryOn(IOException.class)
.build();
}
public String chatWithRetry(String message) {
return retryTemplate.execute(context -> {
return chatClient.call(message);
});
}
}
8.2 Rate Limiting
Use Spring's rate limiting to prevent API quota exhaustion:
@Service
public class RateLimitedChatService {
private final ChatClient chatClient;
private final RateLimiter rateLimiter;
public RateLimitedChatService(ChatClient chatClient) {
this.chatClient = chatClient;
this.rateLimiter = RateLimiter.create(10.0); // 10 requests per second
}
public String chat(String message) {
rateLimiter.acquire();
return chatClient.call(message);
}
}
8.3 Caching
Cache responses for repeated queries:
@Service
public class CachedChatService {
private final ChatClient chatClient;
private final CacheManager cacheManager;
public CachedChatService(ChatClient chatClient, CacheManager cacheManager) {
this.chatClient = chatClient;
this.cacheManager = cacheManager;
}
@Cacheable(value = "chatResponses", key = "#message")
public String chat(String message) {
return chatClient.call(message);
}
}
8.4 Prompt Engineering
- Use system prompts to define AI behavior and context
- Structure prompts with clear instructions and examples
- Use prompt templates for dynamic content
- Validate and sanitize user inputs before sending to models
- Test prompts with different models to ensure consistency
8.5 Observability
- Log prompts and responses (be careful with PII/sensitive data)
- Track token usage and costs
- Monitor latency and error rates
- Use distributed tracing for debugging
9. Testing
Spring AI makes testing easy with mock implementations. The following example demonstrates testing with JUnit 5 and Mockito:
package com.example.ai.service;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.ChatResponse;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.when;
@SpringBootTest
class ChatServiceTest {
@MockBean
private ChatClient chatClient;
@Autowired
private ChatService chatService;
@Test
void testChat() {
// Mock response
when(chatClient.call("Hello")).thenReturn("Hello, how can I help?");
// Test
String response = chatService.chat("Hello");
assertEquals("Hello, how can I help?", response);
}
@Test
void testChatWithSystemPrompt() {
// Test with system prompt
ChatResponse mockResponse = new ChatResponse(
new ChatResponse.Result(
new ChatResponse.Result.Output("I'm a helpful assistant.")
)
);
when(chatClient.call(any(Prompt.class)))
.thenReturn(mockResponse);
String response = chatService.chatWithSystemPrompt(
"You are a helpful assistant.",
"Hello"
);
assertEquals("I'm a helpful assistant.", response);
}
}
Run tests using Gradle:
# Run all tests
./gradlew test
# Run specific test class
./gradlew test --tests ChatServiceTest
# Run with coverage
./gradlew test jacocoTestReport
The component relationships in a typical Spring AI test setup:
10. Advanced Concepts
10.1 Token Limits
Tokens are pieces of text that LLMs process. They can be words, subwords, or characters depending on the tokenization method.
10.1.1 Token Limits by Model
- GPT-4: ~8,192 tokens (input + output combined)
- GPT-3.5: ~4,096 tokens
- text-embedding-ada-002: 8,191 tokens per input
10.1.2 Why Token Limits Matter
Problem: LLMs have maximum token limits. If your input exceeds this limit, you'll get an error or the model will truncate the input.
Solutions:
- Chunking: Split long documents into smaller pieces that fit within token limits
- Summarization: Summarize content before sending to the LLM
- Selective Context: Only send the most relevant parts of documents (this is what RAG does!)
- Streaming: For long outputs, use streaming to handle responses that exceed limits
10.2 Temperature Parameter
Temperature controls the randomness and creativity in LLM responses. It's a parameter you can configure when making API calls.
10.2.1 Temperature Ranges
- Low (0.1-0.3): Deterministic, focused, consistent responses. Best for factual answers, code generation, or when you need reproducible results.
- Medium (0.7): Balanced creativity and consistency. Good default for most applications.
- High (0.9-1.0): Creative, varied, unpredictable. Best for creative writing, brainstorming, or when you want diverse responses.
10.2.2 Example
Prompt: "Complete: The sky is"
Temperature 0.1: "blue" (always the same, most likely answer)
Temperature 0.7: "blue", "cloudy", "clear" (varied but reasonable)
Temperature 1.0: "blue", "purple", "raining cats" (very creative, may be nonsensical)
In Spring AI, you configure temperature in your application properties:
spring.ai.openai.chat.options.temperature=0.7
10.3 Streaming Responses
Streaming sends responses as they're generated, rather than waiting for the complete response before sending it to the client.
10.3.1 Benefits of Streaming
- Faster Perceived Response: Users see text immediately, improving perceived performance
- Better User Experience: Feels more interactive and responsive, similar to ChatGPT
- Lower Latency: Don't wait for complete response before starting to display content
- Handles Long Responses: Can handle responses that exceed token limits by streaming chunks
10.3.2 Implementation in Spring AI
Spring AI supports streaming through reactive streams (Flux):
Flux<String> stream = chatClient.stream(prompt);
// Returns chunks as they're generated
This is particularly useful for chat interfaces where users expect to see responses appear in real-time.
11. Production Considerations
11.1 Cost Management
- Monitor token usage and implement budgets
- Use smaller models for simple tasks
- Cache responses when appropriate
- Batch requests when possible
11.2 Security
- Store API keys securely (use environment variables or secret management)
- Validate and sanitize all inputs
- Implement rate limiting to prevent abuse
- Audit AI interactions for compliance
- Consider data privacy when sending sensitive information to external providers
11.3 Performance
- Use streaming for long responses
- Implement async processing for non-interactive tasks
- Optimize RAG retrieval with proper chunking and indexing
- Use connection pooling for vector databases
.png)
0 Comments