- Published on
[2026 Deep Dive] Mastering Holistic Observability: Metrics, Logs, and Traces for Spring Boot 4.0 Microservices with OpenTelemetry, Prometheus, and Grafana
- Authors

- Name
- Maria
The world of distributed systems, particularly microservices built with Spring Boot, Java 25, and Apache Kafka, introduces unparalleled scalability and resilience. However, this architectural elegance comes with a significant operational challenge: understanding what’s truly happening within your ecosystem. This is where Mastering Holistic Observability becomes not just a best practice, but an absolute necessity. As backend engineers, our ability to quickly diagnose issues, understand performance bottlenecks, and confidently deploy changes hinges on comprehensive visibility across our services.
While we’ve previously explored the intricacies of distributed tracing with OpenTelemetry, that's just one crucial pillar. A truly observable system demands a unified approach to metrics, structured logging, and tracing – the "Observability Triad." In this deep dive, we'll go beyond mere monitoring, delving into how to instrument your Spring Boot 4.0 applications, leverage the power of Java 25's advancements, and integrate with industry-standard tools like OpenTelemetry, Prometheus, and Grafana to build an unparalleled observability stack.
TL;DR
Holistic Observability unifies metrics, logs, and traces for deep system understanding. Leverage Spring Boot 4.0, Java 25, and OpenTelemetry for seamless instrumentation. Integrate with Prometheus for metrics, Grafana for visualization, and structured logging for comprehensive insights.
Why Holistic Observability is Non-Negotiable in Modern Microservices
In a complex microservice architecture, an isolated failure can have a cascading effect. A slow database query in one service might manifest as timeouts in another, leading to a degraded user experience across multiple features. Without proper observability, pinpointing the root cause becomes a frantic, time-consuming effort of sifting through disparate logs and dashboards.
Holistic observability moves beyond simply "seeing if something is up or down." It's about empowering engineers to ask arbitrary questions about the state of their system, understand the "why" behind performance anomalies, and gain a profound insight into every transaction's journey.
Let's break down the three pillars that form this robust foundation:
Metrics: The Numerical Heartbeat of Your System
Metrics are quantitative measurements of your system's behavior over time. They provide aggregatable data points, ideal for dashboards, alerts, and trending. Think request rates, error counts, CPU utilization, memory usage, latency percentiles, and database connection pool sizes.
Spring Boot, especially with its Actuator module, has always had excellent support for metrics via Micrometer. Spring Boot 4.0 continues this tradition, providing even more streamlined integration and default metrics that are invaluable.
Structured Logging: The Narrative of Events
Logs are discrete, timestamped events that describe what happened at a specific point in time within a service. For microservices, unstructured logs quickly become unmanageable. Structured logging, typically in JSON format, makes logs machine-readable and easily searchable, enabling powerful analysis with tools like Elasticsearch (ELK Stack) or Loki. They capture detailed contextual information, crucial for debugging specific requests or understanding application flow.
Distributed Tracing: The Journey's Map
As we covered previously, distributed tracing visualizes the end-to-end flow of a request across multiple services. It shows how different services interact, their individual latencies, and helps pinpoint bottlenecks in a multi-hop operation. OpenTelemetry has emerged as the de-facto standard for instrumenting tracing, providing a vendor-agnostic way to generate, emit, and collect trace data.
The true power of holistic observability emerges when these three pillars are interconnected, typically through a common identifier like a traceId or correlationId. This allows you to jump from a spike in a metric to the specific traces that contributed to it, and then dive into the detailed logs of those traces.
Spring Boot 4.0 and Java 25: A Powerful Observability Foundation
Spring Boot 4.0, coupled with Java 25, brings several enhancements that simplify and empower observability efforts.
Micrometer and Spring Boot Actuator in 4.0
Spring Boot 4.0 leverages Micrometer as its primary metrics facade. Actuator endpoints (/actuator/metrics, /actuator/prometheus) expose a wealth of application-specific and JVM-level metrics automatically. The key here is auto-configuration. Spring Boot 4.0 intelligently wires up Micrometer with common registries like Prometheus, allowing you to get rich metrics with minimal effort.
To enable basic metrics, simply include the spring-boot-starter-actuator dependency:
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
And in your application.yml:
# application.yml
management:
endpoints:
web:
exposure:
include: 'prometheus' # Expose Prometheus endpoint
metrics:
tags:
application: ${spring.application.name} # Add common tags
enable:
all: true # Enable all metrics by default
// application.yml - 설정 파일: 애플리케이션의 핵심 설정을 정의합니다. // management.endpoints.web.exposure.include: 'prometheus' - Prometheus 엔드포인트 노출 설정 // management.metrics.tags.application: ${spring.application.name} - 애플리케이션 이름 태그 추가 // management.metrics.enable.all: true - 모든 메트릭 활성화
This simple setup exposes a /actuator/prometheus endpoint that Prometheus can scrape, providing instant visibility into:
- JVM metrics (memory, garbage collection, threads)
- HTTP request metrics (latency, count, errors for each endpoint)
- Spring-specific metrics (task scheduler, cache hit/miss)
- Data source metrics (connection pool usage)
- Kafka consumer/producer metrics (if
spring-kafkais in use)
Java 25 and Virtual Threads Impact on Metrics and Logging
Java 25’s Virtual Threads (Project Loom) revolutionize concurrent programming. While they drastically simplify writing asynchronous code, they also have implications for observability.
- Context Propagation: The good news is that OpenTelemetry and MDC (for logging) are designed to propagate context across threads, including virtual threads. However, it's essential to ensure your logging and tracing libraries are up-to-date to fully support this.
- Resource Utilization: Virtual threads are cheap. Metrics like "thread count" become less meaningful for resource consumption. Instead, focus on CPU utilization, active requests, and queue lengths. The underlying platform threads (carrier threads) will still be important for understanding physical resource usage.
- Latency Measurement: Virtual threads reduce context switching overhead, potentially leading to lower perceived latency. Accurately measuring the actual work done, rather than just thread-related delays, becomes even more critical. Micrometer timers are excellent for this.
Implementing the Observability Triad
Let's dive into practical implementation details for each pillar.
1. Mastering Metrics with Micrometer and Prometheus
While Actuator provides many out-of-the-box metrics, you'll often need custom ones to track business-specific logic.
Creating Custom Metrics
Micrometer offers various meter types:
- Counters: For incrementing values (e.g.,
api.calls.failed,order.processed.total). - Gauges: For current values (e.g.,
active.users,queue.size). These are typically polled. - Timers: For measuring durations and frequency (e.g.,
service.method.latency). - Distribution Summaries: For tracking the distribution of events (e.g., payload sizes).
Example: Custom Counter for Failed Orders
// src/main/java/com/example/observability/OrderService.java
package com.example.observability;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Service;
import java.util.concurrent.ThreadLocalRandom;
@Service
public class OrderService {
private final Counter failedOrderCounter; // 실패한 주문 수를 세는 카운터
private final Counter processedOrderCounter; // 처리된 주문 수를 세는 카운터
private final io.micrometer.core.instrument.Timer orderProcessingTimer; // 주문 처리 시간을 측정하는 타이머
public OrderService(MeterRegistry meterRegistry) {
this.failedOrderCounter = Counter.builder("order.processing.failed.total") // 메트릭 이름
.description("Total number of orders that failed processing") // 설명
.tags("outcome", "failure", "service", "order-service") // 태그 (키-값 쌍)
.register(meterRegistry); // 미터 레지스트리에 등록
this.processedOrderCounter = Counter.builder("order.processing.success.total")
.description("Total number of orders successfully processed")
.tags("outcome", "success", "service", "order-service")
.register(meterRegistry);
this.orderProcessingTimer = io.micrometer.core.instrument.Timer.builder("order.processing.duration")
.description("Duration of order processing operations")
.tags("service", "order-service")
.register(meterRegistry);
}
public boolean processOrder(String orderId) {
return orderProcessingTimer.record(() -> { // 타이머로 작업 실행 시간 기록
try {
// Simulate some complex order processing logic
// 복잡한 주문 처리 로직 시뮬레이션
Thread.sleep(ThreadLocalRandom.current().nextInt(50, 500)); // Simulate work with random delay
if (orderId.startsWith("FAIL")) {
failedOrderCounter.increment(); // 실패 카운터 증가
System.out.println("Order processing failed for: " + orderId);
return false;
} else {
processedOrderCounter.increment(); // 성공 카운터 증가
System.out.println("Order processing succeeded for: " + orderId);
return true;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt(); // 스레드 인터럽트 처리
failedOrderCounter.increment(); // 실패 카운터 증가
System.err.println("Order processing interrupted for: " + orderId);
return false;
}
});
}
}
This service now exposes order_processing_failed_total{outcome="failure",service="order-service"} and order_processing_success_total{outcome="success",service="order-service"} metrics, which Prometheus can scrape.
2. Structured Logging for Enhanced Debugging
Structured logging converts your log messages into a machine-readable format, typically JSON. This makes parsing, filtering, and querying logs in tools like Kibana or Grafana Loki incredibly efficient.
Configuring Logback for JSON Output
First, add a JSON layout encoder to your pom.xml. Logstash Logback Encoder is a popular choice:
<!-- pom.xml -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version> <!-- Use the latest compatible version -->
</dependency>
Then, configure logback-spring.xml (or logback.xml) in src/main/resources:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/base.xml"/>
<appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<fieldNames>
<timestamp>timestamp</timestamp>
<level>level</level>
<thread>thread</thread>
<message>message</message>
<logger>logger</logger>
<stackTrace>stacktrace</stackTrace>
<callerData>caller</callerData>
</fieldNames>
<customFields>{"app_name": "${spring.application.name}", "environment": "${spring.profiles.active:-default}"}</customFields>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE_JSON"/>
</root>
<logger name="com.example" level="DEBUG"/> <!-- Set specific package logging level -->
</configuration>
<!-- logback-spring.xml - 로깅 설정 파일 --> <!-- appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender" - JSON 형식 콘솔 어펜더 정의 --> <!-- encoder class="net.logstash.logback.encoder.LogstashEncoder" - LogstashEncoder를 사용하여 JSON 출력 --> <!-- fieldNames - JSON 필드 이름 매핑 --> <!-- customFields - 사용자 정의 필드 추가 (앱 이름, 환경 등) --> <!-- root level="INFO" - 기본 로깅 레벨 INFO 설정 --> <!-- logger name="com.example" level="DEBUG" - 특정 패키지 로깅 레벨 DEBUG 설정 -->
Now, your logs will look like this:
{"timestamp":"2026-07-01T10:30:00.123+0000","level":"INFO","thread":"http-nio-8080-exec-1","logger":"com.example.observability.OrderService","app_name":"my-order-service","environment":"dev","message":"Order processing succeeded for: ORDER-123"}
Correlating Logs with Traces (MDC and OpenTelemetry)
The true power of holistic observability comes from correlating logs with their corresponding traces. OpenTelemetry's context propagation ensures that a traceId and spanId are available in the current context. We can push these into Logback's MDC (Mapped Diagnostic Context) so they appear in every log line.
Spring Boot 4.0, when integrated with OpenTelemetry, can often automatically inject trace IDs into MDC. If not, or for finer control, you can do it manually:
// src/main/java/com/example/observability/TraceIdInjectorFilter.java
package com.example.observability;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanContext;
import org.slf4j.MDC;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;
import org.springframework.web.filter.OncePerRequestFilter;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;
@Component
@Order(1) // Ensure this filter runs early
public class TraceIdInjectorFilter extends OncePerRequestFilter {
public static final String TRACE_ID_KEY = "traceId"; // 추적 ID 키
public static final String SPAN_ID_KEY = "spanId"; // 스팬 ID 키
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
try {
SpanContext spanContext = Span.current().getSpanContext(); // 현재 스팬 컨텍스트 가져오기
if (spanContext.isValid()) { // 스팬 컨텍스트가 유효한 경우
MDC.put(TRACE_ID_KEY, spanContext.getTraceId()); // 추적 ID를 MDC에 추가
MDC.put(SPAN_ID_KEY, spanContext.getSpanId()); // 스팬 ID를 MDC에 추가
}
filterChain.doFilter(request, response); // 다음 필터 체인 실행
} finally {
MDC.remove(TRACE_ID_KEY); // 요청 처리 후 추적 ID 제거
MDC.remove(SPAN_ID_KEY); // 요청 처리 후 스팬 ID 제거
}
}
}
// TraceIdInjectorFilter.java - 트레이스 ID 주입 필터 // MDC.put(TRACE_ID_KEY, spanContext.getTraceId()) - MDC에 트레이스 ID 설정 // MDC.remove(TRACE_ID_KEY) - MDC에서 트레이스 ID 제거 (필수)
Modify logback-spring.xml to include traceId and spanId from MDC:
<appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<fieldNames>
<!-- ... existing fields ... -->
<customFields>{"app_name": "${spring.application.name}", "environment": "${spring.profiles.active:-default}"}</customFields>
</fieldNames>
<arguments>
<argument>traceId</argument> <!-- Add traceId from MDC -->
<argument>spanId</argument> <!-- Add spanId from MDC -->
</arguments>
</encoder>
</appender>
<!-- arguments - MDC에서 가져올 추가 인자 (트레이스 ID, 스팬 ID) -->
Now, every log line will include traceId and spanId fields, allowing you to link directly from your log management system to your tracing UI.
3. Deepening Distributed Tracing with OpenTelemetry
Building upon our previous guide, let's explore advanced OpenTelemetry aspects.
OpenTelemetry Agent vs. Manual Instrumentation
- JavaAgent: The easiest way to get started. Download the OpenTelemetry Java Agent, attach it to your JVM (
-javaagent:/path/to/opentelemetry-javaagent.jar), and configure it via environment variables. It automatically instruments popular libraries (Spring, JPA, Kafka clients, HTTP clients).- Pros: Minimal code changes, broad coverage.
- Cons: Less granular control, can sometimes have versioning conflicts.
- Manual Instrumentation: Directly use OpenTelemetry SDK in your code.
- Pros: Fine-grained control over span names, attributes, and custom spans for specific business logic.
- Cons: Requires more code, potential for missing instrumentation points if not diligent.
For a comprehensive observability strategy, a hybrid approach is often best: use the agent for baseline coverage and add manual instrumentation for critical business transactions where precise tracing is needed.
Example: Custom Span for Specific Business Logic
// src/main/java/com/example/observability/InventoryService.java
package com.example.observability;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Scope;
import org.springframework.stereotype.Service;
import java.util.concurrent.ThreadLocalRandom;
@Service
public class InventoryService {
private final Tracer tracer; // OpenTelemetry Tracer 인스턴스
public InventoryService(Tracer tracer) {
this.tracer = tracer; // Tracer 주입
}
public boolean checkAndUpdateStock(String productId, int quantity) {
// Create a custom span for this specific operation
// 특정 작업에 대한 사용자 정의 스팬 생성 (코드 내부 깊이까지 추적)
Span span = tracer.spanBuilder("InventoryService.checkAndUpdateStock") // 스팬 이름
.setAttribute("product.id", productId) // 사용자 정의 속성 추가
.setAttribute("quantity", quantity) // 수량 속성 추가
.startSpan(); // 스팬 시작
try (Scope scope = span.makeCurrent()) { // 현재 스팬 컨텍스트 설정
// Simulate complex inventory logic
// 복잡한 재고 로직 시뮬레이션
Thread.sleep(ThreadLocalRandom.current().nextInt(20, 150)); // Random delay
if (productId.contains("OUT_OF_STOCK")) {
span.setAttribute("stock.available", false); // 재고 없음 속성 추가
span.recordException(new RuntimeException("Product out of stock")); // 예외 기록
return false;
} else {
span.setAttribute("stock.available", true); // 재고 있음 속성 추가
return true;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt(); // 스레드 인터럽트 처리
span.recordException(e); // 예외 기록
return false;
} finally {
span.end(); // 스팬 종료 (매우 중요)
}
}
}
// tracer.spanBuilder("InventoryService.checkAndUpdateStock") - 사용자 정의 스팬 생성 // setAttribute("product.id", productId) - 스팬에 사용자 정의 속성 추가 // span.end() - 스팬 종료 (필수 호출)
This allows you to zoom into the exact latency and attributes of the checkAndUpdateStock operation within a larger trace, even if it’s nested deep within other service calls.
The Observability Stack: Prometheus, Grafana, and Loki
To make sense of all this data, you need a robust backend.
Prometheus: The Metrics Scraper
Prometheus is a powerful open-source monitoring system with a time-series database. It "scrapes" metrics endpoints (like /actuator/prometheus) from your services at regular intervals.
Docker Compose for Prometheus:
# docker-compose.yml for monitoring stack
version: '3.8'
services:
prometheus:
image: prom/prometheus:v2.51.0 # 최신 안정 버전 사용
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro # 설정 파일 마운트
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
networks:
- observability-net
grafana:
image: grafana/grafana:10.4.2 # 최신 안정 버전 사용
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-storage:/var/lib/grafana # 데이터 영속성 유지
- ./grafana/provisioning:/etc/grafana/provisioning # 프로비저닝 설정
environment:
- GF_SECURITY_ADMIN_USER=admin # 관리자 사용자명
- GF_SECURITY_ADMIN_PASSWORD=admin # 관리자 비밀번호 (개발용)
depends_on:
- prometheus
- jaeger-all-in-one # Assuming Jaeger for traces (or Tempo)
networks:
- observability-net
jaeger-all-in-one: # For Distributed Tracing (OpenTelemetry compatible)
image: jaegertracing/all-in-one:1.56 # Jaeger 올인원 이미지
container_name: jaeger
ports:
- "6831:6831/udp" # UDP sender
- "16686:16686" # UI
- "14268:14268" # HTTP receiver for traces
environment:
- COLLECTOR_OTLP_ENABLED=true # OTLP 수집기 활성화
networks:
- observability-net
loki: # For Structured Logging
image: grafana/loki:2.9.7 # Loki 이미지
container_name: loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml # 설정 파일 지정
volumes:
- ./loki/local-config.yaml:/etc/loki/local-config.yaml # Loki 설정 파일 마운트
networks:
- observability-net
promtail: # Log collector for Loki
image: grafana/promtail:2.9.7 # Promtail 이미지
container_name: promtail
volumes:
- /var/log:/var/log # 호스트 로그 디렉토리 마운트
- /var/lib/docker/containers:/var/lib/docker/containers:ro # Docker 컨테이너 로그 마운트
- ./promtail/config.yml:/etc/promtail/config.yml # Promtail 설정 파일 마운트
command: -config.file=/etc/promtail/config.yml # 설정 파일 지정
networks:
- observability-net
depends_on:
- loki
networks:
observability-net:
driver: bridge
volumes:
grafana-storage:
prometheus.yml:
# prometheus.yml
global:
scrape_interval: 15s # 스크랩 간격
evaluation_interval: 15s
scrape_configs:
- job_name: 'spring-boot-apps' # 작업 이름
metrics_path: '/actuator/prometheus' # 메트릭 경로
# Replace with actual service discovery in production (e.g., Kubernetes, Consul)
# 실제 프로덕션 환경에서는 서비스 디스커버리 사용 (Kubernetes, Consul 등)
static_configs:
- targets: ['host.docker.internal:8080'] # Your Spring Boot app's host:port
labels:
application: my-order-service # 라벨 추가
# prometheus.yml - Prometheus 설정 파일 # scrape_interval: 15s - 스크랩 주기 설정 (15초) # job_name: 'spring-boot-apps' - 스크랩 작업 이름 # metrics_path: '/actuator/prometheus' - 메트릭 엔드포인트 경로 # static_configs.targets: ['host.docker.internal:8080'] - 타겟 애플리케이션 주소
Grafana: The Visualization Maestro
Grafana is an open-source analytics and monitoring solution that allows you to query, visualize, alert on, and understand your metrics and logs. It integrates seamlessly with Prometheus (for metrics), Loki (for logs), and Jaeger (for traces).
Setting up Grafana Dashboards:
- Add Data Sources:
- Prometheus: Connect to
http://prometheus:9090. - Loki: Connect to
http://loki:3100. - Jaeger: Connect to
http://jaeger:16686.
- Prometheus: Connect to
- Import Dashboards: You can import pre-built Spring Boot dashboards (e.g., from Grafana Labs) or create custom ones.
- Create Custom Dashboards: Use PromQL (Prometheus Query Language) for metrics, LogQL (Loki Query Language) for logs, and explore trace data.
Grafana Configuration via Provisioning: You can pre-configure data sources and dashboards using Grafana's provisioning feature. Create ./grafana/provisioning/datasources/datasources.yml and ./grafana/provisioning/dashboards/dashboards.yml.
./grafana/provisioning/datasources/datasources.yml:
# ./grafana/provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
access: proxy
isDefault: true
version: 1
editable: true
- name: Loki
type: loki
url: http://loki:3100
access: proxy
version: 1
editable: true
- name: Jaeger
type: jaeger
url: http://jaeger:16686
access: proxy
version: 1
editable: true
jsonData:
# If using OTLP HTTP receiver (e.g., on 14268)
httpSpanEndpoint: http://jaeger:14268/api/traces # OTLP HTTP 스팬 엔드포인트
Loki & Promtail: Log Aggregation for Grafana
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It indexes metadata (labels) instead of full log content, making it very efficient. Promtail is the agent that ships logs from your application hosts to Loki.
./loki/local-config.yaml:
# ./loki/local-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
max_transfer_retries: 0
schema_config:
configs:
- from: 2020-10-27
store: boltdb-shipper
object_store: filesystem
schema: v11
period: 24h
retention_duration: 720h # 30 days retention
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/compactor
shared_store: filesystem
./promtail/config.yml:
# ./promtail/config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml # Promtail이 마지막으로 읽은 위치 기록
clients:
- url: http://loki:3100/loki/api/v1/push # Loki 서버 URL
scrape_configs:
- job_name: system # 시스템 로그
static_configs:
- targets:
- localhost
labels:
job: system-logs # 작업 라벨
__path__: /var/log/*log # 로그 파일 경로
- job_name: docker # 도커 컨테이너 로그
static_configs:
- targets:
- localhost
labels:
job: docker-logs # 작업 라벨
__path__: /var/lib/docker/containers/*/*log # Docker 컨테이너 로그 경로
pipeline_stages:
- docker: {} # Docker 로그 파싱
- json:
expressions:
level: level
logger: logger
message: message
traceId: traceId
spanId: spanId
app_name: app_name
environment: environment
drop_field_if_not_present: [level, logger, message, traceId, spanId, app_name, environment]
- labels:
level:
logger:
traceId:
spanId:
app_name:
environment:
Multi-OS Mapping Table for Observability Stack Management
| Feature / OS | Windows (WSL2/Docker Desktop) | macOS (Docker Desktop) | Linux (Docker/Native) |
|---|---|---|---|
| Start Stack | docker-compose up -d (from project root) | docker-compose up -d (from project root) | docker-compose up -d (from project root) |
| Stop Stack | docker-compose down | docker-compose down | docker-compose down |
| View Prometheus | http://localhost:9090 | http://localhost:9090 | http://localhost:9090 |
| View Grafana | http://localhost:3000 (admin/admin) | http://localhost:3000 (admin/admin) | http://localhost:3000 (admin/admin) |
| View Jaeger UI | http://localhost:16686 | http://localhost:16686 | http://localhost:16686 |
| App Scrape Target | host.docker.internal:8080 (for app on host) | host.docker.internal:8080 (for app on host) | 172.17.0.1:8080 (or host IP if app on host) |
| OpenTelemetry Env Var | SET OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 | export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 | export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 |
Note on host.docker.internal: This special DNS name resolves to the internal IP address of the host from within a Docker container. If your Spring Boot app is running directly on the host machine (not in a Docker container itself), this is how Prometheus/Jaeger/Loki containers will reach it.
Troubleshooting / What if it doesn't work?
Even with the best intentions, observability setups can be tricky. Here are common issues and their solutions:
"My Prometheus endpoint (
/actuator/prometheus) is empty or missing!"- Check dependencies: Ensure
spring-boot-starter-actuatorandmicrometer-registry-prometheusare in yourpom.xml. - Check configuration: Verify
management.endpoints.web.exposure.include: 'prometheus'is correctly set inapplication.yml. - Security: If you have Spring Security, ensure
/actuator/prometheusis accessible. You might need to permit it.
- Check dependencies: Ensure
"Prometheus isn't scraping my app's metrics!"
- Target IP/Hostname: Double-check the
targetsinprometheus.yml. If your Spring Boot app is running on your host machine and Prometheus is in Docker, usehost.docker.internal:8080(Windows/macOS) or the host machine's actual IP address (Linux). - Firewall: Ensure your host firewall isn't blocking incoming connections to your Spring Boot app's port (e.g., 8080) from the Docker bridge network.
- Prometheus UI: Check the "Status -> Targets" page in the Prometheus UI (
http://localhost:9090/targets) to see if your target is showing up and if there are any error messages.
- Target IP/Hostname: Double-check the
"My logs aren't in JSON format or missing correlation IDs!"
- Logback XML: Verify
logback-spring.xmlis correctly configured withLogstashEncoderand thatargumentsfortraceId/spanIdare present. - MDC Integration: Ensure the
TraceIdInjectorFilter(or similar mechanism) is correctly enabled and running for every request, populating MDC before log statements. - OpenTelemetry Agent: If using the agent, ensure it's properly attached and configured. It often handles MDC injection automatically for supported logging frameworks.
- Logback XML: Verify
"Traces aren't showing up in Jaeger/Tempo!"
- OpenTelemetry Exporter Endpoint: Check the
OTEL_EXPORTER_OTLP_ENDPOINTenvironment variable in your Spring Boot application. It should point to the OpenTelemetry Collector or directly to Jaeger's OTLP receiver (e.g.,http://jaeger:4318for OTLP/HTTP orhttp://jaeger:4317for OTLP/gRPC if Jaeger container exposes it). Thedocker-compose.ymlabove uses14268for Jaeger's HTTP receiver which OTLP can use if configured withhttpSpanEndpoint. - Agent/SDK: Confirm your application is actually sending traces. Check your application logs for any OpenTelemetry-related errors.
- Service Name: Ensure
OTEL_SERVICE_NAMEis set, as this is how services are identified in the tracing UI.
- OpenTelemetry Exporter Endpoint: Check the
"Grafana dashboards are empty or showing 'No Data'!"
- Data Source Configuration: Verify your Prometheus, Loki, and Jaeger data sources in Grafana are correctly configured (URL, access mode). Test the connection.
- Query Language: Double-check your PromQL/LogQL queries. Simple typos can lead to no results. Use the "Explore" feature in Grafana to build and test queries.
- Time Range: Ensure the time range selected in Grafana covers the period when data was being generated.
The Future of Observability with Java 25 and Spring Boot 4.0
Java 25 and Spring Boot 4.0 are pushing the boundaries of what's possible in backend development. The introduction of Virtual Threads, for instance, significantly alters how we perceive and measure concurrency. Observability tools must adapt, and OpenTelemetry and Micrometer are at the forefront of this evolution.
- Reduced Overhead: Virtual threads reduce the overhead of context switching, meaning your applications can handle more concurrent tasks with fewer physical threads. This impacts how you interpret traditional thread-based metrics.
- Enhanced Debugging: The ability to easily trace a single logical flow across multiple virtual threads simplifies debugging complex asynchronous operations, a key benefit of structured concurrency and distributed tracing.
- Performance Monitoring: Focus shifts from "how many threads" to "how much CPU utilization per logical task" or "how many active virtual threads are performing meaningful work."
As you scale your microservices, investing in a robust, holistic observability strategy is paramount. It’s the difference between reactive firefighting and proactive problem-solving, between opaque systems and transparent operations. Embrace the Observability Triad, configure your Spring Boot 4.0 applications meticulously, and leverage the power of OpenTelemetry, Prometheus, Grafana, and Loki to gain unparalleled insights into your distributed backend.
🔗 Recommended Articles for Further Reading
- [Previous Post] [Ultimate Guide] The Asynchronous Outbound Gateway Pattern: Mastering Reliable External Integrations with Spring Boot 4.0, JPA, and Apache Kafka
- [Next Post] Stay tuned! The next technical deep-dive is coming up shortly.
🔍 Deep-Dive Search Index & Tags
Developer Intent & Synonyms: Mastering Holistic Observability, Spring Boot 4.0 Observability, Java 25 Microservices Monitoring, OpenTelemetry Spring Boot, Prometheus Grafana Setup, Structured Logging Spring Boot, Distributed Tracing Best Practices, Micrometer Custom Metrics, Loki Promtail Integration, Backend Architecture Monitoring, Virtual Threads Observability, 마이크로서비스 관찰성, 스프링 부트 4.0 모니터링, 자바 25 옵저버빌리티, 프로메테우스 그라파나, 분산 트레이싱, 구조화된 로깅, 백엔드 아키텍처 관제, 가상 스레드 모니터링