Observability Basics: Traces, Metrics, Logs

Level: Intermediate

Date: February 21, 2026

Prerequisites

To get the most out of this article, you should be familiar with fundamental software engineering concepts and have some experience with distributed systems or service-oriented architectures. A basic understanding of monitoring systems, logging libraries, and data collection agents is helpful but not mandatory.

This guide primarily references widely used observability concepts supported across platforms like OpenTelemetry (1.8+ stable as of 2026), Prometheus (v2.40+), and popular logging frameworks such as Logback and Fluentd. Some examples span multiple languages where relevant.

Hands-on Steps

Understanding the Three Pillars

Observability in modern applications centres around three data types:

Traces: Records of requests or operations as they flow through distributed systems.
Metrics: Numeric measurements collected over time (e.g., CPU usage, request latency).
Logs: Event records that provide a timestamped context or state information related to operations.

Each plays a distinct role, and combined, they provide comprehensive insight into system health and behaviour.

Instrumenting Your Application

For this example, we’ll outline how to instrument a simple HTTP microservice using OpenTelemetry in a Java environment. The setup covers capturing traces, exposing metrics compatible with Prometheus, and structured logging.

1. Instrumenting Traces

Using OpenTelemetry Java SDK (stable from 1.8.0+), you can create spans around incoming HTTP requests to track their journey through your service.


// Create tracer
Tracer tracer = OpenTelemetry.getGlobalTracer("com.example.service", "1.0.0");

// In HTTP handler
Span span = tracer.spanBuilder("handleRequest").startSpan();
try (Scope scope = span.makeCurrent()) {
    // Business logic here
    doBusinessLogic();
    span.setStatus(StatusCode.OK);
} catch (Exception e) {
    span.setStatus(StatusCode.ERROR, "Exception occurred");
    span.recordException(e);
    throw e;
} finally {
    span.end();
}

2. Collecting Metrics with Prometheus

Prometheus uses a pull model, where your application exposes metrics via an HTTP endpoint. OpenTelemetry’s metrics SDK supports this, or you can use Prometheus client libraries directly.
Example with Prometheus Java client:


import io.prometheus.client.Counter;
import io.prometheus.client.exporter.HTTPServer;

public class MetricsExample {
    static final Counter requestsTotal = Counter.build()
        .name("http_requests_total").help("Total HTTP requests").register();

    public static void main(String[] args) throws Exception {
        HTTPServer server = new HTTPServer(1234); // exposes /metrics
        // Each time a request is handled:
        requestsTotal.inc();
    }
}

3. Structured Logging

Structured logs include fields such as trace IDs to correlate logs with traces. Using SLF4J with Logback and MDC (Mapped Diagnostic Context) is a common pattern:


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;

Logger logger = LoggerFactory.getLogger(MyService.class);

Span span = tracer.spanBuilder("processing").startSpan();
try (Scope scope = span.makeCurrent()) {
    MDC.put("traceId", span.getSpanContext().getTraceId());
    logger.info("Processing started");
    // business logic
    logger.info("Processing completed successfully");
} finally {
    MDC.clear();
    span.end();
}

Common Pitfalls

Partial Instrumentation: Instrumenting only traces or only logs reduces the benefits of correlation during troubleshooting.
High Cardinality in Metrics: Avoid using high-cardinality labels (e.g., user IDs) in Prometheus metrics as it can impact storage and performance.
Ignoring Context Propagation: Not passing trace context across service boundaries breaks end-to-end tracing visibility.
Verbose Logging in Production: Excessive logging can degrade performance and increase storage costs; ensure log levels are appropriately configured.
Inconsistent Time Synchronisation: Logs, metrics, and traces depend on accurate timestamps; use synchronised clocks with NTP or PTP.

Validation

To confirm that your observability setup works effectively:

Send test requests to your service and verify spans appear in your tracing UI (e.g., Jaeger, Honeycomb, or Grafana Tempo).
Query your Prometheus server for application metrics and ensure new data appears at expected intervals.
Check logs contain trace IDs or other context; cross-reference logs with traces to confirm correlation.
Simulate errors and observe that errors and exceptions are recorded in traces and logs correctly.

Checklist / TL;DR

Instrument your code to create meaningful spans for key operations (OpenTelemetry 1.8+ recommended).
Expose metrics compatible with your monitoring stack (Prometheus v2.40+ is a common choice).
Use structured logging with consistent key fields (e.g., traceId) for correlation.
Ensure trace context propagation across service boundaries.
Avoid high-cardinality labels in metrics and too verbose logging in production.
Verify observability data end-to-end by checking UIs and logs after test runs.

When to Choose Traces vs Metrics vs Logs

Traces are ideal for understanding the journey and performance of individual requests across distributed systems, showing timing and relationship between components.

Metrics are best suited for continuous monitoring and alerting on system health and trends over time thanks to their lightweight, numeric nature.

Logs excel at providing detailed, contextual information about specific events and errors, helpful in root cause analysis.

Use all three in tandem for full observability, adjusting focus depending on immediate needs — metrics for overall health, traces for performance bottlenecks, logs for detailed investigation.