High‑cardinality metrics dos and don’ts — Testing Strategy — Practical Guide (Jun 15, 2026)
High‑cardinality metrics dos and don’ts — Testing Strategy
Level: Intermediate software engineers with experience in observability platforms and reliable metrics instrumentation
Date: June 15, 2026
High‑cardinality metrics, those with many unique label-value combinations, pose unique challenges in software observability. Testing strategies to validate these metrics are crucial to ensure performance, reliability, and actionable monitoring. This article focuses on practical dos and don’ts for testing high‑cardinality metrics in modern software engineering environments as of 2026, referencing current versions of popular monitoring frameworks such as Prometheus 2.51+, OpenTelemetry Collector 0.89+, and Grafana 10.
Prerequisites
- Basic familiarity with metrics in your observability stack—Prometheus exposition format and OpenTelemetry metrics are typical.
- Core understanding of how labels (tags) in metrics affect cardinality and storage overhead.
- Access to a test environment where application instrumentation and metric scraping can be controlled independently from production.
- Familiarity with metric querying—PromQL in Prometheus or equivalent in other platforms—for validation.
- Working knowledge of test automation frameworks commonly integrated with CI/CD pipelines, such as pytest or JUnit, to automate metrics testing.
Hands-on steps
1. Identify potential high-cardinality sources
Start by enumerating labels used in your metrics that can lead to high cardinality. For example, user_ids, session_ids, request_ids, or error stack traces as labels can exponentially increase the number of time series.
# Example: Check current label values for suspicious labels
curl -s http://localhost:9100/metrics | grep '^http_requests_total' | head -20
2. Apply cardinality guards in instrumentation
Either avoid or limit labels prone to variability in your instrumentation code. Measure cardinality limits using counters or histograms scoped sensibly. For example, instead of adding a user-specific label, aggregate on user region or user tier.
// Avoid:
httpRequests.WithLabelValues("user12345").Inc()
// Prefer:
httpRequests.WithLabelValues("tier_premium").Inc()
3. Develop focused unit tests for metric integrity
Write unit tests that verify:
- Labels are restricted to defined sets (enums or validated strings).
- Metrics maintain expected cardinality under normal load.
- Instrumentation functions produce the correct metric types (counter, gauge, histogram).
def test_label_values_are_expected():
from myapp.metrics import REQUEST_METRIC
labels = REQUEST_METRIC._labelnames
assert "user_id" not in labels, "User ID should not be a label"
def test_metric_type():
assert REQUEST_METRIC._type == "counter"
4. Perform integration tests with synthetic traffic
Generate test traffic that mimics realistic scenarios but controls cardinality. Tools such as Locust or k6 can be used to create repeatable traffic with predictable label distributions.
# k6 example controlling labels
import http from 'k6/http';
import { check } from 'k6';
export let options = {
vus: 10,
duration: '1m',
};
export default function () {
let userTier = __VU % 2 === 0 ? 'basic' : 'premium';
let res = http.get(`https://myservice/api?user_tier=${userTier}`);
check(res, {
'status is 200': (r) => r.status === 200,
});
}
5. Validate cardinality impact during load tests
Monitor the cardinality growth directly in Prometheus or your monitoring backend during load tests. Use queries to count unique label combinations. For example, in Prometheus:
// Count distinct label combinations for a metric
count(count by (user_tier) (http_requests_total))
Compare results against thresholds you established. For most Prometheus setups, keeping metrics under 100k time-series is advised to limit resource strain, though this depends heavily on hardware and remote storage solutions.
Common pitfalls
- Using volatile labels as dimensions — Labels like session IDs or request IDs significantly inflate cardinality and should never be included as labels.
- Lack of cardinality limits in instrumentation — Blindly adding labels without policy leads to explosion in time-series count.
- Ignoring metric storage constraints — Different backend versions handle high cardinality differently; for example, Prometheus 2.x stores time-series in a local TSDB with some built-in limits but may struggle at millions of unique series.
- Insufficient test coverage on metric correctness — Metrics may be emitted, but incorrect labels or types can mislead alerting and monitoring systems.
- Not validating metrics during load — Metrics produced during unit tests only are insufficient; integration and load testing reveal cardinality issues realistically.
Validation
Ensuring your metrics testing strategy is valid involves:
- Automating metric assertions: Integrate metric checks into CI pipelines. Tools like
prometheus_mcopyorpromtoolcan help verify syntactic correctness. - Manually querying cardinality: Use queries like
count(count by (...)(metric_name))over time to identify unusual spikes or unbounded growth. - Profiling the monitoring backend: Examine resource usage (memory, CPU) of Prometheus or your metrics store to observe if sustained metric volume is within operational norms.
- Stress testing the instrumentation: Use chaos testing or fault injection to verify that metric emission continues gracefully under failure.
Checklist and TL;DR
- ✅ Identify and document labels likely to increase cardinality.
- ✅ Avoid using highly dynamic labels (request_id, user_id, session_id) as metric labels.
- ✅ Implement and enforce limits in instrumentation code.
- ✅ Build unit tests for metric label validation and type correctness.
- ✅ Generate realistic synthetic load to test behaviour at scale.
- ✅ Query metrics backend regularly to monitor cardinality growth.
- ✅ Profile the monitoring backend’s resource usage during tests.
- ✅ Automate metric validation in CI/CD workflows.
When to choose low-cardinality metrics vs dimensional (high-cardinality) metrics:
Low-cardinality metrics should be your first choice for dimensional labels with few stable values (e.g. region, instance_type). They yield manageable storage and faster queries. High-cardinality dimensions are justified only when you explicitly need detailed breakdowns and your backend supports efficient aggregation or high cardinality storage (e.g. via remote storage extensions or data summarization). If your backend or hardware cannot handle such scale, consider summarising or sampling instead.