Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide — Practical Guide (Nov 2, 2025)
body { font-family: Arial, sans-serif; line-height: 1.6; max-width: 900px; margin: 1em auto; padding: 0 1em; }
h2, h3 { color: #2c3e50; }
pre { background: #f4f4f4; padding: 1em; border-radius: 5px; overflow-x: auto; }
p.audience { font-weight: bold; color: #2980b9; }
p.social { font-style: italic; color: #7f8c8d; margin-top: 2em; }
ul { margin-left: 1.5em; }
Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide
Level: Intermediate
As of November 2, 2025, this guide focuses primarily on Elasticsearch 8.x and OpenSearch 2.x versions.
Prerequisites
This guide assumes you are operating Elasticsearch 8.x or OpenSearch 2.x clusters, familiar with basic concepts like nodes, indices, shards, and mappings. Understanding JSON mapping syntax and cluster monitoring is helpful but not mandatory.
Key knowledge and environment factors to have in place:
- Cluster version: Elasticsearch 8.x (released since mid-2022) or OpenSearch 2.x (from late 2023). Both have largely compatible APIs, but note some differences in features and defaults.
- Business workload characteristics: Document size, ingestion rate, search query profiles (e.g., aggregations, full-text, filters).
- Hardware and infrastructure: CPU, RAM, storage type (SSD recommended), and network topology.
- Basic monitoring: Access to metrics via OpenSearch Dashboards/Kibana, or APIs like
_cat/nodesand_cluster/stats.
Hands-on steps
1. Understanding the role of sizing and mappings in performance
Sizing your Elasticsearch/OpenSearch cluster and defining effective index mappings are tightly coupled decisions that influence throughput, latency, storage utilisation, and query accuracy.
- Shard count and size: Oversharding leads to CPU pressure and cluster instability; too few shards reduce parallelism and can bottleneck IO.
- Mapping definitions: Selecting the right field types, enabling/disabling indexing features, or using multifields affect memory footprint and search performance.
2. Calculating shard sizing
As a best practice from Elasticsearch 8.x and OpenSearch 2.x documentation:
- Target shard size is typically 10–50 GB for data nodes using standard SSDs; this balance keeps recovery times manageable and query performance stable.
- The total shards per node should be limited — Elasticsearch recommends no more than 20 shards per GB of heap, with total shards usually capped around 600 per node.
// Example: determining shard count per index
{
"index_size_gb": 200,
"target_shard_size_gb": 30,
"shard_count": Math.ceil(200 / 30) = 7 shards (round up)
}
Consider cluster node count and growth: e.g., 3 data nodes, 7 shards per index, 3 replicas = 42 shards total for that index.
3. Define efficient mappings
Elasticsearch/OpenSearch data mappings are fundamental for performance tuning:
- Field Types: Choose correct types — for example,
keywordfor exact values,textfor analysed full-text,datefor timestamps. - Disable unnecessary fields: Set
"index": falsefor fields not needed in queries. - Use
runtimefields sparingly: They add query-time cost, so prefer explicit mappings when possible.
PUT my-index
{
"mappings": {
"properties": {
"user_id": {
"type": "keyword"
},
"message": {
"type": "text",
"analyzer": "standard"
},
"timestamp": {
"type": "date"
},
"raw_json": {
"type": "object",
"enabled": false
}
}
}
}
Note: disabling indexing on raw_json data stores the field but prevents any query overhead.
4. When to choose dynamic vs explicit mappings
Dynamic mapping is useful for rapid prototyping; however, explicit mappings promote predictability and performance at scale by preventing frequent index structure changes and unnecessary field expansion.
Common pitfalls
Shard overallocation and hot nodes
Excess shards per node cause higher heap usage, longer GC pauses, and slower query responses. Avoid tiny shards (few MBs) or extremely large shards (>50GB).
Uncontrolled field explosion
Dynamic mapping can cause index mapping explosion due to new fields received, leading to degraded indexing and cluster health. Use index templates and explicit mappings where possible.
Text field misuse
A frequent error is mapping numeric or keyword data as text, which adds unnecessary analysis and indexing cost.
Ignoring refresh interval and replica settings
Default refresh interval (usually 1s) can add indexing overhead under heavy load. Adjust it during bulk ingestion phases and scale replicas based on query load.
Validation
After your cluster is sized and mappings defined, validate the configuration with these steps:
Check shard sizes and distribution
# List shards and sizes per node
curl -s -XGET "http://localhost:9200/_cat/shards?h=index,shard,prirep,state,docs,store,node&s=node"
Monitor indexing/search performance metrics
Use _nodes/stats API or OpenSearch Dashboards to track:
- Query latency and throughput
- GC pause times
- Heap and CPU utilisation
- Cache hit ratios (filter and query caches)
Review index mapping and settings
curl -s -XGET "http://localhost:9200/my-index/_mapping?pretty"
Validate that unnecessary text fields or dynamic enabled fields aren’t present unexpectedly.
Checklist / TL;DR
- Benchmark or estimate total data size and growth before sizing shards.
- Target shard sizes in the 10–50 GB range with a limit on shards per node.
- Use explicit mappings with precise field types; avoid dynamic mappings in production.
- Disable indexing on fields not used for search to save heap and disk.
- Monitor cluster stats actively; watch for GC, CPU, and shard sizes.
- Adjust refresh intervals and replica counts based on ingestion and query profiles.
- Validate index mappings and size distribution regularly to prevent performance regressions.