Sachith Dassanayake Software Engineering Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide — Practical Guide (Nov 2, 2025)

Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide — Practical Guide (Nov 2, 2025)

Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide — Practical Guide (Nov 2, 2025)

Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide

body { font-family: Arial, sans-serif; line-height: 1.6; max-width: 900px; margin: 1em auto; padding: 0 1em; }
h2, h3 { color: #2c3e50; }
pre { background: #f4f4f4; padding: 1em; border-radius: 5px; overflow-x: auto; }
p.audience { font-weight: bold; color: #2980b9; }
p.social { font-style: italic; color: #7f8c8d; margin-top: 2em; }
ul { margin-left: 1.5em; }

Elasticsearch/OpenSearch sizing & mappings — Performance Tuning Guide

Level: Intermediate

As of November 2, 2025, this guide focuses primarily on Elasticsearch 8.x and OpenSearch 2.x versions.

Prerequisites

This guide assumes you are operating Elasticsearch 8.x or OpenSearch 2.x clusters, familiar with basic concepts like nodes, indices, shards, and mappings. Understanding JSON mapping syntax and cluster monitoring is helpful but not mandatory.

Key knowledge and environment factors to have in place:

  • Cluster version: Elasticsearch 8.x (released since mid-2022) or OpenSearch 2.x (from late 2023). Both have largely compatible APIs, but note some differences in features and defaults.
  • Business workload characteristics: Document size, ingestion rate, search query profiles (e.g., aggregations, full-text, filters).
  • Hardware and infrastructure: CPU, RAM, storage type (SSD recommended), and network topology.
  • Basic monitoring: Access to metrics via OpenSearch Dashboards/Kibana, or APIs like _cat/nodes and _cluster/stats.

Hands-on steps

1. Understanding the role of sizing and mappings in performance

Sizing your Elasticsearch/OpenSearch cluster and defining effective index mappings are tightly coupled decisions that influence throughput, latency, storage utilisation, and query accuracy.

  • Shard count and size: Oversharding leads to CPU pressure and cluster instability; too few shards reduce parallelism and can bottleneck IO.
  • Mapping definitions: Selecting the right field types, enabling/disabling indexing features, or using multifields affect memory footprint and search performance.

2. Calculating shard sizing

As a best practice from Elasticsearch 8.x and OpenSearch 2.x documentation:

  • Target shard size is typically 10–50 GB for data nodes using standard SSDs; this balance keeps recovery times manageable and query performance stable.
  • The total shards per node should be limited — Elasticsearch recommends no more than 20 shards per GB of heap, with total shards usually capped around 600 per node.

// Example: determining shard count per index
{
  "index_size_gb": 200,
  "target_shard_size_gb": 30,
  "shard_count": Math.ceil(200 / 30) = 7 shards (round up)
}

Consider cluster node count and growth: e.g., 3 data nodes, 7 shards per index, 3 replicas = 42 shards total for that index.

3. Define efficient mappings

Elasticsearch/OpenSearch data mappings are fundamental for performance tuning:

  • Field Types: Choose correct types — for example, keyword for exact values, text for analysed full-text, date for timestamps.
  • Disable unnecessary fields: Set "index": false for fields not needed in queries.
  • Use runtime fields sparingly: They add query-time cost, so prefer explicit mappings when possible.

PUT my-index
{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "message": {
        "type": "text",
        "analyzer": "standard"
      },
      "timestamp": {
        "type": "date"
      },
      "raw_json": {
        "type": "object",
        "enabled": false
      }
    }
  }
}

Note: disabling indexing on raw_json data stores the field but prevents any query overhead.

4. When to choose dynamic vs explicit mappings

Dynamic mapping is useful for rapid prototyping; however, explicit mappings promote predictability and performance at scale by preventing frequent index structure changes and unnecessary field expansion.

Common pitfalls

Shard overallocation and hot nodes

Excess shards per node cause higher heap usage, longer GC pauses, and slower query responses. Avoid tiny shards (few MBs) or extremely large shards (>50GB).

Uncontrolled field explosion

Dynamic mapping can cause index mapping explosion due to new fields received, leading to degraded indexing and cluster health. Use index templates and explicit mappings where possible.

Text field misuse

A frequent error is mapping numeric or keyword data as text, which adds unnecessary analysis and indexing cost.

Ignoring refresh interval and replica settings

Default refresh interval (usually 1s) can add indexing overhead under heavy load. Adjust it during bulk ingestion phases and scale replicas based on query load.

Validation

After your cluster is sized and mappings defined, validate the configuration with these steps:

Check shard sizes and distribution


# List shards and sizes per node
curl -s -XGET "http://localhost:9200/_cat/shards?h=index,shard,prirep,state,docs,store,node&s=node"

Monitor indexing/search performance metrics

Use _nodes/stats API or OpenSearch Dashboards to track:

  • Query latency and throughput
  • GC pause times
  • Heap and CPU utilisation
  • Cache hit ratios (filter and query caches)

Review index mapping and settings


curl -s -XGET "http://localhost:9200/my-index/_mapping?pretty"

Validate that unnecessary text fields or dynamic enabled fields aren’t present unexpectedly.

Checklist / TL;DR

  • Benchmark or estimate total data size and growth before sizing shards.
  • Target shard sizes in the 10–50 GB range with a limit on shards per node.
  • Use explicit mappings with precise field types; avoid dynamic mappings in production.
  • Disable indexing on fields not used for search to save heap and disk.
  • Monitor cluster stats actively; watch for GC, CPU, and shard sizes.
  • Adjust refresh intervals and replica counts based on ingestion and query profiles.
  • Validate index mappings and size distribution regularly to prevent performance regressions.

References

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Post