Indexes Are Data Structures, Not Magic Switches
An index is not a shortcut that makes data instantly accessible. It is a separate data structure—most commonly a B-tree or variant—that must be stored, updated, and navigated.
Every index duplicates part of the underlying table:
- Column values
- Row identifiers
- Structural metadata
This duplication has a cost. The database must maintain consistency between the table and every index on every write operation.
Indexes speed up reads only when their structure aligns with how data is queried.
The Optimizer Chooses, You Do Not
Developers often assume that if an index exists, the database will use it. In reality, the query optimizer evaluates multiple possible execution plans and chooses the one with the lowest estimated cost.
When many indexes exist:
- The optimizer has more plans to evaluate
- Cost estimation becomes noisier
- Small statistical inaccuracies compound
This increases planning time and raises the probability of choosing a suboptimal plan, especially when data distribution changes over time.
Index Selectivity Is the Core Constraint
Indexes work best when they significantly reduce the search space. This property is called selectivity.
An index on a column with:
- High cardinality (many unique values) → high selectivity
- Low cardinality (few distinct values) → low selectivity
Low-selectivity indexes often look useful but rarely are. The database may scan large portions of the index and still touch most rows in the table, making the index overhead worse than a sequential scan.
Adding indexes on boolean flags, status fields, or small enums often produces negative returns.
Write Amplification Happens Quietly
Every insert, update, or delete must update:
- The table itself
- Every index referencing the modified columns
As indexes accumulate, write operations become increasingly expensive. This is known as write amplification.
The symptoms are subtle:
- Inserts slow down gradually
- Background maintenance increases
- Lock contention rises
- Latency spikes appear under load
Read performance may improve slightly, but overall system throughput declines.
Cache Pollution and Memory Pressure
Indexes consume memory. When too many indexes exist, they compete for cache space with:
- Frequently accessed table pages
- Other critical indexes
- Execution plans
This leads to cache churn. Useful data is evicted more often, forcing disk access. Queries that were previously fast become inconsistent.
The paradox appears:
more indexes → less effective caching → slower reads.
Composite Indexes Are Commonly Misused
Composite indexes depend on column order. An index on (a, b, c) is not equivalent to indexes on (b, a) or (a, c).
Many systems accumulate overlapping composite indexes created to “cover all cases.” In practice:
- Only the leftmost prefix is used
- Redundant indexes provide no benefit
- Maintenance cost multiplies
A small number of well-designed composite indexes often outperforms dozens of overlapping ones.
Index Fragmentation Increases Tree Depth
Indexes change over time. Inserts and deletes fragment their internal structure. As fragmentation grows:
- Tree depth increases
- Cache locality decreases
- Range scans become more expensive
More indexes mean more structures to fragment and maintain. Without regular maintenance, performance degrades even if query patterns remain unchanged.
Statistics Drift Breaks Index Effectiveness
The optimizer relies on statistics to estimate selectivity and cost. These statistics age as data changes.
With many indexes:
- Statistics updates become heavier
- Sampling errors increase
- Plans chosen yesterday may be wrong today
This leads to unpredictable behavior: the same query performs differently depending on timing, data churn, or cache state.
Indexes Can Inhibit Better Plans
Sometimes the presence of an index prevents the optimizer from choosing a better execution strategy.
Examples include:
- Choosing an index scan instead of a faster sequential scan
- Forcing nested loops where hash joins would perform better
- Overusing index-only scans that still hit disk due to visibility checks
The optimizer avoids some plans because an index appears cheaper—even when it is not in practice.
Why Performance Problems Feel Random
From the outside, systems with too many indexes behave erratically:
- Queries are fast one hour, slow the next
- Performance changes after deployments with no query changes
- Load spikes cause disproportionate slowdowns
This randomness reflects internal contention: memory pressure, lock competition, planner uncertainty, and cache churn interacting unpredictably.
What Actually Improves Index Performance
Effective indexing is subtractive, not additive.
It requires:
- Measuring real query patterns
- Removing unused or low-selectivity indexes
- Aligning composite indexes with actual WHERE and ORDER BY clauses
- Monitoring write latency alongside read speed
- Periodic re-evaluation as data grows
The best-performing systems usually have fewer indexes than expected, not more.
Why “Just Add an Index” Is a Trap
Adding an index feels safe because it does not break functionality. But performance degradation is gradual and delayed.
By the time the system feels slow, the cause is buried under layers of historical decisions. Indexes added for short-term wins become long-term liabilities.
Understanding this dynamic turns performance tuning from guesswork into engineering.