Best Practices on RDS Indexes

6 min readFeb 21, 2024

Creating and maintaining indexes on Amazon RDS (or any relational database) can significantly improve query performance, but it’s important to follow best practices to avoid potential issues. Here are some best practices:

Understand Your Workload: Analyze your application’s queries to understand which columns are frequently used in WHERE, JOIN, ORDER BY, and GROUP BY clauses. These are good candidates for indexing.

Avoid Over-Indexing: While indexes can speed up read operations, they can slow down write operations (INSERT, UPDATE, DELETE) because the index also needs to be updated. Therefore, create indexes judiciously.

Consider Composite Indexes: If you often query multiple columns together, consider creating a composite index that includes those columns. The order of columns in the index can also affect its performance.

Cardinality: Columns with high cardinality (many unique values) often make good candidates for leading columns in an index.
Order of Columns: The order of columns in the index and order of columns in the where clause should be the same
Covering Indexes: If a composite index includes all the columns that a query needs, the database can retrieve the query results from the index without having to access the table. This is known as a “covering index” and can significantly improve query performance.
Avoid Redundant Indexes: If you have a composite index on (col1, col2), you don’t need another index on col1. The composite index can be used for queries on col1 alone, but not for queries on col2 alone.

Analyze Your Queries: The indexes should be designed around your application’s queries. Use the EXPLAIN statement to understand how your queries use indexes.

EXPLAIN Vs EXPLAIN ANALYZE: The former doesn’t execute the query to generate the plan, but the later execure the query as part of generating the plan.
If you see “filesort” in the EXPLAIN statement, it means the engine could not find a suitable index and had to use a temp table to do the operation. This will typically be realted to ORDER BY , GROUP BY etc.

The following are some of the standard practices on indexing columns on a query.

Between 2 tables, if there is a FK relationship, create that relationship. mySql automatically created an index on the FK record. If you are joining using the FK column in your queries, then they are all covered.
If you are joining using 2 columns, it might be a good idea to see if an index covering both columns will improve the query. Make sure index has the same order as the order of columns in the JOIN.
Columns in the WHERE clause and ORDER BY need to be part of a non-clustered index.
Sometimes, even when all the indexes are in place, the index might not be chosen. You can use a FORCE INDEX command to forcefully chose an index.

SELECT [columns] 
FROM TableA a
JOIN TableB b FORCE INDEX (idx_tablea_key_createddate)  on a.Column1 = b.Column1
WHERE b.Key = 'key1'
Order By b.CreatedDate DESC LIMIT 5000;

- FK implicit index will take care of the JOIN
- Need an index for (Key and CreatedDate DESC)on tableB, but it was ending up with 
filesort and hence had to use FORCE INDEX command as a last resort.

Maintain Your Indexes: Over time, as data changes, indexes can become fragmented, which can degrade performance. Use the appropriate commands (like OPTIMIZE TABLE in MySQL or REINDEX in PostgreSQL) to defragment your indexes.

Check Fragmentation First: It’s a good idea to check the fragmentation ratio of each table before running OPTIMIZE TABLE. You can do this by comparing the data_length and data_free values in the information_schema.tables table. If the fragmentation ratio is high, then running OPTIMIZE TABLE could be beneficial.

select ENGINE,
  concat(TABLE_SCHEMA, '.', TABLE_NAME) as table_name,
  round(DATA_LENGTH/1024/1024, 2) as data_length,
  round(INDEX_LENGTH/1024/1024, 2) as index_length,
  round(DATA_FREE/1024/1024, 2) as data_free,
  (data_free/(index_length+data_length)) as frag_ratio
FROM information_schema.tables
WHERE DATA_FREE > 0
AND TABLE_SCHEMA = '<schema_name>'
ORDER BY frag_ratio DESC;

# frag_ratio > 5 is a good candidate for optimization

Monitor Index Usage: Use the performance insights and monitoring tools provided by Amazon RDS to monitor how your indexes are being used. If an index is not being used, consider removing it to save space and improve write performance.

# Find index usage
SELECT 
    OBJECT_SCHEMA AS `Database`, 
    OBJECT_NAME AS `Table`, 
    INDEX_NAME AS `Index`, 
    COUNT_FETCH AS `Index Reads`
FROM 
    performance_schema.table_io_waits_summary_by_index_usage
WHERE 
    INDEX_NAME IS NOT NULL 
    AND OBJECT_SCHEMA like '%<schema_name>%'
ORDER BY 
    COUNT_FETCH DESC;

Slow Query Log: Enable the slow query log in MySQL to track queries that take a long time to execute. You can then analyze these queries to see if performance could be improved with better indexing.

It’s a log file that MySQL generates and writes to directly. You can enable it by setting certain system variables, either at startup or dynamically while the server is running. Once the slow query log is enabled, MySQL will start logging any queries that take longer than long_query_time to execute to the specified log file.

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
SET GLOBAL slow_query_log_file = 'logfile.log';

Please note that the slow query log can grow quite large on a busy server, so it’s a good idea to monitor its size and rotate it periodically.

Basics to Know Before Optimization

What is table statistics? In the context of databases, statistics are not considered an external entity. Instead, they are an integral part of the database system itself. Statistics are metadata that the database management system (DBMS) collects and maintains to help the query optimizer make informed decisions about the most efficient way to execute queries.
How is statistis data collected by DBMS? The DBMS periodically collects and updates statistics through operations like ANALYZE TABLE in MySQL or UPDATE STATISTICS in SQL Server. These operations scan the data and update the metadata used by the query optimizer.
What is Index Selectivity? Index selectivity refers to how well an index can distinguish between different rows in a table. High selectivity means that the index can narrow down the search to a few rows, while low selectivity means that the index will return many rows. The query optimizer uses statistics to estimate the selectivity of indexes.
What does Query Optimizer do? The query optimizer uses statistics to choose the best execution plan for a query. For example, it might decide whether to use an index scan or a full table scan based on the estimated selectivity of the index.
a

Periodic Maintenances

OPTIMIZE TABLE: This command performs several actions as follows. Use this when you need to reclaim space and defragment the table, especially after large deletions or updates.

- Reclaims space by defragmenting the data file.
- Rebuilds the table and its indexes.
- Updates the table statistics.

OPTIMIZE TABLE your_table_name;sw

ANALYZE TABLE : Use this when you only need to update the table statistics without the overhead of defragmentation and rebuilding indexes.

ANALYZE TABLE your_table_name;

Rebuild Indexes : Using ALTER TABLE your_table_name ENGINE=InnoDB; to rebuild indexes is a specific maintenance task that can be beneficial under certain conditions. If you notice a significant degradation in query performance, rebuilding indexes might help. This can be particularly useful if the table has undergone substantial changes. If you only need to rebuild indexes and update statistics without defragmenting the table, using ALTER TABLE ... ENGINE=InnoDB can be a more efficient option.

ALTER TABLE your_table_name ENGINE=InnoDB;

CHECK TABLE: The CHECK TABLE command in MySQL is used to check a table or tables for errors. It can be used to verify the integrity of the table structure and the data within it. This command is particularly useful for diagnosing and troubleshooting issues with tables, such as corruption or inconsistencies.

 CHECK TABLE your_table_name;

Sample o/p:
+-----------------+-------+----------+----------+
| Table           | Op    | Msg_type | Msg_text |
+-----------------+-------+----------+----------+
| your_db.your_table_name | check | status   | OK       |
+-----------------+-------+----------+----------+

QnA:

I have a query that already has required indexes defined, but when I execute the query, it takes too long and times out. How to analyze this situation?

First run the query with EXPLAIN and see the rows index will scan
If it is scanning huge no of rows, then run the query with first column in the where clause and see the result with a LIMIT.
If the first column response looks good, keep adding columns and see which column causes timeout.
Some cases, recreating the index by bringing a range column as the first column can help

Select col1, col2, col3
FROM some_table
WHERE status = 'active' and date > '2023-11-01' AND date < '2023-11-02'

# If the tablke has millions of records with status = active, 
# then the query might take longer time
# create a new index with - date, status

Select col1, col2, col3
FROM some_table
WHERE (date > '2023-11-01' AND date < '2023-11-02') AND status = 'active'

Best Practices on RDS Indexes

Basics to Know Before Optimization

Periodic Maintenances

Written by Sudheer Kumar

No responses yet