The clustered index can be beneficial for the queries that read large result sets of ordered sequential data. In this case, the SQL Server Engine will locate the row with the first requested value using the clustered index, and continue sequentially to retrieve the rest of rows that are physically adjacent within the index pages with the correct order, without consuming the SQL Server Engine time and resources in sorting the data that is already sorted in the clustered index, affecting the overall query performance positively.
- Short: Although SQL Server allows us to add up to 16 columns to the clustered index key, with maximum key size of 900 bytes, the typical clustered index key is much smaller than what is allowed, with as few columns as possible. The wide Clustered index key will also affect all non-clustered indexes built over that clustered index, as the clustered index key will be used as a lookup key for all the non-clustered indexes pointing to it.
- Static: It is recommended to choose the columns that are not changed frequently in the clustered index key. Changing the clustered index key values means that the whole row will be moved to the new proper page to keep the data values in the correct order.
- Increasing: Using an increasing column, such as the IDENTITY column, as a clustered index key will help in improving the INSERT process, that will directly insert the new values at the logical end of the table. This highly recommended choice will help in reducing the amount of memory required for the page buffers, minimize the need to split the page into two pages to fit the newly inserted values and the fragmentation occurrence, that required rebuilding or reorganizing the index again.
- Unique: It is recommended to declare the clustered index key column or combination of columns as unique to improve the queries performance. Otherwise, SQL Server will automatically add a uniqueifier column to enforce the clustered index key uniqueness.
- Accessed frequently: This is due to the fact that the rows will be stored in the clustered index in a sorted order based on that index key that is used to access the data.
- Used in the ORDER BY clause: In this case, no need for the SQL Server Engine to sort the data in order to display it, as the rows are already sorted based on the index key used in the ORDER BY clause.
Clustered index key appropriate data types
When designing a clustered index, you should consider that some data types are generally better than other data types to be used as clustering keys. For instance, the columns with SMALLINT, INT and BIGINT data types are the best choices as clustered index keys, especially when used in conjunction with the IDENTITY constraint, that enforces their values to increase sequentially. In addition, the IDENTITY integer values are narrow, due to its small size, unique, if you enforce the column uniqueness with a constraint, and static, as they are generated automatically by the system and not visible to the users.
Although the GUIDs values, that are stored in the uniqueidentifier columns, are commonly used as clustered index key, there are some challenges that accompany that design. The main challenge that affects the clustered index key sorting performance is the nature of the GUID value that is larger than the integer data types, with 16 bytes size, and that it is generated in random manner, different from the IDENTITY integer values that are increasing continuously. The large size and randomness generation of the GUID values will always lead to the page splitting and index fragmentation problems, which negatively affect the clustered index usage performance.
The Character data types can be also used, but not recommended, as clustered index keys. This is due to the limited sorting performance of the character data types, the large size, non-increasing values, non-static values that often tend to change in the business applications and not compared as binary values during the sorting process, as the characters comparison mechanism depends on the used collation. Even though the Date data types are not unique, it has a small size and provides good sorting performance, especially for the queries that search for data ranges.
Conclusion
A clustered index that is designed properly can reduce the I/O and CPU resources consumption amount, therefore improving the queries and overall system performance.
Take into consideration that, when you create a clustered index on a table, all non-clustered indexes created on that heap table will be rebuilt to replace the row identifier (RID) with the clustered index key. So that, it is better always to start with creating the clustered index then proceed with creating the non-clustered index over it.
Comments