A Non-clustered index is built using the same 8K-page B-tree structure that is used to build a Clustered index, except that the data and the Non-clustered index are stored separately. A Non-clustered index is different from a Clustered index in that, the underlying table rows will not be stored and sorted based on the Non-clustered key, and the leaf level nodes of the Non-clustered index are made of index pages instead of data pages. The index pages of the Non-clustered index contain Non-clustered index key values with pointers to the storage location of these rows in the underlying heap table or the Clustered index.
Non-clustered index design considerations
The main goal of creating a Non-clustered index is to improve query performance by speeding up the data retrieval process. Although SQL Server allows us to create multiple Non-clustered indexes, up to 999 Non-clustered on each table that can cover our queries, any index added to the table will negatively impact data modification performance on that table. This is due to the fact that, when you modify a key column in the underlying table, the Non-clustered indexes should be adjusted appropriately as well.
When the index contains all columns required by the query, the SQL Server Query Optimizer will retrieve all column values from the index itself, without the need to perform lookup operations to retrieve the rest of columns in the underlying table or the Clustered index, reducing the costly disk I/O operations. In addition, if the Non-clustered index is built over a Clustered table, the columns that participate in the Clustered index will be appended automatically to the end of each Non-clustered index on that Clustered table, without the need to include these columns to the Non-clustered index key or non-key columns to cover the queries.
Rather than creating a Non-clustered index with a wide key, large columns that are used to cover the query can be included to the Non-clustered index as non-key columns, up to 1023 non-key columns, using the INCLUDE clause of the CREATE INDEX T-SQL statement, that is introduced in SQL Server 2005 version, with a minimum of one key column. The INCLUDE feature extends the functionality of the Non-clustered index, by allowing us to cover more queries by adding the columns as non-key columns to be stored and sorted only in the leaf level of the index, without considering that columns values in the root and intermediate levels of the Non-clustered index. In this case, the SQL Server Query Optimizer will locate all required columns from that index, without the need for any extra lookups.
With all these capabilities provided by SQL Server, it is highly recommended to avoid adding too many key or non-key columns to the Non-clustered index that are not required by the queries. This is due to the large disk space that is required to store that index and the large number of pages required to store the index data, because adding too many columns to the index will result with fewer number of rows that can be fit in each data page, increasing the I/O overhead and reducing the caching efficiency. You can also imagine the data modification overhead resulted from such large indexes.
The candidate columns for the Non-clustered index key are the ones that are frequently involved in the GROUP BY clause or in the JOIN or WHERE conditions, that will cover the submitted queries and return exact match values, rather than returning a large set of data. The semi-unique columns that have a largenumberofdistinct values are good candidates also as Non-clustered index key columns. For the column that has few numbers of distinct values, such as the Gender column, you can take benefits from creating a filtered index.
Comments