Monday, December 8, 2014

Covering Indices

I had a question posed to me where a person was working on a procedure which was running great until he added one new column to the table and query. To start at the end and work my way forwards, the issue was that the new column cause the query to no longer be "covered". What the heck does that mean? In short, it means that if you look up something in a given index, every piece of information SQL Server needs to return that data can be found in the index. To understand how that's accomplished you'll need a little primer on index structure. This is not an in-depth review of indices, but the concepts will be good enough. Take the following table, with some test data put in:
if object_id('tempdb.dbo.#Users') is not null drop table #Users
create table #Users
(
    UserID int identity(1,1) primary key clustered,
    FirstName varchar(100),
    LastName varchar(100),
    Birthday date
)

insert into #Users (FirstName, LastName, Birthday)
values ('Gabe', 'Tower', '1986-08-29'),
    ('Abe', 'Lincoln', '1809-02-12'),
    ('Michael', 'Phelps', '1985-06-30'),
    ('Ving', 'Rhames', '1959-05-12')

insert into #Users
(
    FirstName,
    LastName,
    Birthday
)
select top 100000
    'FirstName_' + cast(newid() as char(36)),
    'LastName_' + cast(newid() as char(36)),
    Birthday = a.create_date
from sys.objects a, sys.objects b, sys.objects c
The clustered index IS the table, sorted by the primary key (in this case, UserID). You can envision this structure pretty intuitively because it look exactly like your data:
Any search on the clustered index is, by definition, a covered query, because the clustered index consists of everything in the table. Take the folowing simple query:
select UserID, FirstName, LastName, Birthday
from #Users
where UserID = 2
Internally, SQL reads through the clustered index till it finds UserID: 2 and it's done.
Now what happens if you want to search by last name (a more realistic example)? Well lets build an index to support that:
create nonclustered index #idx_#Users_LastName_NonCovering on #Users (LastName)
Internally, that index consists of your key column(s) (in this case, just LastName), and a pointer to the row on which it lives. Something like this:

Now let's search this table for the last name, Lincoln, using, for now, only the columns the index covers.
select UserID, LastName
from #Users
where LastName = 'Lincoln'
You get a nice speedy lookup, just as you'd expect. When it finds the LastName "Lincoln" the index literally contains everything needed to satisfy this query, i.e. the LastName (as part of the index key) and the UserID (as part of the Clustered index pointer).
But what happens if I want to return my FirstName when I lookup my last name? The nonclustered index doesn't have this information directly available. It has to lookup this info in the clustered index by means of the clustered index pointer. This is a costly operation and can bring a query to its knees very quickly if you don't know what you're looking for. But lets say I know that most of the time I lookup a last name, I also care about the first name. Wouldn't it be nice not to have to make that additional jump to look up that value in the clustered index?
select UserID, LastName, FirstName
from #Users
where LastName = 'Lincoln'

Enter Include columns. By using the INCLUDE keyword in an index declaration, you can sort of pin additional values on to the index for just this purpose Let's take our original Nonclustered index we added and include FirstName in it. I'll remind you at this point that an index can cover a query by virtue of being part of the KEY of the index and not an included column, but included columns are a much more lightweight way to get at this data if you're not needing to search on those values.
create nonclustered index #idx_#Users_LastName_Covering on #Users (LastName) 
include (FirstName)
This index now looks like the first, but with included values in-line, so it doesn't ever need to go anywhere else, and once again, the search on LastName including FirstName is "covered".



When I first started developing, I found it hard to think about indexes as physical objects, but once I got a handle for what a nonclustered index looked like and how it related to the clustered index, this sort of thing started to make perfect sense to me. I hope this gives you an inkling as to what's going on here when a query isn't covered, and perhaps, more importantly, what the heck your developer friends mean when they use the word.

No comments:

Post a Comment