There seems to be a lot of confusion around the differences between temp tables and table variables (in the classic sense; I'm not talking about in memory table types or anything here). Some people say you should only use table variables. Some people say you should only use temp tables. Most people caution a more nuanced approach. Here I want to cover what I think are the biggest issues in an attempt to shed some light on this surprisingly tricky topic.
Statistics
This is usually the most important factor when considering which one to use. It boils down to two facts, and everything else flows from these:
Temp Tables maintain statistics, Table Variables do not.
Table Variables
Because table variables maintain no statistics, the Query Optimizer always assumes the contain
exactly one row. As a result, any joins against a table variable, unless explicitly told to behave otherwise will (probably) always
be a nested loop join.
As long as the number of rows in the table variable are small, that's not really an issue as nested loop joins are ideal for such cases. But unless you literally have one row of data, you
know the optimizer is getting an incorrect query plan. So most of the problems with table variables arise because of this.
Alright, so you've now sworn off table variables, and are ready to join #TeamTempTable right? Hold that thought because it's time to discuss...
Temp Tables
Without going into a full description of a temp tables, you can basically think of them almost like any other table. They can be indexed, constrained, altered, and they maintain statistics. This is obviously quite helpful if you want to overcome that one-row assumption that you're saddled with when using a table variable.
But this strength is also one of the most compelling reasons to avoid them
in certain cases. Variable workloads with high volume incur some extra overhead to maintain those, but more importantly, they can stick you with a bad query plan. The first time you run your procedure, you might have ten rows of data and the second time you might have ten million. The optimal plans for those two queries will probably not be the same.
Using a table variable over a temp table is a lightweight way to avoid having to litter your code with OPTION (RECOMPILE) or other hints to try to bypass the statistics on your temp table.
Lesser Considerations
In-Memory Myth
At some point the myth that Table Variables live exclusively in memory seems to have been propagated. This is often touted as a reason to favor Table Variables over their disk-based Temp Table counterparts.
This is simply not true. Both temp tables and table variables live on disk in tempdb and store in-use pages in memory. This myth is sometimes also used to justify keeping the number of rows in a table variable low when compared to a disk based table, but again, it's just a faulty assumption.
Participation in Transactions
All other factors being equal, table variables are more lightweight
primarily because they don't have the same statistics overhead, and they
don't participate in explicit transactions. Neither of these by
themselves make much difference, but if it's being frequently called,
those milliseconds can add up.
Which Do I Use?
This is the part where you get the "it depends" answer. But assuming that's not why you're reading this, here are some general rules I used when deciding which to use (pretty much in order)
- If there are no joins or correlated subqueries (i.e. it's just an intermediate place to store data, or all the operations take place on the table itself) use a table variable.
- Basically anywhere where the cardinality estimate isn't going to come into play, or it would have to do a table scan anyway
- A good example would be storing the OUTPUT of a DML statement which is then read out or insert into a log table later.
- If there are joins or correlated subqueries
- Reach for a table variable if the rows are under about 100 (no, this isn't a hard rule)
- Otherwise, a temp table with statistics you can leverage to get a better query plan or get different join types may work better
- If you have a highly variable workload, consider a table variable as it's lack of statistics may actually be a benefit to you so you don't get a bad query plan.
- If your proc is called very infrequently and it's more beneficial to be able to add indexes to the temp table for a large workload, a temp table is probably preferable.
- Literally everything else being equal, a table variable has less overhead, so it would be my default go to. But the caveat that everything else must be equal is a pretty high bar to hit.
Comments
Post a Comment