What comes first, the chicken or the egg? What happens first – update statistics or re-compilations?
The Statistics life cycle
As a regular process of our DB life, data in our tables changes and hence statistics become less accurate. After a certain amount of changes (which I will discuss later in this post), SQL Server marks statistics to be non-valid.
This process marks the statistics as non-valid but doesn’t actually update them. Only when a query is compiled will SQL Server update the relevant statistics (if auto update statistics is on of course). After the statistics are updated, all the queries that rely on these statistics will be marked for re-compilation.
So actually the process is:
1. Many Changes (updates/ inserts/deletes) –> Causes Non Valid Statistics
2. Query Compilation (happens Independently) –> Causes Update Statistics –> Causes All related queries to be re-compiled
When are Statistics marked as non-valid
After how many changes to the table are Statistics marked as non-valid?
After running the following script you’ll find out that SQL Server marks statistics as non-valid if more than 20% of the rows were changed (counting changes only on columns related to the statistics).
Trace Flag 2371
20% of row changes sounds crazy high if the table has a lot of rows !
Consider for instance, a table containing data from the past 7 years. It would demand to wait a whole 1.4 year of new data to have the statistics updated ! (assuming only inserts are done and same amount of inserts per year). So if you don’t update your statistics manually and have very large tables, you should have a close look at Trace Flag 2371. It actually changes the 20% threshold to be smaller according to the amount of rows in the table. The following chart shows the new thresholds:
The new Trace Flag is currently available on SQL Server 2008 R2 SP1 and on SQL Server 2012.
I recommend reading 2 very interesting posts on Update Statistics:
2. Changes to automatic update statistics in SQL Server – traceflag 2371 (Juergen Thomas)
Comments