In show #84 of the SQL Server Radio Hebrew Edition, Matan and I talked about how to calculate running totals in SQL Server. I promised to upload a script with a complete demo of all the possible methods. So here it is.
The script first creates the “RunningTotals” database. Notice that you first need to create the “C:\RunningTotals” folder, because this is where SQL Server will try to store the database files. Alternatively, you can modify the CREATE DATABASE statement in the script to store the files in a different folder. Then the script creates the “Billing.Transactions” table, and populates it with test data. The table includes the “Balance” column, which is set to NULL before each method is being used. The goal is to populate the “Balance” column with the balance in each account after each transaction.
Here are the methods that I demonstrate in the script in descending order of execution time:
1. Cursor with an UPDATE statement in each iteration
2. Cursor with a temporary table and an INSERT statement in each iteration
3. Correlated sub-query
4. Window Function – SUM () OVER…
5. In-place UPDATE with ORDER BY
6. In-place UPDATE with an index hint
The nice thing about this script is that the execution time of each method is better than the previous one. We start with several minutes for the first method, and finish with just 7 seconds (on my laptop) to calculate the running totals for the entire table (one million transactions).
It’s very important to note that you can’t trust the two last methods (in-place update). In order to verify that these methods are reliable, we need to verify that we execute the update in an ordered fashion.
Method #5 uses a sub-query with a TOP 100 PERCENT and an ORDER BY in order to achieve that. This method is not supported by SQL Server (it compiles and runs successfully, but there is no guarantee regarding the order of rows). In recent versions of SQL Server, the optimizer completely ignores it.
Method #6 handles the sort order by specifying an index hint with the clustered index name. Again, it compiles and runs, but it doesn’t guarantee the sort order of the query.
The reason the two methods almost always produce the correct results is because the clustered index is already sorted in the desired way, and the storage engine simply scans the clustered index form left to right. It always works in the demo, because we have just created the index. In a real-life scenario, where there is fragmentation in the index, and pages are moved around, these methods might produce incorrect results.
Also, these two methods are not ANSI-SQL compliant. ANSI-SQL does not support the syntax of updating a column and a local variable in the same statement.
So the best solution, which is also reliable and supported, is method #4 – using the SUM aggregate function as a window function.
Comentarios