During my work, I see a lot of databases and TSQL coding. I see many coding mistakes, one of which is counting without actual necessity.
For example, counting inserted or updated rows in the table. Using an IF-ELSE statement based on the count or counting all rows in the table.
Let's talk about it!
The number of affected rows
Many times, I saw the same job done twice, and even more.
For example, there are processes that run the same clause twice. In the beginning, a Data Manipulation Language (DML) statement, like insert, update, or delete, is made. Then, with the same clause, it counts the number of rows affected by the first statement.
Another example is several executions of the same logic in the SELECT statement. Once for the row counting, and the second time for the actual result set printing.
Well, both examples are totally unnecessary work, and we have a built-in function for this! @@ROWCOUNT and @@ROWCOUNT_BIG return the number of rows affected by the last statement.
Just locate it after your statement like this:
SELECT ...
FROM myTable;
SELECT @@ROWCOUNT;
or
UPDATE T
SET T.Column1 = N'X',
T.Column5 = N'Y'
FROM
[dbo].[CommandLog] AS T
WHERE
T.Column9 = GETDATE();
SELECT @@ROWCOUNT;
and that will return the number of impacted or read rows.
IF ELSE statement based on the count
In many cases, it's some variation of this:
IF (SELECT COUNT(column1) FROM MyTable WHERE columnX = @something) > 0
BEGIN
...do something
END
ELSE
BEGIN
...do another something
END
Let's break down the logic and determine what we are doing here.
The SELECT statement counts how many rows in the table are met for the specific question. For that, SQL Server must scan the entire table or index! The problem with counting is that it's a blocking operation!
It doesn't matter whether we count ALL or distinct values. The count should find all matched rows in the table to determine the total. The speed of the result is heavily dependent on the database structure and settings.
The next step after the counting is checking the clause that triggers the IF-ELSE statement. That clause is "greater than" a specific number ... a ZERO! In other words, our condition is met if we discover even one matching row in the table. So, why count all matches? We can stop counting at the first find and start the next step. Right?
Here are some reasons to stop:
As already discussed, it's a blocking operation.
The lock is on an entire object. It can be a table or only an index, but in both cases, it's locked;
For the counting process, the data pages from the disc must be loaded into memory.
During the loading process, data pages are read. So, add an IO operation to the list;
Counting requires CPU labor.
So, locks, IO, memory, and the CPU are all completely unnecessary. Especially, when we know that one single match is enough.
Let's stop this waste! How? Examine for the presence of one match instead of counting all the rows in the table.
IF EXISTS (SELECT 1 FROM MyTable WHERE columnX = @something)
BEGIN
...do something
END
ELSE
BEGIN
...do another something
END
This example uses the same logic as the previous one but uses the EXISTS operator. That operator's role is exactly what we need - to test for the existence of a match. EXISTS will stop execution on the first finding. In other words, once we find a match, we know that there are more than zero rows. At this moment we can stop and trigger the "begin" part.
Recently, I saw a customer environment that reduced CPU utilization by 10% just by this simple change in one central function!!
Nice, doesn't it?
Are we done yet? No, let's discuss an additional scenario so we can stop counting.
Count all rows in the table
Some time ago, I saw a data analyst querying something from one of the biggest log tables in the instance. At that moment, his query had already been executed for more than 10 minutes!
Any guesses, on what he exactly queried for? Yeap, he counted all rows in the table.
Unfortunately, it was not a partitioned heap table, that does not have any key or index. Resources usage, remember? So yes, this query harmed everything: the drive, the CPU, the memory, and all database users in this area of the scheme.
If you need to check the number of rows in the table, you do not need to count them! There is someone who knows the answer. He knows it at every single moment, for each table, even for temporary ones! Any guesses who this is?
The system views! Yep, the system Dynamic Management Views know the exact information!
This query can tell us the number of rows in each specific table:
USE [MyDB]
GO
SELECT
i.rowcnt
FROM
sys.sysindexes AS i
INNER JOIN sys.sysobjects AS o ON i.id = o.id
WHERE
i.indid < 2
AND o.[name] = 'MyTable';
GO
or
USE [MyDB]
GO
SELECT
SUM(d.row_count)
FROM
sys.indexes AS i
INNER JOIN sys.objects AS o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats AS d ON i.OBJECT_ID = d.OBJECT_ID AND i.index_id = d.index_id
WHERE
i.index_id < 2
AND o.[name] = 'MyTable';
GO
Yep, simple :)
If you do one of these, stop it now!
I'm sure that there are other examples of resource waste due to unnecessary work. Start with these changes, and your performance will thank you :)
コメント