Today, I investigated an interesting issue.
One of my clients called me and complained about high CPU on his server. Server with 0 load at this time frame.
Using my favorite "Activity" script and the sys.dm_os_ring_buffers DMV, I clearly saw that the CPU is being used by a non-SQL server service. Next, I've asked to connect SQL Server machine and opened a task manager.
Surprise, surprise! The CPU was being used by the SQL Server process! Well, kind of...
Database Mail is a feature of SQL Server, but it operates as a separate service.
This feature allows the database engine to send emails using SMTP (Simple Mail Transfer Protocol).
Database Mail - How It Works: (High-level explanation)
SQL Server utilizing stored procedures to send email;
Each email placed as a request in the msdb database's mail queue;
The SQL Server Service Broker manages the queue;
The external Database Mail service processes the request and sends the email via the specified SMTP server;
The status of Database Mail and emails is logged in the msdb database for tracking purposes.
Root Cause:
When SQL Server Database Mail sends emails, it uses a queue processor to handle those messages. In each iteration of this queue processing loop, a System.Timers.Timer object is created.
This is a .NET class used to raise an event at specified intervals. In the context of Database Mail, it is used to manage the timing of email processing tasks.
Due to a bug in early versions of SQL Server 2016, those timer objects are not disposed of after use. As a result, orphaned timer objects accumulated over time.
As more emails are sent, the number of orphaned timer objects increases. These orphaned objects consume CPU resources, leading to unexpectedly high CPU usage. As more emails are sent, the CPU usage increases significantly.
These are objects that are no longer needed, but have not disposed of and leading to resource leaks. CPU in this case.
Interesting! I heard about it but had never encountered a similar issue before.
Investigation:
First let's check SQL Server version:
SELECT @@VERSION;
The answer is: 13.0.7050.2
According to sqlserverbuilds.blogspot.com this build was released in 2016-11-16 and something like 75 additional builds are released after that. ... Cumulative Update 2 for SQL Server 2016 SP1 and Cumulative Update 4 for SQL Server 2016 would resolve the issue.
Now, let's check if the Database Mail is activated on the server and there is no mistake here. Simplest way to do it is to execute system procedure:
EXEC msdb.dbo.sysmail_help_status_sp;
The answer was: STARTED
I wondered if the database was sending a large number of emails. The following script gave me an answer:
SELECT
YEAR(send_request_date),
MONTH(send_request_date),
DAY(send_request_date),
COUNT(*)
FROM
msdb.dbo.sysmail_allitems WITH (NOLOCK)
WHERE
send_request_date > DATEADD(YEAR, -1, GETDATE())
GROUP BY
YEAR(send_request_date),
MONTH(send_request_date),
DAY(send_request_date)
ORDER BY
YEAR(send_request_date) DESC,
MONTH(send_request_date) DESC,
DAY(send_request_date) DESC;
The last month, Database Mail sent almost 27K emails, compared to 4-5K months before. Moreover, most of them sent in the last days of the month!
Resolution:
After some discussion with the client, I realized that there will be no update in the near future. We need search for another way, some work around which can temporarily resolve the problem.
Well, let's try to stop the Database Mail queue.
EXEC msdb.dbo.sysmail_stop_sp;
Execution of this stored procedure stops the Database Mail queue, that holds outgoing message requests. Also, turns off Service Broker activation for the external program.
When the queues are stopped, the Database Mail external program doesn't process messages.
This allows us to stop Database Mail for troubleshooting or maintenance purposes.
I checked the Database Mail error logs, SQL Server error log and system error logs, but did not find any abnormal situations.
Let's try to start the Database Mail queue again.
EXEC msdb.dbo.sysmail_start_sp;
Execution of this stored procedure starts the Database Mail queue.
Additionally, enables the Service Broker activation for the external program.
A simple stop and restart of the Database Mail queue helped. The CPU is dropped to 0 and not back in the next last hours of my additional monitoring.
Steps to fix the Issue:
Apply the Latest Updates: Ensure that your SQL Server instance is updated with the latest cumulative updates that include the fix for this and another issues.
Monitor CPU Usage: Use tools like Task Manager and Performance Monitor to keep an eye on CPU usage.
Restart Database Mail: If for some reason, your SQL Server engine is not up to date and you notice high CPU usage, restarting the Database Mail service can temporarily alleviate the issue.
Hope this helps!
Additional reading and sources:
Ring buffers for health information on availability groups - SQL Server Always On | Microsoft Learn
sysmail_help_status_sp (Transact-SQL) - SQL Server | Microsoft Learn
sysmail_event_log (Transact-SQL) - SQL Server | Microsoft Learn
sysmail_stop_sp (Transact-SQL) - SQL Server | Microsoft Learn
sysmail_start_sp (Transact-SQL) - SQL Server | Microsoft Learn
Among the chaos, this game bitlife online serves as a haven of peace. You can relax and get away from the everyday grind in this tranquil haven.