Everyone’s talking about “Big Data”, huge tables and data explosion right?
Can we have a problem of a table that is too small?
Imagine you have a very small table (up to a few thousand rows) that fits into a single SQL Server page, being constantly read and updated, and demands a very high throughput. You can get into a few interesting problems:
Blocking
If a few sessions are constantly updating and reading from the table, blocking can be an issue. When a session reads or updates the table and the lock is at the page level, it means that the whole table is blocked for other sessions. This situation seriously hurts concurrency.
Solution: Disable page locks by rebuilding the indexes WITH_ALLOW_PAGE_LOCKS = OFF. This will tell SQL Server to work only with row locks. However, when a session acquires about 5000 locks on a table, it escalates to a table lock, and in order to prevent that, you should use the ALTER TABLE command to disable lock escalation for that table.
When you do that, you need to watch out for the amount of memory consumed by locking.
Latching
Locks are used to in order to protect the data from logical errors. Latches, however, are used to protect the physical in-memory structures.
Whenever a session reads or updates a page, it has to put a latch on it in order to prevent other sessions from changing the underlying page at the same time. This problem can be seen, for example, when we use an ever-increasing key and all sessions need to get a latch on the last page of the index.
When we have a single-page table and many concurrent sessions working against it, we can get to a latch contention state that looks about like this:
Solutions:
Make the table bigger: If we spread the rows of the table on a few pages instead of just 1, we will reduce the amount of concurrent sessions that need to get a latch on every single page, thus spreading the load and improving concurrency. We can do that by using a bigger fill factor. Sure, the table will be bigger, but it will still be so small that it won’t make a difference. It will look about like this:
Use In-Memory OLTP: The In-Memory OLTP engine uses lock-free and latch-free data structures, in addition to other benefits. By using that, latches will be eliminated. However, In-Memory OLTP requires application changes and has some limitations in SQL Server 2014, so watch out. Many of those limitations will be removed in SQL Server 2016.
Statistics and Recompilation
The whitepaper Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 explains when a statistics object is considered out of date:
The table size has gone from 0 to >0 rows (test 1).
The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
In such small but frequently-updated tables, we can get to a situation where statistics are constantly considered out of date. In such a case, and if the Auto-Update Statistics option is on, many or all of our queries against the table will trigger the Update Statistics operation and will suffer from degraded performance. Furthermore, stored procedure that reference the table will be frequently recompiled, because when statistics on a table change, the referencing stored procedures are recompiled on the next execution. On small tables, Update Statistics and recompile are not worth the overhead of executing them.
Solutions:
Disable Auto Update Statistics
Use the KEEPFIXED PLAN query hint. This option is less recommended, but possible. It will make the optimizer ignore any statistics change and will continue using the execution plan it already has in cache
WRITELOG Waits:
As you might know, SQL Server has to persist every transaction to disk before it is considered as committed. In such small tables, the only synchronous disk-based piece in the process is writing to the transaction log, and that can make things much slower, since disk is slower than memory by a few orders of magnitude.
Solutions:
Optimize transaction management by using explicit transactions, as I showed here
Use SQL Server 2014 Delayed Durability. Instead of immediately writing the transaction to disk, Delayed Durability will insert the transaction into an in-memory buffer that will be flushed to disk only when it gets to a size of 60KB. This can dramatically improve performance, but has a potential for a small data loss if the server crashes.
In conclusion:
Small tables with high throughput can have challenges of their own. By understanding the internals and applying changes, you can overcome them.
However, you should also think about whether SQL Server is the right place to store such a table. Can it just work in the application memory? Can it fit better inside a NoSQL database? Can you use some kind of streaming technology for processing the table? There are more options today, and it’s worth knowing them too.
Commentaires