Source: https://michaeljswart.com/2014/09/take-care-when-scripting-batches/

The Straight Query

Suppose we want to remove sales data from FactOnlineSales for the “Worcester Company” whose CustomerKey = 19036. That’s a simple delete statement:

DELETE FactOnlineSales WHERE CustomerKey = 19036;

This delete statement runs an unacceptably long time. It scans the clustered index and performs 46,650 logical reads and I’m worried about concurrency issues.

Naive Batching

So I try to delete 1,000 rows at a time. This implementation seems reasonable on the surface:

DECLARE @RC INT = 1;   WHILE (@RC > 0) BEGIN   DELETE TOP (1000) FactOnlineSales WHERE CustomerKey = 19036;   SET @RC = @@ROWCOUNT   END

Unfortunately, this method does poorly. It scans the clustered index in order to find 1,000 rows to delete. The first few batches complete quickly, but later batches gradually get slower as it takes longer and longer to scan the index to find rows to delete. By the time the script gets to the last batch, SQL Server has to delete rows near the very end of the clustered index and to find them, SQL Server has to scan the entire table.

In fact, this last batch performs 46,521 logical reads (just 100 fewer reads than the straight delete). And the entire script performed 1,486,285 logical reads in total. If concurrency is what I’m after, this script is actually worse than the simple DELETE statement.

Careful Batching

But I know something about the indexes on this table. I can make use of this knowledge by keeping track of my progress through the clustered index so that I can continue where I left off:

DECLARE @LargestKeyProcessed INT = -1, @NextBatchMax INT, @RC INT = 1;   WHILE (@RC > 0) BEGIN   SELECT TOP (1000) @NextBatchMax = OnlineSalesKey FROM FactOnlineSales WHERE OnlineSalesKey > @LargestKeyProcessed AND CustomerKey = 19036 ORDER BY OnlineSalesKey ASC;   DELETE FactOnlineSales WHERE CustomerKey = 19036 AND OnlineSalesKey > @LargestKeyProcessed AND OnlineSalesKey <= @NextBatchMax;   SET @RC = @@ROWCOUNT; SET @LargestKeyProcessed = @NextBatchMax;   END

The delete statements in this script performed 46,796 logical reads in total but no individual delete statement performed more than 6,363.

Graphically that looks like:Logical Reads Per Delete Statement

Logical Reads Per Delete

The careful batching method runs in roughly the same time as the straight delete statement but ensures that locks are not held for long.
The naive batching method runs with an order of  complexity (compared to the expected complexity of n) and can hold locks just as long as the straight delete statement.
This underlines the importance of testing for performance.

Last modified: November 25, 2024

Author

Comments

Write a Reply or Comment