Evan Reiser's Blog: When to not not use Cursors Part 2

Thursday, May 24, 2007

When to not not use Cursors Part 2

If you're a new-hire or intern of ours at Capital IQ, stop reading, never use cursors. Ever.

Well my last attempt to prove that cursors can sometime be useful was derailed by my co-worker Mike Forman

This time i have a better (read: valid) example

I have the following table:

Which shows a bunch of employees at an unamed company, and the number of bugs they've fixed each day. For each of these employees i want to determine the running total of bugs fixed, each day. I can use the results of this query to create a pretty graph of bugs fixed over time per developer.

Now, using a cursor based solution we scan each piece of data once, which garauntees we have O(n) performance, meaning that as the number of entries in our table increase we can still create the desired results in time proportional to the number of rows. A set based solution suffers from O(n^2) performance (assuming there is no index on empid, BugsFixed). Even if there was an index a scan will results in (developers * (days + days^2)/2) rows scaned...which basically simplifies to O(n^2).

Now since cursors involve some overhead, the cursor solution will lose to the set based solution for a small number of items, however due to the performance limitations described above, the cursor solution is the only scalable choice to solve this problem.

Now watch Mike beat this using the OVER Clause.....

Code to Create this table:

--CREATE TABLE
IF OBJECT_ID('tempdb.dbo.BugCounts') IS NOT NULL
DROP TABLE tempdb.dbo.BugCounts;
GO

CREATE TABLE tempdb.dbo.BugCounts
(
empid INT NOT NULL,
workDay smalldatetime NOT NULL,
bugsFixed INT NOT NULL,
PRIMARY KEY(empid, workDay)
);

--POPULATE TABLE create 10K data points
DECLARE
@newn AS INT, @newempid AS INT,
@newworkDay As INT, @newbugsFixed As INT
DECLARE C CURSOR FAST_FORWARD FOR
SELECT top 10000 n
FROM numbers_tbl
OPEN C
FETCH NEXT FROM C INTO @newn;
WHILE @@fetch_status = 0
BEGIN
INSERT INTO tempdb.dbo.BugCounts
VALUES (@newn%25, dateadd(day, rand()*-300, GetDAte()), CAST(RAND()*100 AS INTEGER))
FETCH NEXT FROM C INTO @newn;
END
CLOSE C;
DEALLOCATE C;

Cursor Solution:

DECLARE @Result
TABLE(empid INT, workDay SMALLDATETIME, bugsFixed INT, runbugsFixed INT);

DECLARE
@empid AS INT,@prvempid AS INT, @workDay SMALLDATETIME,
@bugsFixed AS INT, @runbugsFixed AS INT;

DECLARE C CURSOR FAST_FORWARD FOR
SELECT empid, workDay, bugsFixed
FROM tempdb.dbo.BugCounts
ORDER BY empid, workDay;

OPEN C

FETCH NEXT FROM C INTO @empid, @workDay, @bugsFixed;
SELECT @prvempid = @empid, @runbugsFixed = 0;

WHILE @@fetch_status = 0
BEGIN
IF @empid <> @prvempid
SELECT @prvempid = @empid, @runbugsFixed = 0;

SET @runbugsFixed = @runbugsFixed + @bugsFixed;

INSERT INTO @Result
VALUES(@empid, @workDay, @bugsFixed, @runbugsFixed);

FETCH NEXT FROM C
INTO @empid, @workDay, @bugsFixed;
END

CLOSE C;
DEALLOCATE C;
select *
from @result
order by empid, workday;

1 comment:

Evan Reiser said...: Ok well i guess cursors are useless. Mike beat me again with a set based solution:

with Ordered as
(
select
empid
, workDay
, bugsFixed
, rn = row_number() over (partition by empId order by workDay)
from tempdb.dbo.BugCounts
)
select
b.empId
, b.workDay
, bugsFixed = max(case when a.rn = b.rn then a.bugsFixed end)
, runbugsFixed = sum(a.bugsFixed)
from Ordered a
inner join Ordered b
on a.empid = b.empid
and a.rn <= b.rn
group by b.empId, b.workDay
order by b.empId, b.workDay;; 6:48 PM

Post a Comment

Disclaimer

Evan Reiser's Blog on Technology, Database Systems, Computer Systems, AJAX, ASP.NET, SQL Server Financial Advice Disclaimer: I, Evan Reiser provide general information, not individually targeted personalised advice. Advice from this site does not take into account any investor’s particular investment objectives, financial situation and personal needs. Investors should assess for themselves whether the advice is appropriate to their individual investment objectives, financial situation and particular needs before making any investment decision on the basis of such general advice. Investors can make their own assessment of the advice or seek the assistance of a professional adviser. Investing entails some degree of risk. Investors should inform themselves of the risks involved before engaging in any investment. I, Evan Reiser, endeavor to ensure accuracy and reliability of the information provided but does not accept any liability whatsoever, whether in tort or contract or otherwise, for any loss or damage arising from the use of this site's data and systems. Past performance is not necessarily indicative of future results. Information and advice provided here is not an offer to buy or sell securities. Before commencing an investment program I recommend you seek independent professional legal, tax and investment advice as to whether it is suitable for your particular needs and circumstances. Failure to seek detailed professional personally tailored advice prior to acting could lead to you acting contrary to your own best interests and could lead to losses of capital. I, Evan Reiser, expressly deny any liability to you for loss in any manner or form now or at any time in the future. You should be aware that some investments will lose money. Conscious investment selections are on the basis of probabilities - that they are proven profitable at some point in time in the future more often than not. Any action based on this information should observe standard investment and trading rules such as diversification, stop losses and matching to personal risk tolerances. Investing strategies and actions discussed in our publications may not be suitable for you. You must make your own investment decisions in light of your own circumstances.

Evan Reiser's Blog

Thursday, May 24, 2007

When to not not use Cursors Part 2

1 comment:

Deal of the Day

About Me

Evan's Links

Popular Posts

Contact

Blog Archive

Relevant Links

Disclaimer