Showing posts with label Cursors. Show all posts
Showing posts with label Cursors. Show all posts

Thursday, May 24, 2007

When to not not use Cursors Part 2

If you're a new-hire or intern of ours at Capital IQ, stop reading, never use cursors. Ever.

Well my last attempt to prove that cursors can sometime be useful was derailed by my co-worker Mike Forman

This time i have a better (read: valid) example

I have the following table:


Which shows a bunch of employees at an unamed company, and the number of bugs they've fixed each day. For each of these employees i want to determine the running total of bugs fixed, each day. I can use the results of this query to create a pretty graph of bugs fixed over time per developer.


Now, using a cursor based solution we scan each piece of data once, which garauntees we have O(n) performance, meaning that as the number of entries in our table increase we can still create the desired results in time proportional to the number of rows. A set based solution suffers from O(n^2) performance (assuming there is no index on empid, BugsFixed). Even if there was an index a scan will results in (developers * (days + days^2)/2) rows scaned...which basically simplifies to O(n^2).

Now since cursors involve some overhead, the cursor solution will lose to the set based solution for a small number of items, however due to the performance limitations described above, the cursor solution is the only scalable choice to solve this problem.

Now watch Mike beat this using the OVER Clause.....


Code to Create this table:

--CREATE TABLE
IF OBJECT_ID('tempdb.dbo.BugCounts') IS NOT NULL
DROP TABLE tempdb.dbo.BugCounts;
GO

CREATE TABLE tempdb.dbo.BugCounts
(
empid INT NOT NULL,
workDay smalldatetime NOT NULL,
bugsFixed INT NOT NULL,

PRIMARY KEY(empid, workDay)
);

--POPULATE TABLE create 10K data points
DECLARE
@newn AS INT, @newempid AS INT,
@newworkDay As INT, @newbugsFixed As INT

DECLARE C CURSOR FAST_FORWARD FOR
SELECT top 10000 n
FROM numbers_tbl
OPEN C
FETCH NEXT FROM C INTO @newn;
WHILE @@fetch_status = 0
BEGIN
INSERT
INTO tempdb.dbo.BugCounts
VALUES (@newn%25, dateadd(day, rand()*-300, GetDAte()), CAST(RAND()*100 AS INTEGER))
FETCH NEXT FROM C INTO @newn;
END
CLOSE
C;
DEALLOCATE C;



Cursor Solution:

DECLARE
@Result
TABLE
(empid INT, workDay SMALLDATETIME, bugsFixed INT, runbugsFixed INT);

DECLARE
@empid AS INT,@prvempid AS INT, @workDay SMALLDATETIME,
@bugsFixed AS INT, @runbugsFixed AS INT;


DECLARE C CURSOR FAST_FORWARD FOR
SELECT empid, workDay, bugsFixed
FROM tempdb.dbo.BugCounts
ORDER BY empid, workDay;

OPEN C

FETCH NEXT FROM C INTO @empid, @workDay, @bugsFixed;
SELECT @prvempid = @empid, @runbugsFixed = 0;

WHILE @@fetch_status = 0
BEGIN
IF
@empid <> @prvempid
SELECT @prvempid = @empid, @runbugsFixed = 0;

SET @runbugsFixed = @runbugsFixed + @bugsFixed;

INSERT INTO @Result
VALUES(@empid, @workDay, @bugsFixed, @runbugsFixed);

FETCH NEXT FROM C
INTO @empid, @workDay, @bugsFixed;
END

CLOSE
C;
DEALLOCATE C;
select *
from @result
order by empid, workday;

Monday, May 21, 2007

When to not not use Cursors

Cursors are usually bad to use; almost always there is a more efficient way to solve your problem using a set based solution. However there are some cases when cursors allow you to create a solution that is exponentially easier to implement than a set based solution. In general you should only resort to using cursors when a difficult set based solution becomes trivial when solved using cursors.

A classic example is that you have 5 classrooms of various sizes, and 3 classes of various sizes, and you want to assign a class to each classroom utilizing the minimum space required.


we'll simplify this problem and leave dates and times out of this issue :). Now this problem is very difficult to solve using a set based solution (it is possible, google itzik). However this is pretty trivial to solve using a cursor based solution, psuedo code below

  1. Declare 2 cursors, one of the list of classrooms (lets call it: RoomsCursor) sorted by increasing capacity (number of seats), and another cursor for the list of classes (ClassesCursor) sorted by increasing number of students.
  2. Now Fetch the first (smallest since you sorted ascending) class from the RoomsCursor
  3. While the fetch returned a class that needs a classroom
    1. Fetch the smallest unused classroom from RoomsCursor. if there is no available room, or the room is too small, continue and fetch the next smallest. Repeat fetching new rooms until you find a room that has fit or run out of rooms
    2. If you didnt run out of rooms (and the last fetch yielded a room and the number of seats in the room is smaller than the number of students in the current room:
      1. if you found a big enough room, schedule the class
      2. else, you ran out of rooms!
      3. fetch another Class
  4. Return the scheduled events
In this case we are scanning both the classrooms and the classes in order. We never back up the cursor. We schedule classes by matching classes to class rooms until we either run out of classses to find classrooms for or we run out of rooms to accomidate classes. The only time there is an error is when no solution exists.

This solution runs in O(N) time since we are simply stepping through the cursor, the worst case solution is that we look at each class or classroom once.

this will set up the problem

USE tempdb;
GO
IF OBJECT_ID('dbo.Classes') IS NOT NULL
DROP TABLE dbo.Classes;
GO
IF OBJECT_ID('dbo.Classrooms') IS NOT NULL
DROP TABLE dbo.Classrooms;
GO

CREATE TABLE dbo.Classrooms
(
classroomid VARCHAR(10) NOT NULL PRIMARY KEY,
classSize INT NOT NULL
);

INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('C001', 2000);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('B101', 1500);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('B102', 100);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R103', 40);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R104', 40);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('B201', 1000);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R202', 100);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R203', 50);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('B301', 600);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R302', 55);
INSERT INTO dbo.Classrooms(classroomid, classSize) VALUES('R303', 55);

CREATE TABLE dbo.Classes
(
classid INT NOT NULL PRIMARY KEY,
eventdesc VARCHAR(25) NOT NULL,
attendees INT NOT NULL
);

INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(1, 'Mikes Adv T-SQL Seminar', 193);
INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(2, 'CIQ .NET Pages', 51);
INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(3, 'How to Break the DB', 232);
INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(4, 'XAML ROCKS', 89);
INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(5, 'CIQ Security Issues', 897);
INSERT INTO dbo.Classes(classid, eventdesc, attendees)
VALUES(6, 'Data Modeling 101', 46);
GO

CREATE INDEX idx_att_eid_edesc
ON dbo.Classes(attendees, classid, eventdesc);
CREATE INDEX idx_classSize_rid
ON dbo.Classrooms(classSize, classroomid);
GO
Cursor Solution:

DECLARE
@classroomid AS VARCHAR(10), @classSize AS INT,
@classid AS INT, @attendees AS INT;

DECLARE @Result TABLE(classroomid VARCHAR(10), classid INT);

DECLARE CClassrooms CURSOR FAST_FORWARD FOR
SELECT classroomid, classSize FROM dbo.Classrooms
ORDER BY classSize, classroomid;
DECLARE CClasses CURSOR FAST_FORWARD FOR
SELECT classid, attendees FROM dbo.Classes
ORDER BY attendees, classid;

OPEN CClassrooms;
OPEN CClasses;

FETCH NEXT FROM CClasses INTO @classid, @attendees;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM CClassrooms INTO @classroomid, @classSize;

WHILE @@FETCH_STATUS = 0 AND @classSize < @attendees
FETCH NEXT FROM CClassrooms INTO @classroomid, @classSize;

IF @@FETCH_STATUS = 0
INSERT INTO @Result(classroomid, classid) VALUES(@classroomid, @classid);
ELSE
BEGIN
RAISERROR('Not enough Classrooms for Classes.', 16, 1);
BREAK;
END

FETCH NEXT FROM CClasses INTO @classid, @attendees;
END

CLOSE CClassrooms;
CLOSE CClasses;

DEALLOCATE CClassrooms;
DEALLOCATE CClasses;

SELECT classroomid, classid FROM @Result;
GO