Thursday, July 05, 2007

Calculating Medians in TSQL using 2005 Window Functions

Calculating Medians can be a common and expensive operation in many applications, inefficiently

The hard part about medians is that you need to take the middle value for an odd number of elements and the 2 middle values for an even number of elements.

The cool thing here is that when you can take these two sequences sorted in opposite directions the absolute difference between the two is smaller than or equal to 1 only for elements that are required for the median calculation.

In this case you’re using ‘memberid’ (or some other unique value) as the tiebreaker to guarantee determinism of the row number calculations. This is required for using this trick to figure out the median.

Once you grab only the values you need for the median calculation you can isolate them by grouping them by groupid and calculate the average for each group.

WITH RN AS
(
SELECT groupid, val,
ROW_NUMBER() OVER(PARTITION BY groupid ORDER BY val, memberid) AS rna,
ROW_NUMBER() OVER(PARTITION BY groupid ORDER BY val DESC, memberid DESC) AS rnd
FROM dbo.Groups
)

SELECT groupid, AVG(1.*val) AS median
FROM
RN
WHERE
ABS(rna - rnd) <= 1
GROUP
BY groupid;



Here’s the temp table I made to test this out

USE tempdb;
GO

IF OBJECT_ID('dbo.Groups') IS NOT NULL
DROP TABLE dbo.Groups;
GO

CREATE TABLE dbo.Groups
(
groupid VARCHAR(10) NOT NULL,
memberid INT NOT NULL,
string VARCHAR(10) NOT NULL,
val INT NOT NULL,
PRIMARY KEY (groupid, memberid)

);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('a', 3, 'stra1', 6);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('a', 9, 'stra2', 7);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('b', 2, 'strb1', 3);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('b', 4, 'strb2', 7);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('b', 5, 'strb3', 3);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('b', 9, 'strb4', 11);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('c', 3, 'strc1', 8);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('c', 7, 'strc2', 10);

INSERT INTO dbo.Groups(groupid, memberid, string, val)
VALUES('c', 9, 'strc3', 12);
GO

No comments: