Thursday, May 31, 2007

Generating a Numbers table

A numbers table can be really useful for lots of reasons, but i'm not going to go over that here, i'm just going to document my journey trying to create one.

Unfortunately, I'm a .NET programmer so i normally first think of a procedural way to do things. When i wanted to create a numbers table I started off with a looping solution, its not pretty, but it works, and you only have to run it once so who cares. so i started off with this

SET NOCOUNT ON
BEGIN TRAN
DECLARE @LoopCounter INT
SET @LoopCounter = 1
WHILE @LoopCounter <= 10000
BEGIN
INSERT numbers_tbl
VALUES(@LoopCounter)
SET @LoopCounter = @LoopCounter + 1
END

COMMIT WORK
GO


This took 29 seconds on our development DB server, not bad for a one time cost, but there must be a different way, i tried using a recursive query that used a table expression. This looked like a pretty sweet query

DECLARE @n AS BIGINT;
SET @n = 1000000;

WITH Nums AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Nums
WHERE n < @n

)
INSERT INTO numbers_tbl
SELECT n
FROM Nums OPTION(MAXRECURSION 0);
GO

>Table 'numbers_tbl'. Scan count 0, logical reads 1009505
>Table 'Worktable'. Scan count 2, logical reads 6000001


Turned out this took 45 seconds, it ended up being a bit slower than my procedural version. Now this approach is kinda lame, really all we need to do is generate the first 1000 rows and then do a cross join on itself to generate the million rows required. However these rows won't have the right numbers per se, but we can use the row number function to give each row a number yielding us a numbers table from 1 to a million.

DECLARE
@n AS BIGINT;
SET @n = 1000000;
WITH
Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM Base WHERE n <>

),
Expand AS
(
SELECT 1 AS c
FROM Base AS B1, Base AS B2

),
Nums AS
(
SELECT ROW_NUMBER() OVER(ORDER BY c) AS n
FROM Expand

)
INSERT INTO numbers_tbl
SELECT n FROM Nums
WHERE n <= @n
OPTION(MAXRECURSION 0);


Ah success this took 18 seconds, 9K reads opposed to 7M, i'm guessing as the number of entries in the numbers table goes up, this query will scale much better

This is good, but its not perfect. (Not perfect enough for Voodoo-Itzik-black-sql-magic arts practitioners). The problem with the solution before was that we still are stuck doing recursive selects for the first 1000 rows, why start off with the square root? why not add an additional crosss join, allowing us to start off at the sqrt(sqrt(@n)). In fact lets just start off with 2, and will continue to cross join these together (increasing by an exponential factor of 2 on each join). With 5 joins we can generate 4.2B rows, and we use the minimal number of initial selects.

In this case, We start with a CTE that only has 2 rows, and multiple it by the number of rows with each following CTE by cross-joining two instances of the previous CTE. This results in 2^2^N rows, we can use the same row number trick to generate the actually numbers to insert into our table.

DECLARE @n AS BIGINT;
SET @n = 1000000;
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A, L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A, L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A, L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A, L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5)

INSERT INTO numbers_tbl
SELECT n FROM Nums
WHERE n <= @n;
GO



This query ran in only 2 seconds, if you want more than 4B rows you can add an additional level (L6) for 2^64 rows. Chances are no machine can even store that much so i've left off L6.

Now you can have the largest numbers table at your company.

No comments: