Wednesday, October 29, 2008

How to gain storage and then performance from data type

During the design phase of a database the choice of the proper data type can have a sizable impact on data base size then on overall performance.

1- Datetime

You may want to choose smalldatetime over datetime for the following reasons:
smalldatetime data type uses half the space (4 bytes) of datetime data type(8 bytes)
datetime stores up to the milliseconds and smalldatetime stores up to the minute,
Use smalldatetime if you need to store date from January 1, 1900 to June 6, 2079; datetime stores date from January 1, 1753, to December 31, 9999

2- Integer

Most of the time developer chose bigint over int as data type for identity key on transaction detail table.

bigint is probably the single most misused data type.
Here are some reasons why to use int over bigint:
Int uses half the space (4 bytes) of bigint (8 bytes); in a 100 million row table you could save about 400MB worth of space by choosing int instead of bigint.
Int can store values from -2147483648 through 2.147.483.647 and bigint can store values from -9223372036854775808 through 9223372036854775807; a huge number.

So it’s clear you should go for bigint only if you need to store values over 2.1 billion; most systems will not grow to this number.

3- numeric/money

It is highly advised to choose money type for currency instead of numeric or decimal. The default precision (number to the left of the decimal point) for both numeric and decimal is 18 not far to the maximum of 28 digits; the default scale (number to the right of the decimal point) is 0.

More than often developers leave untouched the default precision value of 18, higher that the maximum of 15 digits for the money data type. Money data type size is 8 bytes with a precision of 15 and scale of 4 digits while decimal and numeric data type use from 2 to 17 bytes.

4- Character, Text

It is recommended to use char data type over varchar data type in very precise circumstances: data in column must have the same size and contain no null values. Char data type size on disk is fixed, the space for null values remain allocated and smaller string occupied the defined size. varchar data type size on disk is variable and string data of n characters is stored in n bytes, giving a better handling of data size on disk.

Unicode data types nchar and nvarchar must be chosen carefully as they use twice as much space as the non-Unicode data type char and varchar. While char and varchar size is up to 8000 bytes nchar and nvarchar store only up to 4000 bytes. This means you will potentially store the same data on twice the space up to half the capacity.

It is highly advised to use Unicode data type only if Unicode data will be stored in the system. Please note that the proper database collation must be selected to match the Unicode set to avoid string manipulation mal function that may occur.


How is all this space saving impact the overall performance?

A good choice of data type will allow data to be stored in smaller space in the database, this will account for smaller result set and better performance in queries.

No comments: