IT Weblog: Tips to get optimum performance from large dataset manipulation

Here is a non exhaustive list of ways to get the best performance from large dataset DML.
Depending on system configuration and design, the effect of these recommendations may vary.

1-Do not create more than 16 exact match joins.

2- make sure indexes are created on exact match columns only if necessary for reporting

3- Avoid join on large views use temporary tables instead; Proceed by loading data from the view into a temp table then use the temp table in the join instead.

4- Avoid as much as possible creating indexes on temporary tables as it will impact insert performance

5- Avoid grouping as much as possible, if necessary make sure that the non- aggregated columns actually regroup data.

Most of the times as developement advances more columns are added onto the query. At some point the columns in the 'group by' clause may become candidate key to the dataset causing grouping to be ineffective; on a large query the effect on performance can be very important.

--The classics

6- Avoid table scans

7- Avoid nested Sql queries; join are more effective.

8- Avoid cursors. An effective use of Case statement can replace the use of cursors and deliver much better performance.
This sample query is for illustration only:

SELECT VENDORID, PRODUCTID,

SUM(CASE WHEN MONTH(SALESDATE)=1
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS JAN_SALES,

SUM(CASE WHEN MONTH(SALESDATE)=2
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS FEB_SALES,

SUM(CASE WHEN MONTH(SALESDATE)=3
THEN SUM(ISNULL(SALESAMOUNT,0))
ELSE 0 END) AS MAR_SALES

--...

FROM SALES S (NOLOCK)
JOIN PRODUCT P (NOLOCK) ON P.PRODUCTID=S.PRODUCTID
JOIN VENDOR V (NOLOCK) ON V.VENDORID=P.VENDORID
WHERE SALESDATE BETWEEN @PERIODSTART AND @PERIODEND
GROUP BY V.VENDORID,P.PRODUCTID

IT Weblog

Saturday, November 1, 2008

Tips to get optimum performance from large dataset manipulation

No comments:

Followers

Blog Archive