Monday, October 27, 2008

Using SSIS to Transfer Data from a SSAS Cube

Scenario
A process needs to be built that will take the aggregate data from your SSAS cube and transfer that data as a source for a third party application. This process will need to be scheduled and have the capability to execute when requested.

Solution
Using SSIS to transfer the data is a great tool to meet the requirements above since you can build the process and schedule it as required as well as run the package when requested.

Since the tool has been established, the question now becomes how to get SSIS to connect to the cube and then work with the results.

The following document lays out a step by step approach to get the initial connection to the SSAS cube, sample MDX to query the cube and the cleanup steps within SSIS to perform against the data to have it usable going forward.

Data Connection
The data connector to setup would be an ADO.NET connection object to the SSAS database. Using OLEDB connector runs into repeated warning signs and I have experienced will actually not retrieve the data and instead either error out or just hang altogether.

However, for the same connection credentials, the ADO.NET connection will connect to the SSAS cube and allow for retrieval of the data.

Data Source
The DataReader data source object is used to pull the data from the SSAS cube using the ADO.NET connection. The property within the DataReader to note is the SQL Command on the Component Properties tab. The SQL Command will be a MDX query to retrieve the data.

A sample MDX query to retrieve data from two dimensions (product and region) across a measure group could be the following:
SELECT [MEASURES].[RETAIL SALES AMOUNT] ON COLUMNS,
NONEMPTY(CROSSJOIN ([DIM PRODUCT].[PRODUCT CATEGORY].CHILDREN, [DIM REGION].[REGION NAME].CHILDREN)) ON ROWS
FROM [SSAS CUBE]

As a result of this MDX query, when you look at the Metadata of the DataReader, you will see the following names for each of the columns being returned:
[[MEASURES].[RETAIL SALES AMOUNT]]
[[DIM PRODUCT].[PRODUCT CATEGORY].[PRODUCT CATEGORY].[MEMBER CAPTION]]
[[DIM REGION].[REGION NAME].[REGION NAME].[MEMBER CAPTION]]

Also, each column will be of type DT_NTEXT, which can be an issue when the third party tool is a destination table and you start receiving data type matching errors.

Clean Up Process
To solve the Data Type issue, two methods can be used.


One can use a Data Conversion transformation to change the data type. However, the data will not convert from DT_NTEXT to DT_STR or DT_NUMERIC as you would hope. Instead, convert the field to a DT_WSTR field for each of the fields and then you can add another conversion transformation to translate from DT_WSTR to DT_STR or DT_NUMERIC.

The problem with this method is that nine columns are created from the original three since each column is converted and a different output name is given to the conversion column. Unfortunately, the Data Conversion task will not replace the original column field.

The second method involves using the Copy Column to get an appropriately named field and then use the Derived Column task to perform the CAST in one step from DT_NTEXT to DT_WSTR to DT_STR, which would look similar to the following:
(DT_STR, 50, 1252)(DT_WSTR, 50)[Region Name] where Region Name is the Copy Column output name you gave in the previous Copy Column step.

The question then becomes, why is the Copy Column needed? Unfortunately, the Derived Column task will not take the [[DIM REGION].[REGION NAME].[REGION NAME].[MEMBER CAPTION]] output name as part of the expression, so using the Copy Column transformation allows for creating a new column with a name that the Derived Column transformation can handle. This still leads to doubling up the column list but you are saving creating one extra group of columns to work with.

Conclusion
At this point, the package has successfully extracted the data from the cube, assigned an appropriate name for each of the columns and setup a correct data type for each field in the result. The package can now process further with the data as stated within the requirement.

No comments: