By Joe Toscano, Senior Software Engineer
During the last decade large volumes of data have been accumulated and stored in databases. The result of which have made many organizations data-rich but knowledge-poor. You may already be in living in this scenario. Perhaps you currently are generating insightful reports using tools such as Reporting Services and/or PerformancePoint Server. So, where would SQL Server 2005 Data Mining fit in and what value could it add? The goal of this blog entry is to help answer those questions and then to provide a direction for those who like what they hear and wish to dig deeper.
What does Data Mining Promise?
Report viewers or decision makers can develop hypothesis by drilling through the data and digging for cause-and-effect relationships. Wouldn’t be nice if you had a tool that could determine relationships for you? How about predicting future events and spotting bad data and allowing for the analysis of data in ways that have never been possible? This can be accomplished through the use of data mining. Data Mining can help us determine what products may sell together. What sales were the results of a marketing campaign? What are the odds that certain products may sell? What are the odds that customers may go elsewhere (churn) based on various circumstances? Simply put, you can gain additional business insight that may help you make crucial business decisions and perhaps gain a competitive advantage. Data mining contains technologies that may help you in your ability to retain existing customers and acquire new ones by turning your wealth of data into actionable information.
There’s an excellent start-point white paper that introduces SQL Server 2005 Data Mining and provides a great overview of what it can deliver. This paper can be found at the following link:
http://msdn.microsoft.com/en-us/library/ms345131(SQL.90).aspx
Would like to know more?
If you like what you’ve read so far, there are numerous tutorials that walk you through the initial SQL Server Analysis Services setup and then walk you through the creation of a targeting email campaign scenario, the building a forecasting scenario, the building a Market Basket Scenario and finally the building a Sequence Clustering Scenario. You will use the Business Intelligence Development Studio (BIDS) to create a new data mining solution. These excellent tutorials use the AdventureWorksDW data warehouse and can be found at the following MSND Site:
http://msdn.microsoft.com/en-us/library/ms167488(SQL.90).aspx
These represent a great second step after you digest the introductory white paper referenced above. Just one thing to keep in mind – in order to setup and run through the tutorials you will have to install the SQL Server 2005 Datamining Viewer Controls and the Microsoft SQL Server 2005 Analysis Services 9.0 OLE DB Provider. Both of these are free downloads and are present in the SQL Server 2005 Feature Pack.
A quick look at one tutorial
If you are still not sure you are ready to dive into the tutorials we can take a quick peek at portions of the first and sections of the tutorial. Remember, the finished product is a Business Intelligence Development Studio solution. When you create an Analysis Services project AND have satisfied the Data Mining requirements I mentioned above, you will notice a Mining Structures folder as seen under Solution Explorer below:
One algorithm we are exposed to in the tutorial is The Microsoft Decision Tree. This algorithm calculates the odds of an outcome based on attribute values. For example – what are the chances that a person will purchase a bike based on the number of cars owned. As you would expect the person who is most likely to purchase a bike currently has 0 cars as seen below:
Below is an overview of several other Microsoft Data Mining Algorithms:
Decision Trees
This algorithm calculates the odds of an outcome based on attribute values. For example – what are the chances that a high schooler will attend college given parental encouragement, their gender, their parents income level, and so on.
Naïve Bayes
The Naïve Bayes algorithm is used to clearly show the differences in a particular variable for various data elements. For example, let’s assume we offer a course on Data Mining and wish to track the course evaluations. Which variable or question asked in the evaluation form can most effectively be used as a predictor of future courseware purchasing? Personally, I’ve always suspected that the ‘Would you recommend this course to a friend or co-worker’ question to be a possible indicator of return visit to the classroom. This algorithm can be used to validate my suspicions and excels at showing the differences between certain groups students who CHURN (jump ship to a competitor) and those who don’t.
Sequence Clustering
The clustering algorithm is used to group or cluster data based on a sequence of prior events. For example – users of a web application can often follow a variety of paths through a site. This algorithm can be used to group customers based on their sequence of pages through the site to help determine if some paths are more profitable than others. This is an algorithm that many other data mining vendors cannot deliver.
More information you can be found at the following sites:
No comments:
Post a Comment