Data Mining Using SAS® Enterprise Miner Data Sets

Introduction

·

Home Equity Loan Data Set: This supervised data set is used to fit many of the modeling nodes in which the models predicts the interval-valued variable, that is, DEBTINC, the ratio of debt-to-income and the binary-valued variable, BAD, bad creditors from the home equity loan data set. The data set is located in the SAMPSIO directory within the folder in which your SAS software is installed.   

·

2004 Major League Baseball Hitters: This unsupervised data set is used to generate the principal components model. The SAS data set consists of various major league baseball hitters during the 2004 season. The data set is based on all baseball hitters in the major leagues that had at least 150 at bats during the 2004 baseball season.

Supervised Training Data Sets

·

HMEQ: The data set is used throughout my book in explaining many of the supervised training techniques such as traditional linear regression modeling, decision tree modeling, neural network modeling, nearest neighbor modeling, and two-stage modeling that is performed in Enterprise Miner. In other words, the database was used in many of the statistical modeling designs to determine if the applicant can be approved for a home equity loan or determine the probability of clients defaulting on their own home loan. The data consists of applicants granted credit for a certain home equity loan.

·

Lead Production: The data set consists of monthly totals from a five-year period of U.S. lead production measured in tons, from January 1986 to September 1992. The data was used in the Time Series node  to transform the data in preparation for time series modeling. The same data set was used in the User Defined node to generate monthly forecasting estimates of the U.S. lead production over time.

Unsupervised Training Data Sets

·

MLB_2004: The 2004 MLB data set was used for principal components analysis in the Principal Components/Dmneural node.

Back to Page