SAS Score Code from Data Mining Using SAS® Enterprise Miner

 

Explore Nodes

·

Link Analysis Node (Nodes): The scoring code that generates the clustering assignments from the frequency counts of the displayed nodes.

·

Link Analysis Node (Links): The scoring code that generates the clustering assignments from the frequency counts of the displayed links.

·

Link Analysis Node (Nodes and Links): The scoring code that generates the clustering assignments from the frequency counts of the displayed nodes and links.

Modify Nodes

·

Replacement Node: The scoring code that imputes missing values and replaces values from the HMEQ data set by  fitting the neural network and logistic regression models under comparison from the Assessment node.

·

Clustering Node: The scoring code that generates the clustering assignments based on the k-means clustering procedure from the 2004 major league baseball  hitters.

·

SOM/Kohonen Node (SOM): The scoring code that generates the clustering assignments from the Kohonen SOM clustering procedure from the 2004 major league baseball  hitters.

·

SOM/Kohonen Node (VQ): The scoring code that generates the clustering assignments from the Kohonen VQ clustering procedure from the 2004 major league baseball  hitters.

·

Variable Selection Node (R-Square): The scoring code that generates the variable selection routine in selecting the best combination of input variables to the model based on the best r-square model selection criteria by fitting the interval-valued target variable, DEBTINC, from the HMEQ data set.

·

Variable Selection Node (Chi-Sq): The scoring code that generates the variable selection routine in selecting the best combination of input variables to the model based on the best chi-square model selection criteria by fitting the binary-valued target variable, BAD, from the HMEQ data set.

Model Nodes

·

Regression Node (Least-Squares): The scoring code that generates the multiple linear regression estimates from the HMEQ data set. The score code can be used in calculating new prediction estimates by specifying entirely different values to the input variables in the multiple linear regression model. The scored code will first identify any missing values in each one of the input variables in the multiple linear regression model. If there are any missing values in any one of the input variables, then the target variable is estimated by its own average value. The scored code then displays the least-squares model with each input variable in the model and the associated parameter estimates with the intercept term is then added to the model to calculate the predicted values. The residual values are then calculated by calculating the difference between the target values and the fitted values.

·

Regression Node (Logistic): The scoring code that generates the logistic regression estimates from the HMEQ data set that is one of the models under comparison from the Assessment node. 

·

Tree Node: The scoring code that generates the decision tree modeling estimates  by fitting the binary-valued target variable, i.e. bad creditors, from the HMEQ data set. The SAS scoring code displays the recursive splits of the series of if-then partitioning rules of the input variables with the listed target proportions that are performed in creating the decision tree that are created from the range of values of the input variables in the model to predict the binary-valued target variable. Simply copy the SAS program code into a separate SAS program to calculate entirely different classification estimates by fitting the decision tree model to new data and a new set of input values. The decision tree model  is one of the models under comparison from the Assessment node. 

·

Neural Networks Node: The scoring code that generates the neural networks estimates from the HMEQ data set. The scoring code will set the target variable will be estimated by its own mean if there are any missing values in any one of the input variables in the model. The score code will then standardizes each input variable in the model. The code then generates the linear combination of input layer weight estimates with the previously computed standardized input variables for each hidden layer unit. The input layer bias term is added to each hidden layer. The hidden layer weight estimates are applied to the linear combination of weight estimates and standardized input variables with the activation function applied to each hidden layer unit. The hidden layer units are multiplied by the hidden layer weight estimates that are added together along with the hidden layer bias term to generate the final neural network estimates. The neural network  model  is one of the models under comparison from the Assessment node.

·

Princomp/Dmneural Node (Dmneural): The scoring code that generates the dmneural network modeling estimates from the HMEQ data set. The scoring code will first display the separate dummy variables that are created for each class level from the categorical-valued input variables in the model. This is followed by imputing missing values from the interval-valued input variables in the model. The interval-valued input variables in the model are then standardized since that input variables display a wide range of values. The code will then display the principal component scores for each input variable in the model at each stage of the iterative model. The code then calculates the fitted values from the squared activation function that is selected at each stage of the model. The predicted values from the additive nonlinear model are calculated by adding the fitted values from the first stage and the residual values in the following stages to the iterative model.

·

Princomp/Dmneural Node (Principal Components): The scoring code that generates the principal components estimates from the 2004 major league baseball  hitters. The scoring code will display up to two separate principal components that were selected from the node and the corresponding scree plots that are generated from the node. 

·

User-Defined Node (PROC GENMOD): The scoring code that generates the user-defined modeling estimates that are generated from the PROC GENMOD procedure by fitted the logistic regression model in predicting the binary-valued target variable bad clients, BAD, from the HMEQ data set.

·

User-Defined Node (time series): The scoring code that generates the user-defined modeling estimates from the PROC ARIMA  procedure by fitted the time series model in predicting the lead production over time.

·

Ensemble Node (Combined): The scoring code that generates the ensemble modeling estimates by combining the previous logistic regression. neural network, and decision tree models. The ensemble model is one of the models under comparison from the Assessment node. 

·

Ensemble Node (Combined): The scoring code that generates the ensemble modeling estimates by combining the modeling estimates from the multiple linear regression model and the neural network model from the HMEQ data set. In other words, the code will display the corresponding scoring code from the multiple linear regression and neural network models. The fitted values to the ensemble model are calculated by taking the average of the two separate fitted values. 

·

Ensemble Node (Stratified): The scoring code that generates the stratified modeling technique that combines the multiple linear regression modeling estimates by separating or partitioning the training data set that you want to fit. In other words, separate models are created for each level of segmentation or partitioning of the data that you want to fit.

·

Ensemble Node (Bagging): The scoring code that generates the bagging estimates that is analogous to bootstrapping where separate prediction estimates are created by resampling the data that you want to fit by combining the prediction estimates for  the multiple linear regression model. 

·

Ensemble Node (Boosting): The scoring code that generates the boosting model from the logistic regression model by fitting the categorical-valued target variable where the observations are weighted. In other words, the observations are modified by increasing the weight estimates for each observation that have been misclassified from the previous fit.

·

Memory-Based Reasoning Node: The scoring code that generates the nearest neighbor modeling estimates from the HMEQ data set. The scoring code will display the PROC PMBR procedure with the listed option setting like the smoothing constant to the nearest neighbor model.

·

Two-Stage Model Node: The scoring code that generates the two-stage modeling estimates by fitting the decision tree classification model, then fitting the subsequent multiple linear regression model from the HMEQ data set.

Back to Page

 

© copyright www.sasenterpriseminer.com - SAS data mining score code.