< Processing procedures >

l          Variables reduction :

1. According to the number of nonzero instances, choose 33 numerical variables and 18 categorical variables.

2. Because Churn, Appetency, and Up-selling are three separate binary classification problems, I distribute 33 numerical variables and 18 categorical variables into three groups in terms of arithmetic mean and standard deviation. Therefore, attributes which are relatively high and good for predicting two or more propensities of customers can avoid to predict only one of them.

3. Finally, there are 17 variables , including 11 numerical variables and 6 categorical variables, to predict Churn, Appetency, and Up-selling respectively.

l          Instances reduction

1. Choose 10000 instances.

< My evaluation of training set and testing set with confusion matrix and AUC >

1. Appetency

 === Evaluation on training set === === Confusion Matrix ===   a    b   <-- classified as 9194  629 |    a = -1 147   30 |    b = 1

 === Evaluation on test set === === Detailed Accuracy By Class ===                  TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class                  0.938     0.844      0.984     0.938     0.96       0.642    -1                  0.156     0.062      0.044     0.156     0.068      0.642    1 Weighted Avg.    0.924     0.83       0.967     0.924     0.945      0.642   === Confusion Matrix ===        a     b   <-- classified as  46064  3046 |     a = -1    751   139 |     b = 1  2.      Churn

 === Evaluation on training set === === Confusion Matrix ===       a    b   <-- classified as  8756  494 |    a = -1   699   51 |    b = 1

 === Evaluation on test set === === Detailed Accuracy By Class ===                  TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class                  0.947     0.943      0.927     0.947     0.937      0.555    -1                  0.057     0.053      0.079     0.057     0.066      0.555    1 Weighted Avg.    0.882     0.878      0.865     0.882     0.873      0.555   === Confusion Matrix ===        a     b   <-- classified as  43892  2436 |     a = -1   3464   208 |     b = 1  3. Upselling

 === Evaluation on training set === === Confusion Matrix ===       a    b   <-- classified as  5462 3827 |    a = -1   188  523 |    b = 1

 === Evaluation on test set === === Detailed Accuracy By Class ===                  TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class                  0.591     0.266      0.965     0.591     0.733      0.692    -1                  0.734     0.409      0.125     0.734     0.213      0.692    1 Weighted Avg.    0.601     0.276      0.904     0.601     0.695      0.692   === Confusion Matrix ===        a     b   <-- classified as  27364 18954 |     a = -1    978  2704 |     b = 1  