Research on KDD Cup 2009 Based on BP Neural Network

D9715009 Hsu Fu-Yuan

1.      Introduction

KDD Cup is the well-known data mining competition of the annual ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining (KDD), the leading professional organization of data miners. Customer Relationship Management (CRM) is a key element of modern marketing strategies. This research will use the KDD Cup 2009 ( dataset which is a large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling).

This research need to deal with a very large database, including heterogeneous noisy data (numerical and categorical variables), and unbalanced class distributions. Time efficiency is often a crucial point. Therefore part of the KDD Cup 2009 will be time-constrained to test the ability of the participants to deliver solutions quickly.

2.      Related work

2.1 CRM Applications with data mining techniques

E.W.T. Ngai et al. (2009) review an academic database of literature between the period of 20002006 covering 24 journals and proposes a classification scheme to classify the articles. Nine hundred articles were identified and reviewed for their direct relevance to applying data mining techniques to CRM. Eighty-seven articles were subsequently selected, reviewed and classified. Each of the 87 selected papers was categorized on four CRM dimensions (Customer Identification, Customer Attraction, Customer Retention and Customer Development) and seven data mining functions (Association, Classification, Clustering, Forecasting, Regression, Sequence Discovery and Visualization).

S.-Y.Hung et al. (2006) summarizes some data mining functionalities, techniques, and applications in the CRM space, see Table1 below.

Table 1: Data Mining Functionalities, Techniques, and CRM Applications

2.2 Artificial neural network

Frank Rosenblatt's (1969) perceptron was one of the earliest neural networks models. The perceptron consists of the weights, the summation processor, and the adjustable threshold processor.

Figure 1 Artificial Neuron, PERCEPTRON

In recent year, people pay much attention to ANN because of its distinctive information disposal ability and its unique calculation ability. This makes ANN have broad application prospect. ANN has some features that other models don’t process. The merit of ANN model just compensates for the insufficient aspect of early-warning system exited now. The appearance of ANN theory and method offers the possibility of the early-warning system to get over the insufficient of traditional way.

2.3 Backpropagation network algorithm

Rumelhart et. al. (1968) introduced the back-propagation algorithm (abbreviated as BP network) which is a multilayer perceptron with a learning rule called generalised delta rule and a sigmoid transfer function. Until 1990, BPN had become one of the most popular and highly utilized neural network model. It is an example of a supervised neural network. It has very strong ability of simulation of non-linear system and it’s the artificial neural network that has been most widespread applied. The characteristic of BP algorithm is that the signal calculates forward and back-propagation of the error. It shows the characteristic of signal flow direction of algorithm in Figure.2

3.      Research methodology

3.1 Procedure

This research will divide into four steps: (1) Data cleaning & preprocessing, (2) Data reduction & transformation, (3) Classification, (4) Evaluation.

Data cleaning & preprocessing include sample quality control and assurance, normalization, peak detection, peak alignment…etc. Preprocessing is necessary for further analysis. Data reduction & transformation want to filter out data noise and select most discriminative peaks from samples. Several statistical methods are applied here.

After data reduction & transformation, classification algorithm will builds a model that discriminate normal samples from patient. Many machine learning algorithms are applied, such as decision tree, support vector machine, neural network…etc, This research will select neural network (back propagation network, BPN) as data mining techniques to build predictive models. Finally, we will give an evaluation for result.

3.2 Schedule

3.3 Research tool

We’ll use MATLAB 2008a with Neural Network Toolbox as modeling tool to design and simulate neural networks. Neural Network Toolbox extends MATLAB with tools for designing, implementing, visualizing, and simulating neural networks. Neural networks are invaluable for applications where formal analysis would be difficult or impossible, such as pattern recognition and nonlinear system identification and control. Neural Network Toolbox software provides comprehensive support for many proven network paradigms, as well as graphical user interfaces (GUIs) that enable you to design and manage your networks. The modular, open, and extensible design of the toolbox simplifies the creation of customized functions and networks.

Figure.3 The Neural Network Fitting Tool (top) and a performance plot.

3.4 Evaluation

The performances are evaluated according to the arithmetic mean of the AUC for the three tasks (churn, appetency. and up-selling). The prediction of each target variable is thought of as a separate classification problem. The results of classification, obtained by thresholding the prediction score, may be represented in a confusion matrix, where tp (true positive), fn (false negative), tn (true negative) and fp (false positive) represent the number of examples falling into each possible outcome:


Class +1

Class -1


Class +1



Class -1



The sensitivity (also called true positive rate or hit rate) and the specificity (true negative rate) as:
Sensitivity = tp/pos
Specificity = tn/neg
where pos=tp+fn is the total number of positive examples and neg=tn+fp the total number of negative examples.

The results will be evaluated with the so-called Area Under Curve (AUC). It corresponds to the area under the curve obtained by plotting sensitivity against specificity by varying a threshold on the prediction values to determine the classification result.

4. References

[1]       S.-Y. Hung, D.C. Yen, H.-Y. Wang, “Applying data mining to telecom churn management”, Journal of Expert Systems with Applications, vol.31, pp.515-524, 2006.

[2]       Rumelhart, D. E., McClelland, J. L., & The PDP Research Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. London: The MIT Press.

[3]       Rumelhart, D.E., Hintont, G.E., Williams R.J., 1986. Learning representations by back- propagating errors. Nature 323, 533 – 536.