Neural network homework 2- KDD Cup 2009 CRM Prediction

InstructorDr. Hahn-Ming

Wei,Sin-Jong  D9704008

 

                        

Abstract

KDD Cup 2009 focuses on predicting customer relationship management for churn, appetency, up-selling. there are 15000 variables and 50000 instances with a very large database which are from French Telecom Company Orange .the datasets are including numerical and categorical variables, and unbalanced class distributions. It consisted of three accuracy prediction tasks with time efficiency.

 

 

Method

   I have to present my prediction method for the tasks as following listed.

A. System

1. Implemented by Matlab.

2. Network: Back-propagation

3. Layer: 1

4. Epchos: 10000

5. learning rate:0.8

6. Momentum: 0.5

 

B. Processing

1. Feature selection- Remove missing values, outlier and noise in order to remain less than 5000 features.

2. Selected datasets by random latest two times.

3. Normalize for reduce the repeating data

4. Running the Back-propagation neural network for data until a optimal parameter returns to system.

5. The predicted number of ratings for each is based on time series analysis, the test datasets also predicted with the models.

6. To use 10-fold cross-validation to find the best model with the accuracy rate between them.

 

流程圖: 結束點: Datasets

 

 

 

 


                                                         no

流程圖: 決策: Performance evaluation         

 

 

 

 


yes

 

流程圖: 結束點: Analyzing 

 

 

 


System framework

 

 

Analyzing the Prediction Result

The datasets have split into 10 files, first work is to feature selection, I need to know the instances of types the variable, and the frequencies if approximate the data feature is not present,then discard, like missing values, outlier and noise. Also discard the amount if over 40000 instances zero values. but, they are still a big dataset as 5000 variables, so I have random to selected some variables of them for normalize with process matrices by mapping row minimum and maximum values to [-1 1].It has been take too much time on pre-process. And it’s unlucky as fail result. Spread my processing as below

 

Prediction by using the similarity matrix, the neural network parameters design as following,
 

 

NodeNum = 100;                   % 隱藏層節點數 
TypeNum = 3;                    % 輸出維數
 

P1= load('C:\Program Files\MATLAB\R2008a\work\orange_large_train');

T1 = load('C:\Program Files\MATLAB\R2008a\work\orange_large_train_appetency.labels');

P2 = load('C:\Program Files\MATLAB\R2008a\work\orange_large_test');

 
TF1 = 'tansig';TF2 = 'purelin'; % 判別函數
net = newff(minmax(PN1),[NodeNum TypeNum],{TF1 TF2});

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


net.trainFcn = 'traingd';  % 梯度下降算法
net.trainFcn = 'traingdm'; % 動量梯度下降算法
net.trainParam.show = 1;            % 訓練顯示間隔
net.trainParam.lr = 0.8;            % 學習速率 - traingd,traingdm
net.trainParam.mc = 0.5;           % 動量項系數 - traingdm,traingdx
net.trainParam.mem_reduc = 10;      % 分塊計算Hessian矩陣(僅對Levenberg-Marquardt算法有效)
net.trainParam.epochs = 10000;       % 最大訓練次數
net.trainParam.goal = 1e-8;         % 最小均方誤差
net.trainParam.min_grad = 1e-20;    % 最小梯度
net.trainParam.time = inf;          % 最大訓練時間
net = train(net,PN1,T1);             % 訓練
%---------------------------------------------------
% Testing
 
Y1 = sim(net,PN1);             % 訓練樣本實際輸出
Y2 = sim(net,PN2);             % 測試樣本實際輸出
 
Y1 = full(compet(Y1));         % 競爭輸出
Y2 = full(compet(Y2));     
%---------------------------------------------------

Analyzing Result

 
Result = ~sum(abs(T1-Y1))                  % 正確分類顯示為1
Percent1 = sum(Result)/length(Result)      % 訓練樣本正確分類率
 
Result = ~sum(abs(T2-Y2))                  % 正確分類顯示為1
 

 

 

Conclusion

The final result is not running out as failed. The problem might is I did not select a good parameters for the moment. Either the data is too large to rating of accuracy prediction, and possibly remove important vectors while feature selection. I guess the main problem is I have been spent too much time on pre-process and training with convenient training data to use on lower the system resources. Another is the learning time should take shorter. Last, I thought it’s a hard work to me without lucky.

 

 

 

 

References

1. Peelen, Ed. Customer Relationship Management, Pearson Education Limited, 2005

2. 藤田憲一,Customer Relationship Management,先鋒, 2001

3. Anton, J. Customer Relationship Management, Prentice-Hall,Inc. 1996

4. Brown, Stanley A. Customer Relationship Management, John Willey & Sons Canada, Ltd. 2000

5. Dyche,Jill. 客戶關係管理手冊, Pearson Education Taiwan Ltd., 2003

6. Customer Relationship Management insightΠ,遠擎管理顧問公司, 2001

7. 顧客關係可以再靠近一點,天下編輯,2004

8. 葉怡成,類神經網路模式應用與實作,儒林圖書有限公司,2002