Preface:

  Customer Relationship Management (CRM) is a key element of modern marketing strategies.KDD CUP 2009 provides large-scale marketing databa from Fsesrance Telecom ORANGE company. Our purpose is able to predict customer's preferences. We are also able to provide then suggestion about surrounding Products and make them more profitably.We will use neural network analysis a large database.

 

So far of Neural network still have a new theory of architecture which has been raised constantly. Because the computer increase in computing capacity, it makes neural networks more powerful. Neural network has input layer, hidden layer and output layer.We can find hidden layer's relations from the expected input and output data. We use an enormous amount of data to calculus and a series of studing.

 

 It makes results of neural network will be more and more precise. The different algorithms will also affect  studys and results. We will able to create predictable platform from Neural Network that we prior had learned.The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application.

 

Introduced software:

SPSS 17 (Statistical Package for Social Science 17)

    SPSS 17 is a modular, tightly integrated, full-featured product line for the analytical process—planning, data collecting, data access, data management and preparation, analysis, reporting, and deployment.

 

    Using SPSS with a combination of add-on modules and stand-alone software that work seamlessly with the base product enhances the capabilities of SPSS. The intuitive interface makes it easy to use—yet it gives you all of the data management, statistics, and reporting methods you need to do even your toughest analysis.

 

 

Research steps:

1.     Datas handling

MISSING VALUES

we use SPSS to statistical variable of missing values. The file include 15,000 variables and 10,000 datas. However, in order to increase the degree of information available, I will deletion variables of missing values on more than 90%.The rest may do average series and it  should be able to reach the effect of continuity .This data has trained that will be not produce singular points of problem

Part of the Interception

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    REGRESSION ANALYZE

The main use of regression analysis to enhance linearity . If the linearity is higher btween variables and expectations, The neural network training is a major advantage of higher linearity  between variables and target. It make predictive value to more accurate. Relatively poor linearity of the variables may be done to remove. Following is the introduction of multiple regression.

 * x1 + b * x2 + c * x3 +…m * xn + b

    Multiple regression is the instrument of choice when the researcher believes several independent variables interact to predict the value of a dependent variable. The test measures the degree to which each of the independent variables contributes to the prediction. 

Multiple regression assumes:

·        the independent variables are not highly correlated with each other

·        the independent variables predict the dependent variable, but the reverse is not true; the dependent variable cannot predict the values of the independent variables

 

    Multiple regression is normally implemented using one of two techniques. The first technique, called forward stepwise regression, starts by measuring the degree to which one independent variable (usually the one the researcher believes is the strongest predictor) correlates to the dependent variable.

 

    One by one, additional independent variables are added to the equation, and the degree (if any) to which each predict the dependent variable is noted.Backwards stepwise regression, a related approach, begins with an examination of the combined effect of all of the independent variables on the dependent variable.

 

    One by one, independent variables (usually starting with the weakest predictor) are removed, and a new analysis is performed. The results provide coefficients for each independent variable, signifying the degree to which each one, when combined with the others, contributes to predicting the dependent variable.

 

 

 

2.     Neural network

I use the neural network which is built-in MATLAB to simulation,at the beginning we got five files,each file is approximately 10000 or 15000 variables, we do individually data processing by MATLAB, and then five files is simulated individually, it means that we use five files to simulate independent neural networks, do so because the memory is not big enough,so we can not do neural network to simulate.

 

3. Large data processing

Matlab has variables as a result of restrictions, deletion of variables to do so here, missing values as high as 85% or more, or with too many elements of zero, I will consider the action to be deleted, by the regression method to explore the variable nature, information can be made to reduce by the above several steps ,the screened variables  to do neural training.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Result:

 

4.Small data processing小資料處理