**Preface:**

Customer Relationship Management (CRM) is a key element of modern marketing strategies.KDD CUP 2009 provides large-scale marketing databa from Fsesrance Telecom ORANGE company. Our purpose is able to predict customer's preferences. We are also able to provide then suggestion about surrounding Products and make them more profitably.We will use neural network analysis a large database.

So far of Neural network
still have a new theory of architecture which has been raised constantly. Because the computer increase in computing
capacity, it makes
neural networks more powerful. Neural network has input
layer, hidden layer and output layer.We can find
hidden layer's relations from the expected input and output data. We use an enormous amount
of data to calculus and a series of studing.

It makes results of neural network will be more and more precise. The different algorithms will also affect studys and results. We will able to create predictable platform from Neural Network that we prior had learned.The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application.

**Introduced software:**

**SPSS
17 (Statistical Package for Social Science 17)**

SPSS 17 is a modular, tightly integrated, full-featured product line for the analytical process—planning, data collecting, data access, data management and preparation, analysis, reporting, and deployment.

Using SPSS with a combination of add-on modules and stand-alone software that work seamlessly with the base product enhances the capabilities of SPSS. The intuitive interface makes it easy to use—yet it gives you all of the data management, statistics, and reporting methods you need to do even your toughest analysis.

**Research steps:**

**1.
****Datas****
handling**

◎MISSING VALUES：

we use SPSS to statistical variable of missing values. The file include 15,000 variables and 10,000 datas. However, in order to increase the degree of information available, I will deletion variables of missing values on more than 90%.The rest may do average series and it should be able to reach the effect of continuity .This data has trained that will be not produce singular points of problem

Part of the Interception

◎ REGRESSION ANALYZE：

The main use of regression analysis to
enhance linearity . If the linearity is higher btween variables and expectations, The neural network
training is a major advantage of higher linearity between variables and target. It make predictive value to more accurate. Relatively poor linearity of the variables may be
done to remove. Following
is the introduction of multiple regression.

* * x _{1} + b * x_{2} + c *
x_{3} +…m * x_{n} + b*

Multiple regression is the instrument of choice when the researcher believes several independent variables interact to predict the value of a dependent variable. The test measures the degree to which each of the independent variables contributes to the prediction.

Multiple regression assumes:

· the independent variables are not highly correlated with each other

· the independent variables
predict the dependent variable, but the reverse is not true; the dependent
variable cannot predict the values of the independent variables

Multiple regression is
normally implemented using one of two techniques. The first technique, called
forward stepwise regression, starts by measuring the degree to which one
independent variable (usually the one the researcher believes is the strongest
predictor) correlates to the dependent variable.

One by one, additional independent
variables are added to the equation, and the degree (if any) to which each
predict the dependent variable is noted.Backwards
stepwise regression, a related approach, begins with an examination of the
combined effect of all of the independent variables on the dependent variable.

One by one, independent
variables (usually starting with the weakest predictor) are removed, and a new
analysis is performed. The results provide coefficients for each independent
variable, signifying the degree to which each one, when combined with the
others, contributes to predicting the dependent variable.

**2. ****Neural
network**

I use the neural
network which is built-in MATLAB to simulation,at the
beginning we got five files,each file is
approximately 10000 or 15000 variables, we do individually data processing by MATLAB,
and then five files is simulated individually, it means that we use five files
to simulate independent neural networks, do so because the memory is not big enough,so we can not do neural network to simulate.

**3.
Large data processing**

Matlab has variables as a result of
restrictions, deletion of variables to do so here, missing values as high as
85% or more, or with too many elements of zero, I will consider the action to
be deleted, by the regression method to explore the variable nature,
information can be made to reduce by the above several steps ,the screened
variables to do neural training.

**Result:**

**4.Small data processing****小資料處理**