**¤E¤Q¤C¾Ç¦~«×¤U¾Ç´Á**** ****Ãþ¯«¸gºô¸ô****
****¬ã¨spµe®Ñ**

Professor : Hahn-Ming Lee Student Id : M9715002 Student : Po-Lin Yeh |

**Introduction
**

The
goal of KDD Cup 2009 is training Neural Networks by Customer Relationship
Management (CRM) which include more than 50000 customer¡¦s information, and to
predict the CRM ranking form different parameters.

**Task Description**

The task is to estimate the **churn**, **appetency**
and **up-selling** probability of customers, hence
there are three target values to be predicted. The challenge is staged in
phases to test the rapidity with which each team is able to produce results. A
large number of variables (15,000) is made available
for prediction. However, to engage participants having access to less computing
power, a smaller version of the dataset with only 230 variables will be made
available in the second part of the challenge.

Us in order to solve this so
many information, the information must be triggered by a smaller study of the
rules to train; LVQ (Learning Vector Quantization), is a combination of
supervised and Unsupervised learning method to find the dataset of Weight, the
dataset to predict the future dataset correctness.

**1.CRM****
and KDD CUP 2009**

**1.1
Background**

CRM has a lot of companies are the
effects of world trade, warehousing costs, customer needs ... and so on are all
about the operation of an important condition for company¡ACustomer Relationship Management (CRM)
is an approach that can assist organizations to serve their customers better.
CRM helps to identify valuable customers, assess their needs, and provide more
personalized service. It also streamlines the handling of enquiries and
requests, resulting in higher operational efficiency and more rapid responses
to customers.

CRM provides highly automated system to generate and
track sales lead, the performance of individual products or sales professionals
and the results of sales campaigns on a wide range of parameters.

**1.2
Why CRM ?**

The highest goal of CRM is making sure
your organization is keeping customers happy, discovering and solving problems
as they come up to produce robust revenue and profits.

**1.3
Proposal**

The use of the existing
dataset to predict the future there is potential for customers and this
customer is divided into three types, namely, churn, appetency and up-selling
three types, the future will be able to develop these three types of customers
or is to meet the needs of customers.

**2.
Method (LVQ)**

2.1Vector
Quantization

Vector quantization (VQ) networks are
intended to be used for classification. Like unsupervised networks, the VQ
network is based on a set of *codebook vectors*. Each class has a subset
of the codebook vectors associated to it, and a *data vector* is assigned to
the class to which the closest codebook vector belongs. In the neural network
literature, the codebook vectors are often called the *neurons* of the VQ
network. In contrast to unsupervised nets, VQ networks are trained with
supervised training algorithms. This means that you need to supply output data
indicating the correct class of any particular input vector during the
training.

2.2Unsupervised
and Vector Quantization (VQ) Networks

Unsupervised ?algorithms
are used to find structures in the data. They can, for instance, be used to
find clusters of data points, or to find a one-dimensional relation in the
data. If such a structure exists, it can be used to describe the data in a more
compact way.

Most
network models in the package are trained with *supervised* training
algorithms. This means that the desired output must be available for each input
vector used in the training. *Unsupervised* networks, or *self-organizing
networks*, rely only on input data and try to find structures in the input
data space. The training algorithms are therefore called *unsupervised*.

Since
there is no "correct" output, there will also not be any
"incorrect" outputs. This fact leaves a lot of responsibility to the
user. After an unsupervised network has been trained, it must be tested to show
that it makes sense, that is, if the obtained structure is really representing
the data. This validation can be very tricky, especially if you work in a high-
dimensional space. In two- or three-dimensional problems, you can always plot
the data and the obtained structure and simply examine them. Another test that
can be applied in any number of dimensions is to check for the mean distance
between the data points and the obtained cluster centers. A small mean distance
means that the data is well represented by the clusters.

An
unsupervised network consists of a number of *codebook vectors,* which
constitute cluster centers. The codebook vectors are of the same dimension as
the input space, and their components are the parameters of the unsupervised
network. The codebook vectors are called the *neurons* of the unsupervised
network.

An
unsupervised network can employ a *neighbor feature*. This gives rise to a
s*elf-organizing map* (SOM). For SOM networks not only the mean distance
between the data and nearest codebook vector is minimized, but also the
distance between the codebook vectors. In this way it is possible to define
one- or two-dimensional relations among the codebook vectors, and the obtained
SOM unsupervised network becomes a nonlinear mapping from the original data
space to the one- or two-dimensional feature space defined by the codebook
vectors. Self-organizing maps are often called *self-organizing feature maps*,
or *Kohonen** networks*.

When
the data set has been mapped by a SOM to a one- or two-dimensional space, it
can be plotted and investigated visually.

Another
neural network type that has some similarities to the unsupervised one is the *Vector
Quantization* (VQ) network, whose intended use is classification. Like
unsupervised networks, the VQ network is based on a set of *codebook vectors*.
Each class has a subset of the codebook vectors associated to it, and a *data
vector* is classified to be in the class to which the closest codebook
vector belongs. In the neural network literature, the codebook vectors are
often called the *neurons* of the VQ network.

Each
of the codebook vectors has a part of the space "belonging" to it.
These subsets form polygons and are called Voronoi
cells. In two-dimensional problems you can plot these Voronoi
cells.

**2.3***
***Learning Vector Quantization (LVQ)**

The
positions of the codebook vectors are obtained with a supervised training
algorithm, and you have two different ones to choose from. The default one is
called *Learning Vector Quantization* (LVQ) and it adjusts the positions
of the codebook vectors using both the correct and incorrect classified data.
The second training algorithm is the competitive training algorithm, which is
also used for unsupervised networks. For VQ networks this training algorithm
can be used by considering the data and the codebook vectors of a specific
class independently of the rest of the data and the rest of the codebook
vectors. In contrast to the unsupervised networks, the output data indicating
the correct class is also necessary. They are used to divide the input data
among the different classes

LVQ is an acronym for Learning Vector Quantization, translated
into English on Learning Vector Quantization flu still a little speech did not
convey the idea, so call me directly LVQ. It can be Unsupervised learning and
Supervised learning is used, specifically used to do classification. A total of
LVQ neural network layer 2, layer the first layer for the competition to enter
into sub-categories, the second layer for the linear layer, put the entire
sub-category into main category of Health. LVQ neural network
architecture as shown.

**2.4Difficult**

**2.4.1****
Computation**

Vector Quantization was first mentioned by Gray in 1984, the
drawback is the implementation of its computation time and memory space
requirements as the prototype point increase in
the number and characteristics of variables.

Even if the VQ does not use a lot of codebook, but its efficiency is still
excellent, but it has its shortcomings, in order to overcome the limitations
imposed by the VQ and improve its efficiency, the VQ-based methods have been
put forward, such as finite-state vector quantization (FS-VQ), address-VQ,
predictive-VQ, SOC-VQ, LCIC-VQ, STC-VQ, DST-VQ ... ... and so on, those close
to the internal block-based approach associated have come out in succession.
However, some of these methods have a higher distortion rate,
and some only a small part of the implementation of the improved compression,
and most of them require a large amount of computational complexity.

But here there is no way to reduce computational complexity, can only reduce
the amount of data to reduce the training time.

**2.4.2****.About**** Dataset**

Customer¡¦s id look for the number of buy to upgrade salaries and
other conditions of the weight distribution, so using VQ to calculate the
weight of the classification matrix, in order to achieve the purpose.

**2.4.3****Noise
Filter**

Consider
the noise category, the situation is that there is probably not the development
of client potential, so the client does not use too much time to waste, so, it
can be divided into four or more, but it must be determined as the information
and must be observation.

**3Wanted**

**3.1 ****churn, appetency and up-selling**

Expected
data are divided into three different categories of the largest, and then observe
the types of customers that are belong to.

Since the first phase of LVQ is the type of competitors, the so-called
competition must be on the win and lose, so the winner output 1, loser output
is 0, must be convergence, that is, the similarity will be anyway One category is divided into high, there is no exception.

**3.2 Reliability**

we want a right classification, but after
sub-category End can not guarantee the accuracy,

so in three broad categories
each category can be further divided into two categories to

Distinguish between two
categories to achieve the accuracy of the information.