九十七學年度下學期 類神經網路 研究計畫書

Professor : Hahn-Ming Lee

Student Id : M9715002

Student : Po-Lin Yeh

Introduction

The goal of KDD Cup 2009 is training Neural Networks by Customer Relationship Management (CRM) which include more than 50000 customer’s information, and to predict the CRM ranking form different parameters.

 

Task Description

The task is to estimate the churn, appetency and up-selling probability of customers, hence there are three target values to be predicted. The challenge is staged in phases to test the rapidity with which each team is able to produce results. A large number of variables (15,000) is made available for prediction. However, to engage participants having access to less computing power, a smaller version of the dataset with only 230 variables will be made available in the second part of the challenge.

 

Us in order to solve this so many information, the information must be triggered by a smaller study of the rules to train; LVQ (Learning Vector Quantization), is a combination of supervised and Unsupervised learning method to find the dataset of Weight, the dataset to predict the future dataset correctness.

 

1.CRM and KDD CUP 2009

1.1 Background

CRM has a lot of companies are the effects of world trade, warehousing costs, customer needs ... and so on are all about the operation of an important condition for companyCustomer Relationship Management (CRM) is an approach that can assist organizations to serve their customers better. CRM helps to identify valuable customers, assess their needs, and provide more personalized service. It also streamlines the handling of enquiries and requests, resulting in higher operational efficiency and more rapid responses to customers.

 

CRM  provides highly automated system to generate and track sales lead, the performance of individual products or sales professionals and the results of sales campaigns on a wide range of parameters.

 

1.2 Why CRM ?

The highest goal of CRM is making sure your organization is keeping customers happy, discovering and solving problems as they come up to produce robust revenue and profits.

 

1.3 Proposal

The use of the existing dataset to predict the future there is potential for customers and this customer is divided into three types, namely, churn, appetency and up-selling three types, the future will be able to develop these three types of customers or is to meet the needs of customers.

 

2. Method (LVQ)

2.1Vector Quantization

Vector quantization (VQ) networks are intended to be used for classification. Like unsupervised networks, the VQ network is based on a set of codebook vectors. Each class has a subset of the codebook vectors associated to it, and a data vector is assigned to the class to which the closest codebook vector belongs. In the neural network literature, the codebook vectors are often called the neurons of the VQ network. In contrast to unsupervised nets, VQ networks are trained with supervised training algorithms. This means that you need to supply output data indicating the correct class of any particular input vector during the training.

2.2Unsupervised and Vector Quantization (VQ) Networks

Unsupervised ?algorithms are used to find structures in the data. They can, for instance, be used to find clusters of data points, or to find a one-dimensional relation in the data. If such a structure exists, it can be used to describe the data in a more compact way.

Most network models in the package are trained with supervised training algorithms. This means that the desired output must be available for each input vector used in the training. Unsupervised networks, or self-organizing networks, rely only on input data and try to find structures in the input data space. The training algorithms are therefore called unsupervised.

Since there is no "correct" output, there will also not be any "incorrect" outputs. This fact leaves a lot of responsibility to the user. After an unsupervised network has been trained, it must be tested to show that it makes sense, that is, if the obtained structure is really representing the data. This validation can be very tricky, especially if you work in a high- dimensional space. In two- or three-dimensional problems, you can always plot the data and the obtained structure and simply examine them. Another test that can be applied in any number of dimensions is to check for the mean distance between the data points and the obtained cluster centers. A small mean distance means that the data is well represented by the clusters.

An unsupervised network consists of a number of codebook vectors, which constitute cluster centers. The codebook vectors are of the same dimension as the input space, and their components are the parameters of the unsupervised network. The codebook vectors are called the neurons of the unsupervised network.

An unsupervised network can employ a neighbor feature. This gives rise to a self-organizing map (SOM). For SOM networks not only the mean distance between the data and nearest codebook vector is minimized, but also the distance between the codebook vectors. In this way it is possible to define one- or two-dimensional relations among the codebook vectors, and the obtained SOM unsupervised network becomes a nonlinear mapping from the original data space to the one- or two-dimensional feature space defined by the codebook vectors. Self-organizing maps are often called self-organizing feature maps, or Kohonen networks.

When the data set has been mapped by a SOM to a one- or two-dimensional space, it can be plotted and investigated visually.

Another neural network type that has some similarities to the unsupervised one is the Vector Quantization (VQ) network, whose intended use is classification. Like unsupervised networks, the VQ network is based on a set of codebook vectors. Each class has a subset of the codebook vectors associated to it, and a data vector is classified to be in the class to which the closest codebook vector belongs. In the neural network literature, the codebook vectors are often called the neurons of the VQ network.

Each of the codebook vectors has a part of the space "belonging" to it. These subsets form polygons and are called Voronoi cells. In two-dimensional problems you can plot these Voronoi cells.

2.3 Learning Vector Quantization (LVQ)

The positions of the codebook vectors are obtained with a supervised training algorithm, and you have two different ones to choose from. The default one is called Learning Vector Quantization (LVQ) and it adjusts the positions of the codebook vectors using both the correct and incorrect classified data. The second training algorithm is the competitive training algorithm, which is also used for unsupervised networks. For VQ networks this training algorithm can be used by considering the data and the codebook vectors of a specific class independently of the rest of the data and the rest of the codebook vectors. In contrast to the unsupervised networks, the output data indicating the correct class is also necessary. They are used to divide the input data among the different classes

LVQ is an acronym for Learning Vector Quantization, translated into English on Learning Vector Quantization flu still a little speech did not convey the idea, so call me directly LVQ. It can be Unsupervised learning and Supervised learning is used, specifically used to do classification. A total of LVQ neural network layer 2, layer the first layer for the competition to enter into sub-categories, the second layer for the linear layer, put the entire sub-category into main category of Health. LVQ neural network architecture as shown.

 

 

 

 

2.4Difficult

2.4.1 Computation

Vector Quantization was first mentioned by Gray in 1984, the drawback is the implementation of its computation time and memory space requirements as the prototype  point increase in the number and characteristics of variables.
Even if the VQ does not use a lot of codebook, but its efficiency is still excellent, but it has its shortcomings, in order to overcome the limitations imposed by the VQ and improve its efficiency, the VQ-based methods have been put forward, such as finite-state vector quantization (FS-VQ), address-VQ, predictive-VQ, SOC-VQ, LCIC-VQ, STC-VQ, DST-VQ ... ... and so on, those close to the internal block-based approach associated have come out in succession. However, some of these methods have a higher distortion rate, and some only a small part of the implementation of the improved compression, and most of them require a large amount of computational complexity.
But here there is no way to reduce computational complexity, can only reduce the amount of data to reduce the training time.

 

2.4.2.About Dataset

Customer’s id look for the number of buy to upgrade salaries and other conditions of the weight distribution, so using VQ to calculate the weight of the classification matrix, in order to achieve the purpose.

 

2.4.3Noise Filter

Consider the noise category, the situation is that there is probably not the development of client potential, so the client does not use too much time to waste, so, it can be divided into four or more, but it must be determined as the information and must be observation.

 

3Wanted

3.1 churn, appetency and up-selling

Expected data are divided into three different categories of the largest, and then observe the types of customers that are belong to.
Since the first phase of LVQ is the type of competitors, the so-called competition must be on the win and lose, so the winner output 1, loser output is 0, must be convergence, that is, the similarity will be anyway One category is divided into high, there is no exception.

 

3.2 Reliability
      we want a right classification, but after sub-category End can not guarantee the accuracy,
   so in three broad categories each category can be further divided into two categories to
   Distinguish between two categories to achieve the accuracy of the information.