¤E¤Q¤C¾Ç¦~«×¤U¾Ç´Á Ãþ¯«¸gºô¸ô ¬ã¨spµe®Ñ(¸ê¤uºÓ¤@ M9715053 ¤ý«³³Ô)

¤@¡B¬ã¨spµe¤¤^¤åºKn¡G½Ð´N¥»pµenÂI§@¤@·§z¡A¨Ã¨Ì¥»pµe©Ê½è¦ÛqÃöÁäµü¡C¡]¤¦Ê¦r¥H¤º¡^

1.
Background

Because customer base is the key factor of company
profitability, each company must do his best to keep old customers and search new
customers. There are many marketing strategies like CRM (Customer Relation
Management), which means a company interact highly with his customers each
other to understand and effect on the behavior of customers. It is a business
management to raise Customer Acquisition, Customer Retention, Customer Loyalty
and Customer Profitability.

2.
Task
Description

The task is to estimate the probabilities of the three
types of customers, and there are three target values to be predicted and a
large number of variables (15,000) are made available for prediction. The Data
set includes numerical and categorical variables, and unbalanced class
distributions. So, time efficiency is a crucial point.

3.
Data
Set

(1)
Instance:
50000 (both training set and testing set)

(2)
Variable:
14740 numerical values, and 260 categorical values

(3)
Type:
churn, appetency, up-selling

(4)
Label:
1 refer to positive, and -1 refer to negative in each type

(5)
There
are some missing values within data set

(6)
Churn,
appetency, and up-selling are three separate binary classification problems.

4.
Three
type of customers

(1) Churn: It is one of two
primary factors that determine the steady-state level of customers a business
will support. In its broadest sense, churn rate is a measure of the number of
individuals or items moving into or out of a collection over a specific period
of time.

(2) Appetency: The appetency is
the propensity to buy a service or a product.

(3) Up-selling: Up-selling is a
sales technique whereby a salesman attempts to have the customer purchase more
expensive items, upgrades, or other add-ons in an attempt to make a more
profitable sale.

¤G¡B¬ã¨spµe¤º®e¡G

¡]¤@¡^¬ã¨spµe¤§I´º¤Î¥Øªº¡C½Ð¸Ôz¥»¬ã¨spµe¤§I´º¡B¥Øªº¡B«n©Ê¤Î°ê¤º¥~¦³Ãö¥»pµe¤§¬ã¨s±¡ªp¡B«n°Ñ¦Ò¤åÄm¤§µûzµ¥¡C

¡]¤G¡^¬ã¨s¤èªk¡B¶i¦æ¨BÆJ¤Î°õ¦æ¶i«×¡C½Ð¦Cz¡G1.¥»pµe±Ä¥Î¤§¬ã¨s¤èªk»Pì¦]¡C2.¹wp¥i¯à¾D¹J¤§§xÃø¤Î¸Ñ¨M³~®|¡C

¡]¤T¡^¹w´Á§¹¦¨¤§¤u§@¶µ¥Ø¤Î¦¨ªG¡C½Ð¦Cz¡G1.¹w´Á§¹¦¨¤§¤u§@¶µ¥Ø¡C

1.
Task
Description

(1) The task is to estimate the
probabilities of the three types of customers. The data set is from large
marketing databases of the French Telecom company Orange to predict the
propensity of customers to switch provider (churn), buy new products or
services (appetency), or buy upgrades or add-ons proposed to them to make the
sale more profitable (up-selling). Hence, there are three target values to be
predicted and a large number of variables (15,000) are made available for
prediction. The Data set includes numerical and categorical variables, and
unbalanced class distributions.

(2) The main objective is to
make good predictions of the target variables.

(3) The evaluation are
evaluated according to the arithmetic mean of the AUC for the three labels
(churn, appetency. and up-selling).

l Sensitivity and
specificity: the official stuff define sensitivity (also called true positive
rate or hit rate) and the specificity (true negative rate) as follow:

Sensitivity = tp/pos

Specificity = tn/neg

where pos=tp+fn is the total number of
positive examples and neg=tn+fp the total number of negative examples.

l Area Under Curve (AUC): It
corresponds to the area under the curve obtained by plotting sensitivity
against specificity by varying a threshold on the prediction values to determine
the classification result.

1.
Method

To processing the large
data set, the first thing to do is reducing the data size. By analyzing the
data set, I have something discovered and describe them as below.

(1)
Data
Sets

n **Small Training Set**

l Appetency: 890

l Churn: 3672

l Up-selling: 3682

l Non-target: 41756 (83%)

l Null column: 18

l Each instance with each
label

l Some whole columns almost
can be divisible

l Many columns are positive
relationship

n **Large Training Set **

l Appetency: 890

l Churn: 3672

l Up-selling: 3682

l Non-target: 41756

l Null column: 107

l Each instance with each
label

n The bias of
training data set is very obvious.

(2)
Data
Preprocessing

n Data Cleaning

l Delete null columns

n Nominal to Numeric

l Convert the string with
concatenate the ASCII code of character

l Ex: ra_5cè11497955399

n Normalization

l Standard deviation
normalization

l To fill missing values

n Continuous value divided by
some integer

l Some attribute value can be
divisible

(3)
Feature
Selection

n
Various
attribute: 51 attributes

n
Monotonous
attribute: 161 attributes

n
GainRatio
score

l
pickup
top 15 attributes

n
Normalized
numeric Only + 3 labels

l
174
attributes

n
For
Large data set

l
Min,
Max, Mean, Stdev

l
Null
count, Numeric count, Nominal count

(4)
Training
Model

l MultilayerPerceptron

l A classifier that uses
backpropagation to classify instances. This network can be built by hand,
created by an algorithm or both. The network can also be monitored and modified
during training time. The nodes in this network are all sigmoid.

l BayseNet

(5)
Evaluation

l The base line is Naïve Bayes,
which performance as below.

(6)
Challenge

l It is difficult to process
efficiently the large data set by simple analysis like as feature distribution.
The personal computer is impossible handle the large data at short time, so I
will reduce the data size firstly.

l The second, I have no CRM
background knowledge. In fact, if randomly selecting features, it could result
bad performance.

l Hence, I must consider both
data bias and correct feature selection to result good performance.

2.
Experiment
Environment & Results

(1)
Preprocessing:

l Re-label: appetencyàa, churnàc, up-sellingàu, the othersàn

(2)
Results

l 51(various)/161(monotonous)
attributes: the effective attributes is hidden among various and monotonous
attributes.

l MultiLayerPerceptron(1):
50000 instances, 212 normalized attributes

l MultiLayerPerceptron(2): ,
50000 instances, 174 normalized numerical attributes

l It seems
converting nominal to numeric with ASCII code is not working.

l BayesNet(1): More
Information is hidden within various attributes.

l BayesNet(2)

l Obviously,
BayesNet is more adapted to the data set than MultiLayerPerceptron.

l BayesNet(3): The
result is amazing, but I can¡¦t apply to testing data set with same processes.
Besides, It is obviously over-fitting.

l MultiLayerPerceptron(4):
Large Training Data Set: My strategy doesn¡¦t work.

3.
Conclusion

(1)
Maybe categorical attributes is key factor

(2)
Overcome the bias of data set under no any background knowledge

(3)
Adjust parameters to adapt data set

(4)
Maybe Probabilistic Neural Network (PNN) is applicable

(5)
Different target, Different model

(6)
Understand tool¡¦s parameters is important

4.
References

(1)
Ãþ¯«¸gºô¸ô¼Ò¦¡À³¥Î»P¹ê§@,
¸©É¦¨, ¾§ªL¹Ï®Ñ

(2)
Training
set optimization methods for a probabilistic neural network, Mark H. Hammond,
Chemometrics and Intelligent Laboratory Systems 71, 2004, P.73-78

(3)
* MapReduce: Simplified Data Processing on Large Clusters*, Jeffrey Dean and Sanjay Ghemawat,
OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004,
P.107-113