系所:電機碩一 學號:M9707303 姓名:林志餘

 

Neural Networks Proposal

 

Abstract

The proposal will achieve to learn Neural Networks and data mining by KDD 2009 challenge. KDD Cup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIGKDD - Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners. KDD 2009 challenge use dataset. That about Customer Relationship Management(CRM). CRM is a key element of modern marketing strategies. For the large dataset, the first 14,740 variables are numerical and the last 260 are categorical. We will use the dataset and weka data mining tool train dataset to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or sale more profitable (up-selling).

wikipedia definition:

Churn: Churn rate is also sometimes called attrition rate. It is one of two primary factors that determine the steady-state level of customers a business will support. In its broadest sense, churn rate is a measure of the number of individuals or items moving into or out of a collection over a specific period of time. The term is used in many contexts, but is most widely applied in business with respect to a contractual customer base. For instance, it is an important factor for any business with a subscriber-based service model, including mobile telephone networks and pay TV operators. The term is also used to refer to participant turnover in peer-to-peer networks.

 

Appetency: In our context, the appetency is the propensity to buy a service or a product.

 

Up-selling: Up-selling is a sales technique whereby a salesman attempts to have the customer purchase more expensive items, upgrades, or other add-ons in an attempt to make a more profitable sale. Up-selling usually involves marketing more profitable services or products, but up-selling can also be simply exposing the customer to other options he or she may not have considered previously. Up-selling can imply selling something additional, or selling something that is more profitable or otherwise preferable for the seller instead of the original sale.

 

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka is a open source tool. In 1993, the University of Waikato in New Zealand started development of the original version of Weka. Now day, it is a very popular use for data mining.

 

 

History(wikipedia):

·     In 1993, the University of Waikato in New Zealand started development of the original version of Weka (which became a mixture of TCL/TK, C, and Makefiles).

·     All-time ranking on Sourceforge.net as of 2008-11-21: 257 (with 1,362,483 downloads)

 

Study proposal

1. Study background

The proposal goal is achieve KDD 2009 challenge. The challenge is a Customer Relationship Management dataset, that have 14,740 variables are numerical and the last 260 are categorical. We will use weka data mining tool train dataset trying to predict churn, appetency and up-selling. ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. Now day, Data mining research is very hot, it use for large data find information. knowledge discovery and data mining is an interdisciplinary one, requiring cross fertilization of several fields SIGKDD will work closely with other related ACM SIGs, including SIGMOD, SIGART, SIGMIS, SIGIR, SIGCHI, SIGGRAPH, etc., as well as non-ACM societies, such as AAAI, IEEE, ASA (American Statistical Association), etc.

 

2.  Study method

We want use three step to proceed. First must to analyze dataset. Such as KDD 2009 dataset is very larger. So, we must reduce data. We proposal use less data load to Weka. Use Weka to observe data may be reducing portion.

 

Fig 1. Weka to observe data

 

                       

(a) All 0                                                                  (b) Two value

(c) Continuous data

Fig 2 Weka to observe data (a) All 0, (b) Two value (c) Continuous data.

 

From data format all 0 may be cancellation. Such, it entropy is very low and Two value may be is a classification. Second, selection train model. Such dataset not include output so we can not use supervised learning network to train Neural Networks and dataset no supply variable attribute so can not understand relationship for variable and target. Since, we can not to determine use model categorical. We guess the model is unsupervised learning network. Result is complete predict churn, appetency and up-selling.

 

 

            TABLE 1. PROBLEM AND SOLUTION

3. Anticipate complete scheme

This proposal goal is complete KDD 2009 challenge of predict churn, appetency and up-selling. In scheme we only complete Weks initial study, can load data and use Back Propagation Network train data. Now is considering dataset characteristic. In point 2 have study state. TABLE 2. Show my scheme

 

TABLE 2. SCHEME