Class:電機碩一 ID:M9707303 Name:林志餘

Proposal

In original proposal we want use the wake complete my study. First problem is memory. I try to open dataset in office2003, sql2008, access2007 find only ultraedit and office2007 can complete open dataset, so we use office2007 to transform dataset to .csv files and adjust weka max heap. But weka heap adjust can’t help to load dataset, It max load files is close to 300M byte. So, I decision reduce dataset to 300M method is below list.

 

1.    Resave data to 60 files. Files type is 250 variable and 50000 instances.

2.    Load to weka review dataset that have miss value more than 90% and 0 more than 90% the variable delete.

3.    Random selection variable to remain 230 variable.

4.    Load to weka use NaiveBayes train and predict.

 

Execute method

1.    Use c++ open dataset resave data to 60 files.

 

1.    One by one load to weka delete variable.

未命名.bmp

 

(1)  Selection delete variable.

(2)  Execute remove to delete variable.

(3)  Save data.

 

2.    Use office2007 combination data to only one files



3.    Load files save to .arff files.

未命名2.bmp

 

(1)  Save as.

(2)  Select save type.

(3)  Save.

 

4.    Change label attribute form number to class.

未命名3.bmp

 

5.    Load to weka train and predict.

未命名4.bmp

(1) Select train model.

(2) Use train set or test set

(3) Start train

 

7. Save result

未命名5.bmp

Result

 

appetency

Class 1

Class -1

Class 1

122

786

Class -1

4569

44541

AUC

Sensitivity

0.134361

Specificity

0.906964

 

churn

Class 1

Class -1

Class 1

2867

805

Class -1

31270

15058

AUC

Sensitivity

0.780773

Specificity

0.32503

 

up-selling

Class 1

Class -1

Class 1

3101

581

Class -1

31567

14751

AUC

Sensitivity

0.842205

Specificity

0.318472

 

appetency 10%

Class 1

Class -1

Class 1

15

87

Class -1

431

4467

AUC

Sensitivity

0.147059

Specificity

0.912005

 

churn 10%

Class 1

Class -1

Class 1

291

70

Class -1

3480

1159

AUC

Sensitivity

0.806094

Specificity

0.249838

 

up-selling 10%

Class 1

Class -1

Class 1

295

88

Class -1

3012

1605

AUC

Sensitivity

0.770235

Specificity

0.347628

Predict result