Conclusions - A Machine Learning Approach to Phishing Detection and Defense (2015)

A Machine Learning Approach to Phishing Detection and Defense (2015)

Chapter 6. Conclusions

Abstract

This is the concluding chapter of this study, which implies that it includes a recap of all the previous chapters. First, a concluding remark is given discussing the importance of this study in mitigating the risk incurred by online users to phishing websites. Furthermore, the objectives of the study are summarized in phases of our research methodology discussed in Chapter 3. Second, our contribution to this research area is analyzed and discussed in different stages of implementation. However, because of the broadness of the research area, we discussed the scope of our research in Chapter 1 to a time-feasible scope. Furthermore, during the course of the study, we realized prospective field of study to expand on our research in the future. This brings us to the concluding section of our work, which discusses our recommendation and future research area that can improve website phishing detection and mitigation. Finally, our closing remark that includes a brief on the problem and the effectiveness of our solution approach is given.

Keywords

fraud

confidentiality

classifier

performance

ensemble

cross-validation

6.1. Concluding remarks

The importance to safeguard online users from becoming victims of online fraud, divulging confidential information to an attacker among other effective uses of phishing as an attacker’s tool, phishing detection tools play a vital role in ensuring a secure online experience for users. Unfortunately, many of the existing phishing-detection tools, especially those that depend on an existing blacklist, suffer limitations such as low detection accuracy and high false alarm that is often caused by either a delay in blacklist update as a result of human verification process involved in classification or perhaps, it can be attributed to human error in classification which may lead to improper classification of the classes. These critical issues have drawn many researchers to work on various approaches to improve detection accuracy of phishing attacks and to minimize false alarm rate. The inconsistent nature of attacks behaviors and continuously changing URL phish patterns require timely updating of the reference model. Therefore, it requires an effective technique to regulate retraining as to enable machine learning algorithm to actively adapt to the changes in phish patterns.

This study focus on investigating a better detection approach and to design an ensemble of classifier suitable to be used in phishing detection. Figure 6.1 summarizes the design and implementation phases leading to the proposed better detection model.

image

FIG. 6.1 Design and development phases leading to the proposed model.

Phase 1 focuses on dataset gathering, preprocessing, and feature extraction. The objective is to process data for use in Phase 2. The gathering stage is done manually by using Google crawler and Phishtank, each of this data gathering methods were tested to ensure a valid output. The dataset is validated first after gathering, then normalized, features extraction and finally dataset division. Nine features were selected for this project to ensure an optimum result from the classifiers and also, since using a small feature set will invariably speed up processing time for training and for classification of new instances. These features were selected on the basis of the weighted performance of each feature by using information gain algorithm to ensure that only the best features were selected. This phase focuses on ensuring that the dataset preprocessing is done appropriately to accommodate the models selected.

Phase 2 focuses on design and implementation of training and validating model using single classifier. A predefined performance metrics is used as a measurement of accuracy, precision, recall, and f-measure. The objective of this phase is to test the performance of individual classifiers in the pool of varying dataset as divided in Chapter 4 and select the most performed of all the reference classifiers. An accuracy of 99.37% was obtained from K-NN which is the highest as compared to other classifiers referenced. Although it was also observed that some of the classifiers like K-NN and C4.5 maintained a close range performance, same cannot be said of the remaining two classifiers that appeared lacking behind in performance. The performance of K-NN is not surprising since the dataset used is of a small set and as such K-NN often perform better with small dataset but the performance decreases has the size of the dataset increases (Kim and Huh, 2011). Also, since the performance of KNN is primarily determined by the choice of K, the best K was found by varying it from 1 to 7; and found that KNN performs best when K = 1. This as well, helped in the high accuracy of KNN compared to other classifiers used.

Phase 3 which corresponds to the third objective is divided into two parts, one is the ensemble design and the other is the comparative study between the best ensemble and the best individual classifier that was selected in Phase 2. To design a good ensemble, only three algorithms are used for individual ensemble due to the selection of majority voting as the ensemble algorithm, odd number of algorithms must be used to select the committee of ensembles. For every instance of each ensemble, an ensemble design of three algorithms is being selected until all the algorithms have been combined evenly. The design ensemble performed very well with an accuracy of 99.31% for the best-performed ensemble and this result is then compared with that obtained in Phase 2. The outcome of the comparison suggests that if K-NN algorithm is removed or if the size of the dataset is increased, the ensemble will most likely perform better than the individual algorithm. This investigation will be considered as part of future work.

6.2. Research contribution

This section gives a list of contributions as outcomes to this research. The following subsections discussed the major contributions to this research.

6.2.1. Dataset Preprocessing Technique

Since the dataset used in this project is manually collected for non-phishing and then phishing URLs is extracted from Phishtank repository, the remaining part of preprocessing the data and extracting the features were carried out as part of the objectives of this project. During preprocessing, it was ensured that all the dataset involved are tested and confirmed alive as it is known that phishing websites are often uploaded for a limited time and most of them go offline after a couple of days. The essence of this is to ensure that no bogus result is being presented and as such the results collected from the implementation phase are very accurate. One of the major problems observed in other similar research is that there is no assurance that the dataset collected from most of the phishing and non-phishing repository have been tested alive or not and as such the results may be subjected to glitches of error.

6.2.2. Validation Technique

The dataset is validated with individual algorithms nine times to ensure that the right pattern of validation is selected. This cross-validations are set to [10, 20, 30… 90] after which the results based on the performance metrics were averaged and the standard deviation calculated to ensure the deviation limit of the result to justify the cross validation used. In most of the related researches, validation is often set to a range of [5, 10, 15] but in order to ensure the performance of this algorithms and better prove beyond doubts that the cross-validation used is the most efficient. Although it turns out that any of the cross-validation tested can be used since the deviation margin of the results is negligible, the importance of certainty cannot be over emphasized.

6.2.3. Design Ensemble Method

In most research, especially the ones involved with majority voting, often times the number of algorithms used is four and a decision is taken to remove the least-performed classifier but in the case of this project, the last two algorithms performed almost the same and as such the chance of removing one of them without being biased is uncertain. Because of this reason, each algorithm is marched with other algorithm in a committee of three which leads us to having four ensembles. Therefore, the problem of selecting one of two closely performed classifiers has been resolved.

6.3. Research implication

Most of the studies have been focused on phishing detection using preprocessed data. It is obvious that when a selected set of features are extracted during preprocessing, it is easier to develop a dataset that is entirely suitable for phishing. This has been carried out in Chapter 4 to ensure that each of the features selected has been scrutinized on the basis of weight impact. Hence, a more satisfactory classification rate is achieved.

6.4. Recommendations for future research

This research uncovers new possible areas for further research as the following:

1. For future work, other types of voting implementation could be employed in order to find the most efficient of them.

2. Using a dataset with a wider range of variation can be used to improve the performance of c4.5 and SVM as they both perform better with an increase in dataset size.

3. Other features should be alternated with the current features and also the impact of increasing features can be studied to better understand the threshold of the classifiers in correctly classifying dataset with varying feature selection.

6.5. Closing note

This book addresses the research problem comprehensively through synergy of different machine learning algorithms both individually and in-design ensemble training and validation. The issue of low detection rate has been addressed through selective feature extraction, well-performed machine learning algorithm and an unbiased committee of ensemble. This has also efficiently resolved the problem of high false alarm during classification. Finally, composition committee of ensemble has been designed efficiently without being biased. This is achieved by designing an ensemble in which all the reference algorithms participated in. In addition, this research has opened up new research opportunities with respect to the enhancement of the ensemble model by testing new ensemble methods to ensure the best possible ensemble of classifier is used.