EDA-DB discretization method to extract essential features for endoscopic gastritis dataset
Abstract
Since most real-world applications of classification learning involve continuous-valued attributes, extracting data pattern from raw data is an important task. The major purpose of this project is to build a discretization algorithm using boundary cut-points technique known as Entropy-based Discretization According to Distribution of Boundary Point(EDA-DB) Technique to extract essential features. Boundary cut point is a cut point involving examples of different classes. A cut point is defined as the midpoint between each successive pair of values in the sorted sequence of attribute values. The project is developed using Matlab, WEKA, Decision Stump classifier and Random Forest classifier via endoscopic gastritis data set. EDA-DB selected minimum entropy boundary cut point that is spread out within an interval. As a result of discretization
process, good generalized data patterns of Endoscopic Gastritis are generated. On top of
that essential features are also produced. Thus, determining discretized data pattern
from the extracted Endoscopic Gastritis features may improve the overall classification process.