FEATURE SUBSET SELECTION FOR HIGH DIMENSIONAL DATA BASED ON CLUSTERING
Keywords:
Markov Blanket, MST Creation, Gaussian Distribution, Shannon Infogain, Bayesian Probability, Fuzzy LogicAbstract
Feature selection is the process of evaluating and extracting desired data which can be grouped into subsets
which retain the integrity of original data. A feature selection algorithm should be efficient and effective. Efficient means
minimum time required and effective means quality of generated subset is not compromised. Our system proposes an
algorithm which consists of following steps: Markov Blanket, Shannon Infogain, Minimum Spanning Tree, Tree
Partition, Gaussian Distribution, Bayesian Probability. Applying these steps we get the desired subset from the clusters.
Our system ensures to remove irrelevant data along with redundant data which most of the systems fail to perform.