A REVIEW OF FEATURE EXTRACTION METHODS FOR TEXT CLASSIFICATION

Resham N. Waykole; Anuradha D. Thakare

Authors

Resham N. Waykole Department of Computer Engineering, Pimpri Chinchwad College of Engineering
Anuradha D. Thakare Department of Computer Engineering, Pimpri Chinchwad College of Engineering

Keywords:

Natural Language Processing, Feature Extraction, Classification, Bag of Words, TF-IDF, Word2Vec, Logistic Regression, Random Forest Classifier

Abstract

Natural Language Processing (NLP) and Machine Learning concepts are acclaimed in today’s digitalization
of data. Over the time, value of the data keeps changing and it is important to tackle that value for performing in depth
research in various domains. Over the past decade, natural language processing has gained much importance because it
reveals a lot of unseen information in the texts. It is difficult to discover the information of interest from a huge volume of
the text data. Thus, information extraction based on computational text processing is necessary. For many of information
management goals, the task of recognising phrases and words in free text which falls under particular classes of interest
is an important first step. It is crucial to manage huge amount of text being generated dramatically. The text can be for
example clinical and biomedical text. Features can be extracted for classification of the documents. Feature extraction is
extracting an important subset of features from a data for improving the classification task. Correctly identifying the
related features in a text is important. Therefore, applying and expanding NLP techniques can help to better understand
and study the data. This paper aims at analysing the clinical literature for cancer. The feature extraction methods such
as bag of words, tf-idf, word2vec are compared for clinical text analysis. The extracted features are evaluated against
Logistic Regression and Random Forest Classifier.

A REVIEW OF FEATURE EXTRACTION METHODS FOR TEXT CLASSIFICATION

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission

downloads

Imp links

google

Current Issue

Information