The Power of Communities: A Text Classification Model with Automated Labelling Using Network Community Detection

Text classification is one of the most critical areas of machine learning and artificial intelligence research. One of the problems in developing text classification models is that the performances of the models depend on the quality of labeling tasks that are typically done by humans. In this study, we propose a new network community detection-based approach to automatically label and classify text data into multiclass value spaces. Specifically, we build a network with sentences as network nodes and pairwise cosine similarities between sentences as link weights. We use the Louvain method [1] to group sentences into classes, and train Support Vector Machine [2] and Random Forest [3] models for classification using the community labels as part of the features. Results showed that models with the data labeled by network community detection outperformed the models with the human-labeled data by 2.68~3.75% classification accuracy increase on the test data provided by Pypestream. Our method may help development of a more accurate conversational intelligence system and other text classification systems.

Συνεδρία: 
Authors: 
Minjun Kim and Hiroki Sayama
Room: 
4
Date: 
Monday, September 24, 2018 - 12:45 to 13:00

Partners

The official Hotel of the Conference is
Makedonia Palace.

Conference Organiser: NBEvents

The official travel agency of the Conference is: Air Maritime

Photo of Thessaloniki seafront courtesy of Juli Bellou
fb flickr flickr

Contact

ccs2018@auth.gr