A Neutral Zone Classifier for Three Classes with an Application to Text Mining
Conference
64th ISI World Statistics Congress
Format: IPS Abstract
Keywords: classification, text analysis
Session: IPS 421 - Data Science in Statistics: methodological and applied issues
Thursday 20 July 10 a.m. - noon (Canada/Eastern)
Abstract
A classifier may be limited by its conditional misclassification rates more than by the achievable overall misclassification rate. In the case that one or more of the misclassification rates are high a neutral zone may be introduced to lower, and possibly balance, conditional misclassification rates. In this talk we discuss a novel neutral zone for classifiers for three classes and examine some of its properties.
Our application is around the analysis of student evaluations of teaching. While the use of numerical Likert Scale response data has been fervently discussed in the literature, comparatively little attention has been given to how written comments could be used to provide a fuller story of student satisfaction.
We show how our neutral zone classifier can work in this setting by labeling the individual comments as reflecting a positive, mixed, or negative overall experience in the class, and then adopting a neutral zone for comments where the evidence for one of the three labels is not sufficiently strong. The proportions of comments within a given course that are positive, mixed, or negative can then be determined as a summary statistic for what could otherwise be a large corpus of text comments that might not be read. In addition, we analyze the distributions of comments that are positive/mixed/negative across gender and ethnicity groups to examine any potential differences that might parallel differences that have been found in the analyses of the numerical Likert Scale questions.