Max-difference maximization criterion: A feature selection method for text categorization

For text categorization, it is necessary to select a set of features (terms) with high discrimination by using feature selection. In text feature selection, Accuracy2 (ACC2) treats terms with same absolute document rate difference but different discrimination equally, which is unreasonable. Existing improved methods (normalized difference measure (NDM), max-min ratio (MMR) and trigonometric comparison measure (TCM) ) based on ACC2 may confuse the importance of rare and sparse terms on account of challenge for parameter selection.

To solve the problems, a research team led by Li Zhang published their new research in Frontiers of Computer Science.

The team proposed max-difference maximization criterion (MDMC) , which introduces a new weight based on class information occupancy and combines it with ACC2 to estimate the importance of terms. As a result, MDMC can avoid overestimate of sparse terms.

In the research, they analyze the weight distributions of methods (ACC2, NDM, MMR, TCM and MDMC) and intuitively show the mechanism of MDMC to estimate the importance of terms, which is shown in online resources. Experiments demonstrate that MDMC is capable of catching more discriminant terms without any parameter than other filter ones regardless of classifier, and shows its superiority over other dimensionality reduction methods (improved sine cosine algorithm (ISCA) , principal component analysis (PCA) and non-negative matrix factorization (NMF) ).

More information: Lingbin Jin et al, Max-difference maximization criterion: a feature selection method for text categorization, Frontiers of Computer Science (2023). DOI: 10.1007/s11704-022-2154-x

Provided by Higher Education Press

Max-difference maximization criterion: A feature selection method for text categorization

New device invented on an attic sewing machine will improve lives of those with stoma bags

Refined AI approach improves noninvasive brain-computer interface performance

Beware of AI-based deception detection, warns scientific community

Researchers create massive open dataset to advance AI solutions for carbon capture

Random robots are more reliable: New AI algorithm for robots consistently outperforms state-of-the-art systems

New AI tool efficiently detects asbestos in roofs so it can be removed

Natural language boosts LLM performance in coding, planning and robotics

Science has an AI problem: Research group says they can fix it

SK Hynix says high-end AI memory chips almost sold out through 2025

Stretchable e-skin could give robots human-level touch sensitivity

Leveraging robots to help make wind turbine blades

Cost-effective, high-capacity and cyclable lithium-ion battery cathodes

New memory transistor integrates photocrosslinker into molecular switches to adjust its threshold voltage

Researchers find use of olivine in cement production could result in carbon negative concrete

A new roadmap to close the carbon cycle

A miniature wireless robot that can effectively move through tubular structures

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

New research reports on buckling: When structures suddenly collapse

Max-difference maximization criterion: A feature selection method for text categorization

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Share article

E-MAIL THE STORY