|
Depression Classification with Imbalanced Data Problems: Literature Survey |
|---|---|
| รหัสดีโอไอ | |
| Creator | Artitayaporn Rojarath |
| Title | Depression Classification with Imbalanced Data Problems: Literature Survey |
| Contributor | Wararat Songpan, Olarik Surinta |
| Publisher | Faculty of Engineering Mahasasakham University |
| Publication Year | 2568 |
| Journal Title | Engineering Access |
| Journal Vol. | 11 |
| Journal No. | 2 |
| Page no. | 185-199 |
| Keyword | Depressive classification, Imbalanced data, Resampling method, Oversampling technique, Machine learning |
| URL Website | https://ph02.tci-thaijo.org/index.php/mijet/index |
| Website title | THAIJO Engineering Access |
| ISSN | 2730-4175 |
| Abstract | Depression is an increasingly serious global mental health concern, with the number of affected individuals rising steadily. In Thailand, more than 70% of the working-age population is at risk of developing depressive conditions, as reported by the Thai Depression Center. A significant challenge in depression research is the issue of imbalanced datasets, where the number of depressive cases (minority class) is significantly lower than non-depressive cases (majority class). This imbalance often results in biased classification models that favor the majority class, thereby reducing the accuracy and effectiveness of depression classification. This literature survey addresses critical gaps in the field by focusing on the imbalanced data problem in depression classification. While previous studies have primarily relied on traditional oversampling and undersampling techniques, these approaches often intensify the problem of overfitting and lead to the loss of valuable information. Our research explores these issues by reviewing various resampling methods, with a particular emphasis on advanced oversampling techniques that aim to preserve data integrity while mitigating overfitting. The survey also presents a comparative analysis of evaluation metrics, including accuracy, precision, recall, F1-score, and AUC, to provide a more nuanced understanding of classifier performance in the context of imbalanced data. Our findings indicate that while oversampling methods are generally effective, careful implementation is essential to avoid overfitting, which can distort the predictive accuracy of the model. |