Construction-accident narrative classification using shallow and deep learning

Qiao, J; Wang, C; Guan, S and Shuran, L (2022) Construction-accident narrative classification using shallow and deep learning. Journal of Construction Engineering and Management, 148(9), ISSN 0733-9364

Abstract

It is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several state-of-the-art machine learning methods, including 10 shallow learning methods (Rocchio, k-nearest neighbors, linear regression, naive Bayes, decision tree, random forest, gradient boosting, bootstrap aggregating, support vector machine (SVM), and shallow neural network), and five deep learning methods [deep neural network, convolutional neural network (CNN), recurrent neural network with long short-term memory, and a gated recurrent unit, and recurrent CNN]. The input data set contained 4,770 construction accident reports from the Occupational Safety and Health Administration (OSHA). After the narratives were relabeled based on the Occupational Injury and Illness Classification System (OIICS), the accuracy of all shallow classifiers was significantly improved compared with that reported in previous studies. SVM and CNN achieved the highest accuracy of 0.91 and 0.90 among the shallow and deep learning methods, respectively. Misclassifications occur because training data sets lack rich diversity for minority classes, some cases belong to multiple classes, and some divisions have the same key feature words. In the future, when a new data set is available, we can use learned patterns to classify them with high accuracy in practice.

Item Type: Article
Uncontrolled Keywords: autocoding; data mining; machine learning; natural language processing; text mining
Date Deposited: 11 Apr 2025 19:49
Last Modified: 11 Apr 2025 19:49