Kim, J; Chung, S and Chi, S (2024) Cross-lingual information retrieval from multilingual construction documents using pretrained language models. Journal of Construction Engineering and Management, 150(6), ISSN 0733-9364
Abstract
The growth of the global construction market has attracted international companies to participate in overseas projects. Overseas projects are extremely dynamic with numerous uncertainties, raising the need to collect information about construction in host countries. Due to the vast amounts of text data in the construction industry, an automated method, specifically information retrieval, is required to find the necessary information. Previous studies have suggested automated methods to review various construction documents. However, these studies required substantial manual effort and mainly focused on only one language, resulting in loss of vital information because it is buried in documents written in the host country's language. To address these limitations, this study proposes a cross-lingual information retrieval (CLIR) framework using pretrained Bidirectional Encoder Representations from Transformers (BERT) models to retrieve information from multilingual construction documents. The proposed framework employs language models (i.e., monolingual, multilingual, and cross-lingual) and trains these models on a construction data set to enhance their ability in construction-specific text. The framework achieved reliable performance of retrieval, even with minimal additional training using domain-specific data. The results indicate that training on the domain data set raises the level of retrieval, increasing the mean reciprocal rank of a specific task by up to 0.2128. With the employment of a monolingual model with machine translation, CLIR in a specific domain could be performed effectively without the need for a labeled data set. The suggested CLIR framework offers a practical alternative for dealing with construction documents in overseas projects, reducing time and cost while improving risk identification and mitigation.
Item Type: | Article |
---|---|
Date Deposited: | 11 Apr 2025 19:50 |
Last Modified: | 11 Apr 2025 19:50 |