Construction legal support for differing site conditions (DSC) through statistical modeling and machine learning (ML)

Mahfouz, T S (2009) Construction legal support for differing site conditions (DSC) through statistical modeling and machine learning (ML). Unpublished PhD thesis, Iowa State University, USA.

Abstract

The objective of this dissertation is to provide a coherent and integrated methodology for construction legal decision support for Differing Site Conditions (DSC) disputes through statistical modeling and machine learning. To attain this goal, the current study designed and implemented a 4 step methodology targeting the following goals: (1) to extract a comprehensive set of legal factors that govern DSC litigation outcomes in the construction industry; (2) to devise a litigation prediction model for DSC disputes in the construction industry based on the extracted set of legal factors; (3) to create a methodology for automated extraction of significant legal factors that governs DSC litigation outcomes from case documents; and (4) to develop an automated retrieval model for identifying DSC precedent cases from a large corpus based on similarity to newly introduced ones. The 4 steps of this methodology were implemented incrementally, and each step relied on the outcome of its predecessor. First, a comprehensive set of significant legal factors that govern DSC litigation cases verdicts were extracted through statistical modeling. Binary Probit and Logit Choice Models were developed (a) to identify the effect of each extracted factor on the prediction of the winning party; (b) to identify the best combination of factors with the highest significance on the prediction model; and (c) to perform a sensitivity analysis to prioritize the most significant legal factors. Among the main findings of this step are (1) in general, cases in which the Federal Government is a party of the dispute, judgments are in favor of the government (owner) over contractor; (2) "the presence of evident facts that the encountered conditions caused a change in the nature and cost of the contract" had the highest impact among variables causing a decrease in the prediction of judgment in favor of the owner, and causing an increase of 17. 77% in prediction on favor of the contractor; (3) "the presence of evident facts that the specifications included a warning against the presence of DSC from those conveyed in the contract documents" caused the highest increase in the prediction of judgment in favor of the owner amounting to an increase of 56. 56%; and (4) the development of Binary Probit and Logit Choice Models extracted a joint set of 13 statistically significant legal factors related to DSC disputes in the construction industry. This set provided the grounds for the other three steps of the current research methodology. Second, an automated litigation prediction model for DSC disputes in the construction industry through machine learning was developed based on the identified factors in the first step. The framework under this step incorporates analysis of different machine learning methodologies including support vector machines (SVM), Naïve Bayes (NB), and rule induction classifiers like Decision Trees (DT), Boosted Decision Trees (AD Tree), and PART. Ten machine learning models were developed using these machine learning methodologies to evaluate the best methodology for predicting litigation outcomes. The analysis of all developed models showed that the SVM Kernel Polynomial 3rd degree model has the best performance. This model attained an overall prediction accuracy of 98%. Third, an automated significant legal factors extraction model for DSC disputes in the construction industry through machine learning was developed. The framework under this step (1) developed 24 machine learning models in which 4 weighting schemes namely Term Frequency (tf), Logarithmic Term Frequency (ltf), Augmented Term Frequency (atf), and Term Frequency Inverse Document Frequency (tf. idf) were implemented for each type of classifier; and (2) developed two C++ algorithms for the preparation of the corpus and implementation of the required weighting mechanisms. The highest prediction rate of 84% was attained by NB classifier while implementing tf. idf weighting. The model was further validated by testing newly un-encountered cases, and a prediction precision f 81. 8% was attained. Finally, the fourth step of the methodology developed an automated machine learning model for the retrieval of supporting DSC precedent cases from large corpi. This step, therefore, (1) implemented Latent Semantic Analysis algorithm; and (2) developed 9 reduced feature spaces with feature sizes of 5, 10, 15, 20, 100, 200, 300, 400, and 500 for analysis and validation of the implemented algorithm. Among the findings of this step are (1) low dimension reduced feature spaces are more representative of documents closely related to the domain problem; (2) high dimension reduced feature spaces, are more representative to domain problems modeling dispersed and unrelated document collections; and (3) LSA reduced feature space of 10 features is the best reduced feature space to adopt for automating the extraction of similar DSC cases from a large corpus. (Abstract shortened by UMI. )

Item Type: Thesis (Doctoral)
Thesis advisor: Kandil, A
Uncontrolled Keywords: accuracy; highway; linear construction; scheduling
Date Deposited: 16 Apr 2025 19:28
Last Modified: 16 Apr 2025 19:28