Abstract
In the design of reliable structures, the soil classification process is the first step, which involves costly and time-consuming work including laboratory tests. Machine learning (ML), which has wide use in many scientific fields, can be utilized for facilitating soil classification. This study aims to provide a concrete example of the use of ML for soil classification. The dataset of the study comprises 805 soil samples based on the soil drillings of the new Gayrettepe–Istanbul Airport metro line construction. The dataset has both missing data and class imbalance. In the data preprocessing stage, first, data imputation techniques were applied to deal with the missing data. Two different imputation techniques were tested, and finally, the data were imputed with the KNN imputer. Later, a balance was achieved with the synthetic minority oversampling technique (SMOTE). After the preprocessing, a series of ML algorithms were tested with 10-fold cross-validation. Unlike the studies conducted in previous research, new gradient-boosting methods such as XGBoost, LightGBM, and CatBoost were tested, high classification accuracy rates of up to +90% were observed, and a significant improvement in the accuracy of prediction (when compared with previous research) was achieved. © 2023 by the authors.