![]() |
Hybrid machine learning models: A comprehensive, data-driven evaluation with diverse data partitioning strategies for net radiation estimation |
---|---|
รหัสดีโอไอ | |
Creator | 1. Kristian Lorenz Bajao 2. Kittisak Phetpan 3. Ponlawat Chophuk 4. Rattapong Suwalak |
Title | Hybrid machine learning models: A comprehensive, data-driven evaluation with diverse data partitioning strategies for net radiation estimation |
Publisher | Faculty of Engineering, Khon Kaen University |
Publication Year | 2568 |
Journal Title | Engineering and Applied Science Research |
Journal Vol. | 52 |
Journal No. | 3 |
Page no. | 240-250 |
Keyword | Artificial intelligence, Crop water requirement, Smart irrigation, Climate change, Net radiation, Data partitioning |
URL Website | https://ph01.tci-thaijo.org/index.php/easr/index |
Website title | Engineering and Applied Science Research |
ISSN | 2539-6161 |
Abstract | Surface net radiation (Rn) is crucial for climate modeling and agricultural management but is often not readily available, especially in regions like Thailand. Accurate prediction of Rn is essential for estimating evapotranspiration, which is vital for irrigation planning and agricultural productivity. This study develops a hybrid machine learning framework that incorporates K-Nearest Neighbors (KNN) for missing data imputation, Random Forest-Recursive Feature Elimination (RF-RFE) for feature selection, and machine learning models (Multi-layer Perceptron, K-Nearest Neighbors, and Random Forest) for prediction. The research evaluates various data partitioning methods, including hold-out split, K-fold cross-validation, and growing-window forward-validation (gwFV), alongside hyperparameter tuning using GridSearch to enhance model robustness and prevent overfitting. The primary objectives are to develop and evaluate the hybrid ML models for daily Rn estimation using basic meteorological inputs (temperature, relative humidity, and sunshine duration), assess the impact of different input combinations on prediction accuracy in Sawi, Chumphon, Thailand, and compare data partitioning techniques to determine the optimal model performance. Utilizing FAO56PM-calculated Rn as a reference, this study finds that the Random Forest model, with average temperature and sunshine duration (M2) as inputs evaluated under the gwFV method, achieves the highest stability and high accuracy (R? of 0.972, RMSE of 0.457 MJ m-2 day-1, and MAPE of 3.50%). The Random Forest demonstrates strong generalization capabilities, making it a reliable choice. Even models using only sunshine duration (M3) perform adequately, offering a solution when data availability is scarce. This study concludes that hybrid machine learning models, combined with careful data partitioning, significantly improve Rn estimation. These advancements provide valuable insights for climate modeling, agricultural management, and irrigation scheduling, particularly in data-scarce regions. |