TY - JOUR
T1 - On Lack of Robustness in Hydrological Model Development Due to Absence of Guidelines for Selecting Calibration and Evaluation Data
T2 - Demonstration for Data-Driven Models
AU - Zheng, Feifei
AU - Maier, Holger R.
AU - Wu, Wenyan
AU - Dandy, Graeme C.
AU - Gupta, Hoshin V.
AU - Zhang, Tuqiao
N1 - Funding Information:
Zheng acknowledges funding support from The National Natural Science Foundation of China (grant 51708491), and Gupta acknowledges partial support from the Australian Research Council through the Centre of Excellence for Climate System Science (grant CE110001028). We gratefully appreciate Keith Beven, Saman Razavi, and the other two anonymous reviewers for their constructive comments, which help us to improve the quality of this paper significantly. We also gratefully acknowledge data for the 432 U.S. catchments provided by Thibault Mathevet, which can be also accessed through ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/, with details of this data set given in http://www.nws.noaa.gov/ohd/mopex/mo_datasets.htm. Data for the Australian catchments are synthesized based on data given in Chiew et al. () and have been submitted as supporting information.
Funding Information:
Zheng acknowledges funding support from The National Natural Science Foundation of China (grant 51708491), and Gupta acknowledges partial support from the Australian Research Council through the Centre of Excellence for Climate System Science (grant CE110001028). We gratefully appreciate Keith Beven, Saman Razavi, and the other two anonymous reviewers for their constructive comments, which help us to improve the quality of this paper significantly. We also gratefully acknowledge data for the 432 U.S. catchments provided by Thibault Mathevet, which can be also accessed through ftp://hydrology. nws.noaa.gov/pub/gcip/mopex/US_ Data/, with details of this data set given in http://www.nws.noaa.gov/ ohd/mopex/mo_datasets.htm. Data for the Australian catchments are synthesized based on data given in Chiew et al. (2009) and have been submitted as supporting information.
Publisher Copyright:
© 2018. American Geophysical Union. All Rights Reserved.
PY - 2018/2
Y1 - 2018/2
N2 - Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.
AB - Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.
KW - artificial neural networks (ANNs)
KW - calibration and evaluation
KW - data allocation
KW - data splitting
KW - hydrological models
KW - model evaluation bias
UR - http://www.scopus.com/inward/record.url?scp=85044400485&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044400485&partnerID=8YFLogxK
U2 - 10.1002/2017WR021470
DO - 10.1002/2017WR021470
M3 - Article
AN - SCOPUS:85044400485
VL - 54
SP - 1013
EP - 1030
JO - Water Resources Research
JF - Water Resources Research
SN - 0043-1397
IS - 2
ER -