request Request a Protocol
ask Ask a question
Favorite

As already mentioned, this study aimed to examine the applicability of the previously developed UF model, which had been validated against data from the PSEs, to the full-scale Qinghe WRP. However, since the existing monitoring program was not specifically designed for modeling the UF system, the available historical data as described previously brings about two major challenges to this model test study. Firstly, the water quality indicator used to represent organics in the UF system of the WRP was TOC, which was different from UV254 used in the PSEs. So the model parameters needed to be recalibrated when the model structure remained unchanged, assuming that TOC and UV254 were equivalent indicators of organics and behaved similarly during the UF process. The second and more critical challenge was that the monitoring data available was sparse and incomplete for model development and validation. On the one hand, water samples from the UF system of the WRP were collected on a monthly basis from separate operation cycles with different feed TOC concentrations, which was too sparse as compared with the PSEs in which water samples were collected consecutively in a whole operation cycle under well-controlled conditions. On the other hand, the sampling times of the water samples in the WRP were not recorded, which had defeated the authors’ effort to derive the corresponding sampling moments in their operation cycles based on the data, e.g. transmembrane pressure, from the online monitoring system.

Data sparsity is not uncommon in environmental modeling studies. This could result from lack of appropriate monitoring instruments or capacity as well as inadequate temporal/spatial resolution, poor representativeness or low accuracy of the monitoring instruments [1922]. There are also cases where model parameters or input are difficult to monitor, e.g. diffuse pollution loads [23], or even unknown, e.g. sources of groundwater pollution [2426]. Different methods have been developed and applied to deal with data sparsity issues in modeling studies. A straightforward method is to use other theoretical or empirical methods to estimate the missing model parameters or input. For example, Nyeko [22] estimated the missing records of solar radiation for hydrological modeling with an empirical equation. Inverse modeling is another widely applied method to determine missing model input. For example, Hörmann et al. [27] estimated the fraction of wetland in a catchment in inverse modeling runs of a hydrological model, while Herrnegger et al. [21] used inverse rainfall-runoff modeling to obtain additional information on mean areal rainfall of the studied area. Inverse modeling, however, generally requires the model be calibrated. If model parameters and input are both unknown or partially known yet with uncertainty, simultaneous identification of parameters and input is normally applied. For example, due to the difficulty in detecting groundwater pollution sources, many researchers such as [2426] developed methods to identify source characteristics (e.g. location, magnitude, duration) and at the same time estimate unknown aquifer parameters. Similarly, Jun et al. [23] used an optimal algorithm in river water quality modeling for simultaneous estimation of kinetic constants and diffuse loads of total nitrogen and phosphorus. In order to reduce uncertainty resulting from precipitation observations and parameter estimation in hydrological modeling, Pluntke et al. [20] used an ensemble approach by establishing a series of models, among which a model allowed the precipitation input to be calibrated together with other model parameters.

Since UF systems operate in cycles in the Qinghe WRP, sampling times are obviously critical model input. In light of the aforementioned existing approaches to dealing with data sparsity, simultaneous identification of model parameters and sampling times could be the only feasible choice in this case. Before proceeding to detailed algorithm design, another assumption has to be made that the UF process in the WRP operated in a cyclic yet steady state, which could be justified given that no abnormality was reported during the monitoring period. With this assumption, two strategies for model identification were proposed as shown in Fig 1 and details are given below. A common feature of these two strategies is that they are both based on the RSA approach to consider the gross uncertainty associated with model parameters and sampling times. Furthermore, RSA was both performed with the Hornberger-Spear-Young (HSY) algorithm, the procedure of which has been detailed in [18] and [28], based on a Latin Hypercube Sampling (LHS) approach.

Similar to the current practice of simultaneous identification of model parameters and input, e.g. [20] and [23], Strategy 1 regarded the missing model input, i.e. the sampling times of the observed data (ti, i = 1,2,…,8), as “parameters” that needed to be identified together with the four model parameters, i.e. a, b, c and f. Following the procedure of the HSY algorithm, as shown in Fig 1(a), the initial ranges of the model parameters and sampling times were first determined, as shown in Table 1, according to the reported values in literature (see [17]) and the operating conditions in the Qinghe WRP respectively. For simplicity, uniform distributions were assumed for all model parameters and sampling times. Then system behavior was defined to set a criterion for a model simulation to be accepted as a behavior-giving one. Herein a behavior-giving simulation was defined as one with at least 60% of the data points that had absolute relative errors less than 20% between the observed and simulated cp. In the third step, both the model parameters and the sampling times were randomly and independently sampled with a LHS approach according to their designated ranges and probability distributions, and then these values were substituted into Eq (1) for model simulation. The simulation results, together with the parameters and sampling times, were later classified into a behavior-giving set and a non-behavior-giving set according to the definition of system behavior. The simulation continued, i.e. Step 3 and Step 4 in Fig 1(a), until enough behavior-giving simulations were obtained, and finally posterior probability distributions (PPDs) of the parameters and sampling times could be derived for further analysis.

Strategy 1 calibrated the four model parameters and eight sampling times simultaneously, which actually increased the dimension of the identification problem, and therefore great uncertainty were expected to remain with these parameters and input. Given that the four model parameters had global impact on all the eight observations while the sample times only had local impact on each specific observation, it is possible to reduce the dimension of the identification problem temporarily by separating the identification of model parameters from that of sampling times at different stages. Based on this idea, Strategy 2 was designed as shown in Fig 1(b). In the first step, the same initial ranges and probability distributions as those in Strategy 1 were assigned to the four model parameters and eight sampling times. Then the definition of system behavior in Strategy 1 was also adopted at the stage of model parameter identification. At the stage of model input identification, however, a little stricter definition of system behavior was applied and a behavior-giving simulation was one with at least 60% of the data points that had absolute relative errors less than 15% between the observed and simulated cp. The first round of model parameter identification followed and started with randomly and independently sampling the eight sampling times, only once, according to their designated ranges and probability distributions shown in Table 1. These values were later fixed in this round of simulation as if they were known model input, while the four model parameters were calibrated following the same HSY algorithm as that of Strategy 1. At the end of this round of simulation, the PPDs of the four model parameters could be derived. This step was followed by the first round of model input identification where the four model parameters were assumed to have known probability distributions, i.e. their first round PPDs, while the eight sampling times were calibrated also following same HSY algorithm. Since the eight observations were independently obtained from different operation cycles of the UF system, the calibration of eight sampling times could be done individually. That is to say there is only one unknown sampling time to be calibrated for each observation. Similarly, this round of simulation would end up with the PPDs of eight sampling times, which were later used for the second round of model parameter identification as shown in Fig 1(b). When the second round PPDs of both model parameters and sampling times were obtained, they were compared with their first round PPDs through the Kolmogorov-Smirnov (K-S) test, at a 0.05 significance level, to examine whether the PPD for each parameter and sampling time was significantly different between these two rounds of simulation. If any parameter or sampling time showed a significant difference, another round of model parameter identification and input identification would be conducted, i.e. Step 5 and Step 6 in Fig 1(b), and the latest derived PPDs could then be compared with the previous ones to examine convergence for all these model parameters and input. The iterations terminated when no statistically significant differences were detected in the PPDs for all the model parameters and sampling times between two recent iterations.

The two strategies were then compared with respect to their model performance, and the sensitivity, identifiability, and uncertainty of the model parameters as well as the missing model input, i.e. sampling times of the observed data. Model performance was evaluated by the absolute relative errors between the observed and simulated data. Sensitivity, identifiability, and uncertainty of model parameters could be used to judge a model’s reliability [18]. A model with a large proportion of sensitive parameters will have a balanced model structure, and the model will be more trustworthy when the sensitive parameters could be well identified with low uncertainty. In this study, regional sensitivity for each model parameter was characterized by the statistical difference in the two PPDs between the behavior-giving set and the non-behavior-giving set through the K-S test at a 0.05 significance level. The greater the difference between the two distributions, the more sensitive and identifiable the parameter is. Furthermore, the standard deviation of the behavior-giving set was used as an indicator of uncertainty for each model parameter. A smaller standard deviation of a parameter usually indicates better identifiability and lower uncertainty. Since the sampling times were also identified in this study, the sensitivity, identifiability, and uncertainty could be evaluated similarly.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A