MULTIVARIATE MODELS FOR PREDICTING RAINFALL EROSIVITY FROM ANNUAL RAINFALL AND GEOGRAPHICAL COORDINATES IN A REGION WITH A NON- UNIFORM PLUVIAL REGIME

Soil erosion by water is a major land degradation problem because it threatens the farmer’s livelihood and ecosystem's integrity. Rainfall erosivity is one of the major controlling factors inducing this process. One obstacle of estimating the R-factor is the lack of detailed rainfall intensity data worldwide. To overcome the problem of data scarceness for individual analysis of storm events for developing the country with a non-uniform pluvial regime like the upper part of Iraq, multivariate models were derived for estimating annual rainfall erosivity. They were based on annual rainfall data and geographical coordinates of a group of meteorological stations distributed over the study area. A host of statistical indices were selected to evaluate adequately the model's performance. Further, the models were cross-validated using k-fold procedure and unseen data. Subsequently, four linear models were proposed for estimating the annual erosivity for the study area. Good correspondence was found between the measured and predicted values. Among the proposed models, the model with the combination of annual rainfall, latitude and longitude outperformed the remaining proposed ones. After calculating the annual, the ArcMap software ver. 10.2 was applied to map the spatial variability of the Rfactor over the study region.


INTRODUCTION
Natural resources degradation is a challenging problem worldwide and particularly in Iraq (15). Soil erosion by water can be considered as a major global land degradation problem (39) because it has a profound effect on many issues like the sustainability of natural resources, water quality, reservoir sedimentation and global food production (31). Understanding the driving forces on controlling erosion is a prerequisite to addressing this problem strategically (5). Rainfall erosivity is one of the major controlling factors inducing water erosion, which defined as the product of the total storm kinetic energy and maximum 30-minute rainfall intensity (28). Renard et al. (27) have shown that among the USLE factors, rainfall erosivity is the one most exactly computed from input data: rainfall amounts and intensities. As many parts of the world still do not have detailed rainfall intensity data available, many types of study have been carried out to estimate the R-factor based on available rainfall data (10). The obstacle of estimating the R-factors for areas without sufficient data has existed since the introduction of USLE (26). Using multivariate models in data-poor regions of the world have become a new challenge for erosion models users. This implies that multivariate models enable users to predict values of climatic variables for a specific place with reasonable accuracy (22). A host of equations have been published for estimating the R-factor from rainfall data and other variables. One caveat stems from the fact that data of these equations had a large percentage of other countries records. The resulting accuracy of this factor might be better for their locations (24). Angulo-Martinez et al. (2) fitted a multiple regression model to predict the R-factor based on geographical coordination and solar radiation. Meusburger et al. (23) evaluated for predicting the R-factor for Switzerland and noticed that latitude and longitude were not significant and consequently, they were excluded from the model. The majority of the study area is a mountain land where the country of Iraq, Turkey, and Iran join. High, rugged ridges of the great Taurus-Zagros mountain arc, 3,000-4,000 m amsl, extend eastward from the Mediterranean and thence southeastward to the Arabic Gulf, separating the Mesopotamian lowland from the Anatolian-Iranian Plateaus. Precipitation in this region is controlled principally by the general altitude of the land area (38). However, despite the abundance of comprehensive studies conducted across different regions of the world, limited research has been conducted to evaluate temporal and spatial variations in rainfall erosivity factor in Iraqi Kurdistan Region. Accordingly, this study was conducted with the main objective of deriving a multivariate model to estimate rainfall erosivity factor for the upper part of Iraq from annual rainfall data and some other predictors and creating a rainfall erosivity map based on the calculation of annual R.

Study area
The study area is located in the upper part of Iraq spanning from 34 o 28  10  N to 37 o 22  40  N and from 42 o 22  15  E to 46 o 20  35  E and has a total area of about 47,000 Km 2 . It is draining its water into the Tigris River and its tributaries (Khabour, the Greater Zab, the Less Zab, and Sirwan). The elevation ranges from less than 250 m in the wide plains to more than 3600 m at the Iraqi-Turkish and Iraqi-Iranian borders. It abuts Turkey in the north, Iran in the northeast, and Diyala and Tikrit provinces in the south and Syria in the northwest. Figure  1 shows that the majority of employed stations are within Duhok, Erbil and Sulaymani Provinces.

Fig. 1. Majority of employed stations in the study area
Topography plays a major role in creating disparate microclimates ranging from arid to semiarid. The spatial distribution of rainfall is highly affected by orography. The arid zone receives rainfall less than 250 mm while the semi-wet zone receives 900-1000 mm and over. The area above the timberline is covered with snow in winter for several months. The rainfall has a unimodal distribution. In general, the annual distribution shows a dry season lasting from June to September and a wet season from October to April. There is a surplus of water from mid of November to about mid of April. On the other hand, there is a water deficit over the remaining period of the year. Maximum occurrences of rainfall take place from November to April, which accounts for more than 90% of the total rainfall in the region. The majority of the study sites fall in Semi-arid to Semi-wet classes according to Emberger scheme (12). On the other hand, most of the study sites fall in Csa (warm temperate rainy climate) to Csb (rainy winters warm temperate summer) according to the scheme proposed by Köppen (18). The UNESCO aridity index (AI), which is based on the ratio of annual precipitation and potential evapotranspiration rates ranges from 0.1 at the lower part to 0.83 at borders. However, most of the sites can be classified as semiarid (0.2<AI<0.5) (30). Elsahookie et al. (11) reported similar results.

Database
The first phase of this analysis was dedicated to rainfall data collection. The data set consists of rainfall records during 2000 -2018 at 25 rain gauges located in the study region and its peripheral areas (Fig. 1). It is worth noting that the number of stations in service has changed over the years. About 50% of the data set obtained from pluviographic stations. At these stations, the measuring device was recording rain gauge of the tipping-bucket type that is relatively well distributed over the study region. The available data with 15-minute interval was gathered from electronic rain gauges. The data were checked and filtered to remove spurious data before its release.

Data processing
The unit rainfall energy for a given interval was based on the equation developed by Wischmeier and Smith (37): Er = 0.119 + 0.0873 log (i) [1] When: i<76 mmh -1 , er=the unit rainfall energy (MJha -1 mm -1 ) and i= the rainfall intensity during the time increment (mmh -1 ). Subsequently, the unit rainfall erosivity was used to calculate the rainfall erosivity (EI30) for an event j (Rj) according to Panagos et al. Where: Vr=the rainfall depth (mm) during the rth time interval of the rainfall event that has been subdivided into k segments. I30=the maximum-30 min. rainfall intensity (mmh -1 ). The annual rainfall erosivity over a given year (Ry) was obtained by summing the rainfall erosivity over the year in question: The average annual rainfall was obtained according to the equation proposed by Wischmeier and Smith (36): Where: n=the number of years recorded, mj=the number of erosive events during a given year j, and k=an index of a single event with its corresponding erosivity (EI30). For the sake of comparison, the above procedure was repeated for calculating the annual rainfall erosivity of the different stations after replacing the equation [1] by the equation proposed by Brown and Foster (7): Where: Ir=the rainfall intensity during the time increment (mmh -1 ), and er=the unit rainfall energy (MJha -1 h -1 mm -1 ). The pluvial regime of study area was assessed after calculating the precipitation concentration index (PCI) for each station according to the following equation (22): Beside the seasonality index (SI) was computed using the following formula (32); (17): [7] Derivation of multivariate models Multivariate linear models were developed for predicting annual rainfall erosivity that are applicable for the region under study using IBM SPSS-22 as a function of annual precipitation, latitude, longitude, and altitude. Only the variables were selected that have contributed to improving the prediction accuracy of the proposed models for all possible cases using the stepwise multiregression method.

Assessment of the models
Additionally, a host of statistical indices was selected to evaluate adequately the model's performance. The indicators encompassed: mean biased error (MBE), mean absolute percentage error (MAPE), root mean square error (RMSE), coefficient of variation (CV), coefficient of agreement (d), Akaike information criteria (AIC), coefficient of residual mass (CRM), and symmetric mean percentage of error (SMAPE) (3); (19); (1).

Cross validation of the proposed model
The dataset was subdivided into five folds and the proposed models was fitted to 4 folds and validated using the remaining fold. This process was repeated until every fold served as a test set. In the end, the average of some selected performance indicators was taken to test the effectiveness of the proposed models. Seven stations have been also used as part of the cross-validation (9).

Mapping the spatial variability of the R-Factor
After calculating the R-values at annual Rfactor, the ArcMap software ver. 10.2 was applied to map the spatial variability of the Rfactor over the region using Kriging interpolation method.

RESULTS AND DISCUSSION General aspects of annual rainfall and annual erosivity
The average annual rainfall in the study area varied from 120.4 to1048.0 mm, with a high coefficient of variability (35.7%), indicating high spatial variability of rainfall in the study region. This result is concordant with the finding of Jawad et al. (16), who observed a significant fluctuation of precipitation from its average value. A total of 5634 rainfall events were analyzed. The 25 calibration stations encompassed 5088 erosive events and the 7 validation stations had 546 erosive events. The precipitation concentration index varies from as low as 10.43 at Pirmam to as high as 21.54 at Darbandikhan. With one exception, the stations showed either seasonal (PCI=10-15) or highly seasonal (PCI= 15-20) distribution ( Table 1). Most of the stations within the wide intermountain valleys are characterized by having a higher degree of seasonality compared with stations located in the mountainous area. Further, based on the classification scheme proposed by Walsh and Lawler (32) with a few exceptions, the study stations fell within seasonal (0.60<SI<0.79) and markedly seasonal (0.80<SI<0.99) ( Table  1). In general, the daily rainfall was less than 50 mm. Overall, the rainfall pattern is of intermediate type and the rainfall intensity hardly exceeded 10 mmhr -1 . This result supports that earlier findings of Hussein and Othman (13) who revealed that except for short duration spring showers rainfall intensity in the foothills of Iraq seldom exceeded 10 mmhr -1 . The Durbin-Watson test was used for detecting first-order autocorrelation in the annual rainfall of some selected stations (not shown here) and the results elucidated that the value of this statistic was close to the lower critical value (dL) and fell in the inclusive zone. As the data set of different stations were mixed together, the overall data became free of autocorrelation. The measured rainfall erosivity in the context of USLE and RUSLE ranged from 16.6 to 112.3 and from 12.7 to 87.0 metric ton.m ha -1 yr -1 cm h -1 (hereinafter referred to metric unit) respectively. To convert this values from metric unit to MJmmha -1 h -1 yr -1 , multiple the former by 9.81. With no exception, the rainfall erosivity of all the station falls within the low erosivity class based on the scheme proposed by Carvalho (8) (R<2452 MJmmha -1 h -1 yr -1 ).
Overall, the measured values were lower than those obtained by Hussein (14) when he applied the model proposed by Arnoldus (4) to calculate the annual R based on average monthly and annual rainfall for 49 stations across Iraq.

Table 1. Classification of precipitation distribution for study stations based on precipitation concentration index (PCI) and seasonality index (SI).
Within Asia, the Middle East has the lowest erosivity values, with a mean annual R-factor less than 220 MJmmha −1 h −1 yr −1 in Jordan, Saudi Arabia, Kuwait, Syria, Iran and Iraq (25).

Sensitivity analysis
Before performing calibration, a simple sensitivity analysis was conducted without considering interaction into account to identify non-influential variables that can be omitted from the calibration. Table 2 presents the correlation matrix using all possible cases procedure. The regressors encompassed annual rainfall, altitude, longitude, and latitude. As can be seen in Table 2, annual rainfall erosivity exhibited the strongest correlation with rainfall erosivity in the context of USLE and RUSLE followed by altitude, latitude, and longitude. It also observed that the latitude offered the least correlation coefficient with the response variables. Further, the results indicated that longitude was high significantly (P<=0.01) and negatively correlated with latitude. In spite of a highly significant correlation between longitude and latitude, the former was not disregarded because the multicollinearity analysis revealed that the variance inflation factor was less than 10. As the rainfall erosivity in the context of RULSE exhibited lower correlation coefficients with the study regressors compared to the rainfall erosivity from RUSLE, so no RUSLE results are presented hereafter.

Model calibration
The results showed that among the two parameter equations Model 1 performed better than models 2, 3 and 4. The improved relation was attributed to the characteristics of the two indices. Overall, it can be concluded the P is a significant discriminator in the prediction of rainfall erosivity. It was also observed that among the threeparameter models, model 5 exhibited results that are more acceptable and Model 7 offered the second-highest performance. Conversely, model 8 provided the least performance, followed by model 6, 10 and 9. This group used a combination of two variables out of four input variables. Additionally, it was noticed that among the four-parameter models model 11 with a combination of annual rainfall, latitude and longitude generated more statistically significant results followed by model 13, 12 and 14. The findings also revealed that the insertion of all the study variables did not give rise to a considerable improvement in the accuracy of prediction. Use of stepwise, backward, and forward techniques through the IBM SPSS-22 allowed simplification of the regression models obtained with a slight loss of accuracy. This analysis considered only annual rainfall and altitude.

Test of model performance
It is worthy to note that some commonly used correlation measures such as Pearson's correlation coefficient and its square, (R 2 ) and test of statistical significance are often misleading when used to compare the predicted and observed values. Various measures seem to contain appropriate and insightful information (35). Accordingly, the performance of the relationships were assessed by computing a host of indicators (Table 3). As can be observed in Table 3

Table 3. Regression coefficients for the relation between Rainfall erosivity (USLE-R) and each of annual rainfall and geographical coordinates along with several efficiency criteria.
By contrast, the indicated model gave the largest value for of index of agreement (d=0.949) and with one exception the largest value for R 2 (0.820). The index of agreement suggests that model 11 calibrated well enough to simulate the annual rainfall erosivity. Hence, Model 11 and 5 would stand as the most appropriate models for the study area.
Most of the candidates models including model 11, 5 neither overestimated nor underestimated the annual rainfall erosivity, CRM values were zero or close to zero.Because, time, effort money expenses are not involved in obtaining geographical coordinates, it is recommended to use model 11 as a local model for estimating rainfall erosivity for the region under study. Based on the classification scheme proposed by Wilding (34) the coefficient of variability of the predicted and observed rainfall erosivity for all the candidate models are moderate (15%<CV<30%). Model 15 exhibited the lowest value for CV (20.413%) followed by model 11, 5 and 1. To further investigate the degree of agreement between the observed and predicted values, the predicted values from each of model 1, 5, 11 and 15 were plotted versus the observed values of rainfall erosivity in relation to line 1:1. As can be seen from Fig.  2 that the majority of the plotted points falls on or close to the line 1:1. It can also be noticed from Fig.2 that the slope of the regression line is close to unity. Overall, there is limited data scattering over the lower range of rainfall erosivity. Conversely, there is a wider scatter at the middle and the upper rainfall erosivity ranges. The plot of the bias from Models 1, 5, 11 and 15 versus the estimated values of rainfall erosivity revealed that the residuals had no a systematic distribution (Fig.3). This implies that these models are appropriate for estimating rainfall erosivity. Additionally, Kolmogorov-Smirnov proved that the residuals yielded by the indicated models are normally distributed (Table 4). It is commendable to mention that the Shapiro-Wilk test was replaced by Kolmogorov-Smirnov test because the former is suited for relatively small sized data. Table 4. Tests for examining normality for the residuals and for detecting of multicollinearity among the inputs of the proposed models for predicting rainfall erosivity Attempts were also made to improve on the fit by using non-linear least squares technique, but no further improvement in modeling prediction was obtained (Table  5). Accordingly, these equations were not considered in testing the model's validations.

Model validation
The entire data was split into five folds. Three to ten folds are recommended for most applications (33). Then each of model 1,5,11 and 15 fitted to four folds and each model was validated using the remaining fold. In the current study, R 2 -values were noted down in Table 6 as one of the performance indicators. Thereafter, this process was repeated until every K-fold serve as the test set. Finally, the average value of the indicator was taken. The results of Table 6 show that each model retains its original accuracy upon excluding each of the five folds.    The proposed models were also tested for their effectiveness on some unseen data (Table 7).
To achieve this goal, some data (a sample) kept aside. This implies that a portion of the data was not used to train the models, but used for the validation process. The size of the test set was about 8% of the total sample, but the size is typically about 20% of the total sample. The reason behind this measure was to keep a sample of large size for the training. However, close inspection of Table 7 disclosed that the proposed models during the current study are powerful enough to capture the salient pattern of both testing and training sets. In other words, the models caused neither underfitting nor underoverfitting. The mean absolute percentage error (MAPE) was below 20%. According to the scheme proposed by Lewis (21), the proposed models are categorized under potentially good class (MAPE<20%). The mean absolute percentage error (MAPE) is one of the most widely used measures of forecast accuracy, due to its advantages of scale-independency and interpretability (29). Table 7. Validation of some proposed models for predicting annual R by using a set of test data out of the training set

Regression Analysis Using advanced (Non-Classical) Techniques
Additional trials were made to allow retention of all the study explanatory variables, even if they are slightly or highly collinear, through employing both principal component regression and ridge regression (Tables 8 and  9). Substantial reduction in the values of the parameters of some selected models were noticed (Model 11 and 15), but the use of this advanced tool did not enhance the predictive ability of the regression models. The mean absolute error of prediction under ridge regression was slightly higher that of the models derived by employing ordinary least squares method. The results displayed in Table 9 revealed that by following principal component analysis, the number of explanatory variables for model 15 can be reduced from 4 to 2. Under this situation, no input variable was excluded. The two principle components explained only 49% of variation in the annual rainfall erosivity.

Spatial distribution of annual R
The spatial variability of the annual rainfall erosivity was evaluated by computing its coefficient of variation, which was higher than that of the annual rainfall. The CV factor in the context of USLE was 51%. To further investigate the spatial distribution of annual rainfall erosivity the map of Fig. 4 was developed based on the observed rainfall erosivity in the context of USLE. Close inspection of Fig. 4 shows a decreasing pattern for rainfall erosivity from west to east and from south to north within Sulaymani province, which is related to the decrease in the rainfall amount in the same directions. Fig.  4. also indicates a clear north-south-oriented trend with Erbil province. This trend is also following the rainfall pattern. Unlike the distribution of rainfall erosivity within Sulaymani and Erbil provinces, the distribution of this parameter within Duhok did not exhibit an obvious trend.

Fig. 4. Spatial distribution of annual R in study area Conclusions
Based on the obtained results from the current study, it can be concluded that the annual rainfall is a significant discriminator in the prediction of rainfall erosivity. Out of all possible cases, four linear multivariate models offered the highest performance for estimating annual R. To the view of the authors model 11 (the model with a combination of annual rainfall, latitude, and longitude) outperformed the remaining proposed models because nearly all the assumptions of linear multiple modeling are considered in this model.