Water received in rainfall is a crucial natural resource for agriculture, the hydrological cycle, and municipal purposes. The changing rainfall pattern is an essential aspect of assessing the impact of climate change on water resources planning and management. Climate change affected the entire world, specifically India’s fragile Himalayan mountain region, which has high significance due to being a climatic indicator. The water coming from Himalayan rivers is essential for 1.4 billion people living downstream. Earlier studies either modeled temperature or rainfall for the Himalayan area; however, the combined influence of both in a long-term analysis was not performed utilizing Deep Learning (DL). The present investigation attempted to analyze the time series and correlation of temperature (1796–2013) and rainfall changes (1901–2015) over the Himalayan states in India. The Climate Deep Long Short-Term Memory (CDLSTM) model was developed and optimized to forecast all Himalayan states’ temperature and rainfall values. Facebook’s Prophet (FB-Prophet) model was implemented to forecast and assess the performance of the developed CDLSTM model. The performance of both models was assessed based on various performance metrics and shown significantly higher accuracies and low error rates.
Climate change affected India, specifically the fragile Himalayan mountainous region, which has high significance due to climatic indicators, the origin of rivers, municipal, agricultural, hydroelectric purposes, minerals, and tourism [
In current times, the utility of machine learning (ML) approaches has a significant presence in almost every area, and the successful utilization of deep learning (DL) has opened new dimensions for efficient time series forecasting. ML/DL has been used extensively for climate analysis and forecasting [
LSTM networks are a type of Recurrent Neural Network (RNN) that uses special units (cells) and standard units to overcome the limitation of traditional RNN [
Input node
Input gate
Forget gate
Output gate
Cell state
Hidden gate
Output layer
Facebook’s Prophet is an open-source forecasting tool based on a decomposable additive model, similar to a generalized additive model (GAM). Prophet can fit nonlinear time series with seasonality. The Prophet forecast model can be expressed as
Prophet has two models: logistic growth model (LGM) and piece-wise linear (PWL) model. The selection of the model depends on the time series data. The LGM model can be used if the time series shows non-linearity, saturation, and no change after reaching the saturation point. If the time series exhibits linear tendency and a previous track of shrink and growth, then PWL is a better option. The LGM can be expressed as
The monthly rainfall dataset was obtained from more than 3000 rain-gauge stations spread over India, covering 115 years (Jan 1901–Dec 2015). The dataset was released by Indian Meteorological Department (IMD) (
The Berkeley Earth monthly average data from Jan 1796–Aug 2013 was procured from
For in-depth analysis of seasonal patterns of temperature and rainfall, the data was divided into four seasons based on India’s meteorological and international standards, i.e., Dec-Feb as winter, March to May as spring, June-Sep as monsoon, and Oct-Nov as post-monsoon or autumn. Temperature and rainfall data were used; therefore, the term monsoon was used instead of summer for rainfall analysis purposes. Data transformation is crucial before implementing any ML model. Three data transformations were applied in the current investigation. The first transformation was removing missing values and replacing them with average values from the respective records. The second step was transforming time-series data into input and output so that the output of a step could become the input for the next step to forecast the value of the current time step. As described earlier, the total common data in the time series covered 1352 monthly values. The first 980 months’ dataset for all Himalayan states was taken for the training, while testing took 240 months, and validation used the dataset of 120 months of the LSTM model; the remaining twelve months of data were kept separate from the training process for the unbiased external validation of the LSTM prediction. The third transformation was the scaling of time series data from –1 to 1. These three transformations were inverted after the prediction step to get the values at the original scale so that the uncertainty calculation could be adequately assessed.
Mann-Kendall tests [
Here,
An attempt was made to study the correlation analysis based on Moment Correlation Coefficient (MCC) among temperature and rainfall values for all the twelve Himalayan states from Jan 1901 to Aug 2013, as per the availability of a common temporal dataset.
The MCC summarizes the direction and degree of linear relations between actual and modeled datasets. The correlation coefficient can take values between –1 (perfectly negative correlation) through 0 (no correlation) to +1 (perfectly positive correlation). The MCC formula to compute the correlation coefficient is given in
Here, N represents the number of pairs of data. The terms X and Y are parameters.
Keras library with TensorFlow and Python version was used to develop the LSTM models in the current study. The libraries used in the current investigation were Plotly, NumPy, Seaborn, Pandas, Matplotlib, and scikit-learn. A four-step procedure was applied to develop the LSTM model.
The first step was to define the LSTM network to aid LSTM model development. Eight LSTM layers were used in the current investigation, in which four layers were dense, and three were the dropout layers, see
Several hyper-parameters such as optimizer, number of units, learning rate, momentum, and activation functions must be chosen
The Prophet forecast model looks straightforward; however, the computation can be complex due to the selection of parameters. The selection of the LGM or PWL model depends on the time series data. The LGM model was applied for rainfall data due to the time series; however, PWL model was applied for temperature forecasting as it exhibits linear tendency.
The uncertainty in the forecasting values can be obtained by forwarding the GAM model, which can be expressed as
The FB-Prophet model was imported. The Prophet model was fitted with training data, and forecasting was implemented based on 12 periods and month start (MS) as frequency.
The MCC, Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Nash-Sutcliffe coefficient (NSE) were utilized to evaluate the uncertainty of the LSTM model output. The Mean Absolute Deviation (MAD) was also considered to analyze the LSTM model’s accuracy between measured and predicted values.
RMSE is a method to calculate the error or accuracy in predicting models based on standard deviation
Here, P_i is the ith LSTM predicted value, and
The MAPE method was used to calculate the prediction accuracy of the LSTM forecast. The calculation was based on the difference between the original values and values forecasted by the LSTM and dividing the original value difference. It was then multiplied by the number of observations and 100 to obtain the percentage error (18) [
Here,
The NSE or efficiency coefficient test determines the magnitude between the residual time series and variance of actual data, and its value ranges from –∞ to 1, see
The monthly precipitation was found to have decreased over the period 1901-2015 in the Himalayan states. A 20 mm decrease was observed from 180 mm to 160 mm. The decrease in precipitation occurred after July 1995. The highest monthly precipitation (742 mm) was received in July 1948. Feb 2005 has the lowest average temperature for the UK, Him, J&K, A&M. The highest avg temperature was 28.28oC for A&M for June 2013, followed by NMMT (27.87oC) the same month. The highest rainfall (1347.2 mm) occurred for NMMT among all Himalayan states in Aug 1969, followed by A&M (995.2 mm) for July 1984. Higher temperatures were increased after the year 2000, and the occurrences of high rainfall were decreased after the 1990s. It was evident that climate is changing rapidly, especially in Himalayan states. Results with a confidence factor ≥ of 90% indicate a significant trend in the rainfall averaged over the Himalayan states, see
Himalayan States (R) | Monsoon | Autumn | Winter | Spring | Annual |
---|---|---|---|---|---|
J & K | 0.1 | 0.52 | 0.68 | –0.13 | 0.41 |
Him | –2.16 | –0.37 | 0.72 | –0.37 | –0.62 |
UK | 0.33 | 0.18 | –0.29 | 0.32 | 0.27 |
A&M | –2.14 | 0.31 | 0.16 | –0.37 | –1.02 |
WB&S | 1.78 | 0.03 | 0.09 | 0.17 | 0.69 |
Arun | –4.76 | –0.73 | 0.12 | –0.31 | –2.76 |
NMMT | –1.42 | 0.08 | 0.15 | 0.62 | –1.12 |
The mean annual temperature of Himalayan states was observed to have increased around 1.07°C between 1796–2013. Remarkably, it increased only by 0.98°C for the entire of India for the same period. The temperature of the Himalayan states is increasing faster than in the rest of the country. The average winter temperature rose by 1.27°C over the past century, while post-monsoon temperature increased by 1.03°C, see
The MCC was performed to understand the relationship between temperature for the twelve Himalayan states from Jan 1901 to Aug 2013 as per the common temporal dataset availability
Himalayan States (T) | Monsoon | Autumn | Winter | Spring | Annual |
---|---|---|---|---|---|
J & K | 1.02 | 1.01 | 0.94 | 0.64 | 0.95 |
Him | 1.65 | 1.07 | 0.33 | 1.21 | 1.17 |
UK | 1.12 | 1.17 | 1.87 | 1.27 | 1.34 |
Assam | 1.38 | 1.82 | 2.12 | 1.13 | 1.63 |
Meghalaya | 0.13 | 1.16 | 1.03 | 1.18 | 1.03 |
West Bengal | 0.09 | 0.38 | 0.94 | 0.63 | 0.54 |
Sikkim | 0.11 | 0.23 | 0.74 | 1.05 | 0.64 |
Arun | 1.67 | 1.98 | 2.17 | 1.11 | 1.38 |
Nagaland | 1.71 | 2.12 | 2.16 | 0.98 | 1.83 |
Manipur | 2.07 | 1.56 | 1.78 | 1.21 | 1.53 |
Mizoram | –0.19 | 0.12 | 0.54 | 1.19 | 0.51 |
Tripura | –0.12 | –0.23 | 0.67 | 0.59 | 0.34 |
There was a possibility while predicting the future values that the LSTM and FB-Prophet models’ output may be uncertain as the model’s output was fed back into it as input. Therefore, we forecasted the temperature from Sep 2012 to Aug 2013 and compared them with the actual values based on the coefficient of determination, RMSE, MSE, MAPE, MAD, and NSE, see
States | R2 | MSE | RMSE | MAPE | MAD | NSE |
---|---|---|---|---|---|---|
J & K | 0.85 | 17.64 | 4.20 | 4.83% | 1.19 | 0.88 |
Him | 0.92 | 8.94 | 2.99 | 6.74% | 1.03 | 0.90 |
UK | 0.98 | 0.44 | 0.66 | 1.25% | 0.32 | 0.99 |
Assam | 0.95 | 2.25 | 1.5 | 1.86% | 0.29 | 0.94 |
Meghalaya | 0.95 | 2.50 | 1.58 | 1.57% | 0.32 | 0.93 |
West Bengal | 0.95 | 2.40 | 1.55 | 1.23% | 0.35 | 0.94 |
Sikkim | 0.99 | 0.66 | 0.81 | 0.21% | 0.13 | 0.99 |
Arun | 0.93 | 6.00 | 2.45 | 1.16% | 0.42 | 0.95 |
Nagaland | 0.90 | 5.76 | 2.4 | 1.96% | 0.58 | 0.93 |
Manipur | 0.97 | 0.86 | 0.93 | 0.34% | 0.32 | 0.95 |
Mizoram | 0.96 | 2.50 | 1.58 | 0.29% | 0.38 | 0.97 |
Tripura | 0.92 | 5.20 | 2.28 | 1.08% | 0.63 | 0.95 |
States | R2 | MSE | RMSE | MAPE | MAD | NSE |
---|---|---|---|---|---|---|
J & K | 0.99 | 0.37 | 0.61 | –0.30% | 0.23 | 0.99 |
Him | 0.98 | 0.41 | 0.64 | –0.34% | 0.32 | 0.99 |
UK | 0.99 | 0.64 | 0.80 | –0.27% | 0.30 | 0.98 |
Assam | 0.97 | 0.66 | 0.81 | –0.47% | 0.48 | 0.98 |
Meghalaya | 0.95 | 0.59 | 0.77 | –0.57% | 0.59 | 0.96 |
West Bengal | 0.99 | 0.20 | 0.45 | –0.29% | 0.28 | 0.97 |
Sikkim | 0.99 | 0.32 | 0.57 | –0.31% | 0.27 | 0.98 |
Arun | 0.98 | 0.81 | 0.90 | –0.34% | 0.29 | 0.99 |
Nagaland | 0.94 | 0.86 | 0.93 | –0.51% | 0.38 | 0.96 |
Manipur | 0.91 | 0.72 | 0.85 | –0.82% | 0.73 | 0.93 |
Mizoram | 0.98 | 0.59 | 0.77 | –0.42% | 0.34 | 0.95 |
Tripura | 0.98 | 0.49 | 0.70 | –0.43% | 0.46 | 0.97 |
The developed CDLSTM model was used to forecast the rainfall values for all Himalayan states. The training and testing performance of the CDLSTM model is shown in
The forecasting performance of the current study was compared with other benchmark studies based on R2 and RMSE values, see
The FB-Prophet model implemented in the present investigation with the PWL algorithm showed remarkably efficient performance based on accuracy metrics, see
States | R2 | MSE | RMSE | MAPE | MAD | NSE |
---|---|---|---|---|---|---|
J & K | 0.71 | 871.43 | 29.52 | –5.90% | 0.72 | 0.76 |
Him | 0.82 | 149.57 | 12.23 | –2.15% | 1.76 | 0.86 |
UK | 0.83 | 221.12 | 14.87 | –2.25% | 1.27 | 0.88 |
A&M | 0.87 | 69.22 | 8.32 | –1.75% | 1.19 | 0.89 |
WB&S | 0.79 | 192.38 | 13.87 | –3.05% | 1.14 | 0.82 |
Arun | 0.74 | 262.76 | 16.21 | –4.49% | 1.97 | 0.76 |
NMMT | 0.76 | 172.66 | 13.14 | –3.12% | 1.26 | 0.79 |
Ref | State | Rainfall forecasting RMSE | CDLSTM based rainfall forecasting RMSE (current study) | FB-Prophet-based forecasting rainfall RMSE (current study) | Remarks |
---|---|---|---|---|---|
[ |
Him | 16.02 | 12.23 | 2.61 | Selection of months for categorizing the seasons, lack of data-preprocessing |
UK | 16.07 | 14.87 | 2.64 | ||
A&M | 16.14 | 8.32 | 1.8 | ||
WB&S | 16.66 | 13.87 | 1.81 | ||
Arun | 47.34 | 16.21 | 1.77 | ||
NMMT | 18.91 | 13.14 | 1.45 | ||
J&K | 5.15 | 29.52 | 3.57 | ||
[ |
ANN | 112.72 | 15.45 | 2.23 | For Ca Mau, stations in Vietnam, high RMSE values |
SANN | 97.11 | 15.45 | 2.23 | ||
LSTM | 58.8 | 15.45 | 2.23 | ||
[ |
ARIMA | 19.23 | 21.67 | 0.96 | Only for one state, good accuracy |
[ |
SVM | 6.67 | 15.45 | 2.23 | Very small dataset for only one station |
ANN | 3.1 | 15.45 | 2.23 | ||
RNN | 1.41 | 15.45 | 2.23 | ||
[ |
RNN | 304 | 15.45 | 2.23 | Model for entire India, High RMSE value |
LSTM | 270 | 15.45 | 2.23 | ||
[ |
UK | 0.47 | 0.9 | 0.98 | Weak R2 of rainfall with IMD data |
[ |
UK | 0.34 | 0.76 | 0.97 | Weak R2 of rainfall with TRMM data |
[ |
UK | 11.46 | 14.95 | 3.67 | Good RMSE of the historical temperature trend |
[ |
MLP | 0.5 | 0.93 | 0.97 | Weak R2 values for MLP and CNN for different models |
CNN | 0.58 | 0.93 | 0.97 | ||
GRU | 0.6 | 0.93 | 0.97 | ||
BLSTM | 0.75 | 0.93 | 0.97 | ||
LSTM | 0.77 | 0.93 | 0.97 | ||
Prop | 0.87 | 0.93 | 0.97 |
After optimizing the CDLSTM model, it took 45 s 21 ms/step for 20 epochs, i.e., a total of 907 seconds or 15 min 12 sec to complete the training. Optimization saves computation cost by selecting the best number of parameters, including the number of epochs. The optimized model took only 40% computational time compared with 50 epochs in 40 minutes. The imported FB-Prophet model took three minutes to perform the results, only 20% of the computational processing time.
The present investigation provides an understanding of the long-term historical and forecasted data of temperature and rainfall for India’s Himalayan states. A DL-based LSTM model was developed based on rigorous hyper-parameters tuning to forecast the temperature and rainfall. The correlation coefficient, MSE, RMSE MAPE, NSE, and MAD were obtained to evaluate the CDLSTM model performance. All the twelve Himalayan states showed increasing temperatures after 2000 and a decrease in rainfall after 1990. Arun and NMMT showed decreasing trends for rainfall; however, rainfall over J&K, UK, WB&S showed an increasing trend. The Himalayan state with the highest average rainfall was Arun, while the lowest average rainfall was for J&K. Mean annual temperature of the Himalayan states increased around 1.07°C between the last two centuries; interestingly, it has increased 0.98°C for entire India for the same period. The Himalayan states are experiencing more severe impacts of global warming. The present investigation found a strong correlation (0.98) between the average temperature trend for all the Himalayan states. The correlation coefficient between temperature and rainfall was significantly strong for Northeastern Himalayan states A&M (0.80), WB&S (0.78), NMMT (0.76), and Arun (0.62); however, it was weak for Northwestern Himalayan states UK (0.5), Him (0.39) and J&K (0.18).
The present investigation developed the CDLSTM model containing eight LSTM layers, where four layers were dense, and three were the dropout layers. The CDLSTM model was optimized based on rigorous parameters tuning. The developed CDLSTM model showed promising performance based on various metrics such as R2, MSE, RMSE, MAPE, MAD, and NSE. The developed CDLSTM model was likely to estimate the possible future values of temperature and rainfall accurately, given its reliability. The FB-Prophet model implemented in the present investigation with the PWL algorithm showed remarkably efficient performance based on accuracy metrics. As per available literature, the current investigation’s performance achieved by the FB-Prophet model for temperature and rainfall forecasting is the highest, based on accuracy metrics. The developed CDLSTM model has lower accuracy than the FB-Prophet model; however, the CDLSTM model showed better performance than other models applied in previous studies. Both CDLSTM and FB-Prophet model’s performance showed good forecasting values for all months, including Jan 2013, where the temperature was low due to the peak winter season. The future scope of the present investigation is to add more data on snow retreat, glacier melt, agricultural yield, and demographics to assess the complete cycle of climate change for the Himalayan region. Another future scope of the present investigation is to implements and assimilate the latest state of the art models for climate modeling and forecasting [
The significant limitations of the present study include (1) Although the performance of the developed CDLSTM model was significantly higher than previous studies, the imported FB-Prophet model with PWL algorithm performed better than the developed CDLSTM model. (2) The computation of the tuned CDLSTM model took 15 minutes for 20 epochs, so an improvement in computational efficiency is required. (3) The reasons to choose LSTM in the present investigation are its capability to deal with the vanishing gradient problem and better control, flexibility, and performance than traditional RNN. (4) The LSTM model has limitations such as the requirement of high memory bandwidth due to linear layers; also, it is more prone to overfitting and is too complex to apply dropout, (5) The effect of Gulfstream weakening on climate change on agricultural productivity will be a future scope as parts of the US and Europe are influenced by the Gulf Stream.
The author is thankful to IMD and the University of Berkeley for providing climate datasets.