1 Introduction Rainfall is one of the most variable climatic characters and its variability varies with both spatially and temporally. India is a tropical country and its agricultural planning and water utilization mainly depends on monsoon rainfall. More than 75% of the India rainfall occurs during the monsoon season. The Agriculture of the state depend on the rainfall received and rainfall characteristics like magnitude, frequency and intensity vary from spatially and temporally both.The random nature of rainfall occurrence suggests need for its statistical analysis and logical interpretation. In particular, the monthly rainfall of a region is very helpful for farmers in deciding when and where to sow and reap for successful cultivation with proper utilization of available water and irrigation facilities. The Eastern agro climatic zone of Haryana has high rainfall (>400mm) whereas the Western agro climatic zone has lesser amount of rainfall (200-400mm) and maximum rainfall reaches to 800mm in Northern districts of Panchkula, Ambala, Yamunanagar, Kurukshetra etc. (Agro climatic Atlas of Haryana, Technical Bulletin No. 15, 2010). There are two main cropping patterns in Haryana i.e. Rabi and Kharif . Wheat is the main crop of Rabi season and second main crop season is Kharif coinciding with hot weather and south-west monsoon season. In Kharif season main crops are rice (Eastern agro climatic zone) and cotton (Western agro climatic zone). Due to frequent abnormalities in the magnitude and distribution of rainfall make the cropping more risky. Multivariate techniques are very useful tools to find hydro logically homogeneous regions and to classify regions based meteorological data such as rainfall. Gadgil & Iyengar (1980) applied principal component analysis to derive patterns of temporal variation of the rainfall at fifty-three stations in peninsular India and eight clusters was found. Further, Kulkarni & Rao (2000) used Common Principal Components (CPC) approach for classification of the 20 districts of Andhra Pradesh based on monthly rainfall data. Similarly, Kulkarni & Reddy (1994) used average linkage method to group the districts of Andhra Pradesh and found that districts were classified into 5 to 7 clusters which depend on the season. Further, Munoz-D?az & Rodrigo (2004) used Ward’s clustering methods and principal component analysis technique to find out climatically homogeneous zones, based on seasonal rainfall for 32 Spanish localities and found that cluster analysis technique to more suitable than principal component analysis. Similarly, cluster analysis technique was used by various researchers in various regions of India for indentifying homogeneous rainfall regimes, among these some most popular works are Venkatesh & Jose, 2007 (Western Ghats region of Karnataka); Yashwant & Sananse, 2015 (Marathwada region in Maharashtra) and Shirin & Thomas, 2016 (Kerala). Oliveira-Júnior et al. (2017) identified three homogeneous rainfall regions in Tocantins State, Brazil using Ward's algorithm of cluster analysis. Similarly, Terassi & Galvani (2017) also identified the homogeneous rainfall regions in the eastern watersheds of the State of Paraná, Brazil. Recently, Siraj-Ud-Doulah & Islam (2019) analyzed monthly rainfall data from 34 climate stations of Bangladesh using five agglomerative hierarchical clustering measures and found that Ward method based on Euclidean distance, K-means, Fuzzy were the most suitable methods in this particular case. They found seven different climate zones in Bangladesh. Similarly, Gonçalves et al. (2018) used annual mean precipitation and found six homogeneous regions through cluster analysis using Ward's agglomeration method, applied to a historical series of 31 years (1960-1990) at 413 satellite monitoring points in the state of Pará, in the Amazon where the selected years occurred during an El Niño or a La Niña event.The aim of this study was to identify homogeneous regions (rain-gauge stations) in Haryana using cluster analysis and common principal component analysis techniques. For the study monthly rainfall data of 42 years (1970-2011), covering 27 rain gauge stations of Haryana was used for the identification of homogeneous rainfall stations in Haryana. 2 Material and Methods 2.1. Location of study and Data The state Haryana is located in north western India and occupies 1.3 per cent geographical area of the country. The latitude and longitude coverage of the state extends between 270
39' to 300
55'N and 740
27' to 770
36'E respectively. The data for this study include monthly rainfall data obtained from Indian Meteorological Department (IMD) Pune covering 27 rain gauge stations scattered in all the districts of Haryana state. Depending on the availability, 42 years’ data (1970-2011) were obtained for rain gauge stations Fatehabad, Gurgaon, Sohana, Jind, Narwana, Firozpur Jhirka, Nuh, Panipat, Rohtak, Sonipat, Sirsa, Hisar, Bawal, Karnal, Ambala and Kaithal. For the stations Tohana, Jhajjar, Dujana, Kalka, Dadupur and Jagadhari the data for the 2006 was missing. Also for stations Dadri, Ballabgarh, Thanesar, Hassanpur and Narnaul the data were available for 36 years (data from 1984-1990 was not available). 2.2. Ward’s Cluster Method Cluster analysis (CA) is a convenient method for identifying homogenous groups of objects called clusters. There are number of methods that can be used to carry out cluster analysis and in this study, Ward’s (1963) method of cluster analysis was used which is also known as “minimum variance method”. This method is different from other hierarchical clustering methods because it uses an analysis of variance approach to evaluate the distances between clusters. In Ward’s method the within- cluster sum of squares is minimized and clusters with minimum between-cluster distance are merged. Let we have two clusters Ck and Cl which are merged to form a new cluster Cm , then the Euclidean distance between the new cluster and another cluster Cj is given by the formula: dj,m=nj+nkdjk+nj+nldjl-njdklnj+nm Where nj , nk , nl and nm
are the number of objects in clusters j, k, l and m, respectively and djk , djl and dkl represent the distances between the observations in clusters j and k, between j and l, and between k and l, respectively (Ramos, 2001). The, Ward’s algorithm can be implemented by updating a stored Euclidean distance between cluster centroids. Although clustering results may be sensitive to the chosen method, Blashfield (1976) found that the Ward’s method provides the most accurate solutions among the hierarchical methods 2.3.1. Clustering under Multiple Sampling Common principal components approach (Kulkarni & Rao, 2000) was used and described as: let we have ‘n’ objects which are to be classified into k (< n) homogeneous groups. Suppose that the j-th object has observations (j = 1..., n) which are recorded by drawing a random sample of size Nj from it. Let X be the random vector consisting of p variables, then Xij represents the i-th observation vector on the j-th object (i = 1,..., Nj , j = 1,..., n). Thus on the basis of the observation vector Xij the n objects are to be classified into k (<n) distinct groups. This approach involves determining a vector subspace which represents the vector subspaces of all the objects as closely as possible. Several developments have taken place in this field (Flury, 1988). Suppose that principal component analysis has been carried out for each of the ‘n’ objects. Furthermore, the first q (< p) principal components are adequate for summarizing the total variance of each of the covariance matrices. Let Lt (q x p) be the matrix of these vectors corresponding to the t-th object (t = 1…, n) whose rows are the Eigen vectors of the p-principal components. Let ∑ is the covariance matrix and H(p x p) = Lt'Lt be a matrix whose first q (<k) principal components represent the "common principal components”. For obtaining Common Principal Components, each of the covariance matrices of the rain gauge stations were subjected to principal component analysis. It was observed that the first 3 PCs accounted for at least 85 per cent of the sample variance and so adequately summarized the total variance of the 4 rainfall variables in all the 27 stations. Hence the matrix Lt
(t = 1, 2,....., 27) was defined on the basis of the first 3 components. Using these CPCs, component scores for the stations (based on mean rainfall) are obtained and clustering was carried out on the basis of these scores. 3 Results and Discussion For clustering various rainfall stations data for monsoon-period (June-September), common principal components approach was used. It was observed that the first 3 PCs accounted for at least 86 per cent of the sample variance and so adequately summarized the total variance of the 4 rainfall variables (June-September) in all the 27 stations. Hence the matrix Lt
(t=1,..., 27) was defined on the basis of the first 3 components. The results of the principal component analysis of the matrix H =Lt'Lt
given below which give the common principal components are presented in Table 1.
It can be observed that the latent roots of H, which represent the measure of similarity between the CPCs and vector subspaces of all the 27 stations, were almost similar corresponding to the first 3 components, (λ1
=20) whereas it was considerably low in the fourth component (λ4
=12.1). The results thus indicate that all the stations were close together along the first 3 CPCs (i.e., the first three components of H). The vector subspaces of the common principal components indicated that the vector subspace of the first common principal components is heavily loaded on September rainfall (loading = 0.650) while rainfall of June (loading = 0.988) and July (loading = -0.750) were found to be respectively in second and third components. This behavior was exhibited for all the districts in their three-dimensional subspaces (i.e. three principal components). These results indicated that only the first three common principal components can be considered common to all the vector subspaces of the districts. These three components also revealed the common cause for variation in the rainfall of the stations viz. rainfall of June, July, and September. Using these CPCs, Components scores for the stations (based on mean rainfall) were obtained and are given in Table 2. Cluster analysis of scores carried out using Ward’s method. Dendrogram based on Common Principal Components Scores is shown in Figure 1. The dendrogram revealed that there are four clusters of rain gauge stations having similar monsoon rainfall spread over Haryana. Cluster I consisted of six stations i.e. Ballabgarh, Gurgaon, Ambala, Karnal, Firozpur Jhirka and Sonipat while Cluster II made of eight stations i.e. Hassanpur, Fatehabad, Tohana, Sirsa, Hisar, jind, Narwana and Narnaul; Cluster III comprised of 10 stations i.e. Sohana, Thanesar, Panipat, Rohtak, Bawal, Dujana, Jhajjar, Nuh, Kaithal and Dadri and Cluster IV has three stations i.e. Kalka, Dadupur and Jagadhari. Thus, Haryana can be divided into four rainfall zones based on common principal component scores. 3.2 Hierarchical Clustering Analysis (Ward’s Method) The Ward’s method of Hierarchical clustering was also applied for classifying the 27 rain gauge stations of Haryana based on average monthly rainfall for the period 1970-2011. Three seasons viz., Monsoon (June-September); Pre-monsoon (March-May) and Overall period (June-May) were considered for the present study. Post- monsoon (October-December) and winter- period (January-February) were not considered for classification, as in most of the years the rainfall during these months was low (mainly November it was near about zero). Dendrogram based on pre-monsoon, monsoon and overall period are presented in the Figures 2, 3 and 4 respectively. From the analysis of these dendrogram, it has been concluded that 4-clusters solutions are appropriate for grouping of stations. Further, the stations classified under a cluster need not be from the same region. About 80 per cent of annual rainfall comes from the south-west monsoon in the month of June to September in Haryana; hence we are interested in monsoon rainfall. In monsoon rainfall there are 4-clusters as suggested by dendrogram (Figure 3). Cluster I (C1) consisted of five stations i.e. Ballabgarh, Gurgaon, Karnal, Firozpur Jhirka and Sonipat. Cluster II (C2) having eight stations i.e. Hassanpur, Fatehabad, Tohana, Sirsa, Hisar, Jind, Narwana and Narnaul. Cluster III (C3) comprised of 10 stations i.e. Sohana, Thanesar, Panipat, Rohtak, Bawal, Dujana, Jhajjar, Nuh, Kaithal and Dadri. Cluster IV (C4) consisted of four stations i.e. Ambala, Kalka, Dadupur and Jagadhari. Cluster analyses of rain gauge stations based on monsoon rainfall in Haryana are given in Table 3 and distances between the cluster centroids are given in Table 4. It was interesting to note that the results of the Ward’s method of Hierarchical clustering either based on common principal component score or based on mean monthly rainfall data of Monsoon period are almost in agreement similar as depicted in Figure 1 and Figure 3. The cluster profile of rain gauge stations based on monsoon rainfall period is presented in Table 3 and it showed that the mean monsoon rainfall was minimum for cluster-II (333.6) and maximum for Cluster-IV (839.2) showing the maximum variation among the characteristics of these two clusters. But the minimum difference between mean monsoon rainfall was found for cluster-III (461.7) and Cluster-I (582.8) showing the similarity between the characteristics of these clusters. Recently, Swaminathan & Meganathan (2018) employed EM method and K Means on rainfall data of a Thanjavur region in Tamil Nadu for 100 years and found EM method was accurate than K Means. Singh et al. (2010) described that eastern agro climatic zone of Haryana has high amount of rainfall (>400 mm) and maxima of rainfall reaches to > 800 mm in the northern districts of Panchkula, Ambala, Yamunagar, Kurukshetra etc. Eastern agro climatic zone of Haryana is also called wet zone. The western agro climatic zone has lesser amount of rainfall (200-400 mm).These results are mostly in agreement with the present study. The distance between the cluster centroid presented in Table 4 indicate that maximum distance of 272.01 has been observed between cluster 2 and 4 showing that these clusters were most different characteristics to each other and minimum distance of 65.63 between clusters 1 and 3 showing that these were the most similar clusters. An objective of this study was to classify the entire state of Haryana into relative number of homogeneous zones based on monthly rainfall. Clustering is the process of dividing the area under consideration to a limited number of climatologically homogeneous zones, based on any hydrologic parameter. It was found that Haryana can be grouped into four clusters based on Monsoon rainfall. Conclusion For clustering the rainfall stations for monsoon-period (June-September), common principal components (CPCs) approach was used. Using these CPCs, Components scores for the stations (based on mean rainfall) were obtained. Clustering was carried out on these scores using ward’s method of clustering and dendrogram was prepared. These results indicated that there are four clusters of rain gauge stations having similar monsoon rainfall spread over Haryana. Cluster I consisted of six stations i.e. Ballabgarh, Gurgaon, Ambala, Karnal, Firozpur Jhirka and Sonipat; Cluster II consisted of eight stations i.e. Hassanpur, Fatehabad, Tohana, Sirsa, Hisar, jind, Narwana and Narnaul; Cluster III consisted of 10 stations i.e. Sohana, Th0anesar, Panipat, Rohtak, Bawal, Dujana, Jhajjar, Nuh, Kaithal and Dadri while Cluster IV consisted of three stations i.e. Kalka, Dadupur and Jagadhari. Thus, Haryana can be divided into four rainfall zones based on common principal component scores. The alternate approach of clustering the rainfall stations for monsoon-period was based on mean monthly rainfall using Ward’s method. Again the results indicated 4-clusters. Cluster I consisted of five stations i.e. Ballabgarh, Gurgaon, Karnal, Firozpur Jhirka and Sonipat; Cluster II consisted of eight stations i.e. Hassanpur, Fatehabad, Tohana, Sirsa, Hisar, Jind, Narwana and Narnaul; Cluster III consisted of 10 stations i.e. Sohana, Thanesar, Panipat, Rohtak, Bawal, Dujana, Jhajjar, Nuh, Kaithal and Dadri while Cluster IV consisted of four stations i.e. Ambala, Kalka, Dadupur and Jagadhari. The Ward’s method of Hierarchical clustering either based on common principal component score or based on mean monthly rainfall data of Monsoon period gave almost similar result. Conflict of Interest Authors would hereby like to declare that there is no conflict of interests that could possibly arise.