Open Journal of Geology
Vol.08 No.10(2018), Article ID:87520,15 pages

Accurate Imputation for Relative Humidity over Pakistan Gathered from AQUA Satellite

Usman Saleem1, Mian Sohail Akram1, Muhammad Fahad Ullah2, Faisal Rehman2*

1Institute of Geology, University of the Punjab, Lahore, Pakistan

2Departments of Earth Sciences, University of Sargodha, Sargodha, Pakistan

Copyright © 2018 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

Received: August 16, 2018; Accepted: September 23, 2018; Published: September 26, 2018


The relative humidity in the atmosphere captured by AQUA satellite contains missing matrices. In order to fill such missing values four very popular imputation techniques: Bilinear, Inverse Distance Weighting, Natural Neighbor and Nearest Interpolations were tested. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (R2) and Correlation Coefficient (Corr), were used to check the accuracy of these interpolations. It was found that the Inverse Distance Weighting and Nearest Interpolation were proved not to be suited. Natural interpolation gave accurate results than the aforementioned two interpolations. Missing values of relative humidity were accurately refilled with Bilinear Interpolation. This interpolation produced RMSE of ±0.543 for relative humidity over 100, 150, 200, 250, 300, 400, 500 hPa while for 600, 700, 850 and 925 hPa RMSE remainnear to 1. A perfect fit to the surface and very strong correlation (value near to 0.99) was found between actual and imputed relative humidity data through Bilinear Interpolation. Therefore it was concluded that the Bilinear Interpolation is the most accurate and best imputation for missing values of relative humidity form 100 to 1000 hPa levels.


Imputation, Humidity, Aqua Satellite , Pakistan

1. Introduction

Metrological data collected from satellites mostly contain gaps. Such gaps in data set occur due to less efficient sampling of satellite subsystem [1] [2] . Using missing data may produce misleading in purposed outcome of the research [3] . For the best use of relative humidity data, it is necessary to have best estimate of the gaps in satellite captured data with imputation techniques [1] [4] [5] [6] . Saleem [6] mentioned that for the filling of such data, it was mandatory to have the finest understanding of the spatial and temporal variations of climatic variables. Robeson [7] ; Junninen, Niska [1] ; Norazian [4] ; Yozgatligil, Aslan [8] ; Saleem [5] used average refilling of meteorological variables however this method of refilling lacks the integrity and quality in data set. Besides mean value imputations also other imputation methods have been produced for filling gaps in meteorological dataset [1] [9] [10] .

According to Sun and Oort [11] ; Cho, Newell [12] ; McCarthy and Toumi [13] ; Gettelman, Weinstock [14] ; Dessler and Sherwood [15] water vapors were major contributor to cloud formation and greenhouse effect in the atmosphere of our globe and its variation in upper troposphere plays an important role in daily radiation budget of earth [11] [15] [16] . The valuable work on relative humidity in troposphere carried out by Lindzen [17] ; Shine and Sinha [18] ; Del Genio, Kovari [19] ; Sun and Oort [11] ; Peixoto and Oort [20] ; Harries [21] ; Gettelman, Collins [22] . The past practice to collect relative humidity was carried out through radiosonde which was not any accurate method [23] [24] . With the passage of time, the emerging technological development introduced artificial satellites as a platform to observe water vapors in the atmosphere. The first meteorological satellite was Mariner −2 Venus Probe, with the task to determine water content in the planet Venus [25] . After this successful experiment, next two satellites Cosmos 243 and 384 were lunched to measure relative humidity of the earth [26] . Now relative humidity data are being captured by a number of remote sensing satellites with high accuracy and precision [22] [27] .

The relative humidity is defined as the relative amount of water vapors in the atmosphere as a percentage of the amount required for saturation at the same temperature. It varies quantitatively and qualitatively throughout the atmosphere. The relative humidity can change in the atmosphere by either changing the number of water vapors or by variation of temperature in the atmosphere [20] .

Pakistan has latitudinal spread from the Arabian Sea in the South to the Himalayan Mountains in North with longitudinal extent between Afghanistan and India in West and East (Figure 1 and Figure 2). Pakistanis located in the subtropic of the partially temperate region and is home of about 200 million people. Its large portion is facing climate change for many decades. Pakistan is an arid to semi-arid territory with changing in a meteorological variable like temperature, humidity, etc. [28] . It is noted that a large variation in rainfall pattern throughout the country with an average annual rainfall equals to 10 inches [5] . The Monsoon rain is only dominant hydro-meteorological resource, contributing to 59% of the annual rainfall [29] . Most of the Himalayan regions receive precipitation in the form of snow and ice in winter. The coastal climate is confined to a shrink belt along the coast and a rise in temperature from 0.60˚C to 1.00˚C has occurred since 1900 [30] . The coastal line of Pakistan faced four major cyclones during 1999-2010 [31] . Hottest months are May and June with average temperature of 51˚C, while in February winter is on peak with average temperature of 60˚C [30] .

Figure 1. Pakistan and it’s host regions location map [5] .

Figure 2. Altitude map of Pakistan with elevations in meters [5] .

The actual thrust of this research work was to devise a workable methodology for carrying out scientific observation of upper atmospheric meteorology over Pakistan in spite of lack of modern equipment and technological resource. Upper meteorological observation and monitoring were also not available in Pakistan. However, Saleem [5] ; Saleem [6] and Wazir [32] were a few dominant initiatives efforts on upper-level atmospheric observations.

2. Material and Methods

2.1. Data Used

AQUA satellite capturing water content from September 2002 to present and AIRS [Atmospheric Infrared Sounder] is a sensor which is mounted on it [22] [33] . AIRS operated in IR [infrared] and MW [microwaves] and it has nearly 2400 bands in thermal and visible regions. AIRS can also operate in 70% cloud fraction [22] [34] . The ground resolution of data is 45 Km2 and grid size is 10 by 10 degree latitude and longitude [35] .

The current study is carried out by using AIRS level 6 version 3, for monthly average relative humidity over 1000 to 100 hPa pressure levels. The studies on its captured data set have been carried out by using balloons, radio sounding, and aircraft observations. Divakarla, Barnet [36] and Tobin, Revercomb [37] highly recommended the checking of AIRS data in the lower troposphere.

2.2. Imputations of Missing Dataset in Relative Humidity

In order to produce the best estimations for this missing data 30% of the relative humidity was used to interpolate from 70% already known relative humidity samples. Mean Absolute Error [AME], Root Mean Square Error [RMSE], Coefficient of Determinations [R2] Correlation Coefficient [Corr] used as performance indicators in this research.

1) Inverse Distance Weight Interpolation (Idw)

It is the deterministic spatial interpolation which based on Tobbular’ Law of geography [7] . Ferrari and Ozaki [38] used Equation (1) as given below: for inverse distance weighting

R H ( x j ) = i = 1 t R H ( x i ) S i j r i = 1 t S i j r (1)

where R H ( x j ) represents a missing sample of relative humidity, S i j r was the weight factor for R H ( x i ) samples, t is the total number of relative humidity samples and r is the degree of the weighting factor. Algorithm of IDW developed by Langella [39] was used in this present research work.

2) Nearest Neighbor Interpolation [Nni]

NNI interpolation replaces gaps in dataset with nearest sample value [38] [40] .

3) Bilinear Interpolation [Bi]

BI refills the gaps in dataset with respect to the best fit linear line in a dataset. Junninen, Niska [1] used Equation (2) for linear interpolation as given below:

R H = R H y 1 + m ( R H x + R H x 1 ) (2)

m = R H y 2 R H y 1 R H x 2 R H x 1

x 1 < x < x 2 and y 1 < y < y 2

Equation (2) was the simple linear line equation, having ( R H x 1 , R H y 2 ) and ( R H x 2 , R H y 2 ) sample points with m as their gradient.

4) Natural Interpolation (Ni)

In this method, the missing sample gets value from its natural neighbor and Delaunay triangulation will be used to select natural neighbors sample around the missing value [41] .

2.3. Performance Indicators for Each Interpolation

Robeson [7] ; Price, McKenney [42] ; Junninen, Niska [1] ; Perry and Hollis [43] ; Norazian [4] ; Hofstra, Haylock [44] ; Rahman and Islam [2] ; Ferrari and Ozaki [38] ; Saleem and Ahmed [34] have frequently used, Absolute Mean Error [AME], Root Mean Square Error [RMSE], Coefficient of Determination [R2] and Correction Coefficient [Corr] as performance predictor for these interpolations. The present study was, also carried out in line with the same standard procedure.

1) Root Mean Square Error [RMSE]

Norazian [4] used Equation (3) for RMSE as given below:

R M S E = { 1 t i = 1 t [ R H o i R H p i ] 2 } 1 2 (3)

In Equation (3) t was the total number of samples [1] . RMSE gives the difference between original and imputed relative humidity sample and low value of it will show accurate refilling of relative humidity [41] .

2) Mean Absolute Error (MAE)

Junninen, Niska [1] ; Norazian [4] wrote Equation (4) for MAE as given in the following:

M A E = 1 t i = 1 t | R H o i R H p i | (4)

Precise refilling of dataset will be based on MAE value near to 0.

3) Correlation Coefficient (Corr)

It’s value of +1 shows a very good correlation and the good replacement of missing data. Very bad imputation will occur when Corr has value near to 0. Fisher [45] ; Kendall [46] used the following equation this formula for Corr:

c o r r = cov ( R H p i , R H o i ) R H p i R H o i (5)

cov [RHpi, RHoi] represents the covariance of RHpi, RHoi while R H p i R H o i is the product of standard deviations.

4) Coefficient of Determination (R2)

It tells us about the degree of correlation in the dataset [2] . Its value closed to 1 indicates a perfect fit to the surface. [Norazian [4] ] used Equation (5) for R2 as given below:

R 2 = [ 1 t i = 1 t ( R H p i R H p i . m ) ( R H o i R H o i . m ) p o ] (6)

where R H p i . m was the average value of imputed samples and R H o i . m is mean of observed samples.

3. Results

The imputation over each pressure level was determined and the results are presented below.

3.1. Inverse Distance Weighting (IDW)

This interpolation technique showed good performance indicators for refilling of relative humidity for 200, 250, 300, 400, and 500 hPa levels (Table 1).

3.2. Bilinear Interpolation (BI)

Performance parameter reveals that refilling of relative humidity at 100, 150, 200, 250, 300, 400 and 500 hPa was accurate and perfect with BI. Besides, for the remaining pressure levels: 600, 700, 850, and 925 hPa, the results were also very accurate and perfect. A strong correlation [0.995] and R2 close to 1, indicating very good imputation of relative humidity for these pressure levels in the atmosphere (Table 2).

3.3. Natural Neighbor Interpolation (NNI)

This interpolation technique sit best for refilling of relative humidity for 100, 150, 200, 250, 300, 400 hPa with less than ±0.5 RMSE value. The refilling of relative humidity for other pressure levels: 500, 600, 700, 850, 925 hPa also show very good results i.e., RMSE values remain close to ±1 with MAE 0.339 along with very strong correlation [0.985]. This interpolation technique show poor refilling of relative humidity data set at 1000 hPa level (Table 3).

3.4. Nearest Neighbors Interpolation (NI)

This interpolation technique showed perfect and accurate refilling of dataset for 150, 200, 250, 300 and 400 hPa levels. This interpolation proved not to be a very accurate one for remaining pressure levels: 100, 500, 600, 700, 850, 925 and 1000 hPa (Table 4).

Table 1. Inverse distance weighting interpolation out come and its performance indicators.

Table 2. Bilinear Interpolation out come and its performance indicators.

Table 3. Natural Neighbor Interpolation out come and its performance indicators.

Table 4. Nearest Neighbor Interpolation out come and its performance indicators.

Figure 3. (a) Natural Neighbors Interpolation for imputation of relative humidity [100 hPa to 400 hPa]; (b) Natural Neighbors Interpolation for imputation of relative humidity [500 hPa to 1000 hPa].

Figure 4. (a) Bilinear Interpolation for imputation of relative humidity [100 hPa to 400 hPa]; (b) Bilinear Interpolation for imputation of relative humidity [500 hPa to 1000 hPa].

4. Discussions

The scatter plots were adapted in order to identify the perfect and accurate interpolation form Natural and Bilinear interpolations. Good results for refilling of relative humidity were found for 100, 150, 200, 250, 300 and 400 hPa through NNI (Figure 3(a)).

However, NNI not able to accurately refill the missing data of relative humidity over 600, 700, 850, 925 and 1000 hPa pressure levels (Figure 3(b)).

Filling of gaps in data with BI seem good for 100, 150, 200, 250, 300 and 400 hPa levels in every month of years (Figure 4(a)).

Besides for remaining pressure levels: 500, 600, 700, 850, 925 and 1000 hPa BI suit to best for refilling (Figure 4(b)).

5. Conclusion

Based on the critical check and evaluation of interpolations regarding their product it concluded that the Bilinear Interpolation was the best and accurate for all pressure levels while Natural Neighbor Interpolation proved to be the second best interpolation to substitute missing relative humidity of 100 to 1000 hPa.


We very appreciative to AUQA-AIRS team for their assistance to interpolate AIRS data set. The authors additionally wish to recognize Mr. Alessio Martion, University of the Rome, LaSapienza Italy for his important recommendations to enhance this research.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Saleem, U., Akram, M.S., Ullah, M.F. and Rehman, F. (2018) Accurate Imputation for Relative Humidity over Pakistan Gathered from AQUA Satellite. Open Journal of Geology, 8, 987-1001.


  1. 1. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J. and Kolehmainen, M. (2004) Methods for Imputation of Missing Values in Air Quality Data Sets. Atmospheric Environment, 38, 2895-2907.

  2. 2. Rahman, G. and Islam, Z. (2011) A Decision Tree-Based Missing Value Imputation Technique for Data Pre-Processing. Proceedings of the Ninth Australasian Data Mining Conference, Ballarat, Australia, 1-2 December 2011, 41-50.

  3. 3. Muralidhar, K., Parsa, R. and Sarathy, R. (1999) A General Additive Data Perturbation Method for Database Security. Management Science, 45, 1399-1415.

  4. 4. Noor, N.M. (2015) Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set. Materials Science Forum, 803, 278-281.

  5. 5. Saleem, M.U. (2015) Atmospheric Ducts: Their Applications in Radio Frequency Propagation Using Satellite Remote Sensing Techniques. LAP Lambert Academic Publishing, Germany, 56 p.

  6. 6. Saleem, M.U. (2016) Statistical Investigation and Mapping of Monthly Modified Refractivity Gradient over Pakistan at the 700 Hectopascal Level. Open Journal of Antennas and Propagation, 4, 46-63.

  7. 7. Robeson, S.M. (1994) Influence of Spatial Sampling and Interpolation on Estimates of Air Temperature Change. Climate Research, 4, 119-126.

  8. 8. Yozgatligil, C., Aslan, S., Iyigun, C. and Batmaz, I. (2013) Comparison of Missing Value Imputation Methods in Time Series: The Case of Turkish Meteorological Data. Theoretical and Applied Climatology, 112, 143-167.

  9. 9. Schneider, T. (2001) Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. Journal of Climate, 14, 853-871.<0853:AOICDE>2.0.CO;2

  10. 10. Hakamada, K., Okamoto, M. and Hanai, T. (2006) Novel Technique for Preprocessing High Dimensional Time-Course Data from DNA Microarray: Mathematical model-Based Clustering. Bioinformatics, 22, 843-848.

  11. 11. Sun, D.-Z. and Oort, A.H. (1995) Humidity-Temperature Relationships in the Tropical Troposphere. Journal of Climate, 8, 1974-1987.<1974:HRITTT>2.0.CO;2

  12. 12. Cho, J.Y., Newell, R.E. and Sachse, G.W. (2000) Anomalous Scaling of Mesoscale Tropospheric Humidity Fluctuations. Geophysical Research Letters, 27, 377-380.

  13. 13. McCarthy, M.P. and Toumi, R. (2004) Observed Interannual Variability of Tropical Troposphere Relative Humidity. Journal of Climate, 17, 3181-3191.<3181:OIVOTT>2.0.CO;2

  14. 14. Gettelman, A., Weinstock, E., Fetzer, E., Irion, F., Eldering, A., Richard, E., et al. (2004) Validation of Aqua Satellite Data in the Upper Troposphere and Lower Stratosphere with in Situ Aircraft Instruments. Geophysical Research Letters, 31, L22107.

  15. 15. Dessler, A.E. and Sherwood, S.C. (2009) Atmospheric Science. A Matter of Humidity. Science, 323, 1020-1021.

  16. 16. Gaffen, D.J. and Ross, R.J. (1999) Climatology and Trends of US Surface Humidity and Temperature. Journal of Climate, 12, 811-828.<0811:CATOUS>2.0.CO;2

  17. 17. Lindzen, R.S. (1990) Some Coolness Concerning Global Warming. Bulletin of the American Meteorological Society, 71, 288-299.<0288:SCCGW>2.0.CO;2

  18. 18. Shine, K.P. and Sinha, A. (1991) Sensitivity of the Earth’s Climate to Height-Dependent Changes in the Water Vapour Mixing Ratio. Nature, 354, 382-384.

  19. 19. Del Genio, A.D., Kovari, W. and Yao, M.S. (1994) Climatic Implications of the Seasonal Variation of Upper Troposphere Water Vapor. Geophysical Research Letters, 21, 2701-2704.

  20. 20. Peixoto, J. and Oort, A.H. (1996) The Climatology of Relative Humidity in the Atmosphere. Journal of Climate, 9, 3443-3463.<3443:TCORHI>2.0.CO;2

  21. 21. Harries, J. (1997) Atmospheric Radiation and Atmospheric Humidity. Quarterly Journal of the Royal Meteorological Society, 123, 2173-2186.

  22. 22. Gettelman, A., Collins, W.D., Fetzer, E.J., Eldering, A., Irion, F.W., Duffy, P.B., et al. (2006) Climatology of Upper-Tropospheric Relative Humidity from the Atmospheric Infrared Sounder and Implications for Climate. Journal of Climate, 19, 6104-6121.

  23. 23. Elliott, W.P. and Gaffen, D.J. (1991) On the Utility of Radiosonde Humidity Archives for Climate Studies. Bulletin of the American Meteorological Society, 72, 1507-1520.<1507:OTUORH>2.0.CO;2

  24. 24. Miloshevich, L.M., Vomel, H., Whiteman, D.N., Lesht, B.M., Schmidlin, F. and Russo, F. (2006) Absolute Accuracy of Water Vapor Measurements from Six Operational Radiosonde Types Launched during AWEX-G and Implications for AIRS Validation. Journal of Geophysical Research: Atmospheres, 111, 1-25.

  25. 25. Eyre, J. (1964) Progress Achieved on Assimilation of Satellite Data in Numerical Weather Prediction over the Last 30 Years. ECMWF Seminar Proceedings: Recent Developments in Use of Satellite Observations in Numerical Weather Prediction, Shinfield Park, Reading, 3-7 September 2007, 1-27.

  26. 26. Basharinov, A.E., Gurvich, A.S. and Egorov, S.T. (1969) Determination of Geophysical Parameters from Thermal Radio Emission Measurements on the Artificial earth Satellite “Cosmos-243”. Doklady Akademii Nauk SSSR, 188, 1273-1276.

  27. 27. Staelin, D., Kunzi, K., Pettyjohn, R., Poon, R., Wilcox, R. and Waters, J. (1976) Remote Sensing of Atmospheric Water Vapor and Liquid Water with the Nimbus 5 Microwave Spectrometer. Journal of Applied Meteorology, 15, 1204-1214.<1204:RSOAWV>2.0.CO;2

  28. 28. Cheema, S.B., Rasul, G., Ali, G. and Kazmi, D.H. (2011) A Comparison of Minimum Temperature Trends with Model Projections. Pakistan Journal of Meteorology, 8, 39-52.

  29. 29. Farooqi, A.B., Khan, A.H. and Mir, H. (2005) Climate Change Perspective in Pakistan. Pakistan Journal of Meteorology, 2, 11-21.

  30. 30. Iqbal, M.J. and Quamar, J. (2011) Measuring Temperature Variability of Five Major Cities of Pakistan. Arabian Journal of Geosciences, 4, 595-606.

  31. 31. Sarfaraz, S. and Dube, S. (2012) Numerical Simulation of Storm Surges Associated with Severe Cyclones Land Falling Pakistan Coast during 1999-2010. Pakistan Journal of Meteorology, 8, 11-20.

  32. 32. Wazir, M.A. (2011) Estimation of Regional Stratospheric Ozone Concerning Pakistan. Pakistan Journal of Meteorology, 7, 33-43.

  33. 33. Tian, B., Manning, E., Fetzer, E., Olsen, E., Wong, S., Susskind, J., et al. (2013) AIRS/AMSU/HSB Version 6 Level 3 Product User Guide. Jet Propulsion Laboratory, Pasadena, CA.

  34. 34. Saleem, M.U. and Ahmed, S.R. (2016) Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua satellite. Journal of Data Analysis and Information Processing, 4, 132-146.

  35. 35. Susskind, J., Barnet, C.D. and Blaisdell, J.M. (2003) Retrieval of Atmospheric and Surface Parameters from AIRS/AMSU/HSB Data in the Presence of Clouds. IEEE Transactions on Geoscience and Remote Sensing, 41, 390-409.

  36. 36. Divakarla, M.G., Barnet, C.D., Goldberg, M.D., McMillin, L.M., Maddy, E., Wolf, W., et al. (2006) Validation of Atmospheric Infrared Sounder Temperature and Water Vapor Retrievals with Matched Radiosonde Measurements and Forecasts. Journal of Geophysical Research: Atmospheres, 111, D09S15.

  37. 37. Tobin, D.C., Revercomb, H.E., Knuteson, R.O., Lesht, B.M., Strow, L.L., Hannon, S.E., et al. (2006) Atmospheric Radiation Measurement Site Atmospheric State Best Estimates for Atmospheric Infrared Sounder Temperature and Water Vapor Retrieval Validation. Journal of Geophysical Research: Atmospheres, 111, D09S14.

  38. 38. Ferrari, G.T. and Ozaki, V. (2014) Missing Data Imputation of Climate Datasets: Implications to Modeling Extreme Drought Events. Revista Brasileira de Meteorologia, 29, 21-28.

  39. 39. Langella, G. (2010) Inverse Distance Weighted (IDW) or Simple Moving Average (SMA) INTERPOLATION. File Exchange Version

  40. 40. Stahl, K., Moore, R., Floyer, J., Asplin, M. and McKendry, I. (2006) Comparison of Approaches for Spatial Interpolation of Daily Air Temperature in a Large Region with Complex Topography and Highly Variable Station Density. Agricultural and Forest Meteorology, 139, 224-236.

  41. 41. Boissonnat, J.-D. and Cazals, F. (2002) Smooth Surface Reconstruction via Natural Neighbour Interpolation of Distance Functions. Computational Geometry, 22, 185-203.

  42. 42. Price, D.T., McKenney, D.W., Nalder, I.A., Hutchinson, M.F. and Kesteven, J.L. (2000) A Comparison of two Statistical Methods for Spatial Interpolation of Canadian Monthly Mean Climate Data. Agricultural and Forest Meteorology, 101, 81-94.

  43. 43. Perry, M. and Hollis, D. (2005) The Generation of Monthly Gridded Datasets for a Range of Climatic Variables over the UK. International Journal of Climatology: A Journal of the Royal Meteorological Society, 25, 1041-1054.

  44. 44. Hofstra, N., Haylock, M., New, M., Jones, P. and Frei, C. (2008) Comparison of Six Methods for the Interpolation of Daily, European Climate Data. Journal of Geophysical Research: Atmospheres, 113, D21110.

  45. 45. Fisher, R.A. (1958) Statistical Methods for Research Workers. Hafner, New York.

  46. 46. Kendall, M.G. (1979) The Advanced Theory of Statistics. Macmillan, New York.