Saturday, December 26, 2015

Data generation

Hi
Recently, I used to reconstruct some missing data in my runoff and lake water depth time series to fulfill the gaps in the records. For this, I used Frequency Domain Analysis (FDA) combined with Auto Regressive models to catch the remaining information (persistence) in the FDA residual.

  • Firstly a primary set of data is considered with at least 100 month length in order to reproduce the main properties of the generated time series, moments of the distribution and time dependency.
  • In the second stage, time series of data is transformed to Normal distribution and controlled for stationary. The transformation procedure for runoff time series is done with log-transformation, while lake water level time series is manipulated by means of Box-Cox transformation.
  • Then, analysis of spikes in line spectrum (LS) of the model is calculated for catching the main periodicity in the sets. For instance LS of runoff in Simine River is shown in Fig.1.
Fig.1. LLS of Simine River
  • Due to statistical inconsistency of spikes in line spectrum function, a Tukey window is used to transform it to power spectrum (PS) and use the statistically significant spikes in the PS (Fig. 2).
Fig. 2. PS of Simine River
    • Then residual series of the selected Fourier series is calculated and controlled for being a white noise. If there was a remaining information in the procedure an AR model is used for manipulation. Later a normally distributed random series by mean zero and the same standard deviation with residual series is generated.
    • Fig. 3. shows some Fourier series used in Analysis.
    Fig. 3. Fourier series used in Analysis of Simine River
    •  Then after, estimation time series is calculated using selected Fourier series plus random series and controlled for degree of accuracy in comparison with selected transformed time series.
    • Flowingly, selected Fourier series with the most adequate properties is used in manipulations and filling up gaps between data but original data was not disturbed at all.
    • At last, generated time series was tested for statistical properties compared to primary time series. Fig. 3 shows some Fourier series of Simine river.
    • A competitive results of primary and generated tiem series is shown in Table 1 and Fig. 4. The goal is to catch more information and transfer it to the final time series.
    Add caption
    Table. 1. Statistics related to primary and generated time series (Fourier and combined Fourier and AR)
    • Thus, for this case a combined Fourier and AR model is used for generating data. This issue is also used for other time series of runoff and lake water depth.
    Thankfully yours
    Babak





    No comments:

    Post a Comment