Energy forecasting under missing data: Comparative evaluation of augmented representations and decoder-only time-series imputation
Data-related issues, including missing values and irregular measurements, challenge the accuracy of short-term energy forecasting in smart grids. In data-scarce scenarios, two approaches are commonly considered, but their strengths and weaknesses are not fully mapped. Embedding-based models learn joint representations from heterogeneous data, compensating for the lack of time-series measurements via additional contextual or external sources, whereas imputation pipelines restore temporal continuity but may smooth variability or produce implausible values. To address these limitations, we propose a unified forecasting framework for energy systems that integrates a shared Temporal Fusion Transformer prediction with a controlled degradation protocol to simulate realistic missing-data patterns. This enables a fair and systematic comparison between two pipelines: a representation-augmented learning and decoder-only time series imputation. The former integrates TS2Vec temporal embeddings and BERT-based static contextual representations to provide a richer forecasting space without without explicit reconstruction of missing values. The latter uses a Chronos-2 model to reconstruct missing time-series segments, followed by physics-based correction to enforce physically plausible outputs. We evaluate both pipelines under a controlled data degradation protocol to map the trade-offs between representation learning and data continuity restoration through imputation. We use real-world non-residential building electricity consumption and wind generation datasets. The imputation-based pipeline achieves a mean sMAPE of 10.14% and MAE of 8.43 kWh across 100 buildings, compared to 12.11% and 10.89 kWh for the representation-based approach ( p < 0 . 01 p < 0 . 01 p < 0 . 01 ) . On the wind generation imputation also improves predictive accuracy ( R 2 = 0 . 870 vs. R 2 = 0 . 794 ). However representation-based models remain competitive in scenarios with irregular, spike-dominated, or event-driven consumption patterns where imputation provides limited additional benefits.
Authors
Related projects
No projects linked.
Attachments
No attachments yet.