Signal or Noise?

Understanding Generative Models for Real-World Sensor Time Series

1University of California, Los Angeles    2Carnegie Mellon University    3Massachusetts Institute of Technology
*Equal contribution    Corresponding author

The first open, large-scale study of sensor-signal generation — 14 settings across 4 domains, 7 datasets, and 12 modalities, evaluating 5 model families under one unified protocol.

14Generation Settings
4Real-World Domains
7Datasets
12Signal Modalities
5Model Families

Why SensorGen

The broadest open study of sensor-signal generation

Generative modeling of sensor signals has been fragmented across modalities, datasets, and task formulations, with each method tuned to a single signal type. SensorGen consolidates this landscape into one platform — shared data processing, task construction, training, and evaluation — so that model families, signal properties, and design choices can finally be compared apples-to-apples.

Comparison of SensorGen against prior studies across model families and task types
SensorGen vs. prior work on sensor-signal generation. Earlier studies cover one domain and a single task type; SensorGen is the first to span all four task categories and five model families at scale.

Signal Diversity & Scale

Sampling Frequency

0.0033 Hz → 256 Hz

Sequence Length

102 → 104 steps

Time Span

seconds → 7 days

Four Real-World Data Regimes

Emergency Department

MIMIC-IV ECG — clinical 12-lead electrocardiograms paired with diagnostic reports.

Daily Life

CAPTURE-24, PPG-DaLiA, Metabonet — free-living accelerometry, wearable physiology, and longitudinal metabolic traces.

Lab Study

PhyMER, SHHS — controlled affective-state recordings and overnight polysomnography (EEG, ECG, EMG, EOG).

Operation Room

VitalDB — intra-operative waveforms, medication records, arterial blood pressure, PPG, ECG, and NIBP.

Signal Modalities

Click an icon to see where each modality appears.

These settings are organized into 4 task categories and 14 concrete settings. See the task design →

Task Design

One generation interface, four task categories

We cast every task as conditional generation x̂ ∼ pθ(x | C),   C ⊆ {c1, c2}, where c1 is a semantic condition (text, labels, metadata) and c2 is observed signal context (history, source channels). Each setting is built from a real sensing bottleneck and validated for clinical grounding, application value, and generative suitability.

Semantic-to-Signal

Generate realistic, semantically consistent signals from text, labels, or metadata.

  • Text-to-Signal Generation
  • Label-Conditioned Generation

Interpolation & Extrapolation

Predict future segments or reconstruct missing spans from observed temporal context.

  • Future Forecasting
  • Intervention-Conditioned Forecasting
  • Temporal Imputation

Translation

Generate missing or hard-to-acquire target channels from accessible source channels.

  • Signal Translation
  • Proxy-to-Target Reconstruction

Editing

Restore, refine, or selectively modify recordings while preserving non-target attributes.

  • Super-Resolution
  • Denoising
  • Semantic Signal Editing
Task taxonomy: how each setting instantiates the conditional-generation interface
Each setting instantiates the same x̂ ∼ pθ(x | C) interface — only the role of the conditions changes.

Coverage

Bars are sized by the number of available samples. Click a category to see its settings.

What the Study Reveals

Findings

Rather than crowning a single best model, SensorGen turns a fragmented field into evidence. Each takeaway below is grounded in the benchmark and paired with the table or figure that supports it — practical lessons for building and evaluating sensor generators.

01 Which generative paradigm transfers best?

Takeaway 1

Flow matching is the strongest default across tasks.

Across all four categories, SiT (flow matching) gives the best aggregate performance — best precision/recall on semantic-to-signal, and best MSE/MAE/PSNR/SMSE/SSIM on translation and interpolation/extrapolation. No single method wins everywhere, but flow matching is the reliable baseline for heterogeneous sensor generation.

Aggregate results for semantic-to-signal and editing
Aggregate results — Semantic-to-Signal & Editing.
Aggregate results for translation and interpolation/extrapolation
Aggregate results — Translation & Interpolation/Extrapolation.

02 What signal properties matter?

Takeaway 2

Target-signal context tames long-sequence generation.

Quality degrades as sequences grow for semantic-to-signal and translation, but stays stable for imputation: surrounding observed segments anchor the missing region, so generation is far more tractable when target-signal context is available.

Short vs long sequence generation across settings
Short vs. long sequences across imputation, IMU generation, and PPG-to-ECG translation.
Takeaway 3

High-frequency signals need time-frequency modeling.

For high-frequency signals such as EEG, models reach high precision but low recall — plausible waveforms that miss spectral variability. Adding an STFT spectrogram as a time-frequency conditioning signal improves high-frequency channel fidelity.

Spectrogram conditioning for high-frequency generation
Spectrogram conditioning improves high-frequency channel generation.
Takeaway 4

Demographic covariates help — only when normalized.

Raw demographic attributes hurt longitudinal glucose generation (overfitting), but dataset-level normalization flips them into a useful global context that beats the no-demographic baseline. Subject metadata helps only when encoded in a properly normalized form.

Effect of demographic encoding on longitudinal generation
Raw vs. normalized demographic conditioning on glucose generation.
Finding

Fixed-range normalization is a more stable target.

z-score normalization can bias generation (e.g., upward-shifted targets in blood-pressure translation). Range-based [−1, 1] min-max normalization consistently improves quality across models — sensor generation may need different normalization than sensor representation learning.

Range-based vs z-score normalization
Range-based normalization outperforms z-score for sensor generation.

03 Are generated signals useful beyond realism?

Finding

Synthetic signals deliver downstream utility.

ECGs from a text-to-ECG generator improve both ECG-text pre-training transfer and supervised disease classification under data scarcity. Moderate augmentation (~10%) helps most; excessive synthetic data can amplify generator artifacts.

Downstream utility of synthetic sensor signals
Synthetic ECGs improve supervised classification and self-supervised transfer.
Finding

Sensor generators scale with compute and capacity.

On text-to-ECG, FID decreases consistently with more training steps, and larger models improve fidelity further — sensor generation benefits from both compute and capacity scaling.

Scaling behavior with training compute and model size
Generation quality improves with training steps and model size.

04 Generated signals, up close

Across settings, samples capture recognizable signal structure rather than degenerating into noise.

Model Families Evaluated

Diffusion

Iterative denoising — DiT.

Flow Matching

Continuous transport — SiT.

Autoregressive

Sequential tokens — MAR.

Normalizing Flows

Invertible likelihood — TarFlow.

Hierarchical

Coarse-to-fine — FractalGen & Imagen.

Unified Protocol

Shared tasks, splits, and metrics for every model.

Open & Reproducible

Get started

SensorGen ships standardized tasks, training pipelines, and pretrained checkpoints so you can reproduce results or build on them directly.

Quick start

  1. Onboard the code. Clone the SensorGen GitHub repository and follow the README to set up the environment and standardized tasks.
  2. Download a checkpoint. Pull a pretrained SensorGen-SiT checkpoint from Hugging Face, then run generation and evaluation as described in the repo.

Citation

BibTeX

@article{shuai2026sensorgen,
  title   = {Signal or Noise? Understanding Generative Models for Real-World Sensor Time Series},
  author  = {Shuai, Zitao and Xu, Zongzhe and Wu, Yuntian and Li, Sirui and Li, Tianhong and Yang, Yuzhe},
  journal = {Preprint},
  year    = {2026}
}