SensorGen — Signal or Noise? Generative Models for Real-World Sensor Time Series

Why SensorGen

The broadest open study of sensor-signal generation

Generative modeling of sensor signals has been fragmented across modalities, datasets, and task formulations, with each method tuned to a single signal type. SensorGen consolidates this landscape into one platform — shared data processing, task construction, training, and evaluation — so that model families, signal properties, and design choices can finally be compared apples-to-apples.

Comparison of SensorGen against prior studies across model families and task types — SensorGen vs. prior work on sensor-signal generation. Earlier studies cover one domain and a single task type; SensorGen is the first to span all four task categories and five model families at scale.

Signal Diversity & Scale

Sampling Frequency

0.0033 Hz → 256 Hz

Sequence Length

10² → 10⁴ steps

Time Span

seconds → 7 days

Four Real-World Data Regimes

Emergency Department

MIMIC-IV ECG — clinical 12-lead electrocardiograms paired with diagnostic reports.

Daily Life

CAPTURE-24, PPG-DaLiA, Metabonet — free-living accelerometry, wearable physiology, and longitudinal metabolic traces.

Lab Study

PhyMER, SHHS — controlled affective-state recordings and overnight polysomnography (EEG, ECG, EMG, EOG).

Operation Room

VitalDB — intra-operative waveforms, medication records, arterial blood pressure, PPG, ECG, and NIBP.

Signal Modalities

Click an icon to see where each modality appears.

These settings are organized into 4 task categories and 14 concrete settings. See the task design →

Task Design

One generation interface, four task categories

We cast every task as conditional generation x̂ ∼ p_θ(x | C), C ⊆ {c₁, c₂}, where c₁ is a semantic condition (text, labels, metadata) and c₂ is observed signal context (history, source channels). Each setting is built from a real sensing bottleneck and validated for clinical grounding, application value, and generative suitability.

Semantic-to-Signal

Generate realistic, semantically consistent signals from text, labels, or metadata.

Text-to-Signal Generation
Label-Conditioned Generation

Interpolation & Extrapolation

Predict future segments or reconstruct missing spans from observed temporal context.

Future Forecasting
Intervention-Conditioned Forecasting
Temporal Imputation

Translation

Generate missing or hard-to-acquire target channels from accessible source channels.

Signal Translation
Proxy-to-Target Reconstruction

Editing

Restore, refine, or selectively modify recordings while preserving non-target attributes.

Super-Resolution
Denoising
Semantic Signal Editing

Task taxonomy: how each setting instantiates the conditional-generation interface — Each setting instantiates the same x̂ ∼ p_θ(x | C) interface — only the role of the conditions changes.

Coverage

Bars are sized by the number of available samples. Click a category to see its settings.

What the Study Reveals

Findings

Rather than crowning a single best model, SensorGen turns a fragmented field into evidence. Each takeaway below is grounded in the benchmark and paired with the table or figure that supports it — practical lessons for building and evaluating sensor generators.

01 Which generative paradigm transfers best?

Takeaway 1

Flow matching is the strongest default across tasks.

Across all four categories, SiT (flow matching) gives the best aggregate performance — best precision/recall on semantic-to-signal, and best MSE/MAE/PSNR/SMSE/SSIM on translation and interpolation/extrapolation. No single method wins everywhere, but flow matching is the reliable baseline for heterogeneous sensor generation.

Aggregate results for semantic-to-signal and editing — Aggregate results — Semantic-to-Signal & Editing.

Aggregate results for translation and interpolation/extrapolation — Aggregate results — Translation & Interpolation/Extrapolation.

02 What signal properties matter?

Takeaway 2

Target-signal context tames long-sequence generation.

Quality degrades as sequences grow for semantic-to-signal and translation, but stays stable for imputation: surrounding observed segments anchor the missing region, so generation is far more tractable when target-signal context is available.

Short vs long sequence generation across settings — Short vs. long sequences across imputation, IMU generation, and PPG-to-ECG translation.

Takeaway 3

High-frequency signals need time-frequency modeling.

For high-frequency signals such as EEG, models reach high precision but low recall — plausible waveforms that miss spectral variability. Adding an STFT spectrogram as a time-frequency conditioning signal improves high-frequency channel fidelity.

Spectrogram conditioning for high-frequency generation — Spectrogram conditioning improves high-frequency channel generation.

Takeaway 4

Demographic covariates help — only when normalized.

Raw demographic attributes hurt longitudinal glucose generation (overfitting), but dataset-level normalization flips them into a useful global context that beats the no-demographic baseline. Subject metadata helps only when encoded in a properly normalized form.

Effect of demographic encoding on longitudinal generation — Raw vs. normalized demographic conditioning on glucose generation.

Finding

Fixed-range normalization is a more stable target.

z-score normalization can bias generation (e.g., upward-shifted targets in blood-pressure translation). Range-based [−1, 1] min-max normalization consistently improves quality across models — sensor generation may need different normalization than sensor representation learning.

Range-based vs z-score normalization — Range-based normalization outperforms z-score for sensor generation.

03 Are generated signals useful beyond realism?

Finding

Synthetic signals deliver downstream utility.

ECGs from a text-to-ECG generator improve both ECG-text pre-training transfer and supervised disease classification under data scarcity. Moderate augmentation (~10%) helps most; excessive synthetic data can amplify generator artifacts.

Downstream utility of synthetic sensor signals — Synthetic ECGs improve supervised classification and self-supervised transfer.

Finding

Sensor generators scale with compute and capacity.

On text-to-ECG, FID decreases consistently with more training steps, and larger models improve fidelity further — sensor generation benefits from both compute and capacity scaling.

Scaling behavior with training compute and model size — Generation quality improves with training steps and model size.

04 Generated signals, up close

Across settings, samples capture recognizable signal structure rather than degenerating into noise.

Text-to-ECG generation example — Text-to-ECG generation

Sleep-signal imputation example — Sleep-signal imputation

Medication-intervened forecasting example across ECG, PLETH, AWP, CO2 and BIS-EEG channels — Medication-intervened forecasting

Blood pressure translation example — Blood-pressure translation

Glucose signal translation example — Glucose-signal translation

Glucose super-resolution example — Glucose super-resolution

Model Families Evaluated

Diffusion

Iterative denoising — DiT.

Flow Matching

Continuous transport — SiT.

Autoregressive

Sequential tokens — MAR.

Normalizing Flows

Invertible likelihood — TarFlow.

Hierarchical

Coarse-to-fine — FractalGen & Imagen.

Unified Protocol

Shared tasks, splits, and metrics for every model.

Open & Reproducible

Get started

SensorGen ships standardized tasks, training pipelines, and pretrained checkpoints so you can reproduce results or build on them directly.

Quick start

Onboard the code. Clone the SensorGen GitHub repository and follow the README to set up the environment and standardized tasks.
Download a checkpoint. Pull a pretrained SensorGen-SiT checkpoint from Hugging Face, then run generation and evaluation as described in the repo.

Citation

BibTeX

@article{shuai2026sensorgen,
  title   = {Signal or Noise? Understanding Generative Models for Real-World Sensor Time Series},
  author  = {Shuai, Zitao and Xu, Zongzhe and Wu, Yuntian and Li, Sirui and Li, Tianhong and Yang, Yuzhe},
  journal = {Preprint},
  year    = {2026}
}

The broadest open study of sensor-signal generation

Signal Diversity & Scale

Sampling Frequency

Sequence Length

Time Span

Four Real-World Data Regimes

Emergency Department

Daily Life

Lab Study

Operation Room

Signal Modalities

One generation interface, four task categories

Semantic-to-Signal

Interpolation & Extrapolation

Translation

Editing

Coverage

Findings

01 Which generative paradigm transfers best?

Flow matching is the strongest default across tasks.

02 What signal properties matter?

Target-signal context tames long-sequence generation.

High-frequency signals need time-frequency modeling.

Demographic covariates help — only when normalized.

Fixed-range normalization is a more stable target.

03 Are generated signals useful beyond realism?

Synthetic signals deliver downstream utility.

Sensor generators scale with compute and capacity.

04 Generated signals, up close

Model Families Evaluated

Diffusion

Flow Matching

Autoregressive

Normalizing Flows

Hierarchical

Unified Protocol

Get started

Paper

Code

Models

Data

Quick start

BibTeX