Learn Before
Concept

Synthetic Data Generation for Linear Regression

To evaluate machine learning models, we often generate synthetic datasets where the underlying ground truth relationship is known. For a linear regression task, we can draw a design matrix X\mathbf{X} of features from a standard normal distribution. The corresponding labels y\mathbf{y} are computed by applying a ground truth linear function defined by true weights w\mathbf{w} and bias bb, and then corrupting the output with additive noise ϵ\boldsymbol{\epsilon} drawn from a normal distribution with mean μ=0\mu=0 and standard deviation σ=0.01\sigma = 0.01: y=Xw+b+ϵ\mathbf{y}= \mathbf{X} \mathbf{w} + b + \boldsymbol{\epsilon}. This procedure ensures that the generated labels simulate realistically observed data containing inherent random variation.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L