Data Generation
Learn about the data generation process that is being used.
We'll cover the following
The data generation process
We know our model already. To generate synthetic data for it, we need to pick values for its parameters. In our case, we chose b = 1
and w = 2
(as in, thousands of $).
First, let us generate our feature (x
), we use Numpy’s rand
method to randomly generate 100 (N
) points between 0 and 1.
Then, we plug our feature (x
) and our parameters b
and w
into our equation to compute our labels (y
). But we need to add some Gaussian noise (epsilon
) as well. Otherwise, our synthetic dataset would be a perfectly straight line. We can generate noise using Numpy’s randn
method, which draws samples from a normal distribution (of mean 0 and variance 1), and then multiplies it by a factor to adjust for the level of noise. Since we do not want to add so much noise, we pick 0.1 as our factor.
Synthetic data generation
The following code generates our synthetic data:
Create a free account to view this lesson.
By signing up, you agree to Educative's Terms of Service and Privacy Policy