Vector Space in Data Science

From abstract to real vector space

All data (text, images, videos, sound) is made up of data points or samples, which can be represented as column vectors having dd real numbers. That said, we already know that all d-dimensional vectors compose the Rd\R^d vector space over the field R\R. So, we can say that Rd\R^d is the most popular vector space used in data science because all data is represented using vectors that span Rd\R^d. Here, dd represents the dimension of the vector space, Rd\R^d.

Image is a point in space

Consider a 4×44 \times 4 matrix of intensities of a grayscale image shown in the figure below. Each intensity value can be considered a feature of the image, so we can say that it has a total of 16 features. We can represent these features as a column vector in R16\R^{16} by flattening the image, say, row-wise. Even if we don’t flatten the image, it’s still a vector/point in the space of 4×44 \times 4 matrices denoted by M4×4M^{4 \times 4}.

Get hands-on with 1400+ tech skills courses.