ml.tar.gz

Playground

Playground-SPA

Complexity CH-1

regulizer CH-1

regulizer CH-1-S

regulizer ch1_v2

complexity ch1_v2

code_widget

ST_kernel_trick-LIVE

python

pytorch

ST_Autoencoders

code_widget_torchsummary

code_widget_tqdm

SPA_tqdm

AE_multiple

SPA_server

Implementation of NN

c1l3

Implementation of NN-copy

This course focuses on core concepts, algorithms, and machine learning techniques. It explores the fundamentals, implements algorithms from scratch, and compares the results with scikit-learn, the Python machine learning library. This course contains examples, theoretical knowledge, and codes for various ML algorithms.

You’ll start by learning the essentials of machine learning and its applications. Then, you’ll learn about supervised learning, clustering, and constructing a bag of visual words project, followed by generalized linear regression, support vector machines, logistic regression, ensemble learning, and principal component analysis. You’ll also learn about autoencoders and variational autoencoders and end with three exciting projects.

By the end, you’ll have a solid understanding of machine learning and its algorithms, hands-on experience implementing such algorithms and applying them to different problems, and an understanding of how each algorithm works with the provided examples.

Fundamentals of Machine Learning: A Pythonic Introduction

# Optimization
Logistic regression aims to learn a parameter vector $\bold{w}$ by minimizing a chosen loss function. While the squared loss $L_s(\bold{w})=\sum_{i=1}^n\bigg(y_i-\frac{1}{1+e^{-\bold{w}^T\phi(\bold{x}_i)}}\bigg)^2$ might appear as a natural choice, it's not convex. Fortunately, we have the flexibility to consider alternative loss functions that are convex. One such loss function is the **binary cross-entropy (BCE) loss**, denoted as $L_{BCE}$, which possesses convexity properties. The BCE loss can be defined as: 

 $$L_{BCE}(\bold{w})=-\sum_{i=1}^n(y_ilog(\hat y_i)+(1-y_i)log(1-\hat y_i))$$

## Explanation of BCE loss
Let's delve into the explanation of the BCE loss. For a single example in a dataset with a target label $y_i$, if $y_i=1$ and the prediction $\hat{y}_i \approx 1$, the loss $- \log(\hat{y}_i) \approx 0$. Conversely, if $\hat{y}_i \approx 0$, the loss $- \log(\hat{y}_i)$ becomes significantly large. Similarly, we can evaluate the pairs $(y_i=0, \hat{y}_i \approx 0)$ and $(y_i=0, \hat{y}_i \approx 1)$. The code snippet provided below illustrates the computation of the BCE loss for a single example:

Learn how to minimize BCE loss using gradient descent.

Optimizing BCE Loss

Course Overview

Supervised Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

Optimizing BCE Loss

Optimization

Explanation of BCE loss