What is ELU?

In the formula above, $\alpha$ is usually set to 1.0. It determines the saturation level of the negative inputs.

Need for ELU

The activation function ReLU became famous when it solved the problem of the vanishing gradient (namely, the gradients of activation functions, like sigmoids, become very small making it difficult to train bigger models).

At the same time, however, ReLU created a problem for itself, called the dying ReLU problem. This problem occurs when ReLU outputs 0 on any input.

In contrast, ELU (like batch normalization) has negative values that help bring the mean value closer to 0. This improves the training speed.

Even though Parametric ReLU and Leaky ReLU also have negative values, they are not smooth functions. ELU is a smooth function for negative values, making it more noise-robust.

Code

Here, we implement ELU in Python:

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

What is ELU?

Need for ELU

Code

Code explanation