An image, because it is composed of millions of pixels, requires analysis of every pixel to particularly derive and detect features using traditional convolutional neural networks. Furthermore, since pixels form a much smaller landscape within a large image, each pixel's vicinity and neighboring pixels need to be sought out to detect patterns and more largely describe features.
Narrowing down to the application of object detection, features within any particular object can be described in terms of the difference of light intensities between neighboring landscapes.
For instance, in the case of facial features, lips have a relatively darker intensity than the upper lip and lower-lip chin area. Similarly the edge of forehead is of a relatively higher intensity than that of the hair on the head.
Each region is viewed as a rectangular frame and the intensities are summed up over each frame. The difference between neighboring frames is calculated, and it follows that a greater difference in the intensities of the two regions would indicate moving from one feature to another. A significant and appropriate threshold is set for the difference between intensities to be classified as a Haar-like feature.
The difference in intensities,
Naively, the difference between intensities of forehead edge and hair would be greater than the difference between intensities of cheekbone area and lower cheeks. Notably, a greater difference in intensities would indicate a feature distinction or recognition of some kind.
Traditionally, there are three major Haar-like features used in facial detection essentially.
Line features: Line features can be used to detect a slide of intensities that vary from light-dark-light or even dark-light-dark. The idea is to detect areas of a different intensity encapsulated between symmetric regions. An example would be lips positioned between the lighter upper lips and lower lips area.
Edge features: Edge features are used to encapsulate the starkly varying intensities such as the direct sliding from regions of darker intensities to regions of lower intensities. The detection of facial edge due to the difference between the intensities of the darker hair regions and comparatively lighter facial skin area.
Four rectangular features: The identification of finer regions and patterns on the face that can be viewed as a mesh of of diagonally varying intensities can be identified using four rectangular features. An example could be the jawline and cheekbone areas.
It is worth noting that the above listed categories are just the major, broad Haar-like features used in the context of facial recognition initially. Extending upon the idea, more Haar-like features are used and introduced subject to the object being classified.
A popular application of Haar-like features is facial detection which was first detailed and proposed in the Viola-Jones object detection algorithm.
On the whole, the algorithm slides a rectangular window over gray-scaled sections of the face and sums the intensities in each window. The difference in the summed up intensities is used to detect all possible Haar-like features (of the the three being line, edge and four rectangular features).
Note: Once all Haar-like features corresponding to all normally necessary facial features are identified, such as two eyes, lips and facial edges, the algorithm concludes that the object in question is a face.