Search⌘ K

Inputs, Features, and Targets

Explore the inputs, features, and targets that form the foundation of supervised machine learning applications.

Before a machine learning model can learn anything, it needs structured data to observe patterns. In supervised learning, we train a model using examples where we provide both the raw information and the desired outcome.

  • The input is the raw data presented to the model.

  • The features are the specific, measurable pieces of information extracted from the inputs that the model learns from.

  • The target is the desired output that the model is trying to predict or classify.

Feature magic in machine learning

Under the umbrella of machine learning, inputs, and features are highly correlated because the input (commonly called the dataset) is processed to derive more insightful features. For example, the input will be the image of the person we want to identify for a facial recognition application. Features like edges, texture patterns, distance between the eyes, the nose, the height, etc., can be extracted by applying different transformations over the input image. Finally, the target will be the name of the person. In order to create this application, we need to make our model learn the identities of people, and for that, we need to provide the model with a mapping function between the inputs and the identities.

Extracting the input’s insightful features helps the model learn faster and more accurately. Our model might use the picture’s brightness or standard deviation as a feature, or it might divide the image into small patches like the four-quadrant system. In this manner, we get local information that comes in handy when the model struggles to understand the whole picture. Facial recognition is typically a classification task, where the model classifies the input image into one of several distinct person identities (the targets).

Feature selection process

In classical machine learning, feature selection was heavily dependent upon hand-engineered features, which took a lot of time. For instance, considering the color as an input feature, we might get useful results while classifying images of flowers (where color is a key distinguishing feature between species). However, this feature won’t be handy while identifying shirts or trousers, as they’re available in almost all colors. Therefore, we can’t restrict our model to call an object a shirt just because it’s red.

In modern machine learning, the model is designed to identify the best features without human intervention. The latest machine learning algorithms focus on creating models without knowing much about the features. Further, these features are extracted from inputs automatically.

To select the best features, we need to understand that not all inputs are equally useful for a machine learning model. For example, in email spam detection, the presence of certain keywords like “free” or “limited offer” can help identify spam messages, whereas the font color or email background is irrelevant.

This shows us that feature selection depends on:

  • Relevance: Does the feature really help distinguish between classes?

  • Consistency: Does the feature remain stable across multiple samples of the same object/person?

  • Generalization: Does it work well across different datasets?

Traditionally, experts handpicked features (e.g., shape, edges, keywords) through trial and error. Modern ML models (like deep learning) now learn useful features automatically from raw input data, reducing the need for manual selection.

So, the “best” features are those that make the model’s job easier by highlighting meaningful differences between classes while avoiding irrelevant details.

Custom features selection application

To create a perfect person identifier, we need to have a unique feature set such that those features remain consistent among all pictures of the same person, however, a clear difference in the feature set must be observed for a different person.

Machine learning algorithms analyze and process data to identify custom features, which are used to train models for classification, prediction, and other tasks. This improves the accuracy and reliability of models, helps distinguish between individuals, objects, and events, and improves understanding of complex systems and processes.

Therefore, the following application is designed in a way that allows the user to select features by clicking anywhere on two different persons’ faces. As a result, the application will output the difference between the same features but on two different faces.

1.  Press the **Run** button and wait till the connection gets 
     established. 

2.  Select the same set of features for both faces in order to 
     observe the difference between them.

3.  Select any feature point by clicking on the image of a person.  

4.  Hit **Enter** after selecting the features to move on to the next
     person.

5.  Open the **Terminal** tab to observe the difference.

6.  Run the `python3 /usercode/main.py` command to execute it again.
Custom features selection for facial recognition

After playing with the application given above, we can easily realize that feature selection in itself is a time-consuming process.

Note: While adding more relevant features can improve accuracy, adding irrelevant or redundant features can lead to overfitting (known as the curse of dimensionality), where the model performs poorly on new, unseen data.

Complex ML Applications

Now that we have covered the basics of supervised learning, let’s briefly look at how these concepts apply in advanced generative models.

Deepfake

Deepfake is a machine learning technique that’s commonly used to replace a person’s face with someone else’s face in a given image or video.

  • Input: An image of a person’s face (the source image) that will replace the original face in the image or video, along with the original image/video.

  • Target (yy): The desired output, which is the final modified image or video with the face replaced. The individual pixels of the modified image are the elements of the target output.

In the case of creating a deepfake GIF of a statue, a single image of the statue is sufficient to make it appear to move. However, multiple images of a person’s face can enhance the output results since more data is always welcome in machine learning.

Activity recognition

Activity recognition is a machine learning technique used to recognize a certain activity from a given sequence of input image frames. For an activity recognition algorithm, the inputs are more than a single image, conventionally.

  • Input: A sequence of image frames (a video clip) showing a person performing an action.
  • Target (yy): The output is a textual string describing the activity in layman’s language, such as “Running” or “Waving”. This is typically a sequence classification problem.
Activity recognition with labeled bounding boxes
Activity recognition with labeled bounding boxes

Text-to-image generation

As the title indicates, certain open source models are available nowadays that take a caption in string form as input in advance and create a synthetic image to draft that captioned scenario. In other words, it means creating an image that corresponds to or illustrates the content of the given caption.

  • Input (XX): The textual caption (e.g., “A dog wearing a wizard hat”).
  • Target (yy): The synthetic image that is generated to match the caption.

Click the text below to select any phrase and observe the effects:

Conclusion

We have explored the fundamental concepts of inputs, features, and targets, which form the bedrock of supervised machine learning. Understanding how to derive meaningful features from raw inputs is crucial for creating effective models, whether the task is classification (like identifying a person) or prediction/regression (like predicting house prices based on features). The transition from manual feature engineering to automatic feature learning in modern ML demonstrates the evolution of the field and its growing capability to handle highly complex, raw data.