Displaying media

Gradio has many other components that can be used to interact with users in the UI. In this lesson, we will go over some of the common Gradio components that will be useful for displaying media, and allow users to interact with the UI in meaningful ways. We will get to see how Gradio abstracts away the complexities of this interaction and makes it very easy to work with images and audio files.

Image

With any application, it is very common to have images. This could be just for displaying images, interacting with images such as uploading image files or taking snapshots from a webcam. Particularly in the data space and generative AI, images are also frequently used in many machine learning pipelines, so it is important to have good integration with handling images. In this lesson, we will look at how Gradio allows us to interact with images.

Image in Gradio

Gradio has an Image component, that allows users to upload images as input, or display images as an output. This can all be achieved in just a few short lines of code, and we can use the Interface class to wrap this together into a UI.

We can instantiate the Image component using gr.Image(), or using the shortcut string 'image'.

Many initialization parameters allow us to customize the Image component. These include things such as specifying whether we want color or black/white images, even the source of the image, such as whether it is from an upload, or snapshots from a webcam. Gradio Image component is extremely customizable, and it can all be determined when we initialize the component. We will see different examples of using the Image component throughout this course.

Property viewer app: Identifying property attributes from images

When looking at property listings, one of the first things we look at are the images. The images of a property tell us many things that we are looking for. For example, we might see the different amenities, such as swimming pools, patio, gym, etc., from the listing images. In this lesson, we will build a new page in our property viewer application, that will allow users to upload images, and feed these images into a prebuilt model that will identify any attribute it can find in the images. Let’s get started!

We will introduce a few new features and concepts in this example. We will be using a famous model from Hugging Face as our image classifier. Hugging Face is a collaboration platform that hosts many different trained models that can easily be accessed. We will have a deeper dive into what Hugging Face is in future lessons and the integration with Gradio. For now, it suffices to know that Gradio is very well integrated with Hugging Face and has many ways to leverage Hugging Face models in a few lines of code.

We will use a Vision Transformer model pre-trained on ImageNet-21k (14 million images, 21,843) for our image classification. This is a general-purpose image classifier, not tuned specifically for property so it wouldn’t be too surprising if it does not pick up all relevant property attributes we are looking for. This could potentially be an area to improve upon!

The basic steps involved in building this image detection application are:

  • We will load the pre-trained model from Hugging Face into Gradio.

  • We will pass Gradio Image components into the pre-trained model.

  • The identified objects will be shown in the output.

Let’s look at a line by line breakdown of the code.

Get hands-on with 1300+ tech skills courses.