Generating Recipes

Explore the research related to generating recipes with deep learning.

As eating habits and culinary practices have evolved over time, with a shift toward consuming more meals from third-party sources like takeaways and restaurants, obtaining detailed information about prepared food has become increasingly challenging.

Behind each meal, there is a story described in a complex recipe, but unfortunately, we cannot access its preparation process by simply looking at the image. This limitation underscores the need for the development of a reversed cooking system capable of deriving ingredients and cooking instructions from a prepared meal. Here is where generative models come in.

One advanced application of generating textual descriptions of images using GANs involves producing a structured description of an image with multiple components, such as generating a recipe for foodSalvador, Amaia, Michal Drozdzal, Xavier Giro-i-Nieto, and Adriana Romero. 2019. “Inverse Cooking: Recipe Generation from Food Images.” ArXiv:1812.06164 [Cs], June. https://arxiv.org/abs/1812.06164. (as shown in the image below).

Press + to interact
A recipe generated from an image of food
A recipe generated from an image of food

As shown in the figure above, this description is also more complex because it relies on a particular sequence of these components (instructions) to be coherent.

Generating recipes from images

Generating a recipe (including its title, list of ingredients, and preparation instructions) from an image is challenging. It necessitates a comprehensive grasp of the ingredients and the various processes they undergo, such as slicing, blending, or mixing.

One way to achieve this is that instead of obtaining the recipe from an image directly, a recipe generation pipeline would benefit from an intermediate step predicting the ingredients list. Then, the sequence of instructions would be generated conditioned on both the image and its corresponding list of ingredients, where the interplay between the image and ingredients could provide additional insights into how the latter were processed to produce the resulting dish. This process is called inverse cooking, where the cooking recipes are recreated given the food images.

Inverse cooking

As the figure below demonstrates, the inverse cooking problem has also been studied using generative modelsSalvador, Amaia, Michal Drozdzal, Xavier Giro-i-Nieto, and Adriana Romero. 2019. “Inverse Cooking: Recipe Generation from Food Images.” ArXiv:1812.06164 [Cs], June. https://arxiv.org/abs/1812.06164.. The suggested recipe generator takes a food image as an input and outputs a sequence of cooking instructions, which are generated by means of an instruction decoder that takes two embeddings as input:

  • One embedding represents visual features extracted from an image.

  • The second one encodes the ingredients extracted from the image.

The instruction decoder is composed of transformer blocks, each of them containing two attention layers—a technique to improve the performance of models by focusing on relevant information—followed by a linear layer. The first attention layer applies self-attention, which captures dependencies ...