Generating the Evaluation Code in Python

Learn how to generate the Python code that evaluates the pictionary bot responses using Google Gemini's text-to-code generation capability.

Behind the scenes

Text-to-code generation might seem like a trivial problem to solve, given how well LLMs can generate text from textual prompts. However, LLMs’ text-to-code generation relies on extensive training and fine-tuning of models to understand and generate both natural language and code. Here are some key differences between a text-to-text model and a text-to-code model:

  • Complexity: Code generation requires understanding programming syntax, logic, and how different code parts interact. This makes it more complex than text-to-text models, which primarily deal with the semantics and structure of human languages.

  • Output Format: Text-to-code models generate code snippets or a complete program in a specific programming language, as compared to plain text for text-to-text models.

  • Application: Text-to-code models can automate repetitive coding tasks, assist programmers, or help beginners learn. Text-to-text models, on the other hand, are used for tasks such as creative content writing, information summarization, or language translation.

Gemini currently supports around 20 programming languages. However, given its large context window and its ability to reason about logic, Gemini can be used to generate and query code that might not be officially supported.

Devin AI was announced by Cognition Labs as the world’s first AI programmer on March 12, 2024. This announcement created a lot of hype and panic, as engineers feared that AI might take over their jobs. While Devin AI did showcase some impressive code generation and debugging capabilities, it has yet to prove itself to be a capable standalone AI programmer.

Generating code with LLMs

Before we start generating code, it is important to understand some limitations of LLMs. These models exhibit great creativity, which can be helpful for creative purposes; however, this might be a hindrance for functional use cases such as code generation. Given the complexity of some coding tasks, LLMs might overlook the intricacies of logic flow in coding paradigms. For code that might need to be plugged into an existing codebase, the LLM might be unable to provide code that works in a specific environment. Lastly, the code that the model generates might be prone to vulnerabilities or follow unsafe programming practices. Therefore, it is important to always review the code generated by a LLM.

The caveats might suggest that LLMs would struggle to generate code effectively. However, like any problem, it becomes manageable when you know how to approach it. Let’s look at a few prompt guidelines for generating code:

  • Define your goal clearly: Before interacting with the LLM, we need to have a clear understanding of what we want the code to achieve. This includes the program’s functionality, desired inputs and outputs, and any specific libraries we might want to use.

  • Break down the problem: Divide the program’s functionality into smaller, more manageable tasks. Then, build upon the generated code step by step. This enables the LLM to create smaller code snippets that are easier for the user to review and test. Once validated, all the code snippets can be sent to the LLM to stitch together.

  • Use comments: When sending code snippets to the LLM for stitching, be sure to add comments to explain its functionality and logic. This will improve readability and allow the LLM to understand the intent behind the code.

Generating Python code with Gemini

Our pictionary app has reached a functional level. Now, we need to add the evaluation component. We’ve learned that using Python code could be a reliable and straightforward approach. The generate_content() method will come in handy again and allow us to generate the code with a prompt.

Let’s map out the logic of our code; create a function and name it check_guess().

  1. The function will take two inputs:

    1. Gemini’s response

    2. The correct answer

  2. Next, it will extract the value of guess from the JSON response.

  3. It will then modify the string to make it easier to compare. This would include changing both inputs to lowercase and removing any white spaces.

  4. Finally, it should return True if the guess is correct; otherwise, it should return False.

Now that are requirements are clear, let’s prompt Gemini to generate the code. We can try it out in the code widget below:

Get hands-on with 1300+ tech skills courses.