Evals API - Image Input Cookbook #1950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

daisyshe-oai merged 3 commits into main from daisyshe/evals-images-cookbook

Jul 16, 2025

Contributor

daisyshe-oai commented Jul 15, 2025

Summary

This cookbook demonstrates how to conduct sampling and model grading using image inputs in our evals API.

Motivation

I added support for image inputs for EvalsAPI. There is no cookbook to demonstrate this new functionality. Hence, this cookbook aims to share examples using image inputs with our evals product.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.


          cookbook and info

9ca3d00

daisyshe-oai requested review from Andrew-peng, anoop-openai and prashantmital-openai

July 15, 2025 20:18

josiah-openai approved these changes

View reviewed changes

Contributor

josiah-openai left a comment

Left some nits!

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                  "\n",
+                  "OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n",
+                  "\n",
+                  "In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",

Contributor

josiah-openai Jul 15, 2025

delete this.
In this cookbooks, we will use Evals API

That "we'll walk through..." is redundant.

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                  "\n",
+                  "In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",
+                  "\n",
+                  "Based on your use case, you might only need the sampling functionality or the model grader, and you can revise what you pass in during the eval and run creation to fit your needs. "

Contributor

josiah-openai Jul 15, 2025

nit: this line isn't needed, people probably understand that this is a starting point!

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                  "\n",
+                  "client = OpenAI(\n",
+                  "    api_key=os.getenv(\"OPENAI_API_KEY\"),\n",
+                  "    base_url=\"https://api.openai.com/v1\",\n",

Contributor

josiah-openai Jul 15, 2025

nit: don't need the base url here

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n",

Contributor

josiah-openai Jul 15, 2025

trajectory is kind of an "inside of evalapi" term, I would chat say messages or input

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                 "source": [
+                  "To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n",
+                  "\n",
+                  "Here's the sampling message trajectory we'll use for this example."

Contributor

josiah-openai Jul 15, 2025

Same here for trajectory

examples/evaluation/use-cases/EvalsAPI_Image_Inputs.ipynb Outdated

+                  "\n",
+                  "OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n",
+                  "\n",
+                  "In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",

Contributor

josiah-openai Jul 15, 2025

Probably want to put the use-case for this cookbook at the very top, like:
Something like:

In this cookbook we will grade the classification of an image against its ground truth


          revised from feedback

cbfe474

anoop-openai approved these changes

View reviewed changes

anoop-openai left a comment

looks good, thanks!


          revisions from Shikhar's feedback

fee6383

daisyshe-oai merged commit 6ba23ee into main

1 check passed

daisyshe-oai deleted the daisyshe/evals-images-cookbook branch

July 16, 2025 23:36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet