Skip to content

Evals API - Image Input Cookbook #1950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 16, 2025
Merged

Conversation

daisyshe-oai
Copy link
Contributor

Summary

This cookbook demonstrates how to conduct sampling and model grading using image inputs in our evals API.

Motivation

I added support for image inputs for EvalsAPI. There is no cookbook to demonstrate this new functionality. Hence, this cookbook aims to share examples using image inputs with our evals product.


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

Copy link
Contributor

@josiah-openai josiah-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits!

"\n",
"OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n",
"\n",
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this.
In this cookbooks, we will use Evals API

That "we'll walk through..." is redundant.

"\n",
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",
"\n",
"Based on your use case, you might only need the sampling functionality or the model grader, and you can revise what you pass in during the eval and run creation to fit your needs. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line isn't needed, people probably understand that this is a starting point!

"\n",
"client = OpenAI(\n",
" api_key=os.getenv(\"OPENAI_API_KEY\"),\n",
" base_url=\"https://api.openai.com/v1\",\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need the base url here

"cell_type": "markdown",
"metadata": {},
"source": [
"To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trajectory is kind of an "inside of evalapi" term, I would chat say messages or input

"source": [
"To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n",
"\n",
"Here's the sampling message trajectory we'll use for this example."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for trajectory

"\n",
"OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n",
"\n",
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to put the use-case for this cookbook at the very top, like:
Something like:

In this cookbook we will grade the classification of an image against its ground truth

Copy link

@anoop-openai anoop-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks!

@daisyshe-oai daisyshe-oai merged commit 6ba23ee into main Jul 16, 2025
1 check passed
@daisyshe-oai daisyshe-oai deleted the daisyshe/evals-images-cookbook branch July 16, 2025 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants