-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Evals API - Image Input Cookbook #1950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits!
"\n", | ||
"OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n", | ||
"\n", | ||
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this.
In this cookbooks, we will use Evals API
That "we'll walk through..." is redundant.
"\n", | ||
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n", | ||
"\n", | ||
"Based on your use case, you might only need the sampling functionality or the model grader, and you can revise what you pass in during the eval and run creation to fit your needs. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line isn't needed, people probably understand that this is a starting point!
"\n", | ||
"client = OpenAI(\n", | ||
" api_key=os.getenv(\"OPENAI_API_KEY\"),\n", | ||
" base_url=\"https://api.openai.com/v1\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't need the base url here
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trajectory is kind of an "inside of evalapi" term, I would chat say messages
or input
"source": [ | ||
"To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n", | ||
"\n", | ||
"Here's the sampling message trajectory we'll use for this example." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for trajectory
"\n", | ||
"OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n", | ||
"\n", | ||
"In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to put the use-case for this cookbook at the very top, like:
Something like:
In this cookbook we will grade the classification of an image against its ground truth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks!
Summary
This cookbook demonstrates how to conduct sampling and model grading using image inputs in our evals API.
Motivation
I added support for image inputs for EvalsAPI. There is no cookbook to demonstrate this new functionality. Hence, this cookbook aims to share examples using image inputs with our evals product.
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.