-
-
Notifications
You must be signed in to change notification settings - Fork 0
feat:Update Cohere embed endpoints to support flexible input and new parameters #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe OpenAPI specification for the Cohere API's embedding endpoints has been updated to support more flexible and detailed input structures. The Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant EmbedModel
Client->>API: POST /v1/embed or /v2/embed (inputs, max_tokens, output_dimension)
API->>EmbedModel: Process array of EmbedInput (text/image), apply max_tokens, output_dimension
EmbedModel-->>API: Embedding results
API-->>Client: Embedding response
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/libs/Cohere/openapi.yaml (2)
8016-8020
:output_dimension
parameter: well-defined, but consider enumerating allowed values.The
output_dimension
property is described as only available forembed-v4
and newer models, with allowed values of 256, 512, 1024, and 1536. However, the schema currently only specifiestype: integer
and a description.Recommendation:
To improve API documentation and client validation, consider using anenum
to explicitly list the allowed values foroutput_dimension
. This will help prevent invalid requests and make the contract clearer for SDK generators.Example:
output_dimension: type: integer enum: [256, 512, 1024, 1536] description: ...
15271-15275
:EmbedInput.content
property: description is accurate, but clarify structure for mixed content.The description for the
content
property ("An array of objects containing the input data for the model to embed.") is accurate, but could be more explicit about the expected structure, especially for clients unfamiliar with multimodal input.Recommendation:
Consider clarifying that each item in the array can be either a text or image object, and provide a short example in the description. This will help API consumers understand how to construct mixed-content inputs.Example addition:
Each item in the array should be an object of type
EmbedText
orEmbedImage
. For example:[{"type": "text", "text": "hello"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (8)
src/libs/Cohere/Generated/Cohere.CohereClient.Embedv2.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.ICohereClient.Embedv2.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.Models.EmbedContent.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.Models.EmbedImage.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.Models.EmbedInput.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.Models.EmbedText.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/Cohere.Models.Embedv2Request.g.cs
is excluded by!**/generated/**
src/libs/Cohere/Generated/JsonSerializerContextTypes.g.cs
is excluded by!**/generated/**
📒 Files selected for processing (1)
src/libs/Cohere/openapi.yaml
(4 hunks)
🔇 Additional comments (2)
src/libs/Cohere/openapi.yaml (2)
15255-15255
:EmbedImage
schema: description addition is clear and helpful.The addition of the description "Image content of the input." to the
EmbedImage
schema is clear and improves the documentation.
15360-15360
:EmbedText
schema: description addition is clear and helpful.The addition of the description "Text content of the input." to the
EmbedText
schema is clear and improves the documentation.
inputs: | ||
type: array | ||
items: | ||
$ref: '#/components/schemas/EmbedInput' | ||
description: An array of inputs for the model to embed. Maximum number of inputs per call is `96`. An input can contain a mix of text and image components. | ||
x-fern-audiences: | ||
- public | ||
max_tokens: | ||
type: integer | ||
description: 'The maximum number of tokens to embed per input. If the input text is longer than this, it will be truncated according to the `truncate` parameter.' | ||
x-fern-audiences: | ||
- public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
New inputs
, max_tokens
, and output_dimension
fields: strong improvement, but clarify interaction with legacy fields.
The addition of the inputs
array (of EmbedInput
), max_tokens
, and output_dimension
to the /v2/embed
request schema is well-structured and aligns with modern multimodal embedding API design. The descriptions are clear and the types are correct.
However, there is potential for confusion regarding the interaction between the new inputs
field and the legacy texts
and images
fields, which are still present in the schema. It's not clear:
- Whether
inputs
can be used together withtexts
/images
, or if they are mutually exclusive. - What happens if both are provided—does one take precedence, or is it an error?
- If
inputs
is now the preferred way, consider markingtexts
/images
as deprecated or clarifying in the documentation.
Recommendation:
Add explicit documentation (either in the property descriptions or as a top-level note) clarifying the intended usage and precedence of inputs
vs. texts
/images
. If possible, mark texts
and images
as deprecated for new clients.
Would you like a suggested wording for this clarification, or a diff to mark the legacy fields as deprecated?
Action Required: Clarify Interaction Between New and Legacy Fields
The new addition of the inputs
array (containing EmbedInput
objects), max_tokens
, and output_dimension
to the /v2/embed
request schema is well structured and clearly described. However, there remains some ambiguity regarding how these new fields interact with the legacy texts
and images
fields. In particular, it’s not clear:
- Whether
inputs
is intended to be used exclusively, or if it can be combined withtexts
/images
. - Which field takes precedence if both are provided, or if using them together should trigger an error.
- If
inputs
should be promoted as the standard input method, whether the legacy fields ought to be marked as deprecated.
Recommendation:
Update the OpenAPI documentation by adding an explicit note—either as a property description or a top-level comment—detailing the intended usage and precedence. If appropriate, mark texts
and images
as deprecated for new clients.
Would you like a suggested diff to implement these clarifications?
Summary by CodeRabbit
New Features
Documentation