-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Hi, I'm not familiar with llama-cpp-python (actually not familiar with cpp) but I have to use gguf model for my project.
I want to generate answer from pre-computed embedding vectors(torch.Tensor) with size (1, n_tokens, 4096), not from query text. Here I mean the embedding vectors are text embeddings that generated from torch.nn.Embedding()
(Just like inputs_embeds argument of generate() function of transformers model)
What I want to do is just skip process 1 and 2:
- tokenize input string
- make text embeddings from tokens
- model inference
- get output token
- detokenize
Is this feature already implemented? If not, please anyone help me where should I begin.
Metadata
Metadata
Assignees
Labels
No labels