feat: include tokens usage for streamed output #4282

mintyleaf · 2024-11-27T22:37:31Z

Description

This PR fixes didn't find such issue

i found out that usage body part is always zeroed when localai is used through webui, e.g. the output is streamed
the issue is grpc server is supposed to return a stream of Reply structures, but for some reason they was used to retrieve just message bytes every chunk and nothing more
But, the logic for holding mysterious empty usage data for every chunk is present (even it have sent only at the last closing chunk, that's why i was forced to do a hack with empty choices to retrieve just usage data through the tokenCallback)

i don't really know about testing the tools part (and what the hell it is, lol), but since it uses usage data returned from ComputeChoices function - i have verified that it indeed returns non-empty usage result

at the same time just Predict (not streaming) part using protocol.Reply all the way down through the layers of abstractions and correctly showing the token usage data

…to get the proper usage data in reply streaming mode at the last [DONE] frame

netlify · 2024-11-27T22:38:22Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`e459118`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/67482889efbc0f0008f0d32b
😎 Deploy Preview	https://deploy-preview-4282--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

mudler · 2024-11-28T08:25:33Z

core/http/endpoints/openai/chat.go

@@ -39,11 +39,15 @@ func ChatEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, startup
 		responses <- initialMessage

 		ComputeChoices(req, s, config, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
+			choices := []schema.Choice{}
+			if s != "" {


I'm not entirely sure about this as I didn't checked what OpenAI does in this case (and hard to replicate).

If the token is empty does it send back an empty choice, or just skips it? Probably doesn't matter either, just making sure we keep an eye to not break any user flow

i don't really know, but that token callback function is used only for streaming reply mode

and in that mode we chop runes of the token, and if there is some - we calling that function, so before that commit in that specific case the tokenCallback never will be called with empty first argument which i check here

the problem is the real usage is sent by llama as final reply where message field bytes is empty

and in that specific case i explicitly call tokenCallback just to capture the usage part

next, about choices - they are always was strictly set to a slice with exactly one element before
so i just used it as an opportunity to not rewrite much of the code and somehow flag out that everything except usage part is need to be ignored, so the real final reply can be constructed after receiving from backend depending on the tools stuff e.t.c.

Use pb.Reply instead of []byte with Reply.GetMessage() in llama grpc …

2931ea4

…to get the proper usage data in reply streaming mode at the last [DONE] frame

mintyleaf and others added 2 commits November 28, 2024 03:17

Merge branch 'master' into fix/stream_tokens_usage

8e8e05d

Merge branch 'master' into fix/stream_tokens_usage

e459118

mudler reviewed Nov 28, 2024

View reviewed changes

mudler changed the title ~~Fix OpenAI tokens usage for streamed output~~ fix: include tokens usage for streamed output Nov 28, 2024

mudler changed the title ~~fix: include tokens usage for streamed output~~ feat: include tokens usage for streamed output Nov 28, 2024

mudler added the enhancement New feature or request label Nov 28, 2024

mudler approved these changes Nov 28, 2024

View reviewed changes

mudler merged commit 0d6c3a7 into mudler:master Nov 28, 2024
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: include tokens usage for streamed output #4282

feat: include tokens usage for streamed output #4282

mintyleaf commented Nov 27, 2024

netlify bot commented Nov 27, 2024 •

edited

Loading

mudler Nov 28, 2024

mintyleaf Nov 28, 2024 •

edited

Loading

feat: include tokens usage for streamed output #4282

feat: include tokens usage for streamed output #4282

Conversation

mintyleaf commented Nov 27, 2024

netlify bot commented Nov 27, 2024 • edited Loading

✅ Deploy Preview for localai ready!

mudler Nov 28, 2024

Choose a reason for hiding this comment

mintyleaf Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

netlify bot commented Nov 27, 2024 •

edited

Loading

mintyleaf Nov 28, 2024 •

edited

Loading