Spaces:
Sleeping
Sleeping
| title: InferenceProxy | |
| emoji: 💾 | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: docker | |
| pinned: false | |
| app_port: 4040 | |
| # inference-proxy | |
| Lightweight proxy to store LLM traces in a Hugging Face Dataset. | |
| ### How it works | |
| This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables: | |
| - `BATCH_SIZE_LIMIT` - the maximum batch size before pushing to dataset | |
| - `BATCH_TIME_LIMIT` - the amount of time before pushing to dataset | |
| ### Required Environment Variables | |
| - `HF_ACCESS_TOKEN` - HF Access Token | |
| - `USER_NAME` - Used to ensure we only process requests from the user | |
| ### Example | |
| ```js | |
| import { OpenAI } from "openai"; | |
| const client = new OpenAI({ | |
| baseURL: "http://localhost:4040/fireworks-ai/inference/v1", | |
| apiKey: process.env.HF_API_KEY, | |
| }); | |
| let out = ""; | |
| const stream = await client.chat.completions.create({ | |
| model: "accounts/fireworks/models/deepseek-v3", | |
| messages: [ | |
| { | |
| role: "user", | |
| content: "What is the capital of France?", | |
| }, | |
| ], | |
| stream: true, | |
| max_tokens: 500, | |
| }); | |
| for await (const chunk of stream) { | |
| if (chunk.choices && chunk.choices.length > 0) { | |
| const newContent = chunk.choices[0].delta.content; | |
| out += newContent; | |
| console.log(newContent); | |
| } | |
| } | |
| ``` | |