Spaces:

cfahlgren1
/

inference-proxy

Sleeping

inference-proxy / README.md

cfahlgren1 HF Staff

set port

a87f9b7 7 months ago

1.26 kB

	---
	title: InferenceProxy
	emoji: 💾
	colorFrom: blue
	colorTo: pink
	sdk: docker
	pinned: false
	app_port: 4040
	---

	# inference-proxy

	Lightweight proxy to store LLM traces in a Hugging Face Dataset.

	### How it works

	This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables:

	- `BATCH_SIZE_LIMIT` - the maximum batch size before pushing to dataset
	- `BATCH_TIME_LIMIT` - the amount of time before pushing to dataset

	### Required Environment Variables

	- `HF_ACCESS_TOKEN` - HF Access Token
	- `USER_NAME` - Used to ensure we only process requests from the user

	### Example

	```js
	import { OpenAI } from "openai";

	const client = new OpenAI({
	baseURL: "http://localhost:4040/fireworks-ai/inference/v1",
	apiKey: process.env.HF_API_KEY,
	});

	let out = "";

	const stream = await client.chat.completions.create({
	model: "accounts/fireworks/models/deepseek-v3",
	messages: [
	{
	role: "user",
	content: "What is the capital of France?",
	},
	],
	stream: true,
	max_tokens: 500,
	});

	for await (const chunk of stream) {
	if (chunk.choices && chunk.choices.length > 0) {
	const newContent = chunk.choices[0].delta.content;
	out += newContent;
	console.log(newContent);
	}
	}
	```