Spaces:
Paused
Paused
| # Local LangChain with FastChat | |
| [LangChain](https://python.langchain.com/en/latest/index.html) is a library that facilitates the development of applications by leveraging large language models (LLMs) and enabling their composition with other sources of computation or knowledge. | |
| FastChat's OpenAI-compatible [API server](openai_api.md) enables using LangChain with open models seamlessly. | |
| ## Launch RESTful API Server | |
| Here are the steps to launch a local OpenAI API server for LangChain. | |
| First, launch the controller | |
| ```bash | |
| python3 -m fastchat.serve.controller | |
| ``` | |
| LangChain uses OpenAI model names by default, so we need to assign some faux OpenAI model names to our local model. | |
| Here, we use Vicuna as an example and use it for three endpoints: chat completion, completion, and embedding. | |
| `--model-path` can be a local folder or a Hugging Face repo name. | |
| See a full list of supported models [here](../README.md#supported-models). | |
| ```bash | |
| python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5 | |
| ``` | |
| Finally, launch the RESTful API server | |
| ```bash | |
| python3 -m fastchat.serve.openai_api_server --host localhost --port 8000 | |
| ``` | |
| ## Set OpenAI Environment | |
| You can set your environment with the following commands. | |
| Set OpenAI base url | |
| ```bash | |
| export OPENAI_API_BASE=http://localhost:8000/v1 | |
| ``` | |
| Set OpenAI API key | |
| ```bash | |
| export OPENAI_API_KEY=EMPTY | |
| ``` | |
| If you meet the following OOM error while creating embeddings, please set a smaller batch size by using environment variables. | |
| ~~~bash | |
| openai.error.APIError: Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\\n\\n(CUDA out of memory. Tried to allocate xxx MiB (GPU 0; xxx GiB total capacity; xxx GiB already allocated; xxx MiB free; xxx GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400) | |
| ~~~ | |
| You can try `export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1`. | |
| ## Try local LangChain | |
| Here is a question answerting example. | |
| Download a text file. | |
| ```bash | |
| wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt | |
| ``` | |
| Run LangChain. | |
| ~~~py | |
| from langchain.chat_models import ChatOpenAI | |
| from langchain.document_loaders import TextLoader | |
| from langchain.embeddings import OpenAIEmbeddings | |
| from langchain.indexes import VectorstoreIndexCreator | |
| embedding = OpenAIEmbeddings(model="text-embedding-ada-002") | |
| loader = TextLoader("state_of_the_union.txt") | |
| index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader]) | |
| llm = ChatOpenAI(model="gpt-3.5-turbo") | |
| questions = [ | |
| "Who is the speaker", | |
| "What did the president say about Ketanji Brown Jackson", | |
| "What are the threats to America", | |
| "Who are mentioned in the speech", | |
| "Who is the vice president", | |
| "How many projects were announced", | |
| ] | |
| for query in questions: | |
| print("Query:", query) | |
| print("Answer:", index.query(query, llm=llm)) | |
| ~~~ | |