Files
lang-agent/README.md

183 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lang Agent Chat API
这是一个基于FastAPI的聊天API服务使用OpenAI格式的请求来调用pipeline.invoke方法进行聊天。
## 安装依赖
```bash
# recommended to install as dev to easily modify the configs in ./config
python -m pip install -e .
```
## 环境变量
make a `.env` with:
```bash
ALI_API_KEY=<ALI API KEY>
ALI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
LANGSMITH_API_KEY=<LANG SMITH API KEY> # for testing only
```
### Hardware tools
update the link to xiaozhi server in `configs/mcp_config.json`
## Configure for Xiaozhi
0. Start the `fastapi_server/server_dashscope.py` file
1. Make a new model entry in `xiaozhi` with AliBL as provider.
2. Fill in the `base_url` entry. The other entries (`API_KEY`, `APP_ID`) can be garbage
- for local computer `base_url=http://127.0.0.1:8588/api/`
- if inside docker, it needs to be `base_url=http://{computer_ip}:8588/api/`
## 运行服务
#### API key setup
`server_dashcop.py` and `server_openai.py` both require api key; generate one and set
```bash
FAST_AUTH_KEYS=API_KEY1,API_KEY2 # at least one
```
`FAST_AUTH_KEYS` will be used as the api-key for authentication when the api is requested.
```bash
# for easy debug; streams full message internally for visibility
python fastapi_server/fake_stream_server_dashscopy.py
# for live production; this is streaming
python fastapi_server/server_dashscope.py
# start server with chatty tool node; NOTE: streaming only!
python fastapi_server/server_dashscope.py route chatty_tool_node
# this supports openai-api;
python fastapi_server/server_openai.py
```
see sample usage in `fastapi_server/test_dashscope_client.py` to see how to communicate with `fake_stream_server_dashscopy.py` or `server_dashscope.py` service
## Conversation Viewer
A web UI to visualize and browse conversations stored in the PostgreSQL database.
### Setup
1. Ensure your database is set up (see `scripts/init_user.sql` and `scripts/recreate_table.sql`)
2. Set the `CONN_STR` environment variable:
```bash
export CONN_STR="postgresql://myapp_user:secure_password_123@localhost/ai_conversations"
```
### Running the Viewer
```bash
python fastapi_server/server_viewer.py
```
Then open your browser and navigate to:
```
http://localhost:8590
```
### Features
- **Left Sidebar**: Lists all conversations with message counts and last updated timestamps
- **Main View**: Displays messages in a chat-style interface
- Human messages appear on the right (blue bubbles)
- AI messages appear on the left (green bubbles)
- Tool messages appear on the left (orange bubbles with border)
The viewer automatically loads all conversations from the `messages` table and allows you to browse through them interactively.
### Openai API differences
For the python `openai` package it does not handle memory. Ours does, so each call remembers what happens previously. For managing memory, pass in a `thread_id` to manager the conversations
```python
from openai import OpenAI
client = OpenAI(
base_url=BASE_URL,
api_key="test-key" # see put a key in .env and put it here; see above
)
client.chat.completions.create(
model="qwen-plus",
messages=messages,
stream=True,
extra_body={"thread_id":"2000"} # pass in a thread id; must be string
)
```
## Runnables
everything in scripts:
- For sample usage see `scripts/demo_chat.py`.
- To evaluate the current default config `scripts/eval.py`
- To make a dataset for eval `scripts/make_eval_dataset.py`
## Registering MCP service
put the links in `configs/mcp_config.json`
## Modifying LLM prompts
Refer to model above when modifying the prompts.
they are in `configs/route_sys_prompts`
- `chat_prompt.txt`: controls `chat_model_call`
- `route_prompt.txt`: controls `router_call`
- `tool_prompt.txt`: controls `tool_model_call`
- `chatty_prompt.txt`: controls how the model say random things when tool use is in progress. Ignore this for now as model architecture is not yet configurable
## Frontend (Conversation Viewer UI)
The React-based frontend for browsing conversations lives in the `frontend` directory.
### Install dependencies
```bash
cd frontend
npm install
```
### Start the `front_apis` server
The frontend talks to the `front_apis` FastAPI service, which by default listens on `http://127.0.0.1:8001`.
From the project root:
```bash
uvicorn fastapi_server.front_apis:app --reload --host 0.0.0.0 --port 8001
```
You can change the URL by setting `VITE_FRONT_API_BASE_URL` in `frontend/.env` (defaults to `http://127.0.0.1:8001`).
### Start the development server
```bash
cd frontend
npm run dev
```
By default, Vite will start the app on `http://localhost:5173` (or the next available port).
## Stress Test results
### Dashscope server summary
#### Non-Streaming
| Concurrency | Requests | Success % | Throughput (req/s) | Avg Latency (ms) | p95 (ms) | p99 (ms) |
|-----------:|---------:|----------:|-------------------:|-----------------:|---------:|---------:|
| 1 | 10 | 100.00% | 0.77 | 1293.14 | 1460.48 | 1476.77 |
| 5 | 25 | 100.00% | 2.74 | 1369.23 | 1827.11 | 3336.25 |
| 10 | 50 | 100.00% | 6.72 | 1344.48 | 1964.75 | 2165.77 |
| 20 | 100 | 100.00% | 10.90 | 1688.06 | 2226.49 | 2747.19 |
| 50 | 200 | 100.00% | 11.75 | 3877.01 | 4855.45 | 5178.52 |
#### Streaming
| Concurrency | Requests | Success % | Throughput (req/s) | Avg Latency (ms) | p95 (ms) | p99 (ms) |
|-----------:|---------:|----------:|-------------------:|-----------------:|---------:|---------:|
| 1 | 10 | 100.00% | 0.73 | 1374.08 | 1714.61 | 1715.82 |
| 10 | 50 | 100.00% | 5.97 | 1560.63 | 1925.01 | 2084.21 |
| 20 | 100 | 100.00% | 9.28 | 2012.03 | 2649.72 | 2934.84 |
Interpretation - Handling concurrently 20 conversations should be ok