lang-agent/README.md

# Lang Agent Chat API

这是一个基于FastAPI的聊天API服务，使用OpenAI格式的请求来调用pipeline.invoke方法进行聊天。

## Docker Installation

For production deployment using Docker, see the [Installation Guide](README_INSTALL.md).

## 安装依赖

```bash
# recommended to install as dev to easily modify the configs in ./config
python -m pip install -e .
```

## 环境变量

make a `.env` with:

```bash
ALI_API_KEY=<ALI API KEY>
ALI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
LANGSMITH_API_KEY=<LANG SMITH API KEY> # for testing only
```

### Hardware tools
update the link to xiaozhi server in `configs/mcp_config.json`

## Configure for Xiaozhi
0. Start the `fastapi_server/server_dashscope.py` file
1. Make a new model entry in `xiaozhi` with AliBL as provider.
2. Fill in the `base_url` entry. The other entries (`API_KEY`, `APP_ID`) can be garbage
    - for local computer `base_url=http://127.0.0.1:8588/api/`
    - if inside docker, it needs to be `base_url=http://{computer_ip}:8588/api/`


## 运行服务

#### API key setup
`server_dashcop.py` and `server_openai.py` both require api key; generate one and set

```bash
FAST_AUTH_KEYS=API_KEY1,API_KEY2    # at least one
```
`FAST_AUTH_KEYS` will be used as the api-key for authentication when the api is requested.

```bash
# for easy debug; streams full message internally for visibility
python fastapi_server/fake_stream_server_dashscopy.py

# for live production; this is streaming
python fastapi_server/server_dashscope.py

# start server with chatty tool node; NOTE: streaming only!
python fastapi_server/server_dashscope.py route chatty_tool_node

# this supports openai-api;
python fastapi_server/server_openai.py
```
see sample usage in `fastapi_server/test_dashscope_client.py` to see how to communicate with `fake_stream_server_dashscopy.py` or `server_dashscope.py` service

## Conversation Viewer

A web UI to visualize and browse conversations stored in the PostgreSQL database.

### Setup

1. Ensure your database is set up (see `scripts/init_user.sql` and `scripts/recreate_table.sql`)
2. Set the `CONN_STR` environment variable:
   ```bash
   export CONN_STR="postgresql://myapp_user:secure_password_123@localhost/ai_conversations"
   ```

### Running the Viewer

```bash
python fastapi_server/server_viewer.py
```

Then open your browser and navigate to:
```
http://localhost:8590
```

### Features

- **Left Sidebar**: Lists all conversations with message counts and last updated timestamps
- **Main View**: Displays messages in a chat-style interface
  - Human messages appear on the right (blue bubbles)
  - AI messages appear on the left (green bubbles)
  - Tool messages appear on the left (orange bubbles with border)

The viewer automatically loads all conversations from the `messages` table and allows you to browse through them interactively.

### Openai API differences
For the python `openai` package it does not handle memory. Ours does, so each call remembers what happens previously. For managing memory, pass in a `thread_id` to manager the conversations
```python
from openai import OpenAI

client = OpenAI(
        base_url=BASE_URL,
        api_key="test-key"  # see put a key in .env and put it here; see above
    )

client.chat.completions.create(
            model="qwen-plus",
            messages=messages,
            stream=True,
            extra_body={"thread_id":"2000"}  # pass in a thread id; must be string
        )
```


## Runnables
everything in scripts:
- For sample usage see `scripts/demo_chat.py`.
- To evaluate the current default config `scripts/eval.py`
- To make a dataset for eval `scripts/make_eval_dataset.py`


## Registering MCP service
put the links in `configs/mcp_config.json`

## Modifying LLM prompts
Refer to model above when modifying the prompts.
they are in `configs/route_sys_prompts`
- `chat_prompt.txt`: controls `chat_model_call`
- `route_prompt.txt`: controls `router_call`
- `tool_prompt.txt`: controls `tool_model_call`
- `chatty_prompt.txt`: controls how the model say random things when tool use is in progress. Ignore this for now as model architecture is not yet configurable

## Frontend (Conversation Viewer UI)

The React-based frontend for browsing conversations lives in the `frontend` directory.

### Install dependencies

```bash
cd frontend
npm install
```

### Start the `front_apis` server

The frontend talks to the `front_apis` FastAPI service, which by default listens on `http://127.0.0.1:8500`.

From the project root:

```bash
uvicorn fastapi_server.front_apis:app --reload --host 0.0.0.0 --port 8500
```

Or run directly:
```bash
python fastapi_server/front_apis.py
```

### Backend run modes

Run whichever backend mode you need from the project root:

```bash
# admin/control plane only (/v1/... frontend APIs)
uvicorn fastapi_server.front_apis:app --reload --host 0.0.0.0 --port 8500

# DashScope chat runtime only (/apps/... and /v1/apps/... APIs)
uvicorn fastapi_server.server_dashscope:app --reload --host 0.0.0.0 --port 8588

# combined mode: one process serves both front_apis + DashScope endpoints
uvicorn fastapi_server.combined:app --reload --host 0.0.0.0 --port 8500
```

You can change the URL by setting `VITE_FRONT_API_BASE_URL` in `frontend/.env` (defaults to `http://127.0.0.1:8500`).

### Start the development server

```bash
cd frontend
npm run dev
```

By default, Vite will start the app on `http://localhost:5173` (or the next available port).

## Stress Test results
### Dashscope server summary

#### Non-Streaming

| Concurrency | Requests | Success % | Throughput (req/s) | Avg Latency (ms) | p95 (ms) | p99 (ms) |
|-----------:|---------:|----------:|-------------------:|-----------------:|---------:|---------:|
| 1          | 10       | 100.00%   | 0.77               | 1293.14          | 1460.48  | 1476.77  |
| 5          | 25       | 100.00%   | 2.74               | 1369.23          | 1827.11  | 3336.25  |
| 10         | 50       | 100.00%   | 6.72               | 1344.48          | 1964.75  | 2165.77  |
| 20         | 100      | 100.00%   | 10.90              | 1688.06          | 2226.49  | 2747.19  |
| 50         | 200      | 100.00%   | 11.75              | 3877.01          | 4855.45  | 5178.52  |

#### Streaming

| Concurrency | Requests | Success % | Throughput (req/s) | Avg Latency (ms) | p95 (ms) | p99 (ms) |
|-----------:|---------:|----------:|-------------------:|-----------------:|---------:|---------:|
| 1          | 10       | 100.00%   | 0.73               | 1374.08          | 1714.61  | 1715.82  |
| 10         | 50       | 100.00%   | 5.97               | 1560.63          | 1925.01  | 2084.21  |
| 20         | 100      | 100.00%   | 9.28               | 2012.03          | 2649.72  | 2934.84  |

Interpretation - Handling concurrently 20 conversations should be ok