diff --git a/fastapi_server/OpenAI_API_README.md b/fastapi_server/OpenAI_API_README.md deleted file mode 100644 index c59f511..0000000 --- a/fastapi_server/OpenAI_API_README.md +++ /dev/null @@ -1,220 +0,0 @@ -# Lang Agent OpenAI 兼容API - -这是一个符合OpenAI接口规范的聊天API,允许用户使用与OpenAI API相同的方式访问您的Lang Agent服务。 - -## 快速开始 - -### 1. 启动服务器 - -```bash -cd /path/to/lang-agent/fastapi_server -python server.py -``` - -服务器将在 `http://localhost:8488` 上启动。 - -### 2. 使用API - -#### 使用curl命令 - -```bash -curl -X POST "http://localhost:8488/v1/chat/completions" \ - -H "Authorization: Bearer 123tangledup-ai" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "qwen-plus", - "messages": [ - { - "role": "system", - "content": "You are a helpful assistant." - }, - { - "role": "user", - "content": "你是谁?" - } - ] - }' -``` - -#### 使用Python requests - -```python -import requests - -API_BASE_URL = "http://localhost:8488" -API_KEY = "123tangledup-ai" - -headers = { - "Authorization": f"Bearer {API_KEY}", - "Content-Type": "application/json" -} - -data = { - "model": "qwen-plus", - "messages": [ - { - "role": "system", - "content": "You are a helpful assistant." - }, - { - "role": "user", - "content": "你是谁?" - } - ] -} - -response = requests.post(f"{API_BASE_URL}/v1/chat/completions", headers=headers, json=data) -print(response.json()) -``` - -#### 使用OpenAI Python库 - -```python -from openai import OpenAI - -client = OpenAI( - api_key="123tangledup-ai", - base_url="http://localhost:8488/v1" -) - -response = client.chat.completions.create( - model="qwen-plus", - messages=[ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "你是谁?"} - ] -) - -print(response.choices[0].message.content) -``` - -## API 端点 - -### 1. 聊天完成 `/v1/chat/completions` - -与OpenAI的chat completions API完全兼容。 - -**请求参数:** - -| 参数 | 类型 | 必需 | 默认值 | 描述 | -|------|------|------|--------|------| -| model | string | 是 | - | 模型名称 | -| messages | array | 是 | - | 消息列表 | -| temperature | number | 否 | 0.7 | 采样温度 | -| max_tokens | integer | 否 | 500 | 最大生成token数 | -| stream | boolean | 否 | false | 是否流式返回 | -| thread_id | integer | 否 | 3 | 线程ID,用于多轮对话 | - -**响应格式:** - -```json -{ - "id": "chatcmpl-abc123", - "object": "chat.completion", - "created": 1677652288, - "model": "qwen-plus", - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": "您好!我是一个AI助手..." - }, - "finish_reason": "stop" - } - ], - "usage": { - "prompt_tokens": 56, - "completion_tokens": 31, - "total_tokens": 87 - } -} -``` - -### 2. 健康检查 `/health` - -检查API服务状态。 - -**请求:** -```bash -GET /health -``` - -**响应:** -```json -{ - "status": "healthy" -} -``` - -### 3. API信息 `/` - -获取API基本信息。 - -**请求:** -```bash -GET / -``` - -**响应:** -```json -{ - "message": "Lang Agent Chat API", - "version": "1.0.0", - "description": "使用OpenAI格式调用pipeline.invoke的聊天API", - "authentication": "Bearer Token (API Key)", - "endpoints": { - "/v1/chat/completions": "POST - 聊天完成接口,兼容OpenAI格式,需要API密钥验证", - "/": "GET - API信息", - "/health": "GET - 健康检查接口" - } -} -``` - -## 认证 - -API使用Bearer Token认证。默认API密钥为 `123tangledup-ai`。 - -在请求头中包含: -``` -Authorization: Bearer 123tangledup-ai -``` - -## 测试脚本 - -项目提供了两个测试脚本: - -1. **Bash脚本** (`test_openai_api.sh`) - 使用curl命令测试API -2. **Python脚本** (`test_openai_api.py`) - 使用Python requests库测试API - -运行测试脚本: - -```bash -# 运行Bash测试脚本 -chmod +x test_openai_api.sh -./test_openai_api.sh - -# 运行Python测试脚本 -python test_openai_api.py -``` - -## 与OpenAI API的兼容性 - -此API完全兼容OpenAI的chat completions API,您可以: - -1. 使用任何支持OpenAI API的客户端库 -2. 将base_url更改为`http://localhost:8488/v1` -3. 使用提供的API密钥进行认证 - -## 注意事项 - -1. 确保服务器正在运行且可访问 -2. 流式响应(stream=true)目前可能不完全支持 -3. 模型参数(model)主要用于标识,实际使用的模型由服务器配置决定 -4. 多轮对话使用thread_id参数来维护上下文 - -## 故障排除 - -1. **连接错误**: 确保服务器正在运行,检查URL和端口是否正确 -2. **认证错误**: 检查API密钥是否正确设置 -3. **请求格式错误**: 确保请求体是有效的JSON格式,包含所有必需字段 \ No newline at end of file diff --git a/fastapi_server/README.md b/fastapi_server/README.md deleted file mode 100644 index e767853..0000000 --- a/fastapi_server/README.md +++ /dev/null @@ -1,179 +0,0 @@ -# Lang Agent Chat API - -这是一个基于FastAPI的聊天API服务,使用OpenAI格式的请求来调用pipeline.invoke方法进行聊天。 - -## 功能特点 - -- 兼容OpenAI API格式的聊天接口 -- 支持多轮对话(通过thread_id) -- 使用qwen-flash模型 -- 支持流式和非流式响应 -- 提供健康检查接口 - -## 安装依赖 - -```bash -pip install -r requirements.txt -``` - -## 环境变量 - -确保设置以下环境变量: - -```bash -export ALI_API_KEY="your_ali_api_key" -``` - -## 运行服务 - -### 方法1:使用启动脚本 - -```bash -./start_server.sh -``` - -### 方法2:直接运行Python文件 - -```bash -python server.py -``` - -服务将在 `http://localhost:8000` 启动。 - -## API接口 - -### 聊天完成接口 - -**端点**: `POST /v1/chat/completions` - -**请求格式**: -```json -{ - "model": "qwen-flash", - "messages": [ - { - "role": "system", - "content": "你是一个有用的助手。" - }, - { - "role": "user", - "content": "你好,请介绍一下你自己。" - } - ], - "temperature": 0.7, - "max_tokens": 1000, - "stream": false, - "thread_id": 3 -} -``` - -**响应格式**: -```json -{ - "id": "chatcmpl-abc123", - "object": "chat.completion", - "created": 1677652288, - "model": "qwen-flash", - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": "你好!我是小盏,是半盏青年茶馆的智能助手..." - }, - "finish_reason": "stop" - } - ] -} -``` - -### API信息接口 - -**端点**: `GET /` - -返回API的基本信息。 - -### 健康检查接口 - -**端点**: `GET /health` - -返回服务的健康状态。 - -## 使用示例 - -### 使用OpenAI Python客户端库 - -首先安装OpenAI库: - -```bash -pip install openai -``` - -然后使用以下代码: - -```python -from openai import OpenAI - -# 设置API基础URL和API密钥(这里使用一个虚拟的密钥,因为我们没有实现认证) -client = OpenAI( - api_key="your-api-key", # 这里可以使用任意值,因为我们的API没有实现认证 - base_url="http://localhost:8000/v1" -) - -# 发送聊天请求 -response = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "system", "content": "你是一个有用的助手。"}, - {"role": "user", "content": "你好,请介绍一下你自己。"} - ], - temperature=0.7, - thread_id=1 # 用于多轮对话 -) - -print(response.choices[0].message.content) -``` - -### 使用curl - -```bash -curl -X POST "http://localhost:8000/v1/chat/completions" \ --H "Content-Type: application/json" \ --d '{ - "model": "qwen-flash", - "messages": [ - { - "role": "user", - "content": "你好,请介绍一下你自己。" - } - ] -}' -``` - -### 使用Python requests - -```python -import requests - -url = "http://localhost:8000/v1/chat/completions" -headers = {"Content-Type": "application/json"} -data = { - "model": "qwen-flash", - "messages": [ - { - "role": "user", - "content": "你好,请介绍一下你自己。" - } - ] -} - -response = requests.post(url, headers=headers, json=data) -print(response.json()) -``` - -## 注意事项 - -1. 确保已设置正确的API密钥环境变量 -2. 默认使用qwen-flash模型,可以通过修改代码中的配置来更改模型 -3. thread_id用于多轮对话,相同的thread_id会保持对话上下文 -4. 目前stream参数设置为true时,仍会返回非流式响应(可根据需要进一步实现) \ No newline at end of file diff --git a/fastapi_server/openai_client_example.py b/fastapi_server/openai_client_example.py deleted file mode 100644 index 5622129..0000000 --- a/fastapi_server/openai_client_example.py +++ /dev/null @@ -1,129 +0,0 @@ -#!/usr/bin/env python3 -""" -使用OpenAI Python客户端库调用我们的FastAPI聊天API的示例 -""" - -from openai import OpenAI -import os - -# 设置API基础URL和API密钥(这里使用一个虚拟的密钥,因为我们没有实现认证) -client = OpenAI( - api_key="your-api-key", # 这里可以使用任意值,因为我们的API没有实现认证 - base_url="http://localhost:8000/v1" -) - -def simple_chat(): - """简单的聊天示例""" - print("=" * 50) - print("简单聊天示例") - print("=" * 50) - - response = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "user", "content": "你好,请介绍一下你自己。"} - ], - temperature=0.7, - thread_id=1 - ) - - print(f"助手回复: {response.choices[0].message.content}") - print("\n") - -def multi_turn_chat(): - """多轮对话示例""" - print("=" * 50) - print("多轮对话示例") - print("=" * 50) - - # 第一轮对话 - print("第一轮对话:") - response1 = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "user", "content": "你推荐什么茶?"} - ], - temperature=0.7, - thread_id=2 - ) - - print(f"用户: 你推荐什么茶?") - print(f"助手: {response1.choices[0].message.content}") - - # 第二轮对话,使用相同的thread_id - print("\n第二轮对话:") - response2 = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "user", "content": "为什么推荐这个茶?"} - ], - temperature=0.7, - thread_id=2 # 使用相同的thread_id - ) - - print(f"用户: 为什么推荐这个茶?") - print(f"助手: {response2.choices[0].message.content}") - print("\n") - -def system_prompt_example(): - """使用系统提示的示例""" - print("=" * 50) - print("系统提示示例") - print("=" * 50) - - response = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "system", "content": "你是一个专业的茶艺师,用简洁的语言回答问题,不超过50字。"}, - {"role": "user", "content": "请介绍一下普洱茶。"} - ], - temperature=0.3, - thread_id=3 - ) - - print(f"用户: 请介绍一下普洱茶。") - print(f"助手: {response.choices[0].message.content}") - print("\n") - -def interactive_chat(): - """交互式聊天示例""" - print("=" * 50) - print("交互式聊天 (输入'quit'退出)") - print("=" * 50) - - thread_id = 4 # 为这个会话分配一个固定的thread_id - - while True: - user_input = input("你: ") - if user_input.lower() == 'quit': - break - - try: - response = client.chat.completions.create( - model="qwen-flash", - messages=[ - {"role": "user", "content": user_input} - ], - temperature=0.7, - thread_id=thread_id - ) - - print(f"助手: {response.choices[0].message.content}") - except Exception as e: - print(f"错误: {str(e)}") - -if __name__ == "__main__": - print("使用OpenAI客户端库调用FastAPI聊天API示例") - print("注意: 确保服务器在 http://localhost:8000 上运行\n") - - # 简单聊天示例 - simple_chat() - - # 多轮对话示例 - multi_turn_chat() - - # 系统提示示例 - system_prompt_example() - - # 交互式聊天示例 - interactive_chat() \ No newline at end of file diff --git a/fastapi_server/requirements.txt b/fastapi_server/requirements.txt deleted file mode 100644 index ad49bad..0000000 --- a/fastapi_server/requirements.txt +++ /dev/null @@ -1,25 +0,0 @@ -fastapi -uvicorn -pydantic>=2.0.0,<2.12 -loguru>=0.7.0 -python-dotenv>=1.0.0 -langchain==1.0 -langchain-core>=0.1.0 -langchain-community -langchain-openai -openai>=1.0.0 -langchain-mcp-adapters -langgraph>=0.0.40 -tyro>=0.7.0 -commentjson>=0.9.0 -matplotlib>=3.7.0 -Pillow>=10.0.0 -jax>=0.4.0 -httpx[socks] -dashscope -websockets>=11.0.3 -mcp>=1.8.1 -mcp-proxy>=0.8.2 -faiss-cpu -fastmcp -pandas diff --git a/fastapi_server/server.py b/fastapi_server/server.py deleted file mode 100644 index 0a96c72..0000000 --- a/fastapi_server/server.py +++ /dev/null @@ -1,315 +0,0 @@ -from fastapi import FastAPI, HTTPException, Depends, Security -from fastapi.middleware.cors import CORSMiddleware -from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials -from fastapi.responses import StreamingResponse -from pydantic import BaseModel, Field -from typing import List, Optional, Dict, Any, Union -import os -import sys -import time -import uvicorn -import httpx -import openai -import json -from loguru import logger - -# 添加父目录到系统路径,以便导入lang_agent模块 -sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) - -from lang_agent.pipeline import Pipeline, PipelineConfig - -# 定义OpenAI格式的请求模型 -class ChatMessage(BaseModel): - role: str = Field(..., description="消息角色,可以是 'system', 'user', 'assistant'") - content: str = Field(..., description="消息内容") - -class ChatCompletionRequest(BaseModel): - model: str = Field(default="qwen-flash", description="模型名称") - messages: List[ChatMessage] = Field(..., description="对话消息列表") - temperature: Optional[float] = Field(default=0.7, description="采样温度") - max_tokens: Optional[int] = Field(default=500, description="最大生成token数") - stream: Optional[bool] = Field(default=False, description="是否流式返回") - thread_id: Optional[int] = Field(default=3, description="线程ID,用于多轮对话") - llm_provider: Optional[str] = Field(default="openai", description="LLM提供商") - base_url: Optional[str] = Field(default="https://dashscope.aliyuncs.com/compatible-mode/v1", description="LLM API基础URL") - -class ChatCompletionResponseChoice(BaseModel): - index: int - message: ChatMessage - finish_reason: str - -class ChatCompletionResponseUsage(BaseModel): - prompt_tokens: int - completion_tokens: int - total_tokens: int - -class ChatCompletionResponse(BaseModel): - id: str - object: str = "chat.completion" - created: int - model: str - choices: List[ChatCompletionResponseChoice] - usage: Optional[ChatCompletionResponseUsage] = None - -# OpenAI客户端包装类 -class OpenAIClientWrapper: - def __init__( - self, - api_key: Optional[str] = None, - base_url: Optional[str] = None, - timeout: float = 60.0, - model_name: str = "qwen-flash", - max_tokens: int = 500, - temperature: float = 0.7, - top_p: float = 1.0, - frequency_penalty: float = 0.0, - ): - """ - 初始化OpenAI客户端包装器 - - Args: - api_key: API密钥,如果为None则从环境变量OPENAI_API_KEY获取 - base_url: API基础URL,如果为None则从环境变量OPENAI_BASE_URL获取 - timeout: 请求超时时间(秒) - model_name: 默认模型名称 - max_tokens: 默认最大token数 - temperature: 默认采样温度 - top_p: 默认top_p参数 - frequency_penalty: 默认频率惩罚 - """ - self.api_key = api_key or os.getenv("OPENAI_API_KEY", "") - self.base_url = base_url or os.getenv("OPENAI_BASE_URL", None) - self.timeout = timeout - self.model_name = model_name - self.max_tokens = max_tokens - self.temperature = temperature - self.top_p = top_p - self.frequency_penalty = frequency_penalty - - self.client = openai.OpenAI( - api_key=self.api_key, - base_url=self.base_url, - timeout=httpx.Timeout(self.timeout) - ) - - def response(self, session_id: str, dialogue: List[Dict[str, str]], **kwargs): - """ - 生成聊天响应(流式) - - Args: - session_id: 会话ID - dialogue: 对话消息列表,格式为 [{"role": "user", "content": "..."}, ...] - **kwargs: 额外的参数,可以覆盖默认的max_tokens, temperature, top_p, frequency_penalty - - Returns: - OpenAI流式响应对象 - """ - try: - responses = self.client.chat.completions.create( - model=self.model_name, - messages=dialogue, - stream=True, - max_tokens=kwargs.get("max_tokens", self.max_tokens), - temperature=kwargs.get("temperature", self.temperature), - top_p=kwargs.get("top_p", self.top_p), - frequency_penalty=kwargs.get("frequency_penalty", self.frequency_penalty), - ) - return responses - except Exception as e: - logger.error(f"OpenAI客户端响应错误: {str(e)}") - raise - -# 初始化FastAPI应用 -app = FastAPI(title="Lang Agent Chat API", description="使用OpenAI格式调用pipeline.invoke的聊天API") - -# 设置API密钥 -API_KEY = "123tangledup-ai" - -# 创建安全方案 -security = HTTPBearer() - -# 验证API密钥的依赖项 -# async def verify_api_key(credentials: HTTPAuthorizationCredentials = Security(security)): -# if credentials.credentials != API_KEY: -# raise HTTPException( -# status_code=401, -# detail="无效的API密钥", -# headers={"WWW-Authenticate": "Bearer"}, -# ) -# return credentials - -# 添加CORS中间件 -app.add_middleware( - CORSMiddleware, - allow_origins=["*"], - allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], -) - -# 初始化Pipeline -pipeline_config = PipelineConfig() -pipeline_config.llm_name = "qwen-flash" -pipeline_config.llm_provider = "openai" -pipeline_config.base_url = "https://dashscope.aliyuncs.com/compatible-mode/v1" - -pipeline = Pipeline(pipeline_config) - -# 初始化OpenAI客户端包装器(可选,用于直接调用OpenAI API) -openai_client = OpenAIClientWrapper( - api_key=os.getenv("OPENAI_API_KEY"), - base_url="https://dashscope.aliyuncs.com/compatible-mode/v1", - timeout=60.0, - model_name="qwen-flash", - max_tokens=500, - temperature=0.7, - top_p=1.0, - frequency_penalty=0.0, -) - -def generate_streaming_chunks(full_text: str, response_id: str, model: str, chunk_size: int = 10): - """ - Generate streaming chunks from non-streaming result - """ - created_time = int(time.time()) - - # Stream content chunks - for i in range(0, len(full_text), chunk_size): - chunk = full_text[i:i + chunk_size] - if chunk: - chunk_data = { - "id": response_id, - "object": "chat.completion.chunk", - "created": created_time, - "model": model, - "choices": [ - { - "index": 0, - "delta": {"content": chunk}, - "finish_reason": None - } - ] - } - yield f"data: {json.dumps(chunk_data)}\n\n" - - # Send final chunk with finish_reason - final_chunk = { - "id": response_id, - "object": "chat.completion.chunk", - "created": created_time, - "model": model, - "choices": [ - { - "index": 0, - "delta": {}, - "finish_reason": "stop" - } - ] - } - yield f"data: {json.dumps(final_chunk)}\n\n" - yield "data: [DONE]\n\n" - -@app.post("/v1/chat/completions") -async def chat_completions( - request: ChatCompletionRequest#, - # credentials: HTTPAuthorizationCredentials = Depends(verify_api_key) -): - """ - 使用OpenAI格式的聊天完成API - """ - try: - # 提取用户消息 - user_message = None - system_message = None - - # TODO: wrap this sht as human and system message - for message in request.messages: - if message.role == "user": - user_message = message.content - elif message.role == "system" or message.role == "assistant": - system_message = message.content - - if not user_message: - raise HTTPException(status_code=400, detail="缺少用户消息") - - # 调用pipeline的chat方法 (always get non-streaming result) - response_content = pipeline.chat( - inp=user_message, - as_stream=False, # Always get full result, then chunk it if streaming - thread_id=request.thread_id - ) - - # Ensure response_content is a string - if not isinstance(response_content, str): - response_content = str(response_content) - - logger.info(f"Pipeline response - Length: {len(response_content)}, Content: {repr(response_content[:200])}") - - if len(response_content) == 0: - logger.warning("Pipeline returned empty response!") - - response_id = f"chatcmpl-{os.urandom(12).hex()}" - - # If streaming requested, return streaming response - if request.stream: - return StreamingResponse( - generate_streaming_chunks( - full_text=response_content, - response_id=response_id, - model=request.model, - chunk_size=10 - ), - media_type="text/event-stream" - ) - - # Otherwise return normal response - response = ChatCompletionResponse( - id=response_id, - created=int(time.time()), - model=request.model, - choices=[ - ChatCompletionResponseChoice( - index=0, - message=ChatMessage(role="assistant", content=response_content), - finish_reason="stop" - ) - ] - ) - - return response - - except Exception as e: - logger.error(f"处理聊天请求时出错: {str(e)}") - raise HTTPException(status_code=500, detail=f"内部服务器错误: {str(e)}") - -@app.get("/") -async def root(): - """ - 根路径,返回API信息 - """ - return { - "message": "Lang Agent Chat API", - "version": "1.0.0", - "description": "使用OpenAI格式调用pipeline.invoke的聊天API", - "authentication": "Bearer Token (API Key)", - "endpoints": { - "/v1/chat/completions": "POST - 聊天完成接口,兼容OpenAI格式,需要API密钥验证", - "/": "GET - API信息", - "/health": "GET - 健康检查接口" - } - } - -@app.get("/health") -async def health_check(): - """ - 健康检查接口 - """ - return {"status": "healthy"} - -if __name__ == "__main__": - uvicorn.run( - "server:app", - host="0.0.0.0", - port=8488, - reload=True - ) \ No newline at end of file diff --git a/fastapi_server/start_server.sh b/fastapi_server/start_server.sh deleted file mode 100755 index 852ab95..0000000 --- a/fastapi_server/start_server.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - -echo "启动Lang Agent Chat API服务器..." - -# 检查Python环境 -if ! command -v python &> /dev/null; then - echo "错误: 未找到Python。请确保Python已安装并添加到PATH中。" - exit 1 -fi - -# 检查环境变量 -if [ -z "$ALI_API_KEY" ]; then - echo "警告: 未设置ALI_API_KEY环境变量。请确保已设置此变量。" - echo "例如: export ALI_API_KEY='your_api_key'" -fi - -# 启动服务器 -cd "$(dirname "$0")" -python server.py \ No newline at end of file diff --git a/fastapi_server/test_openai_client.py b/fastapi_server/test_openai_client.py deleted file mode 100644 index a2d345e..0000000 --- a/fastapi_server/test_openai_client.py +++ /dev/null @@ -1,79 +0,0 @@ -#!/usr/bin/env python3 -""" -Simple test for OpenAI client chat.completions.create -""" -import os -import httpx -import openai -from dotenv import load_dotenv - -load_dotenv() - -print("Initializing OpenAI client...") -print(f"Base URL: http://localhost:8488/v1") -print(f"API Key set: {'Yes' if os.getenv('ALI_API_KEY') else 'No'}") - -# Initialize client (pointing to FastAPI server from server.py) -client = openai.OpenAI( - api_key=os.getenv("ALI_API_KEY"), - base_url="http://localhost:8488/v1", - timeout=httpx.Timeout(60.0) -) - -print("\nTesting chat completion (non-streaming)...") -# try: -# # Test chat completion (non-streaming first) -# response = client.chat.completions.create( -# model="qwen-flash", -# messages=[ -# {'role':'system', 'content': 'your name is steve'} -# ,{"role": "user", "content": "Say hello!"}], -# stream=False, -# max_tokens=100, -# temperature=0.7 -# ) - -# print(f"Response ID: {response.id}") -# print(f"Model: {response.model}") -# print(f"Content: {response.choices[0].message.content}") -# print("\n✓ Non-streaming test successful!") - -# except Exception as e: -# print(f"\n✗ Error: {str(e)}") -# import traceback -# traceback.print_exc() - -print("\nTesting chat completion (streaming)...") -try: - # Test streaming with same message as non-streaming test - response = client.chat.completions.create( - model="qwen-flash", - messages=[ - {'role':'system', 'content': 'your name is steve'}, - {"role": "user", "content": "Say hello!"} - ], - stream=True, - max_tokens=100, - temperature=0.7 - ) - - print("Streaming response:") - full_content = "" - chunk_count = 0 - for chunk in response: - chunk_count += 1 - if hasattr(chunk, 'choices') and len(chunk.choices) > 0: - if hasattr(chunk.choices[0], 'delta') and chunk.choices[0].delta.content: - content = chunk.choices[0].delta.content - print(content, end="", flush=True) - full_content += content - - print(f"\n\nTotal chunks received: {chunk_count}") - print(f"Full content: {repr(full_content)}") - print(f"Content length: {len(full_content)}") - print("\n✓ Streaming test successful!") - -except Exception as e: - print(f"\n✗ Error: {str(e)}") - import traceback - traceback.print_exc()