ASR WebSocket服务实现：1.FunASR本地方案 2.豆包

1.已实现音频文件的处理，包括小文件直接转换以及大文件分割识别。 2.豆包接入流式识别，但封装仍需要修改 3.线程管理 4.websocket集中管理

ASR WebSocket服务实现：1.FunASR本地方案 2.豆包
1.已实现音频文件的处理，包括小文件直接转换以及大文件分割识别。 2.豆包接入流式识别，但封装仍需要修改 3.线程管理 4.websocket集中管理
冯杨
Commit f274f8c1bf0dfbc8491d74927f0f2bdec43d99b7 f274f8c1 1 parent e2fc4261
Showing 17 changed files with 4150 additions and 1 deletions
.gitignore
asr/doubao/README.md
asr/doubao/__init__.py
asr/doubao/__main__.py
asr/doubao/asr_client.py
asr/doubao/audio_utils.py
asr/doubao/config.json
asr/doubao/config_manager.py
asr/doubao/example.py
asr/doubao/example_optimized.py
asr/doubao/protocol.py
asr/doubao/result_processor.py
asr/doubao/service_factory.py
doc/process/update.log
funasr_asr_sync.py
scheduler/__init__.py
scheduler/thread_manager.py
--- a/.gitignore
View file @f274f8c
+++ b/.gitignore
View file @f274f8c
@@ -3,6 +3,8 @@ build/
 *.egg-info/
 *.so
 *.mp4
+*.mp3
+*.wav
 tmp*
 trial*/
@@ -12,7 +14,6 @@ data_utils/face_tracking/3DMM/*
 data_utils/face_parsing/79999_iter.pth
 pretrained
-*.mp4
 .DS_Store
 workspace/log_ngp.txt
 .idea
@@ -21,3 +22,4 @@ models/
 *.log
 dist
 .vscode/launch.json
+/speech.wav
--- a/asr/doubao/README.md 0 → 100644
View file @f274f8c
+++ b/asr/doubao/README.md 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+
+# 豆包ASR语音识别服务
+
+基于豆包（Doubao）语音识别API的通用ASR服务，支持流式和非流式语音识别，提供简洁易用的Python接口。
+
+## 🚀 特性
+
+- **多种识别模式**: 支持流式和非流式语音识别
+- **多格式支持**: 支持WAV、MP3、PCM等音频格式
+- **灵活配置**: 支持配置文件、环境变量、代码配置等多种方式
+- **异步支持**: 基于asyncio的异步API，支持高并发
+- **实时回调**: 流式识别支持实时结果回调
+- **错误处理**: 完善的错误处理和重试机制
+- **易于集成**: 简洁的API设计，易于集成到现有项目
+
+## 📦 安装
+
+### 依赖要求
+
+```bash
+pip install websockets aiofiles
+```
+
+### 项目结构
+
+```
+asr/doubao/
+├── __init__.py          # 模块初始化和公共API
+├── config.json          # 默认配置文件
+├── config_manager.py    # 配置管理器
+├── protocol.py          # 豆包协议处理
+├── audio_utils.py       # 音频处理工具
+├── asr_client.py        # ASR客户端核心
+├── service_factory.py   # 服务工厂和便捷接口
+├── example.py           # 使用示例
+└── README.md           # 项目文档
+```
+
+## 🔧 配置
+
+### 1. 获取API密钥
+
+访问[豆包开放平台](https://www.volcengine.com/docs/6561/1354869)获取：
+- `app_key`: 应用密钥
+- `access_key`: 访问密钥
+
+### 2. 配置方式
+
+#### 方式1: 环境变量（推荐）
+
+```bash
+export DOUBAO_APP_KEY="your_app_key"
+export DOUBAO_ACCESS_KEY="your_access_key"
+```
+
+#### 方式2: 配置文件
+
+创建 `config.json`：
+
+```json
+{
+  "auth_config": {
+    "app_key": "your_app_key",
+    "access_key": "your_access_key"
+  },
+  "asr_config": {
+    "streaming_mode": true,
+    "enable_punc": true,
+    "seg_duration": 200
+  }
+}
+```
+
+#### 方式3: 代码配置
+
+```python
+service = create_asr_service(
+    app_key="your_app_key",
+    access_key="your_access_key"
+)
+```
+
+## 🎯 快速开始
+
+### 1. 简单文件识别
+
+```python
+import asyncio
+from asr.doubao import recognize_file
+
+async def simple_recognition():
+    result = await recognize_file(
+        audio_path="path/to/your/audio.wav",
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True
+    )
+    print(f"识别结果: {result}")
+
+# 运行
+asyncio.run(simple_recognition())
+```
+
+### 2. 流式识别with实时回调
+
+```python
+import asyncio
+from asr.doubao import create_asr_service
+
+async def streaming_recognition():
+    # 定义结果回调函数
+    def on_result(result):
+        if result.get('payload_msg'):
+            print(f"实时结果: {result['payload_msg']}")
+    
+    # 创建服务实例
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True
+    )
+    
+    try:
+        result = await service.recognize_file(
+            "path/to/your/audio.wav",
+            result_callback=on_result
+        )
+        print(f"最终结果: {result}")
+    finally:
+        await service.close()
+
+# 运行
+asyncio.run(streaming_recognition())
+```
+
+### 2.1 优化的文本输出（推荐）
+
+针对豆包ASR输出完整报文但只需要文本的问题，提供了专门的结果处理器：
+
+```python
+import asyncio
+from asr.doubao import create_asr_service
+from asr.doubao.result_processor import create_text_only_callback
+
+async def optimized_streaming():
+    # 只处理文本内容的回调函数
+    def on_text(text: str):
+        print(f"识别文本: {text}")
+        # 流式数据特点：后一次覆盖前一次，最终结果会不停刷新
+    
+    # 创建优化的回调（自动提取text字段）
+    optimized_callback = create_text_only_callback(
+        user_callback=on_text,
+        enable_streaming_log=False  # 关闭中间结果日志
+    )
+    
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True
+    )
+    
+    try:
+        await service.recognize_file(
+            "path/to/your/audio.wav",
+            result_callback=optimized_callback
+        )
+    finally:
+        await service.close()
+
+# 运行
+asyncio.run(optimized_streaming())
+```
+
+### 3. 音频数据识别
+
+```python
+import asyncio
+from asr.doubao import recognize_audio_data
+
+async def data_recognition():
+    # 读取音频数据
+    with open("path/to/your/audio.wav", "rb") as f:
+        audio_data = f.read()
+    
+    result = await recognize_audio_data(
+        audio_data=audio_data,
+        audio_format="wav",
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    print(f"识别结果: {result}")
+
+# 运行
+asyncio.run(data_recognition())
+```
+
+### 4. 同步方式（简单场景）
+
+```python
+from asr.doubao import run_recognition
+
+# 同步识别
+result = run_recognition(
+    audio_path="path/to/your/audio.wav",
+    app_key="your_app_key",
+    access_key="your_access_key"
+)
+print(f"识别结果: {result}")
+```
+
+## 📚 详细用法
+
+### 服务实例管理
+
+```python
+from asr.doubao import DoubaoASRService, create_asr_service
+
+# 创建服务实例
+service = create_asr_service(
+    app_key="your_app_key",
+    access_key="your_access_key",
+    streaming=True,
+    debug=True
+)
+
+# 执行多次识别（复用连接）
+result1 = await service.recognize_file("audio1.wav")
+result2 = await service.recognize_file("audio2.wav")
+
+# 关闭服务
+await service.close()
+```
+
+### 批量识别
+
+```python
+async def batch_recognition(audio_files):
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    
+    results = []
+    try:
+        for audio_file in audio_files:
+            result = await service.recognize_file(audio_file)
+            results.append({
+                'file': audio_file,
+                'result': result
+            })
+    finally:
+        await service.close()
+    
+    return results
+
+# 使用
+audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
+results = await batch_recognition(audio_files)
+```
+
+### 自定义配置
+
+```python
+custom_config = {
+    'asr_config': {
+        'enable_punc': True,
+        'seg_duration': 300,  # 自定义分段时长
+        'streaming_mode': True
+    },
+    'connection_config': {
+        'timeout': 60,  # 自定义超时时间
+        'retry_times': 5
+    },
+    'logging_config': {
+        'enable_debug': True
+    }
+}
+
+service = create_asr_service(
+    app_key="your_app_key",
+    access_key="your_access_key",
+    custom_config=custom_config
+)
+```
+
+### 配置文件使用
+
+```python
+# 使用配置文件
+result = await recognize_file(
+    audio_path="audio.wav",
+    config_path="asr/doubao/config.json"
+)
+
+# 或者
+service = create_asr_service(config_path="asr/doubao/config.json")
+```
+
+## 🔧 配置参数
+
+### ASR配置 (asr_config)
+
+| 参数 | 类型 | 默认值 | 说明 |
+|------|------|--------|------|
+| `ws_url` | str | wss://openspeech.bytedance.com/api/v3/sauc/bigmodel | 流式识别WebSocket URL |
+| `ws_url_nostream` | str | wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream | 非流式识别WebSocket URL |
+| `resource_id` | str | volc.bigasr.sauc.duration | 资源ID |
+| `model_name` | str | bigmodel | 模型名称 |
+| `enable_punc` | bool | true | 是否启用标点符号 |
+| `streaming_mode` | bool | true | 是否启用流式模式 |
+| `seg_duration` | int | 200 | 音频分段时长(ms) |
+| `mp3_seg_size` | int | 1000 | MP3分段大小(bytes) |
+
+### 认证配置 (auth_config)
+
+| 参数 | 类型 | 说明 |
+|------|------|------|
+| `app_key` | str | 应用密钥 |
+| `access_key` | str | 访问密钥 |
+
+### 音频配置 (audio_config)
+
+| 参数 | 类型 | 默认值 | 说明 |
+|------|------|--------|------|
+| `default_format` | str | wav | 默认音频格式 |
+| `default_rate` | int | 16000 | 默认采样率 |
+| `default_bits` | int | 16 | 默认位深度 |
+| `default_channel` | int | 1 | 默认声道数 |
+| `supported_formats` | list | ["wav", "mp3", "pcm"] | 支持的音频格式 |
+
+### 连接配置 (connection_config)
+
+| 参数 | 类型 | 默认值 | 说明 |
+|------|------|--------|------|
+| `max_size` | int | 1000000000 | 最大消息大小 |
+| `timeout` | int | 30 | 连接超时时间(秒) |
+| `retry_times` | int | 3 | 重试次数 |
+| `retry_delay` | int | 1 | 重试延迟(秒) |
+
+## 🎵 支持的音频格式
+
+| 格式 | 扩展名 | 说明 |
+|------|--------|------|
+| WAV | .wav | 无损音频格式，推荐使用 |
+| MP3 | .mp3 | 压缩音频格式 |
+| PCM | .pcm | 原始音频数据 |
+
+### 音频要求
+
+- **采样率**: 推荐16kHz，支持8kHz、16kHz、24kHz、48kHz
+- **位深度**: 推荐16bit
+- **声道**: 推荐单声道(mono)
+- **编码**: PCM编码
+
+## 🔍 API参考
+
+### 便捷函数
+
+#### `recognize_file()`
+
+```python
+async def recognize_file(
+    audio_path: str,
+    app_key: str = None,
+    access_key: str = None,
+    config_path: str = None,
+    streaming: bool = True,
+    result_callback: callable = None,
+    **kwargs
+) -> dict:
+```
+
+识别音频文件。
+
+**参数:**
+- `audio_path`: 音频文件路径
+- `app_key`: 应用密钥
+- `access_key`: 访问密钥
+- `config_path`: 配置文件路径
+- `streaming`: 是否使用流式识别
+- `result_callback`: 结果回调函数
+- `**kwargs`: 其他配置参数
+
+**返回:** 识别结果字典
+
+#### `recognize_audio_data()`
+
+```python
+async def recognize_audio_data(
+    audio_data: bytes,
+    audio_format: str,
+    app_key: str = None,
+    access_key: str = None,
+    config_path: str = None,
+    streaming: bool = True,
+    result_callback: callable = None,
+    **kwargs
+) -> dict:
+```
+
+识别音频数据。
+
+**参数:**
+- `audio_data`: 音频数据(bytes)
+- `audio_format`: 音频格式("wav", "mp3", "pcm")
+- 其他参数同`recognize_file()`
+
+#### `run_recognition()`
+
+```python
+def run_recognition(
+    audio_path: str = None,
+    audio_data: bytes = None,
+    audio_format: str = None,
+    **kwargs
+) -> dict:
+```
+
+同步方式执行识别。
+
+#### `create_asr_service()`
+
+```python
+def create_asr_service(
+    app_key: str = None,
+    access_key: str = None,
+    config_path: str = None,
+    custom_config: dict = None,
+    **kwargs
+) -> DoubaoASRService:
+```
+
+创建ASR服务实例。
+
+### 核心类
+
+#### `DoubaoASRService`
+
+主要服务类，提供高级API。
+
+**方法:**
+- `recognize_file(audio_path, result_callback=None)`: 识别文件
+- `recognize_audio_data(audio_data, audio_format, result_callback=None)`: 识别音频数据
+- `get_status()`: 获取服务状态
+- `close()`: 关闭服务
+
+#### `DoubaoASRClient`
+
+底层客户端类，处理WebSocket通信。
+
+#### `ConfigManager`
+
+配置管理器，处理配置加载、验证、合并等。
+
+**方法:**
+- `load_config(config_path)`: 加载配置文件
+- `save_config(config, config_path)`: 保存配置文件
+- `validate_config(config)`: 验证配置
+- `merge_configs(base_config, override_config)`: 合并配置
+- `create_default_config()`: 创建默认配置
+
+#### `DoubaoResultProcessor`
+
+结果处理器，专门处理豆包ASR流式识别结果，解决输出完整报文但只需要文本的问题。
+
+**特性:**
+- 自动提取`payload_msg.result.text`字段
+- 处理流式数据覆盖更新特性
+- 可配置日志输出级别
+- 支持自定义文本回调函数
+
+**方法:**
+- `extract_text_from_result(result)`: 从完整结果中提取文本
+- `process_streaming_result(result)`: 处理流式结果
+- `create_optimized_callback(user_callback)`: 创建优化的回调函数
+- `get_current_result()`: 获取当前识别状态
+- `reset()`: 重置处理器状态
+
+**便捷函数:**
+- `create_text_only_callback(user_callback, enable_streaming_log)`: 创建只处理文本的回调
+- `extract_text_only(result)`: 快速提取文本内容
+
+**使用示例:**
+```python
+from asr.doubao.result_processor import DoubaoResultProcessor, create_text_only_callback
+
+# 方式1: 使用处理器类
+processor = DoubaoResultProcessor(text_only=True, enable_streaming_log=False)
+callback = processor.create_optimized_callback(lambda text: print(f"文本: {text}"))
+
+# 方式2: 使用便捷函数
+callback = create_text_only_callback(
+    user_callback=lambda text: print(f"文本: {text}"),
+    enable_streaming_log=False
+)
+
+# 在ASR服务中使用
+service = create_asr_service(...)
+await service.recognize_file("audio.wav", result_callback=callback)
+```
+
+## 🧪 测试
+
+运行测试套件：
+
+```bash
+python -m pytest test/test_doubao_asr.py -v
+```
+
+或者直接运行测试文件：
+
+```bash
+python test/test_doubao_asr.py
+```
+
+测试包括：
+- 单元测试
+- 集成测试
+- 性能测试
+- 错误处理测试
+
+## 🔧 故障排除
+
+### 常见问题
+
+#### 1. 认证失败
+
+```
+AuthenticationError: Invalid app_key or access_key
+```
+
+**解决方案:**
+- 检查app_key和access_key是否正确
+- 确认密钥是否已激活
+- 检查网络连接
+
+#### 2. 音频格式不支持
+
+```
+AudioFormatError: Unsupported audio format
+```
+
+**解决方案:**
+- 确认音频格式为WAV、MP3或PCM
+- 检查音频文件是否损坏
+- 转换音频格式到支持的格式
+
+#### 3. 连接超时
+
+```
+ConnectionTimeoutError: Connection timeout
+```
+
+**解决方案:**
+- 检查网络连接
+- 增加timeout配置
+- 检查防火墙设置
+
+#### 4. 音频文件过大
+
+```
+FileSizeError: Audio file too large
+```
+
+**解决方案:**
+- 分割音频文件
+- 压缩音频质量
+- 使用流式识别
+
+### 调试模式
+
+启用调试模式获取详细日志：
+
+```python
+service = create_asr_service(
+    app_key="your_app_key",
+    access_key="your_access_key",
+    debug=True
+)
+```
+
+或在配置文件中设置：
+
+```json
+{
+  "logging_config": {
+    "enable_debug": true,
+    "log_requests": true,
+    "log_responses": true
+  }
+}
+```
+
+## 📈 性能优化
+
+### 1. 连接复用
+
+对于批量识别，使用服务实例复用连接：
+
+```python
+service = create_asr_service(...)
+try:
+    for audio_file in audio_files:
+        result = await service.recognize_file(audio_file)
+finally:
+    await service.close()
+```
+
+### 2. 并发处理
+
+使用asyncio进行并发识别：
+
+```python
+import asyncio
+
+async def concurrent_recognition(audio_files):
+    tasks = []
+    for audio_file in audio_files:
+        task = recognize_file(audio_file, ...)
+        tasks.append(task)
+    
+    results = await asyncio.gather(*tasks)
+    return results
+```
+
+### 3. 音频预处理
+
+- 使用合适的音频格式和参数
+- 预先分割大文件
+- 去除静音段
+
+## 🤝 贡献
+
+欢迎提交Issue和Pull Request！
+
+### 开发环境设置
+
+1. 克隆项目
+2. 安装依赖：`pip install -r requirements.txt`
+3. 运行测试：`python -m pytest`
+4. 提交代码前请确保测试通过
+
+## 📄 许可证
+
+MIT License
+
+## 🔗 相关链接
+
+- [豆包语音识别API文档](https://www.volcengine.com/docs/6561/1354869)
+- [豆包开放平台](https://www.volcengine.com/)
+- [WebSocket协议](https://tools.ietf.org/html/rfc6455)
+
+## 📞 支持
+
+如有问题，请：
+
+1. 查看本文档的故障排除部分
+2. 搜索已有的Issue
+3. 创建新的Issue并提供详细信息
+4. 联系技术支持
+
+---
+
+**作者**: AIfeng  
+**版本**: 1.0.0  
+**更新时间**: 2025-07-11
--- a/asr/doubao/__init__.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/__init__.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR语音识别服务模块
+提供完整的语音识别功能，支持流式和非流式识别
+"""
+
+__version__ = "1.0.0"
+__author__ = "AIfeng"
+__description__ = "豆包ASR语音识别服务模块"
+
+# 导入核心类和函数
+from .asr_client import DoubaoASRClient
+from .config_manager import ConfigManager
+from .service_factory import (
+    DoubaoASRService,
+    create_asr_service,
+    recognize_file,
+    recognize_audio_data,
+    run_recognition
+)
+from .protocol import DoubaoProtocol, MessageType, MessageFlags, SerializationMethod, CompressionType
+from .audio_utils import AudioProcessor
+from .result_processor import (
+    DoubaoResultProcessor,
+    ASRResult,
+    create_text_only_callback,
+    extract_text_only
+)
+
+# 公共API
+__all__ = [
+    # 核心类
+    'DoubaoASRClient',
+    'DoubaoASRService',
+    'ConfigManager',
+    'DoubaoProtocol',
+    'AudioProcessor',
+    'DoubaoResultProcessor',
+    'ASRResult',
+    
+    # 便捷函数
+    'create_asr_service',
+    'recognize_file',
+    'recognize_audio_data',
+    'run_recognition',
+    'create_text_only_callback',
+    'extract_text_only',
+    
+    # 协议常量
+    'MessageType',
+    'MessageFlags',
+    'SerializationMethod',
+    'CompressionType',
+    
+    # 版本信息
+    '__version__',
+    '__author__',
+    '__description__'
+]
+
+
+# 快速开始示例
+def get_quick_start_example() -> str:
+    """
+    获取快速开始示例代码
+    
+    Returns:
+        str: 示例代码
+    """
+    return '''
+# 豆包ASR快速开始示例
+
+import asyncio
+from asr.doubao import recognize_file, create_asr_service
+
+# 方式1: 使用便捷函数（推荐用于简单场景）
+async def simple_recognition():
+    result = await recognize_file(
+        audio_path="path/to/your/audio.wav",
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True
+    )
+    print(result)
+
+# 方式2: 使用服务实例（推荐用于复杂场景）
+async def advanced_recognition():
+    # 创建服务实例
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True,
+        debug=True
+    )
+    
+    # 定义结果回调函数
+    def on_result(result):
+        if result.get('payload_msg'):
+            print(f"实时结果: {result['payload_msg']}")
+    
+    try:
+        # 执行识别
+        result = await service.recognize_file(
+            "path/to/your/audio.wav",
+            result_callback=on_result
+        )
+        print(f"最终结果: {result}")
+    finally:
+        await service.close()
+
+# 方式3: 使用配置文件
+async def config_based_recognition():
+    result = await recognize_file(
+        audio_path="path/to/your/audio.wav",
+        config_path="path/to/config.json"
+    )
+    print(result)
+
+# 同步方式（简单场景）
+def sync_recognition():
+    from asr.doubao import run_recognition
+    
+    result = run_recognition(
+        audio_path="path/to/your/audio.wav",
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    print(result)
+
+# 运行示例
+if __name__ == "__main__":
+    # 选择一种方式运行
+    asyncio.run(simple_recognition())
+    # asyncio.run(advanced_recognition())
+    # asyncio.run(config_based_recognition())
+    # sync_recognition()
+'''
+
+
+def get_config_template() -> str:
+    """
+    获取配置文件模板
+    
+    Returns:
+        str: 配置文件模板
+    """
+    return '''
+{
+  "asr_config": {
+    "ws_url": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel",
+    "ws_url_nostream": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream",
+    "resource_id": "volc.bigasr.sauc.duration",
+    "model_name": "bigmodel",
+    "enable_punc": true,
+    "streaming_mode": true,
+    "seg_duration": 200,
+    "mp3_seg_size": 1000
+  },
+  "auth_config": {
+    "app_key": "your_app_key_here",
+    "access_key": "your_access_key_here"
+  },
+  "audio_config": {
+    "default_format": "wav",
+    "default_rate": 16000,
+    "default_bits": 16,
+    "default_channel": 1,
+    "default_codec": "raw",
+    "supported_formats": ["wav", "mp3", "pcm"]
+  },
+  "connection_config": {
+    "max_size": 1000000000,
+    "timeout": 30,
+    "retry_times": 3,
+    "retry_delay": 1
+  },
+  "logging_config": {
+    "enable_debug": false,
+    "log_requests": true,
+    "log_responses": true
+  }
+}
+'''
+
+
+def print_info():
+    """
+    打印模块信息
+    """
+    print(f"豆包ASR语音识别服务模块 v{__version__}")
+    print(f"作者: {__author__}")
+    print(f"描述: {__description__}")
+    print("\n支持的功能:")
+    print("- 流式语音识别")
+    print("- 非流式语音识别")
+    print("- 多种音频格式支持 (WAV, MP3, PCM)")
+    print("- 灵活的配置管理")
+    print("- 异步和同步API")
+    print("- 实时结果回调")
+    print("\n快速开始:")
+    print("from asr.doubao import recognize_file")
+    print("result = await recognize_file('audio.wav', app_key='...', access_key='...')")
+
+
+if __name__ == "__main__":
+    print_info()
--- a/asr/doubao/__main__.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/__main__.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 14:15:00
+"""
+豆包ASR模块主入口
+支持通过 python -m asr.doubao 运行示例
+"""
+
+if __name__ == '__main__':
+    from .example import run_all_examples
+    import asyncio
+    
+    print("=== 豆包ASR语音识别服务示例 ===")
+    print("正在运行所有示例...")
+    
+    try:
+        asyncio.run(run_all_examples())
+    except KeyboardInterrupt:
+        print("\n用户中断执行")
+    except Exception as e:
+        print(f"执行失败: {e}")
+        print("请确保已设置环境变量: DOUBAO_APP_KEY, DOUBAO_ACCESS_KEY")
+        print("并准备好测试音频文件")
--- a/asr/doubao/asr_client.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/asr_client.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR客户端核心模块
+提供完整的语音识别服务接口，支持流式和非流式识别
+"""
+
+import asyncio
+import json
+import logging
+import time
+import uuid
+from pathlib import Path
+from typing import Dict, Any, Optional, Callable, AsyncGenerator
+
+import aiofiles
+import websockets
+from websockets.exceptions import ConnectionClosedError, WebSocketException
+
+from .protocol import DoubaoProtocol, MessageType
+from .audio_utils import AudioProcessor
+
+
+class DoubaoASRClient:
+    """豆包ASR客户端"""
+    
+    def __init__(self, config: Dict[str, Any]):
+        """
+        初始化ASR客户端
+        
+        Args:
+            config: 配置字典
+        """
+        self.config = config
+        self.asr_config = config.get('asr_config', {})
+        self.auth_config = config.get('auth_config', {})
+        self.audio_config = config.get('audio_config', {})
+        self.connection_config = config.get('connection_config', {})
+        self.logging_config = config.get('logging_config', {})
+        
+        # 设置日志
+        self.logger = self._setup_logger()
+        
+        # 协议处理器
+        self.protocol = DoubaoProtocol()
+        
+        # 音频处理器
+        self.audio_processor = AudioProcessor()
+        
+        # 连接状态
+        self.is_connected = False
+        self.current_session_id = None
+    
+    def _setup_logger(self) -> logging.Logger:
+        """设置日志记录器"""
+        logger = logging.getLogger('doubao_asr')
+        if not logger.handlers:
+            handler = logging.StreamHandler()
+            formatter = logging.Formatter(
+                '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+            )
+            handler.setFormatter(formatter)
+            logger.addHandler(handler)
+        
+        if self.logging_config.get('enable_debug', False):
+            logger.setLevel(logging.DEBUG)
+        else:
+            logger.setLevel(logging.INFO)
+        
+        return logger
+    
+    def _get_ws_url(self, streaming: bool = True) -> str:
+        """获取WebSocket URL"""
+        if streaming:
+            return self.asr_config.get('ws_url', 'wss://openspeech.bytedance.com/api/v3/sauc/bigmodel')
+        else:
+            return self.asr_config.get('ws_url_nostream', 'wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream')
+    
+    def _build_auth_headers(self, request_id: str) -> Dict[str, str]:
+        """构建认证头部"""
+        headers = {
+            'X-Api-Resource-Id': self.asr_config.get('resource_id', 'volc.bigasr.sauc.duration'),
+            'X-Api-Access-Key': self.auth_config.get('access_key', ''),
+            'X-Api-App-Key': self.auth_config.get('app_key', ''),
+            'X-Api-Request-Id': request_id
+        }
+        return headers
+    
+    def _build_request_params(
+        self,
+        request_id: str,
+        audio_format: str = 'wav',
+        sample_rate: int = 16000,
+        bits: int = 16,
+        channels: int = 1,
+        uid: str = 'default_user'
+    ) -> Dict[str, Any]:
+        """构建请求参数"""
+        return {
+            'user': {
+                'uid': uid
+            },
+            'audio': {
+                'format': audio_format,
+                'sample_rate': sample_rate,
+                'bits': bits,
+                'channel': channels,
+                'codec': self.audio_config.get('default_codec', 'raw')
+            },
+            'request': {
+                'model_name': self.asr_config.get('model_name', 'bigmodel'),
+                'enable_punc': self.asr_config.get('enable_punc', True)
+            }
+        }
+    
+    async def recognize_file(
+        self,
+        audio_path: str,
+        streaming: bool = True,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+        **kwargs
+    ) -> Dict[str, Any]:
+        """
+        识别音频文件
+        
+        Args:
+            audio_path: 音频文件路径
+            streaming: 是否使用流式识别
+            result_callback: 结果回调函数
+            **kwargs: 其他参数
+            
+        Returns:
+            Dict: 识别结果
+        """
+        try:
+            # 读取音频文件
+            async with aiofiles.open(audio_path, mode='rb') as f:
+                audio_data = await f.read()
+            
+            self.logger.info(f"开始识别音频文件: {audio_path}, 大小: {len(audio_data)} 字节")
+            
+            # 识别音频数据
+            return await self.recognize_audio_data(
+                audio_data,
+                streaming=streaming,
+                result_callback=result_callback,
+                **kwargs
+            )
+        
+        except Exception as e:
+            self.logger.error(f"识别音频文件失败: {e}")
+            return {
+                'success': False,
+                'error': str(e),
+                'audio_path': audio_path
+            }
+    
+    async def recognize_audio_data(
+        self,
+        audio_data: bytes,
+        streaming: bool = True,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+        **kwargs
+    ) -> Dict[str, Any]:
+        """
+        识别音频数据
+        
+        Args:
+            audio_data: 音频数据
+            streaming: 是否使用流式识别
+            result_callback: 结果回调函数
+            **kwargs: 其他参数
+            
+        Returns:
+            Dict: 识别结果
+        """
+        request_id = str(uuid.uuid4())
+        self.current_session_id = request_id
+        
+        try:
+            # 准备音频数据
+            audio_format, segment_size, metadata = self.audio_processor.prepare_audio_for_recognition(
+                audio_data,
+                segment_duration_ms=self.asr_config.get('seg_duration', 200)
+            )
+            
+            self.logger.info(f"音频格式: {audio_format}, 分片大小: {segment_size}, 元数据: {metadata}")
+            
+            # 构建请求参数
+            request_params = self._build_request_params(
+                request_id,
+                audio_format=audio_format,
+                sample_rate=metadata.get('sample_rate', 16000),
+                bits=metadata.get('sample_width', 2) * 8,
+                channels=metadata.get('channels', 1),
+                uid=kwargs.get('uid', 'default_user')
+            )
+            
+            # 执行识别
+            if streaming:
+                return await self._streaming_recognize(
+                    audio_data,
+                    request_params,
+                    segment_size,
+                    request_id,
+                    result_callback
+                )
+            else:
+                return await self._non_streaming_recognize(
+                    audio_data,
+                    request_params,
+                    request_id
+                )
+        
+        except Exception as e:
+            self.logger.error(f"识别音频数据失败: {e}")
+            return {
+                'success': False,
+                'error': str(e),
+                'request_id': request_id
+            }
+    
+    async def _streaming_recognize(
+        self,
+        audio_data: bytes,
+        request_params: Dict[str, Any],
+        segment_size: int,
+        request_id: str,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]] = None
+    ) -> Dict[str, Any]:
+        """流式识别处理"""
+        ws_url = self._get_ws_url(streaming=True)
+        headers = self._build_auth_headers(request_id)
+        
+        results = []
+        final_result = None
+        
+        try:
+            # 兼容不同版本的websockets库
+            connect_kwargs = {
+                'uri': ws_url,
+                'max_size': self.connection_config.get('max_size', 1000000000)
+            }
+            
+            # 尝试使用新版本的additional_headers参数
+            try:
+                async with websockets.connect(
+                    **connect_kwargs,
+                    additional_headers=headers
+                ) as ws:
+                    await self._handle_streaming_connection(ws, audio_data, request_params, segment_size, request_id, result_callback, results, final_result)
+            except TypeError:
+                # 回退到旧版本的extra_headers参数
+                async with websockets.connect(
+                    **connect_kwargs,
+                    extra_headers=headers
+                ) as ws:
+                    await self._handle_streaming_connection(ws, audio_data, request_params, segment_size, request_id, result_callback, results, final_result)
+            
+            return {
+                'success': True,
+                'request_id': request_id,
+                'results': results,
+                'final_result': final_result,
+                'total_results': len(results)
+            }
+        
+        except ConnectionClosedError as e:
+            self.logger.error(f"WebSocket连接关闭: {e.code} - {e.reason}")
+            return {
+                'success': False,
+                'error': f"连接关闭: {e.reason}",
+                'error_code': e.code,
+                'request_id': request_id
+            }
+        
+        except WebSocketException as e:
+            self.logger.error(f"WebSocket异常: {e}")
+            return {
+                'success': False,
+                'error': str(e),
+                'request_id': request_id
+            }
+        
+        except Exception as e:
+            self.logger.error(f"流式识别异常: {e}")
+            return {
+                'success': False,
+                'error': str(e),
+                'request_id': request_id
+            }
+        
+        finally:
+            self.is_connected = False
+    
+    async def _handle_streaming_connection(
+        self,
+        ws,
+        audio_data: bytes,
+        request_params: Dict[str, Any],
+        segment_size: int,
+        request_id: str,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]],
+        results: list,
+        final_result: Any
+    ):
+        """处理流式连接的核心逻辑"""
+        self.is_connected = True
+        self.logger.info(f"WebSocket连接建立成功")
+        
+        # 发送初始请求
+        seq = 1
+        full_request = self.protocol.build_full_request(request_params, seq)
+        await ws.send(full_request)
+        
+        # 接收初始响应
+        response = await ws.recv()
+        result = self.protocol.parse_response(response)
+        
+        if self.logging_config.get('log_responses', True):
+            self.logger.debug(f"初始响应: {result}")
+        
+        # 分片发送音频数据
+        for chunk, is_last in self.audio_processor.slice_audio_data(audio_data, segment_size):
+            seq += 1
+            if is_last:
+                seq = -seq
+            
+            start_time = time.time()
+            
+            # 构建音频请求
+            audio_request = self.protocol.build_audio_request(
+                chunk, seq, is_last
+            )
+            
+            # 发送音频数据
+            await ws.send(audio_request)
+            
+            # 接收响应
+            response = await ws.recv()
+            result = self.protocol.parse_response(response)
+            
+            # 处理结果
+            if result.get('payload_msg'):
+                results.append(result)
+                
+                # 调用回调函数
+                if result_callback:
+                    try:
+                        result_callback(result)
+                    except Exception as e:
+                        self.logger.warning(f"回调函数执行失败: {e}")
+            
+            if result.get('is_last_package'):
+                final_result = result
+                break
+            
+            # 流式识别延时控制
+            if self.asr_config.get('streaming_mode', True):
+                elapsed = time.time() - start_time
+                sleep_time = max(0, (self.asr_config.get('seg_duration', 200) / 1000.0) - elapsed)
+                if sleep_time > 0:
+                    await asyncio.sleep(sleep_time)
+    
+    async def _non_streaming_recognize(
+        self,
+        audio_data: bytes,
+        request_params: Dict[str, Any],
+        request_id: str
+    ) -> Dict[str, Any]:
+        """非流式识别处理"""
+        ws_url = self._get_ws_url(streaming=False)
+        headers = self._build_auth_headers(request_id)
+        
+        try:
+            # 兼容不同版本的websockets库
+            connect_kwargs = {
+                'uri': ws_url,
+                'max_size': self.connection_config.get('max_size', 1000000000)
+            }
+            
+            # 尝试使用新版本的additional_headers参数
+            try:
+                async with websockets.connect(
+                    **connect_kwargs,
+                    additional_headers=headers
+                ) as ws:
+                    return await self._handle_non_streaming_connection(ws, audio_data, request_params, request_id)
+            except TypeError:
+                # 回退到旧版本的extra_headers参数
+                async with websockets.connect(
+                    **connect_kwargs,
+                    extra_headers=headers
+                ) as ws:
+                    return await self._handle_non_streaming_connection(ws, audio_data, request_params, request_id)
+        
+        except Exception as e:
+            self.logger.error(f"非流式识别异常: {e}")
+            return {
+                'success': False,
+                'error': str(e),
+                'request_id': request_id
+            }
+        
+        finally:
+            self.is_connected = False
+    
+    async def _handle_non_streaming_connection(
+        self,
+        ws,
+        audio_data: bytes,
+        request_params: Dict[str, Any],
+        request_id: str
+    ) -> Dict[str, Any]:
+        """处理非流式连接的核心逻辑"""
+        self.is_connected = True
+        self.logger.info(f"WebSocket连接建立成功")
+        
+        # 发送完整请求（包含音频数据）
+        full_request = self.protocol.build_full_request(request_params, 1)
+        await ws.send(full_request)
+        
+        # 发送音频数据
+        audio_request = self.protocol.build_audio_request(
+            audio_data, -1, is_last=True
+        )
+        await ws.send(audio_request)
+        
+        # 接收最终结果
+        response = await ws.recv()
+        result = self.protocol.parse_response(response)
+        
+        self.is_connected = False
+        
+        return {
+            'success': True,
+            'request_id': request_id,
+            'result': result
+        }
+    
+    async def close(self):
+        """关闭客户端"""
+        self.is_connected = False
+        self.current_session_id = None
+        self.logger.info("ASR客户端已关闭")
+    
+    def get_status(self) -> Dict[str, Any]:
+        """获取客户端状态"""
+        return {
+            'is_connected': self.is_connected,
+            'current_session_id': self.current_session_id,
+            'config': {
+                'ws_url': self._get_ws_url(),
+                'model_name': self.asr_config.get('model_name'),
+                'streaming_mode': self.asr_config.get('streaming_mode')
+            }
+        }
--- a/asr/doubao/audio_utils.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/audio_utils.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR音频处理工具模块
+提供音频格式检测、分片处理、元数据提取等功能
+"""
+
+import wave
+from io import BytesIO
+from typing import Tuple, Generator, Dict, Any
+
+
+class AudioProcessor:
+    """音频处理器"""
+    
+    @staticmethod
+    def read_wav_info(audio_data: bytes) -> Tuple[int, int, int, int, bytes]:
+        """
+        读取WAV文件信息
+        
+        Args:
+            audio_data: WAV音频数据
+            
+        Returns:
+            Tuple: (声道数, 采样宽度, 采样率, 帧数, 音频字节数据)
+        """
+        try:
+            with BytesIO(audio_data) as audio_io:
+                with wave.open(audio_io, 'rb') as wave_fp:
+                    nchannels, sampwidth, framerate, nframes = wave_fp.getparams()[:4]
+                    wave_bytes = wave_fp.readframes(nframes)
+            return nchannels, sampwidth, framerate, nframes, wave_bytes
+        except Exception as e:
+            raise ValueError(f"读取WAV文件失败: {e}")
+    
+    @staticmethod
+    def is_wav_format(audio_data: bytes) -> bool:
+        """
+        检查是否为WAV格式
+        
+        Args:
+            audio_data: 音频数据
+            
+        Returns:
+            bool: 是否为WAV格式
+        """
+        if len(audio_data) < 44:
+            return False
+        return audio_data[0:4] == b"RIFF" and audio_data[8:12] == b"WAVE"
+    
+    @staticmethod
+    def detect_audio_format(audio_data: bytes) -> str:
+        """
+        检测音频格式
+        
+        Args:
+            audio_data: 音频数据
+            
+        Returns:
+            str: 音频格式 ('wav', 'mp3', 'pcm', 'unknown')
+        """
+        if len(audio_data) < 4:
+            return 'unknown'
+        
+        # 检查WAV格式
+        if AudioProcessor.is_wav_format(audio_data):
+            return 'wav'
+        
+        # 检查MP3格式
+        if audio_data[0:3] == b"ID3" or audio_data[0:2] == b"\xff\xfb":
+            return 'mp3'
+        
+        # 默认为PCM
+        return 'pcm'
+    
+    @staticmethod
+    def slice_audio_data(
+        audio_data: bytes, 
+        chunk_size: int
+    ) -> Generator[Tuple[bytes, bool], None, None]:
+        """
+        将音频数据分片
+        
+        Args:
+            audio_data: 音频数据
+            chunk_size: 分片大小
+            
+        Yields:
+            Tuple[bytes, bool]: (音频片段, 是否为最后一片)
+        """
+        data_len = len(audio_data)
+        offset = 0
+        
+        while offset + chunk_size < data_len:
+            yield audio_data[offset:offset + chunk_size], False
+            offset += chunk_size
+        
+        # 最后一片
+        if offset < data_len:
+            yield audio_data[offset:data_len], True
+    
+    @staticmethod
+    def calculate_segment_size(
+        audio_format: str,
+        sample_rate: int = 16000,
+        channels: int = 1,
+        bits: int = 16,
+        segment_duration_ms: int = 200,
+        mp3_seg_size: int = 1000
+    ) -> int:
+        """
+        计算音频分片大小
+        
+        Args:
+            audio_format: 音频格式
+            sample_rate: 采样率
+            channels: 声道数
+            bits: 位深度
+            segment_duration_ms: 分片时长(毫秒)
+            mp3_seg_size: MP3分片大小
+            
+        Returns:
+            int: 分片大小(字节)
+        """
+        if audio_format == 'mp3':
+            return mp3_seg_size
+        elif audio_format == 'wav':
+            # 计算每秒字节数
+            bytes_per_second = channels * (bits // 8) * sample_rate
+            return int(bytes_per_second * segment_duration_ms / 1000)
+        elif audio_format == 'pcm':
+            # PCM格式计算
+            return int(sample_rate * (bits // 8) * channels * segment_duration_ms / 1000)
+        else:
+            raise ValueError(f"不支持的音频格式: {audio_format}")
+    
+    @staticmethod
+    def extract_wav_metadata(audio_data: bytes) -> Dict[str, Any]:
+        """
+        提取WAV文件元数据
+        
+        Args:
+            audio_data: WAV音频数据
+            
+        Returns:
+            Dict: 音频元数据
+        """
+        try:
+            nchannels, sampwidth, framerate, nframes, _ = AudioProcessor.read_wav_info(audio_data)
+            duration = nframes / framerate
+            
+            return {
+                'format': 'wav',
+                'channels': nchannels,
+                'sample_width': sampwidth,
+                'sample_rate': framerate,
+                'frames': nframes,
+                'duration': duration,
+                'size': len(audio_data)
+            }
+        except Exception as e:
+            return {
+                'format': 'wav',
+                'error': str(e),
+                'size': len(audio_data)
+            }
+    
+    @staticmethod
+    def validate_audio_params(
+        audio_format: str,
+        sample_rate: int,
+        channels: int,
+        bits: int
+    ) -> bool:
+        """
+        验证音频参数
+        
+        Args:
+            audio_format: 音频格式
+            sample_rate: 采样率
+            channels: 声道数
+            bits: 位深度
+            
+        Returns:
+            bool: 参数是否有效
+        """
+        # 支持的格式
+        supported_formats = ['wav', 'mp3', 'pcm']
+        if audio_format not in supported_formats:
+            return False
+        
+        # 采样率范围
+        if sample_rate < 8000 or sample_rate > 48000:
+            return False
+        
+        # 声道数
+        if channels < 1 or channels > 2:
+            return False
+        
+        # 位深度
+        if bits not in [8, 16, 24, 32]:
+            return False
+        
+        return True
+    
+    @staticmethod
+    def prepare_audio_for_recognition(
+        audio_data: bytes,
+        target_format: str = 'wav',
+        segment_duration_ms: int = 200
+    ) -> Tuple[str, int, Dict[str, Any]]:
+        """
+        为识别准备音频数据
+        
+        Args:
+            audio_data: 原始音频数据
+            target_format: 目标格式
+            segment_duration_ms: 分片时长
+            
+        Returns:
+            Tuple: (检测到的格式, 分片大小, 音频元数据)
+        """
+        # 检测音频格式
+        detected_format = AudioProcessor.detect_audio_format(audio_data)
+        
+        # 提取元数据
+        if detected_format == 'wav':
+            metadata = AudioProcessor.extract_wav_metadata(audio_data)
+            segment_size = AudioProcessor.calculate_segment_size(
+                detected_format,
+                metadata.get('sample_rate', 16000),
+                metadata.get('channels', 1),
+                metadata.get('sample_width', 2) * 8,
+                segment_duration_ms
+            )
+        else:
+            # 对于非WAV格式，使用默认参数
+            metadata = {
+                'format': detected_format,
+                'size': len(audio_data)
+            }
+            segment_size = AudioProcessor.calculate_segment_size(
+                detected_format,
+                segment_duration_ms=segment_duration_ms
+            )
+        
+        return detected_format, segment_size, metadata
--- a/asr/doubao/config.json 0 → 100644
View file @f274f8c
+++ b/asr/doubao/config.json 0 → 100644
View file @f274f8c
+{
+  "asr_config": {
+    "ws_url": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel",
+    "ws_url_nostream": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream",
+    "resource_id": "volc.bigasr.sauc.duration",
+    "resource_id_concurrent": "volc.bigasr.sauc.concurrent",
+    "model_name": "bigmodel",
+    "enable_punc": true,
+    "streaming_mode": true,
+    "seg_duration": 200,
+    "mp3_seg_size": 1000
+  },
+  "auth_config": {
+    "app_key": "1549099156",
+    "access_key": "0GcKVco6j09bThrIgQWTWa3g1nA91_9C"
+  },
+  "audio_config": {
+    "default_format": "wav",
+    "default_rate": 16000,
+    "default_bits": 16,
+    "default_channel": 1,
+    "default_codec": "raw",
+    "supported_formats": ["wav", "mp3", "pcm"]
+  },
+  "connection_config": {
+    "max_size": 1000000000,
+    "timeout": 30,
+    "retry_times": 3,
+    "retry_delay": 1
+  },
+  "logging_config": {
+    "enable_debug": false,
+    "log_requests": true,
+    "log_responses": true
+  }
+}
--- a/asr/doubao/config_manager.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/config_manager.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR配置管理模块
+提供配置文件加载、验证、合并和环境变量支持
+"""
+
+import json
+import os
+from pathlib import Path
+from typing import Dict, Any, Optional
+
+
+class ConfigManager:
+    """配置管理器"""
+    
+    DEFAULT_CONFIG = {
+        "asr_config": {
+            "ws_url": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel",
+            "ws_url_nostream": "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_nostream",
+            "resource_id": "volc.bigasr.sauc.duration",
+            "resource_id_concurrent": "volc.bigasr.sauc.concurrent",
+            "model_name": "bigmodel",
+            "enable_punc": True,
+            "streaming_mode": True,
+            "seg_duration": 200,
+            "mp3_seg_size": 1000
+        },
+        "auth_config": {
+            "app_key": "",
+            "access_key": ""
+        },
+        "audio_config": {
+            "default_format": "wav",
+            "default_rate": 16000,
+            "default_bits": 16,
+            "default_channel": 1,
+            "default_codec": "raw",
+            "supported_formats": ["wav", "mp3", "pcm"]
+        },
+        "connection_config": {
+            "max_size": 1000000000,
+            "timeout": 30,
+            "retry_times": 3,
+            "retry_delay": 1
+        },
+        "logging_config": {
+            "enable_debug": False,
+            "log_requests": True,
+            "log_responses": True
+        }
+    }
+    
+    def __init__(self, config_path: Optional[str] = None):
+        """
+        初始化配置管理器
+        
+        Args:
+            config_path: 配置文件路径
+        """
+        self.config_path = config_path
+        self.config = self.DEFAULT_CONFIG.copy()
+        
+        if config_path:
+            self.load_config(config_path)
+        
+        # 从环境变量加载配置
+        self._load_from_env()
+    
+    def load_config(self, config_path: str) -> Dict[str, Any]:
+        """
+        加载配置文件
+        
+        Args:
+            config_path: 配置文件路径
+            
+        Returns:
+            Dict: 配置字典
+        """
+        try:
+            config_file = Path(config_path)
+            if not config_file.exists():
+                raise FileNotFoundError(f"配置文件不存在: {config_path}")
+            
+            with open(config_file, 'r', encoding='utf-8') as f:
+                file_config = json.load(f)
+            
+            # 合并配置
+            self.config = self._merge_config(self.config, file_config)
+            
+            # 验证配置
+            self._validate_config()
+            
+            return self.config
+        
+        except Exception as e:
+            raise ValueError(f"加载配置文件失败: {e}")
+    
+    def _merge_config(self, base_config: Dict[str, Any], new_config: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        合并配置字典
+        
+        Args:
+            base_config: 基础配置
+            new_config: 新配置
+            
+        Returns:
+            Dict: 合并后的配置
+        """
+        merged = base_config.copy()
+        
+        for key, value in new_config.items():
+            if key in merged and isinstance(merged[key], dict) and isinstance(value, dict):
+                merged[key] = self._merge_config(merged[key], value)
+            else:
+                merged[key] = value
+        
+        return merged
+    
+    def _load_from_env(self):
+        """从环境变量加载配置"""
+        # ASR配置
+        if os.getenv('DOUBAO_WS_URL'):
+            self.config['asr_config']['ws_url'] = os.getenv('DOUBAO_WS_URL')
+        
+        if os.getenv('DOUBAO_MODEL_NAME'):
+            self.config['asr_config']['model_name'] = os.getenv('DOUBAO_MODEL_NAME')
+        
+        if os.getenv('DOUBAO_SEG_DURATION'):
+            try:
+                self.config['asr_config']['seg_duration'] = int(os.getenv('DOUBAO_SEG_DURATION'))
+            except ValueError:
+                pass
+        
+        # 认证配置
+        if os.getenv('DOUBAO_APP_KEY'):
+            self.config['auth_config']['app_key'] = os.getenv('DOUBAO_APP_KEY')
+        
+        if os.getenv('DOUBAO_ACCESS_KEY'):
+            self.config['auth_config']['access_key'] = os.getenv('DOUBAO_ACCESS_KEY')
+        
+        # 日志配置
+        if os.getenv('DOUBAO_DEBUG'):
+            self.config['logging_config']['enable_debug'] = os.getenv('DOUBAO_DEBUG').lower() == 'true'
+    
+    def _validate_config(self):
+        """验证配置"""
+        # 验证必需的认证信息
+        auth_config = self.config.get('auth_config', {})
+        if not auth_config.get('app_key'):
+            raise ValueError("缺少必需的配置: auth_config.app_key")
+        
+        if not auth_config.get('access_key'):
+            raise ValueError("缺少必需的配置: auth_config.access_key")
+        
+        # 验证ASR配置
+        asr_config = self.config.get('asr_config', {})
+        if not asr_config.get('ws_url'):
+            raise ValueError("缺少必需的配置: asr_config.ws_url")
+        
+        # 验证音频配置
+        audio_config = self.config.get('audio_config', {})
+        supported_formats = audio_config.get('supported_formats', [])
+        default_format = audio_config.get('default_format')
+        
+        if default_format and default_format not in supported_formats:
+            raise ValueError(f"默认音频格式 {default_format} 不在支持的格式列表中: {supported_formats}")
+        
+        # 验证数值范围
+        seg_duration = asr_config.get('seg_duration', 200)
+        if not (50 <= seg_duration <= 1000):
+            raise ValueError(f"分片时长必须在50-1000ms之间，当前值: {seg_duration}")
+        
+        sample_rate = audio_config.get('default_rate', 16000)
+        if sample_rate not in [8000, 16000, 22050, 44100, 48000]:
+            raise ValueError(f"不支持的采样率: {sample_rate}")
+    
+    def get_config(self) -> Dict[str, Any]:
+        """
+        获取完整配置
+        
+        Returns:
+            Dict: 配置字典
+        """
+        return self.config.copy()
+    
+    def get_asr_config(self) -> Dict[str, Any]:
+        """
+        获取ASR配置
+        
+        Returns:
+            Dict: ASR配置
+        """
+        return self.config.get('asr_config', {}).copy()
+    
+    def get_auth_config(self) -> Dict[str, Any]:
+        """
+        获取认证配置
+        
+        Returns:
+            Dict: 认证配置
+        """
+        return self.config.get('auth_config', {}).copy()
+    
+    def get_audio_config(self) -> Dict[str, Any]:
+        """
+        获取音频配置
+        
+        Returns:
+            Dict: 音频配置
+        """
+        return self.config.get('audio_config', {}).copy()
+    
+    def update_config(self, new_config: Dict[str, Any]):
+        """
+        更新配置
+        
+        Args:
+            new_config: 新配置
+        """
+        self.config = self._merge_config(self.config, new_config)
+        self._validate_config()
+    
+    def save_config(self, output_path: Optional[str] = None):
+        """
+        保存配置到文件
+        
+        Args:
+            output_path: 输出文件路径，默认使用原配置文件路径
+        """
+        save_path = output_path or self.config_path
+        if not save_path:
+            raise ValueError("未指定保存路径")
+        
+        try:
+            with open(save_path, 'w', encoding='utf-8') as f:
+                json.dump(self.config, f, indent=2, ensure_ascii=False)
+        except Exception as e:
+            raise ValueError(f"保存配置文件失败: {e}")
+    
+    def create_default_config(self, output_path: str) -> Dict[str, Any]:
+        """
+        创建默认配置文件
+        
+        Args:
+            output_path: 输出文件路径
+            
+        Returns:
+            Dict[str, Any]: 默认配置字典
+        """
+        try:
+            with open(output_path, 'w', encoding='utf-8') as f:
+                json.dump(self.DEFAULT_CONFIG, f, indent=2, ensure_ascii=False)
+            return self.DEFAULT_CONFIG.copy()
+        except Exception as e:
+            raise ValueError(f"创建默认配置文件失败: {e}")
+    
+    def get_env_template(self) -> str:
+        """
+        获取环境变量模板
+        
+        Returns:
+            str: 环境变量模板
+        """
+        template = """
+# 豆包ASR环境变量配置模板
+
+# ASR服务配置
+DOUBAO_WS_URL=wss://openspeech.bytedance.com/api/v3/sauc/bigmodel
+DOUBAO_MODEL_NAME=bigmodel
+DOUBAO_SEG_DURATION=200
+
+# 认证配置（必需）
+DOUBAO_APP_KEY=your_app_key_here
+DOUBAO_ACCESS_KEY=your_access_key_here
+
+# 调试配置
+DOUBAO_DEBUG=false
+"""
+        return template.strip()
+    
+    @classmethod
+    def from_dict(cls, config_dict: Dict[str, Any]) -> 'ConfigManager':
+        """
+        从字典创建配置管理器
+        
+        Args:
+            config_dict: 配置字典
+            
+        Returns:
+            ConfigManager: 配置管理器实例
+        """
+        manager = cls()
+        manager.config = manager._merge_config(manager.DEFAULT_CONFIG, config_dict)
+        manager._validate_config()
+        return manager
+    
+    @classmethod
+    def from_env(cls) -> 'ConfigManager':
+        """
+        仅从环境变量创建配置管理器
+        
+        Returns:
+            ConfigManager: 配置管理器实例
+        """
+        manager = cls()
+        return manager
--- a/asr/doubao/example.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/example.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR语音识别服务使用示例
+演示各种使用场景和最佳实践
+"""
+
+import asyncio
+import os
+import logging
+from pathlib import Path
+from typing import Dict, Any, Optional
+
+# 导入ASR服务 - 支持相对导入和绝对导入
+try:
+    # 尝试相对导入（作为包运行时）
+    from . import (
+        recognize_file,
+        recognize_audio_data,
+        create_asr_service,
+        run_recognition,
+        ConfigManager
+    )
+except ImportError:
+    # 回退到绝对导入（独立运行时）
+    try:
+        from asr_client import (
+            recognize_file,
+            recognize_audio_data,
+            create_asr_service,
+            run_recognition
+        )
+        from config_manager import ConfigManager
+    except ImportError:
+        # 最后尝试直接导入
+        import sys
+        from pathlib import Path
+        
+        # 添加当前目录到路径
+        current_dir = Path(__file__).parent
+        sys.path.insert(0, str(current_dir))
+        
+        from asr_client import (
+            recognize_file,
+            recognize_audio_data,
+            create_asr_service,
+            run_recognition
+        )
+        from config_manager import ConfigManager
+
+
+# 配置日志
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+
+class ASRExamples:
+    """
+    ASR使用示例集合
+    """
+    
+    def __init__(self, app_key: str, access_key: str):
+        self.app_key = app_key
+        self.access_key = access_key
+    
+    """
+        示例1: 简单文件识别
+    """
+    async def example_1_simple_file_recognition(self, audio_path: str):
+        
+        logger.info("=== 示例1: 简单文件识别 ===")
+        
+        try:
+            result = await recognize_file(
+                audio_path=audio_path,
+                app_key=self.app_key,
+                access_key=self.access_key,
+                streaming=True
+            )
+            
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+    
+    """
+    示例2: 流式识别with实时回调 - 简化版流式输出演示
+    """
+    async def example_2_streaming_with_callback(self, audio_path: str):
+       
+        logger.info("=== 示例2: 流式识别with实时回调 - 简化版流式输出演示 ===")
+        
+        # 流式输出状态
+        self.current_text = ""
+        self.result_count = 0
+        
+        def clear_line():
+            """清除当前行"""
+            print('\r' + ' ' * 100 + '\r', end='', flush=True)
+        
+        def print_streaming_result(text: str, is_final: bool = False):
+            """打印流式结果"""
+            clear_line()
+            status = "[最终]" if is_final else "[流式]"
+            timestamp = f"[{self.result_count:02d}]"
+            print(f"{timestamp}{status} {text}", end='', flush=True)
+            if is_final:
+                print()  # 最终结果换行
+        
+        # 定义结果回调函数 - 演示实时文本更新
+        def on_result(result: Dict[str, Any]):
+            self.result_count += 1
+            
+            if result.get('payload_msg'):
+                payload = result['payload_msg']
+                
+                # 检查是否有识别结果
+                if 'result' in payload and 'text' in payload['result']:
+                    new_text = payload['result']['text']
+                    
+                    # 显示累积文本更新
+                    print_streaming_result(new_text, False)
+                    self.current_text = new_text
+            
+            # 检查是否为最终结果
+            if result.get('is_last_package', False):
+                print_streaming_result(self.current_text, True)
+                logger.info(f"识别完成，共收到{self.result_count}次流式结果")
+        
+        print("\n观察流式文本累积更新效果:")
+        print()
+        
+        # 创建服务实例
+        service = create_asr_service(
+            app_key=self.app_key,
+            access_key=self.access_key,
+            streaming=True,
+            debug=False  # 关闭调试日志以便观察输出
+        )
+        
+        try:
+            result = await service.recognize_file(
+                audio_path,
+                result_callback=on_result
+            )
+            
+            print()
+            logger.info(f"=== 识别结果摘要 ===")
+            logger.info(f"最终文本: {self.current_text}")
+            logger.info(f"流式更新次数: {self.result_count}")
+            
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+        finally:
+            await service.close()
+    
+    
+    """
+    示例3: 非流式识别
+    """
+    async def example_3_non_streaming_recognition(self, audio_path: str):
+
+        logger.info("=== 示例3: 非流式识别 ===")
+        
+        try:
+            result = await recognize_file(
+                audio_path=audio_path,
+                app_key=self.app_key,
+                access_key=self.access_key,
+                streaming=False  # 非流式
+            )
+            
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+    
+    """
+        示例4: 音频数据识别
+    """
+    async def example_4_audio_data_recognition(self, audio_data: bytes, audio_format: str = "wav"):
+
+        logger.info("=== 示例4: 音频数据识别 ===")
+        
+        try:
+            result = await recognize_audio_data(
+                audio_data=audio_data,
+                audio_format=audio_format,
+                app_key=self.app_key,
+                access_key=self.access_key,
+                streaming=True
+            )
+            
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+    
+    """
+        示例5: 基于配置文件的识别
+    """
+    async def example_5_config_based_recognition(self, audio_path: str, config_path: str):
+        """
+        示例5: 基于配置文件的识别
+        """
+        logger.info("=== 示例5: 基于配置文件的识别 ===")
+        
+        try:
+            result = await recognize_file(
+                audio_path=audio_path,
+                config_path=config_path
+            )
+            
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+    
+    """
+    示例6: 批量识别
+    """
+    async def example_6_batch_recognition(self, audio_files: list):
+        
+        logger.info("=== 示例6: 批量识别 ===")
+        
+        results = []
+        
+        # 创建服务实例（复用连接）
+        service = create_asr_service(
+            app_key=self.app_key,
+            access_key=self.access_key,
+            streaming=True
+        )
+        
+        try:
+            for i, audio_file in enumerate(audio_files):
+                logger.info(f"处理文件 {i+1}/{len(audio_files)}: {audio_file}")
+                
+                try:
+                    result = await service.recognize_file(audio_file)
+                    results.append({
+                        'file': audio_file,
+                        'result': result,
+                        'status': 'success'
+                    })
+                    
+                except Exception as e:
+                    logger.error(f"文件 {audio_file} 识别失败: {e}")
+                    results.append({
+                        'file': audio_file,
+                        'result': None,
+                        'status': 'failed',
+                        'error': str(e)
+                    })
+            
+            logger.info(f"批量识别完成，成功: {sum(1 for r in results if r['status'] == 'success')}/{len(results)}")
+            return results
+            
+        finally:
+            await service.close()
+    
+    """
+        示例7: 同步识别（简单场景）
+    """
+    def example_7_sync_recognition(self, audio_path: str):
+       
+        logger.info("=== 示例7: 同步识别 ===")
+        
+        try:
+            result = run_recognition(
+                audio_path=audio_path,
+                app_key=self.app_key,
+                access_key=self.access_key,
+                streaming=True
+            )
+            
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+    
+    """
+        示例8: 自定义配置识别
+    """
+    async def example_8_custom_config_recognition(self, audio_path: str):
+        
+        logger.info("=== 示例8: 自定义配置识别 ===")
+        
+        # 自定义配置
+        custom_config = {
+            'asr_config': {
+                'enable_punc': True,
+                'seg_duration': 300,  # 自定义分段时长
+                'streaming_mode': True
+            },
+            'audio_config': {
+                'default_rate': 16000,
+                'default_bits': 16,
+                'default_channel': 1
+            },
+            'connection_config': {
+                'timeout': 60,  # 自定义超时时间
+                'retry_times': 5
+            },
+            'logging_config': {
+                'enable_debug': True
+            }
+        }
+        
+        service = create_asr_service(
+            app_key=self.app_key,
+            access_key=self.access_key,
+            custom_config=custom_config
+        )
+        
+        try:
+            result = await service.recognize_file(audio_path)
+            logger.info(f"识别结果: {result}")
+            return result
+            
+        except Exception as e:
+            logger.error(f"识别失败: {e}")
+            return None
+        finally:
+            await service.close()
+
+
+def create_sample_config(config_path: str, app_key: str, access_key: str):
+    """
+    创建示例配置文件
+    """
+    config_manager = ConfigManager()
+    
+    # 创建配置
+    config = config_manager.create_default_config(config_path)
+    config['auth_config']['app_key'] = app_key
+    config['auth_config']['access_key'] = access_key
+    config['logging_config']['enable_debug'] = True
+    
+    # 更新配置管理器的配置并保存
+    config_manager.update_config(config)
+    config_manager.save_config(config_path)
+    logger.info(f"示例配置文件已创建: {config_path}")
+
+
+async def run_all_examples():
+    """
+    运行所有示例
+    """
+    # 从环境变量获取密钥
+    app_key = os.getenv('DOUBAO_APP_KEY', '1549099156')
+    access_key = os.getenv('DOUBAO_ACCESS_KEY', '0GcKVco6j09bThrIgQWTWa3g1nA91_9C')
+    
+    if app_key == 'your_app_key_here' or access_key == 'your_access_key_here':
+        logger.warning("请设置环境变量 DOUBAO_APP_KEY 和 DOUBAO_ACCESS_KEY")
+        logger.info("或者直接修改代码中的密钥")
+        return
+    
+    # 示例音频文件路径（请替换为实际路径）
+    audio_path = "E:\\fengyang\\eman_one\\speech.wav"
+    
+    if not Path(audio_path).exists():
+        logger.warning(f"音频文件不存在: {audio_path}")
+        logger.info("请替换为实际的音频文件路径")
+        return
+    
+    # 创建示例实例
+    examples = ASRExamples(app_key, access_key)
+    
+    # 创建示例配置文件
+    config_path = "example_config.json"
+    create_sample_config(config_path, app_key, access_key)
+    
+    try:
+        # 运行示例
+        # await examples.example_1_simple_file_recognition(audio_path)
+        await examples.example_2_streaming_with_callback(audio_path)
+        # await examples.example_3_non_streaming_recognition(audio_path)
+        
+        # 音频数据示例（需要实际音频数据）
+        # with open(audio_path, 'rb') as f:
+        #     audio_data = f.read()
+        # await examples.example_4_audio_data_recognition(audio_data)
+        
+        # await examples.example_5_config_based_recognition(audio_path, config_path)
+        
+        # 批量识别示例
+        # audio_files = [audio_path]  # 添加更多文件
+        # await examples.example_6_batch_recognition(audio_files)
+        
+        # 同步识别示例
+        # examples.example_7_sync_recognition(audio_path)
+        
+        # await examples.example_8_custom_config_recognition(audio_path)
+        
+    except Exception as e:
+        logger.error(f"示例运行失败: {e}")
+    
+    finally:
+        # 清理示例配置文件
+        if Path(config_path).exists():
+            os.remove(config_path)
+            logger.info(f"已清理示例配置文件: {config_path}")
+
+
+if __name__ == "__main__":
+    # 运行所有示例
+    asyncio.run(run_all_examples())
+    
+    # 或者运行单个示例
+    # app_key = "your_app_key"
+    # access_key = "your_access_key"
+    # audio_path = "path/to/audio.wav"
+    # 
+    # examples = ASRExamples(app_key, access_key)
+    # asyncio.run(examples.example_1_simple_file_recognition(audio_path))
--- a/asr/doubao/example_optimized.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/example_optimized.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-17 13:58:00
+"""
+豆包ASR优化使用示例
+演示如何使用结果处理器来优化ASR输出，只获取文本内容
+"""
+
+import asyncio
+import logging
+from pathlib import Path
+from typing import Dict, Any
+
+from .service_factory import create_asr_service
+from .result_processor import DoubaoResultProcessor, create_text_only_callback, extract_text_only
+
+# 设置日志
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+async def example_optimized_streaming():
+    """优化的流式识别示例 - 只输出文本内容"""
+    logger.info("=== 优化流式识别示例 ===")
+    
+    # 创建结果处理器
+    processor = DoubaoResultProcessor(
+        text_only=True,
+        enable_streaming_log=False  # 关闭流式日志，减少输出
+    )
+    
+    # 用户自定义文本处理函数
+    def on_text_result(text: str):
+        """只处理文本内容的回调函数"""
+        print(f"识别文本: {text}")
+        # 这里可以添加你的业务逻辑
+        # 比如：发送到WebSocket、保存到数据库等
+    
+    # 创建优化的回调函数
+    optimized_callback = processor.create_optimized_callback(on_text_result)
+    
+    # 创建ASR服务
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key",
+        streaming=True,
+        debug=False  # 关闭调试模式，减少日志输出
+    )
+    
+    try:
+        # 识别音频文件
+        audio_path = "path/to/your/audio.wav"
+        result = await service.recognize_file(
+            audio_path,
+            result_callback=optimized_callback
+        )
+        
+        # 获取最终状态
+        final_status = processor.get_current_result()
+        logger.info(f"识别完成: {final_status}")
+        
+    finally:
+        await service.close()
+
+
+async def example_simple_text_extraction():
+    """简单文本提取示例"""
+    logger.info("=== 简单文本提取示例 ===")
+    
+    # 使用便捷函数创建只处理文本的回调
+    def print_text(text: str):
+        print(f">> {text}")
+    
+    text_callback = create_text_only_callback(
+        user_callback=print_text,
+        enable_streaming_log=False
+    )
+    
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    
+    try:
+        audio_path = "path/to/your/audio.wav"
+        await service.recognize_file(
+            audio_path,
+            result_callback=text_callback
+        )
+    finally:
+        await service.close()
+
+
+async def example_manual_text_extraction():
+    """手动文本提取示例"""
+    logger.info("=== 手动文本提取示例 ===")
+    
+    def manual_callback(result: Dict[str, Any]):
+        """手动提取文本的回调函数"""
+        # 使用便捷函数提取文本
+        text = extract_text_only(result)
+        if text:
+            print(f"提取的文本: {text}")
+            
+            # 检查是否为最终结果
+            is_final = result.get('is_last_package', False)
+            if is_final:
+                print(f"[最终结果] {text}")
+    
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    
+    try:
+        audio_path = "path/to/your/audio.wav"
+        await service.recognize_file(
+            audio_path,
+            result_callback=manual_callback
+        )
+    finally:
+        await service.close()
+
+
+async def example_streaming_with_websocket():
+    """流式识别结合WebSocket示例"""
+    logger.info("=== 流式识别WebSocket示例 ===")
+    
+    # 模拟WebSocket连接
+    class MockWebSocket:
+        def __init__(self):
+            self.messages = []
+        
+        async def send_text(self, text: str):
+            self.messages.append(text)
+            print(f"WebSocket发送: {text}")
+    
+    websocket = MockWebSocket()
+    
+    # 创建处理器
+    processor = DoubaoResultProcessor(
+        text_only=True,
+        enable_streaming_log=False
+    )
+    
+    async def websocket_handler(text: str):
+        """WebSocket文本处理器"""
+        await websocket.send_text(text)
+    
+    # 创建回调
+    callback = processor.create_optimized_callback(
+        lambda text: asyncio.create_task(websocket_handler(text))
+    )
+    
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    
+    try:
+        audio_path = "path/to/your/audio.wav"
+        await service.recognize_file(
+            audio_path,
+            result_callback=callback
+        )
+        
+        print(f"WebSocket消息历史: {websocket.messages}")
+        
+    finally:
+        await service.close()
+
+
+async def example_comparison():
+    """对比示例：原始输出 vs 优化输出"""
+    logger.info("=== 对比示例 ===")
+    
+    print("\n1. 原始完整输出:")
+    def original_callback(result: Dict[str, Any]):
+        print(f"完整结果: {result}")
+    
+    print("\n2. 优化文本输出:")
+    def optimized_callback(text: str):
+        print(f"文本: {text}")
+    
+    # 这里只是演示，实际使用时选择其中一种即可
+    text_callback = create_text_only_callback(optimized_callback)
+    
+    service = create_asr_service(
+        app_key="your_app_key",
+        access_key="your_access_key"
+    )
+    
+    try:
+        audio_path = "path/to/your/audio.wav"
+        
+        # 使用优化回调
+        await service.recognize_file(
+            audio_path,
+            result_callback=text_callback
+        )
+        
+    finally:
+        await service.close()
+
+
+def run_example(example_name: str = "optimized"):
+    """运行指定示例"""
+    examples = {
+        "optimized": example_optimized_streaming,
+        "simple": example_simple_text_extraction,
+        "manual": example_manual_text_extraction,
+        "websocket": example_streaming_with_websocket,
+        "comparison": example_comparison
+    }
+    
+    if example_name not in examples:
+        print(f"可用示例: {list(examples.keys())}")
+        return
+    
+    asyncio.run(examples[example_name]())
+
+
+if __name__ == "__main__":
+    # 运行优化示例
+    run_example("optimized")
+    
+    # 或者运行其他示例
+    # run_example("simple")
+    # run_example("manual")
+    # run_example("websocket")
+    # run_example("comparison")
--- a/asr/doubao/protocol.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/protocol.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包语音识别WebSocket协议处理模块
+实现二进制协议的编解码、消息类型定义和数据包处理
+"""
+
+import gzip
+import json
+from typing import Dict, Any, Tuple, Optional
+
+
+# 协议版本和头部大小
+PROTOCOL_VERSION = 0b0001
+DEFAULT_HEADER_SIZE = 0b0001
+
+# 消息类型定义
+class MessageType:
+    FULL_CLIENT_REQUEST = 0b0001
+    AUDIO_ONLY_REQUEST = 0b0010
+    FULL_SERVER_RESPONSE = 0b1001
+    SERVER_ACK = 0b1011
+    SERVER_ERROR_RESPONSE = 0b1111
+
+# 消息类型特定标志
+class MessageFlags:
+    NO_SEQUENCE = 0b0000
+    POS_SEQUENCE = 0b0001
+    NEG_SEQUENCE = 0b0010
+    NEG_WITH_SEQUENCE = 0b0011
+
+# 序列化方法
+class SerializationMethod:
+    NO_SERIALIZATION = 0b0000
+    JSON = 0b0001
+
+# 压缩方法
+class CompressionType:
+    NO_COMPRESSION = 0b0000
+    GZIP = 0b0001
+
+
+class DoubaoProtocol:
+    """豆包ASR WebSocket协议处理器"""
+    
+    @staticmethod
+    def generate_header(
+        message_type: int = MessageType.FULL_CLIENT_REQUEST,
+        message_type_specific_flags: int = MessageFlags.NO_SEQUENCE,
+        serial_method: int = SerializationMethod.JSON,
+        compression_type: int = CompressionType.GZIP,
+        reserved_data: int = 0x00
+    ) -> bytearray:
+        """
+        生成协议头部
+        
+        Args:
+            message_type: 消息类型
+            message_type_specific_flags: 消息类型特定标志
+            serial_method: 序列化方法
+            compression_type: 压缩类型
+            reserved_data: 保留字段
+            
+        Returns:
+            bytearray: 4字节协议头部
+        """
+        header = bytearray()
+        header_size = 1
+        header.append((PROTOCOL_VERSION << 4) | header_size)
+        header.append((message_type << 4) | message_type_specific_flags)
+        header.append((serial_method << 4) | compression_type)
+        header.append(reserved_data)
+        return header
+    
+    @staticmethod
+    def generate_sequence_payload(sequence: int) -> bytearray:
+        """
+        生成序列号载荷
+        
+        Args:
+            sequence: 序列号
+            
+        Returns:
+            bytearray: 4字节序列号数据
+        """
+        payload = bytearray()
+        payload.extend(sequence.to_bytes(4, 'big', signed=True))
+        return payload
+    
+    @staticmethod
+    def parse_response(response_data: bytes) -> Dict[str, Any]:
+        """
+        解析服务器响应数据
+        
+        Args:
+            response_data: 服务器响应的二进制数据
+            
+        Returns:
+            Dict: 解析后的响应数据
+        """
+        if len(response_data) < 4:
+            raise ValueError("响应数据长度不足")
+        
+        # 解析头部
+        protocol_version = response_data[0] >> 4
+        header_size = response_data[0] & 0x0f
+        message_type = response_data[1] >> 4
+        message_type_specific_flags = response_data[1] & 0x0f
+        serialization_method = response_data[2] >> 4
+        message_compression = response_data[2] & 0x0f
+        reserved = response_data[3]
+        
+        # 解析扩展头部和载荷
+        header_extensions = response_data[4:header_size * 4]
+        payload = response_data[header_size * 4:]
+        
+        result = {
+            'protocol_version': protocol_version,
+            'header_size': header_size,
+            'message_type': message_type,
+            'message_type_specific_flags': message_type_specific_flags,
+            'serialization_method': serialization_method,
+            'message_compression': message_compression,
+            'is_last_package': False,
+            'payload_msg': None,
+            'payload_size': 0
+        }
+        
+        # 处理序列号
+        if message_type_specific_flags & 0x01:
+            if len(payload) >= 4:
+                seq = int.from_bytes(payload[:4], "big", signed=True)
+                result['payload_sequence'] = seq
+                payload = payload[4:]
+        
+        # 检查是否为最后一包
+        if message_type_specific_flags & 0x02:
+            result['is_last_package'] = True
+        
+        # 根据消息类型解析载荷
+        payload_msg = None
+        payload_size = 0
+        
+        if message_type == MessageType.FULL_SERVER_RESPONSE:
+            if len(payload) >= 4:
+                payload_size = int.from_bytes(payload[:4], "big", signed=True)
+                payload_msg = payload[4:]
+        elif message_type == MessageType.SERVER_ACK:
+            if len(payload) >= 4:
+                seq = int.from_bytes(payload[:4], "big", signed=True)
+                result['seq'] = seq
+                if len(payload) >= 8:
+                    payload_size = int.from_bytes(payload[4:8], "big", signed=False)
+                    payload_msg = payload[8:]
+        elif message_type == MessageType.SERVER_ERROR_RESPONSE:
+            if len(payload) >= 8:
+                code = int.from_bytes(payload[:4], "big", signed=False)
+                result['code'] = code
+                payload_size = int.from_bytes(payload[4:8], "big", signed=False)
+                payload_msg = payload[8:]
+        
+        # 解压缩和反序列化载荷
+        if payload_msg is not None:
+            if message_compression == CompressionType.GZIP:
+                try:
+                    payload_msg = gzip.decompress(payload_msg)
+                except Exception as e:
+                    result['decompress_error'] = str(e)
+                    return result
+            
+            if serialization_method == SerializationMethod.JSON:
+                try:
+                    payload_msg = json.loads(payload_msg.decode('utf-8'))
+                except Exception as e:
+                    result['json_parse_error'] = str(e)
+                    return result
+            elif serialization_method != SerializationMethod.NO_SERIALIZATION:
+                payload_msg = payload_msg.decode('utf-8')
+        
+        result['payload_msg'] = payload_msg
+        result['payload_size'] = payload_size
+        return result
+    
+    @staticmethod
+    def build_full_request(
+        request_params: Dict[str, Any],
+        sequence: int = 1,
+        compression: bool = True
+    ) -> bytearray:
+        """
+        构建完整客户端请求
+        
+        Args:
+            request_params: 请求参数字典
+            sequence: 序列号
+            compression: 是否启用压缩
+            
+        Returns:
+            bytearray: 完整的请求数据包
+        """
+        # 序列化请求参数
+        payload_bytes = json.dumps(request_params).encode('utf-8')
+        
+        # 压缩载荷
+        compression_type = CompressionType.GZIP if compression else CompressionType.NO_COMPRESSION
+        if compression:
+            payload_bytes = gzip.compress(payload_bytes)
+        
+        # 生成头部
+        header = DoubaoProtocol.generate_header(
+            message_type=MessageType.FULL_CLIENT_REQUEST,
+            message_type_specific_flags=MessageFlags.POS_SEQUENCE,
+            compression_type=compression_type
+        )
+        
+        # 构建完整请求
+        request = bytearray(header)
+        request.extend(DoubaoProtocol.generate_sequence_payload(sequence))
+        request.extend(len(payload_bytes).to_bytes(4, 'big'))
+        request.extend(payload_bytes)
+        
+        return request
+    
+    @staticmethod
+    def build_audio_request(
+        audio_data: bytes,
+        sequence: int,
+        is_last: bool = False,
+        compression: bool = True
+    ) -> bytearray:
+        """
+        构建音频数据请求
+        
+        Args:
+            audio_data: 音频数据
+            sequence: 序列号
+            is_last: 是否为最后一包
+            compression: 是否启用压缩
+            
+        Returns:
+            bytearray: 音频请求数据包
+        """
+        # 压缩音频数据
+        compression_type = CompressionType.GZIP if compression else CompressionType.NO_COMPRESSION
+        payload_bytes = gzip.compress(audio_data) if compression else audio_data
+        
+        # 确定消息标志
+        if is_last:
+            flags = MessageFlags.NEG_WITH_SEQUENCE
+            sequence = -abs(sequence)
+        else:
+            flags = MessageFlags.POS_SEQUENCE
+        
+        # 生成头部
+        header = DoubaoProtocol.generate_header(
+            message_type=MessageType.AUDIO_ONLY_REQUEST,
+            message_type_specific_flags=flags,
+            compression_type=compression_type
+        )
+        
+        # 构建音频请求
+        request = bytearray(header)
+        request.extend(DoubaoProtocol.generate_sequence_payload(sequence))
+        request.extend(len(payload_bytes).to_bytes(4, 'big'))
+        request.extend(payload_bytes)
+        
+        return request
--- a/asr/doubao/result_processor.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/result_processor.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-17 13:58:00
+"""
+豆包ASR识别结果处理器
+专门处理豆包ASR流式识别结果，提取关键信息并优化日志输出
+"""
+
+import logging
+from typing import Dict, Any, Optional, Callable
+from dataclasses import dataclass
+from datetime import datetime
+
+
+@dataclass
+class ASRResult:
+    """ASR识别结果数据类"""
+    text: str
+    is_final: bool
+    confidence: float = 0.0
+    timestamp: datetime = None
+    sequence: int = 0
+    
+    def __post_init__(self):
+        if self.timestamp is None:
+            self.timestamp = datetime.now()
+
+
+class DoubaoResultProcessor:
+    """豆包ASR结果处理器"""
+    
+    def __init__(self, 
+                 text_only: bool = True,
+                 log_level: str = 'INFO',
+                 enable_streaming_log: bool = False):
+        """
+        初始化结果处理器
+        
+        Args:
+            text_only: 是否只输出文本内容
+            log_level: 日志级别
+            enable_streaming_log: 是否启用流式日志（会频繁输出中间结果）
+        """
+        self.text_only = text_only
+        self.enable_streaming_log = enable_streaming_log
+        self.logger = self._setup_logger(log_level)
+        
+        # 流式结果管理
+        self.current_text = ""
+        self.last_sequence = 0
+        self.result_count = 0
+        
+    def _setup_logger(self, log_level: str) -> logging.Logger:
+        """设置日志记录器"""
+        logger = logging.getLogger(f"DoubaoResultProcessor_{id(self)}")
+        logger.setLevel(getattr(logging, log_level.upper()))
+        
+        if not logger.handlers:
+            handler = logging.StreamHandler()
+            formatter = logging.Formatter(
+                '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+            )
+            handler.setFormatter(formatter)
+            logger.addHandler(handler)
+            
+        return logger
+    
+    def extract_text_from_result(self, result: Dict[str, Any]) -> Optional[ASRResult]:
+        """
+        从豆包ASR完整结果中提取文本信息
+        
+        Args:
+            result: 豆包ASR返回的完整结果字典
+            
+        Returns:
+            ASRResult: 提取的结果对象，如果无有效文本则返回None
+        """
+        try:
+            # 检查是否有payload_msg
+            payload_msg = result.get('payload_msg')
+            if not payload_msg:
+                return None
+            
+            # 提取result字段
+            result_data = payload_msg.get('result')
+            if not result_data:
+                return None
+            
+            # 提取文本
+            text = result_data.get('text', '').strip()
+            if not text:
+                return None
+            
+            # 提取其他信息
+            is_final = result.get('is_last_package', False)
+            confidence = result_data.get('confidence', 0.0)
+            sequence = result.get('payload_sequence', 0)
+            
+            return ASRResult(
+                text=text,
+                is_final=is_final,
+                confidence=confidence,
+                sequence=sequence
+            )
+            
+        except Exception as e:
+            self.logger.error(f"提取ASR结果文本失败: {e}")
+            return None
+    
+    def process_streaming_result(self, result: Dict[str, Any]) -> Optional[str]:
+        """
+        处理流式识别结果
+        
+        Args:
+            result: 豆包ASR返回的完整结果字典
+            
+        Returns:
+            str: 当前识别文本，如果无变化则返回None
+        """
+        asr_result = self.extract_text_from_result(result)
+        if not asr_result:
+            return None
+        
+        self.result_count += 1
+        
+        # 流式结果：后一次覆盖前一次
+        previous_text = self.current_text
+        self.current_text = asr_result.text
+        self.last_sequence = asr_result.sequence
+        
+        # 根据配置决定是否记录日志
+        if asr_result.is_final:
+            self.logger.info(f"[最终结果] {asr_result.text}")
+        elif self.enable_streaming_log:
+            self.logger.debug(f"[流式更新 #{self.result_count}] {asr_result.text}")
+        
+        # 返回文本（如果与上次不同）
+        return asr_result.text if asr_result.text != previous_text else None
+    
+    def create_optimized_callback(self, 
+                                  user_callback: Optional[Callable[[str], None]] = None) -> Callable[[Dict[str, Any]], None]:
+        """
+        创建优化的回调函数
+        
+        Args:
+            user_callback: 用户自定义回调函数，接收文本参数
+            
+        Returns:
+            Callable: 优化后的回调函数
+        """
+        def optimized_callback(result: Dict[str, Any]):
+            """优化的回调函数，只处理文本内容"""
+            try:
+                # 处理流式结果
+                text = self.process_streaming_result(result)
+                
+                # 如果有文本变化且用户提供了回调函数
+                if text and user_callback:
+                    user_callback(text)
+                    
+            except Exception as e:
+                self.logger.error(f"处理ASR回调失败: {e}")
+        
+        return optimized_callback
+    
+    def get_current_result(self) -> Dict[str, Any]:
+        """
+        获取当前识别状态
+        
+        Returns:
+            Dict: 当前状态信息
+        """
+        return {
+            'current_text': self.current_text,
+            'last_sequence': self.last_sequence,
+            'result_count': self.result_count,
+            'text_length': len(self.current_text)
+        }
+    
+    def reset(self):
+        """重置处理器状态"""
+        self.current_text = ""
+        self.last_sequence = 0
+        self.result_count = 0
+        self.logger.info("结果处理器已重置")
+
+
+# 便捷函数
+def create_text_only_callback(user_callback: Optional[Callable[[str], None]] = None,
+                              enable_streaming_log: bool = False) -> Callable[[Dict[str, Any]], None]:
+    """
+    创建只处理文本的回调函数
+    
+    Args:
+        user_callback: 用户回调函数
+        enable_streaming_log: 是否启用流式日志
+        
+    Returns:
+        Callable: 优化的回调函数
+    """
+    processor = DoubaoResultProcessor(
+        text_only=True,
+        enable_streaming_log=enable_streaming_log
+    )
+    return processor.create_optimized_callback(user_callback)
+
+
+def extract_text_only(result: Dict[str, Any]) -> Optional[str]:
+    """
+    从豆包ASR结果中只提取文本
+    
+    Args:
+        result: 豆包ASR完整结果
+        
+    Returns:
+        str: 提取的文本，如果无文本则返回None
+    """
+    try:
+        return result.get('payload_msg', {}).get('result', {}).get('text', '').strip() or None
+    except Exception:
+        return None
--- a/asr/doubao/service_factory.py 0 → 100644
View file @f274f8c
+++ b/asr/doubao/service_factory.py 0 → 100644
View file @f274f8c
+# AIfeng/2025-07-11 13:36:00
+"""
+豆包ASR服务工厂模块
+提供简化的API接口和服务实例管理
+"""
+
+import asyncio
+from pathlib import Path
+from typing import Dict, Any, Optional, Callable, Union
+
+from .config_manager import ConfigManager
+from .asr_client import DoubaoASRClient
+
+
+class DoubaoASRService:
+    """豆包ASR服务工厂"""
+    
+    _instances = {}
+    
+    def __init__(self, config: Union[str, Dict[str, Any], ConfigManager]):
+        """
+        初始化ASR服务
+        
+        Args:
+            config: 配置文件路径、配置字典或配置管理器实例
+        """
+        if isinstance(config, str):
+            self.config_manager = ConfigManager(config)
+        elif isinstance(config, dict):
+            self.config_manager = ConfigManager.from_dict(config)
+        elif isinstance(config, ConfigManager):
+            self.config_manager = config
+        else:
+            raise ValueError("配置参数类型错误")
+        
+        self.client = DoubaoASRClient(self.config_manager.get_config())
+    
+    async def recognize_file(
+        self,
+        audio_path: str,
+        streaming: bool = True,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+        **kwargs
+    ) -> Dict[str, Any]:
+        """
+        识别音频文件
+        
+        Args:
+            audio_path: 音频文件路径
+            streaming: 是否使用流式识别
+            result_callback: 结果回调函数
+            **kwargs: 其他参数
+            
+        Returns:
+            Dict: 识别结果
+        """
+        return await self.client.recognize_file(
+            audio_path,
+            streaming=streaming,
+            result_callback=result_callback,
+            **kwargs
+        )
+    
+    async def recognize_audio_data(
+        self,
+        audio_data: bytes,
+        streaming: bool = True,
+        result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+        **kwargs
+    ) -> Dict[str, Any]:
+        """
+        识别音频数据
+        
+        Args:
+            audio_data: 音频数据
+            streaming: 是否使用流式识别
+            result_callback: 结果回调函数
+            **kwargs: 其他参数
+            
+        Returns:
+            Dict: 识别结果
+        """
+        return await self.client.recognize_audio_data(
+            audio_data,
+            streaming=streaming,
+            result_callback=result_callback,
+            **kwargs
+        )
+    
+    def get_status(self) -> Dict[str, Any]:
+        """
+        获取服务状态
+        
+        Returns:
+            Dict: 服务状态
+        """
+        return self.client.get_status()
+    
+    async def close(self):
+        """关闭服务"""
+        await self.client.close()
+    
+    @classmethod
+    def create_service(
+        cls,
+        config: Union[str, Dict[str, Any], ConfigManager],
+        instance_name: str = 'default'
+    ) -> 'DoubaoASRService':
+        """
+        创建或获取服务实例
+        
+        Args:
+            config: 配置
+            instance_name: 实例名称
+            
+        Returns:
+            DoubaoASRService: 服务实例
+        """
+        if instance_name not in cls._instances:
+            cls._instances[instance_name] = cls(config)
+        return cls._instances[instance_name]
+    
+    @classmethod
+    def get_service(cls, instance_name: str = 'default') -> Optional['DoubaoASRService']:
+        """
+        获取已创建的服务实例
+        
+        Args:
+            instance_name: 实例名称
+            
+        Returns:
+            DoubaoASRService: 服务实例或None
+        """
+        return cls._instances.get(instance_name)
+    
+    @classmethod
+    async def close_all_services(cls):
+        """关闭所有服务实例"""
+        for service in cls._instances.values():
+            await service.close()
+        cls._instances.clear()
+
+
+# 便捷函数
+def create_asr_service(
+    config_path: Optional[str] = None,
+    app_key: Optional[str] = None,
+    access_key: Optional[str] = None,
+    **kwargs
+) -> DoubaoASRService:
+    """
+    创建ASR服务的便捷函数
+    
+    Args:
+        config_path: 配置文件路径
+        app_key: 应用密钥
+        access_key: 访问密钥
+        **kwargs: 其他配置参数
+        
+    Returns:
+        DoubaoASRService: ASR服务实例
+    """
+    if config_path:
+        return DoubaoASRService(config_path)
+    
+    # 从参数构建配置
+    config = {
+        'auth_config': {
+            'app_key': app_key or '',
+            'access_key': access_key or ''
+        }
+    }
+    
+    # 添加其他配置参数
+    if kwargs:
+        if 'asr_config' not in config:
+            config['asr_config'] = {}
+        if 'audio_config' not in config:
+            config['audio_config'] = {}
+        if 'connection_config' not in config:
+            config['connection_config'] = {}
+        if 'logging_config' not in config:
+            config['logging_config'] = {}
+        
+        # 映射常用参数
+        param_mapping = {
+            'streaming': ('asr_config', 'streaming_mode'),
+            'seg_duration': ('asr_config', 'seg_duration'),
+            'model_name': ('asr_config', 'model_name'),
+            'enable_punc': ('asr_config', 'enable_punc'),
+            'sample_rate': ('audio_config', 'default_rate'),
+            'audio_format': ('audio_config', 'default_format'),
+            'timeout': ('connection_config', 'timeout'),
+            'debug': ('logging_config', 'enable_debug')
+        }
+        
+        for param, (section, key) in param_mapping.items():
+            if param in kwargs:
+                config[section][key] = kwargs[param]
+    
+    return DoubaoASRService(config)
+
+
+async def recognize_file(
+    audio_path: str,
+    config_path: Optional[str] = None,
+    app_key: Optional[str] = None,
+    access_key: Optional[str] = None,
+    streaming: bool = True,
+    result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+    **kwargs
+) -> Dict[str, Any]:
+    """
+    识别音频文件的便捷函数
+    
+    Args:
+        audio_path: 音频文件路径
+        config_path: 配置文件路径
+        app_key: 应用密钥
+        access_key: 访问密钥
+        streaming: 是否使用流式识别
+        result_callback: 结果回调函数
+        **kwargs: 其他参数
+        
+    Returns:
+        Dict: 识别结果
+    """
+    service = create_asr_service(
+        config_path=config_path,
+        app_key=app_key,
+        access_key=access_key,
+        **kwargs
+    )
+    
+    try:
+        return await service.recognize_file(
+            audio_path,
+            streaming=streaming,
+            result_callback=result_callback
+        )
+    finally:
+        await service.close()
+
+
+async def recognize_audio_data(
+    audio_data: bytes,
+    config_path: Optional[str] = None,
+    app_key: Optional[str] = None,
+    access_key: Optional[str] = None,
+    streaming: bool = True,
+    result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+    **kwargs
+) -> Dict[str, Any]:
+    """
+    识别音频数据的便捷函数
+    
+    Args:
+        audio_data: 音频数据
+        config_path: 配置文件路径
+        app_key: 应用密钥
+        access_key: 访问密钥
+        streaming: 是否使用流式识别
+        result_callback: 结果回调函数
+        **kwargs: 其他参数
+        
+    Returns:
+        Dict: 识别结果
+    """
+    service = create_asr_service(
+        config_path=config_path,
+        app_key=app_key,
+        access_key=access_key,
+        **kwargs
+    )
+    
+    try:
+        return await service.recognize_audio_data(
+            audio_data,
+            streaming=streaming,
+            result_callback=result_callback
+        )
+    finally:
+        await service.close()
+
+
+def run_recognition(
+    audio_path: str,
+    config_path: Optional[str] = None,
+    app_key: Optional[str] = None,
+    access_key: Optional[str] = None,
+    streaming: bool = True,
+    result_callback: Optional[Callable[[Dict[str, Any]], None]] = None,
+    **kwargs
+) -> Dict[str, Any]:
+    """
+    同步方式识别音频文件
+    
+    Args:
+        audio_path: 音频文件路径
+        config_path: 配置文件路径
+        app_key: 应用密钥
+        access_key: 访问密钥
+        streaming: 是否使用流式识别
+        result_callback: 结果回调函数
+        **kwargs: 其他参数
+        
+    Returns:
+        Dict: 识别结果
+    """
+    return asyncio.run(
+        recognize_file(
+            audio_path,
+            config_path=config_path,
+            app_key=app_key,
+            access_key=access_key,
+            streaming=streaming,
+            result_callback=result_callback,
+            **kwargs
+        )
+    )
--- a/doc/process/update.log
View file @f274f8c
+++ b/doc/process/update.log
View file @f274f8c
--- a/funasr_asr_sync.py 0 → 100644
View file @f274f8c
+++ b/funasr_asr_sync.py 0 → 100644
View file @f274f8c
+# -*- coding: utf-8 -*-
+"""
+AIfeng/2025-01-02 10:27:06
+FunASR语音识别模块 - 同步版本
+基于eman-Fay-main-copy项目的同步实现模式
+"""
+
+from threading import Thread
+import websocket
+import json
+import time
+import ssl
+import _thread as thread
+import os
+import asyncio
+import threading
+
+from core import get_web_instance, get_instance
+from utils import config_util as cfg
+from utils import util
+
+class FunASRSync:
+    """FunASR同步客户端 - 基于参考项目实现"""
+    
+    def __init__(self, username):
+        self.__URL = "ws://{}:{}".format(cfg.local_asr_ip, cfg.local_asr_port)
+        self.__ws = None
+        self.__connected = False
+        self.__frames = []
+        self.__state = 0
+        self.__closing = False
+        self.__task_id = ''
+        self.done = False
+        self.finalResults = ""
+        self.__reconnect_delay = 1
+        self.__reconnecting = False
+        self.username = username
+        self.started = True
+        self.__result_callback = None  # 添加结果回调
+        
+        util.log(1, f"FunASR同步客户端初始化完成，用户: {username}")
+    
+    def on_message(self, ws, message):
+        """收到websocket消息的处理"""
+        try:
+            util.log(1, f"收到FunASR消息: {message}")
+            
+            # 尝试解析JSON消息以区分状态消息和识别结果
+            try:
+                import json
+                parsed_message = json.loads(message)
+                
+                # 检查是否为状态消息（如分块准备消息）
+                if isinstance(parsed_message, dict) and 'status' in parsed_message:
+                    status = parsed_message.get('status')
+                    if status == 'ready':
+                        util.log(1, f"收到分块准备状态: {parsed_message.get('message', '')}")
+                        return  # 状态消息不触发回调
+                    elif status in ['processing', 'chunk_received']:
+                        util.log(1, f"收到处理状态: {status}")
+                        return  # 处理状态消息不触发回调
+                    elif status == 'error':
+                        util.log(3, f"收到错误状态: {parsed_message.get('message', '')}")
+                        return
+                
+                # 如果是字典但不是状态消息，可能是结构化的识别结果
+                if isinstance(parsed_message, dict) and 'text' in parsed_message:
+                    # 结构化识别结果
+                    recognition_text = parsed_message.get('text', '')
+                    if recognition_text.strip():  # 只有非空结果才处理
+                        self.done = True
+                        self.finalResults = recognition_text
+                        util.log(1, f"收到结构化识别结果: {recognition_text}")
+                        self._trigger_result_callback()
+                    return
+                    
+            except json.JSONDecodeError:
+                # 不是JSON格式，可能是纯文本识别结果
+                pass
+            
+            # 处理纯文本识别结果
+            if isinstance(message, str) and message.strip():
+                # 过滤掉明显的状态消息
+                if any(keyword in message.lower() for keyword in ['status', 'ready', '准备接收', 'processing', 'chunk']):
+                    util.log(1, f"跳过状态消息: {message}")
+                    return
+                
+                # 这是真正的识别结果
+                self.done = True
+                self.finalResults = message
+                util.log(1, f"收到文本识别结果: {message}")
+                self._trigger_result_callback()
+                
+        except Exception as e:
+            util.log(3, f"处理识别结果时出错: {e}")
+
+        if self.__closing:
+            try:
+                self.__ws.close()
+            except Exception as e:
+                util.log(2, f"关闭WebSocket时出错: {e}")
+    
+    def _trigger_result_callback(self):
+        """触发结果回调函数"""
+        if self.__result_callback:
+            try:
+                # 创建chat_message直接推送
+                chat_message = {
+                    "type":"chat_message",
+                    "sender":"回音",
+                    "text": self.finalResults,
+                    "Username": self.username,
+                    "model_info":"Funasr"
+                }
+                
+                self.__result_callback(chat_message)
+                util.log(1, f"已触发结果回调: {self.finalResults}")
+            except Exception as e:
+                util.log(3, f"调用结果回调时出错: {e}")
+            
+            # 发送到Web客户端（改进的异步调用方式）
+            # try:
+            #     # 先检查WSA服务是否已初始化
+            #     web_instance = get_web_instance()
+            #     if web_instance and web_instance.is_connected(self.username):
+            #         # 创建chat_message直接推送
+            #         chat_message = {
+            #             "type":"chat_message",
+            #             "sender":"回音",
+            #             "content": self.finalResults,
+            #             "Username": self.username,
+            #             "model_info":"Funasr"
+            #         }
+            #         # 方案1: 使用add_cmd推送wsa_command类型数据
+            #         # web_instance.add_cmd(chat_message)
+                    
+            #         util.log(1, f"FunASR识别结果已推送到Web客户端[{self.username}]: {self.finalResults}")
+            #     else:
+            #         util.log(2, f"用户{self.username}未连接到Web客户端，跳过推送")
+            # except RuntimeError as e:
+            #     # WSA服务未初始化，这是正常情况（服务启动顺序问题）
+            #     util.log(2, f"WSA服务未初始化，跳过Web客户端通知: {e}")
+            # except Exception as e:
+            #     util.log(3, f"发送到Web客户端时出错: {e}")
+            
+            # Human客户端通知改为日志记录（避免重复通知当前服务）
+            # util.log(1, f"FunASR识别结果[{self.username}]: {self.finalResults}")
+
+        if self.__closing:
+            try:
+                self.__ws.close()
+            except Exception as e:
+                util.log(2, f"关闭WebSocket时出错: {e}")
+
+    def on_close(self, ws, code, msg):
+        """收到websocket关闭的处理"""
+        self.__connected = False
+        util.log(2, f"FunASR连接关闭: {msg}")
+        self.__ws = None
+
+    def on_error(self, ws, error):
+        """收到websocket错误的处理"""
+        self.__connected = False
+        util.log(3, f"FunASR连接错误: {error}")
+        self.__ws = None
+
+    def __attempt_reconnect(self):
+        """重连机制"""
+        if not self.__reconnecting:
+            self.__reconnecting = True
+            util.log(1, "尝试重连FunASR...")
+            while not self.__connected:
+                time.sleep(self.__reconnect_delay)
+                self.start()
+                self.__reconnect_delay *= 2  
+            self.__reconnect_delay = 1  
+            self.__reconnecting = False
+
+    def on_open(self, ws):
+        """收到websocket连接建立的处理"""
+        self.__connected = True
+        util.log(1, "FunASR WebSocket连接建立")
+
+        def run(*args):
+            while self.__connected:
+                try:
+                    if len(self.__frames) > 0:
+                        frame = self.__frames[0]
+                        self.__frames.pop(0)
+                        
+                        if type(frame) == dict:
+                            ws.send(json.dumps(frame))
+                        elif type(frame) == bytes:
+                            ws.send(frame, websocket.ABNF.OPCODE_BINARY)
+                            
+                except Exception as e:
+                    util.log(3, f"发送帧数据时出错: {e}")
+                # 优化发送间隔，从0.04秒减少到0.02秒提高效率
+                time.sleep(0.02)
+
+        thread.start_new_thread(run, ())
+
+    def __connect(self):
+        """建立WebSocket连接"""
+        self.finalResults = ""
+        self.done = False
+        self.__frames.clear()
+        websocket.enableTrace(False)
+        
+        self.__ws = websocket.WebSocketApp(
+            self.__URL, 
+            on_message=self.on_message,
+            on_close=self.on_close,
+            on_error=self.on_error
+        )
+        self.__ws.on_open = self.on_open
+        self.__ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})
+
+    def add_frame(self, frame):
+        """添加帧到发送队列"""
+        self.__frames.append(frame)
+
+    def send(self, buf):
+        """发送音频数据"""
+        self.__frames.append(buf)
+
+    def send_url(self, url):
+        """发送音频文件URL"""
+        # 确保使用绝对路径，相对路径对funasr服务无效
+        absolute_url = os.path.abspath(url)
+        frame = {'url': absolute_url}
+        if self.__ws and self.__connected:
+            util.log(1, f"发送音频文件URL到FunASR: {absolute_url}")
+            self.__ws.send(json.dumps(frame))
+            util.log(1, f"音频文件URL已发送: {frame}")
+        else:
+            util.log(2, f"WebSocket未连接，无法发送URL: {absolute_url}")
+    
+    def send_audio_data(self, audio_bytes, filename="audio.wav"):
+        """发送音频数据（支持大文件分块）"""
+        import base64
+        import math
+        
+        try:
+            # 确保audio_bytes是bytes类型，避免memoryview缓冲区问题
+            if hasattr(audio_bytes, 'tobytes'):
+                audio_bytes = bytes(audio_bytes.tobytes())  # Fix BufferError: memoryview has 1 exported buffer
+            elif isinstance(audio_bytes, memoryview):
+                audio_bytes = bytes(audio_bytes)
+            
+            total_size = len(audio_bytes)
+            
+            # 大文件阈值：1MB，超过则使用分块发送
+            large_file_threshold = 512 * 1024  # aiohttp限制默认1M，但再处理base64,会增加33%
+            
+            if total_size > large_file_threshold:
+                util.log(1, f"检测到大文件({total_size} bytes)，使用分块发送模式")
+                return self._send_audio_data_chunked(audio_bytes, filename)
+            else:
+                # 小文件使用原有方式
+                return self._send_audio_data_simple(audio_bytes, filename)
+                
+        except Exception as e:
+            util.log(3, f"发送音频数据时出错: {e}")
+            return False
+    
+    def _send_audio_data_simple(self, audio_bytes, filename):
+        """简单发送模式（小文件）"""
+        import base64
+        
+        try:
+            # 将音频字节数据编码为Base64
+            audio_data_b64 = base64.b64encode(audio_bytes).decode('utf-8')
+            
+            # 构造发送格式，与funasr服务的process_audio_data函数兼容
+            frame = {
+                'audio_data': audio_data_b64,
+                'filename': filename
+            }
+            
+            if self.__ws and self.__connected:
+                util.log(1, f"发送音频数据到FunASR: {filename}, 大小: {len(audio_bytes)} bytes")
+                success = self._send_frame_with_retry(frame)
+                if success:
+                    util.log(1, f"音频数据已发送: {filename}")
+                    return True
+                else:
+                    util.log(3, f"音频数据发送失败: {filename}")
+                    return False
+            else:
+                util.log(2, f"WebSocket未连接，无法发送音频数据: {filename}")
+                return False
+                
+        except Exception as e:
+            util.log(3, f"简单发送音频数据时出错: {e}")
+            return False
+    
+    def _send_audio_data_chunked(self, audio_bytes, filename, chunk_size=512*1024):
+        """分块发送音频数据（大文件）"""
+        import base64
+        import math
+        
+        try:
+            total_size = len(audio_bytes)
+            total_chunks = math.ceil(total_size / chunk_size)
+            
+            util.log(1, f"开始分块发送: {filename}, 总大小: {total_size} bytes, 分块数: {total_chunks}")
+            
+            # 发送开始信号
+            start_frame = {
+                'type': 'audio_start',
+                'filename': filename,
+                'total_size': total_size,
+                'total_chunks': total_chunks,
+                'chunk_size': chunk_size
+            }
+            
+            if not self._send_frame_with_retry(start_frame):
+                util.log(3, f"发送开始信号失败: {filename}")
+                return False
+            
+            # 分块发送
+            for i in range(total_chunks):
+                start_pos = i * chunk_size
+                end_pos = min(start_pos + chunk_size, total_size)
+                chunk_data = audio_bytes[start_pos:end_pos]
+                
+                # Base64编码分块
+                chunk_b64 = base64.b64encode(chunk_data).decode('utf-8')
+                
+                chunk_frame = {
+                    'type': 'audio_chunk',
+                    'filename': filename,
+                    'chunk_index': i,
+                    'chunk_data': chunk_b64,
+                    'is_last': (i == total_chunks - 1)
+                }
+                
+                # 发送分块并检查结果
+                success = self._send_frame_with_retry(chunk_frame)
+                if not success:
+                    util.log(3, f"分块 {i+1}/{total_chunks} 发送失败")
+                    return False
+                
+                # 进度日志
+                if (i + 1) % 10 == 0 or i == total_chunks - 1:
+                    progress = ((i + 1) / total_chunks) * 100
+                    util.log(1, f"发送进度: {progress:.1f}% ({i+1}/{total_chunks})")
+                
+                # 流控延迟
+                time.sleep(0.01)
+            
+            # 发送结束信号
+            end_frame = {
+                'type': 'audio_end',
+                'filename': filename
+            }
+            
+            if self._send_frame_with_retry(end_frame):
+                util.log(1, f"音频数据分块发送完成: {filename}")
+                return True
+            else:
+                util.log(3, f"发送结束信号失败: {filename}")
+                return False
+            
+        except Exception as e:
+            util.log(3, f"分块发送音频数据时出错: {e}")
+            return False
+    
+    def _send_frame_with_retry(self, frame, max_retries=3, timeout=10):
+        """带重试的帧发送"""
+        for attempt in range(max_retries):
+            try:
+                if self.__ws and self.__connected:
+                    # 设置发送超时
+                    start_time = time.time()
+                    self.__ws.send(json.dumps(frame))
+                    
+                    # 简单的发送确认检查
+                    time.sleep(0.05)  # 等待发送完成
+                    
+                    if time.time() - start_time < timeout:
+                        return True
+                    else:
+                        util.log(2, f"发送超时，尝试 {attempt + 1}/{max_retries}")
+                else:
+                    util.log(2, f"连接不可用，尝试 {attempt + 1}/{max_retries}")
+                    
+            except Exception as e:
+                util.log(2, f"发送失败，尝试 {attempt + 1}/{max_retries}: {e}")
+                
+            if attempt < max_retries - 1:
+                time.sleep(0.5 * (attempt + 1))  # 指数退避
+        
+        return False
+    
+    def set_result_callback(self, callback):
+        """设置结果回调函数"""
+        self.__result_callback = callback
+        util.log(1, f"已设置结果回调函数")
+    
+
+    def connect(self):
+        """连接到FunASR服务（同步版本）"""
+        try:
+            if not self.__connected:
+                self.start()  # 调用现有的start方法
+                
+                # 等待连接建立，最多等待30秒（针对大文件处理优化）
+                max_wait_time = 30.0
+                wait_interval = 0.1
+                waited_time = 0.0
+                
+                while not self.__connected and waited_time < max_wait_time:
+                    time.sleep(wait_interval)
+                    waited_time += wait_interval
+                    
+                    # 每5秒输出一次等待日志
+                    if waited_time % 5.0 < wait_interval:
+                        util.log(1, f"等待FunASR连接中... {waited_time:.1f}s/{max_wait_time}s")
+                
+                if self.__connected:
+                    util.log(1, f"FunASR连接成功，耗时: {waited_time:.2f}秒")
+                else:
+                    util.log(3, f"FunASR连接超时，等待了{waited_time:.2f}秒")
+                
+                return self.__connected
+            return True
+        except Exception as e:
+            util.log(3, f"连接FunASR服务时出错: {e}")
+            return False
+ 
+    def start(self):
+        """启动FunASR客户端"""
+        Thread(target=self.__connect, args=[]).start()
+        data = {
+            'vad_need': False,
+            'state': 'StartTranscription'
+        }
+        self.add_frame(data)
+        util.log(1, "FunASR客户端启动")
+
+    def is_connected(self):
+        """检查连接状态"""
+        return self.__connected
+    
+    def end(self):
+        """结束FunASR客户端"""
+        if self.__connected:
+            try:
+                # 发送剩余帧
+                for frame in self.__frames:
+                    self.__frames.pop(0)
+                    if type(frame) == dict:
+                        self.__ws.send(json.dumps(frame))
+                    elif type(frame) == bytes:
+                        self.__ws.send(frame, websocket.ABNF.OPCODE_BINARY)
+                        
+                self.__frames.clear()
+                
+                # 发送停止信号
+                frame = {'vad_need': False, 'state': 'StopTranscription'}
+                self.__ws.send(json.dumps(frame))
+                
+            except Exception as e:
+                util.log(3, f"结束FunASR时出错: {e}")
+                
+        self.__closing = True
+        self.__connected = False
+        util.log(1, "FunASR客户端结束")
+
+    
--- a/scheduler/__init__.py 0 → 100644
View file @f274f8c
+++ b/scheduler/__init__.py 0 → 100644
View file @f274f8c
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+AIfeng/2025-07-02 11:24:08
+Scheduler模块初始化文件
+"""
+
+from .thread_manager import MyThread
+
+__all__ = ['MyThread']
--- a/scheduler/thread_manager.py 0 → 100644
View file @f274f8c
+++ b/scheduler/thread_manager.py 0 → 100644
View file @f274f8c
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+AIfeng/2025-07-02 11:24:08
+线程管理器 - 提供增强的线程功能
+"""
+
+import threading
+import time
+import traceback
+from typing import Callable, Any, Optional
+from utils.util import log
+
+
+class MyThread(threading.Thread):
+    """增强的线程类，提供更好的错误处理和监控功能"""
+    
+    def __init__(self, target: Optional[Callable] = None, name: Optional[str] = None, 
+                 args: tuple = (), kwargs: dict = None, daemon: bool = True):
+        """
+        初始化线程
+        
+        Args:
+            target: 目标函数
+            name: 线程名称
+            args: 位置参数
+            kwargs: 关键字参数
+            daemon: 是否为守护线程
+        """
+        super().__init__(target=target, name=name, args=args, kwargs=kwargs or {}, daemon=daemon)
+        self._target = target
+        self._args = args
+        self._kwargs = kwargs or {}
+        self._result = None
+        self._exception = None
+        self._start_time = None
+        self._end_time = None
+        self._running = False
+    
+    def run(self):
+        """重写run方法，添加错误处理和监控"""
+        self._start_time = time.time()
+        self._running = True
+        
+        try:
+            if self._target:
+                log(1, f"线程 {self.name} 开始执行")
+                self._result = self._target(*self._args, **self._kwargs)
+                log(1, f"线程 {self.name} 执行完成")
+        except Exception as e:
+            self._exception = e
+            log(3, f"线程 {self.name} 执行出错: {e}")
+            log(3, f"错误详情: {traceback.format_exc()}")
+        finally:
+            self._end_time = time.time()
+            self._running = False
+            duration = self._end_time - self._start_time
+            log(1, f"线程 {self.name} 运行时长: {duration:.2f}秒")
+    
+    def get_result(self) -> Any:
+        """获取线程执行结果"""
+        if self.is_alive():
+            raise RuntimeError("线程仍在运行中")
+        
+        if self._exception:
+            raise self._exception
+        
+        return self._result
+    
+    def get_exception(self) -> Optional[Exception]:
+        """获取线程执行过程中的异常"""
+        return self._exception
+    
+    def get_duration(self) -> Optional[float]:
+        """获取线程运行时长（秒）"""
+        if self._start_time is None:
+            return None
+        
+        end_time = self._end_time or time.time()
+        return end_time - self._start_time
+    
+    def is_running(self) -> bool:
+        """检查线程是否正在运行"""
+        return self._running and self.is_alive()
+    
+    def stop_gracefully(self, timeout: float = 5.0) -> bool:
+        """优雅地停止线程
+        
+        Args:
+            timeout: 等待超时时间（秒）
+        
+        Returns:
+            bool: 是否成功停止
+        """
+        if not self.is_alive():
+            return True
+        
+        log(1, f"正在停止线程 {self.name}")
+        
+        # 等待线程自然结束
+        self.join(timeout=timeout)
+        
+        if self.is_alive():
+            log(2, f"线程 {self.name} 在 {timeout} 秒内未能自然结束")
+            return False
+        else:
+            log(1, f"线程 {self.name} 已成功停止")
+            return True
+    
+    def __str__(self) -> str:
+        """字符串表示"""
+        status = "运行中" if self.is_running() else "已停止"
+        duration = self.get_duration()
+        duration_str = f", 运行时长: {duration:.2f}秒" if duration else ""
+        return f"MyThread(name={self.name}, status={status}{duration_str})"
+    
+    def __repr__(self) -> str:
+        """详细字符串表示"""
+        return self.__str__()
+
+
+class ThreadManager:
+    """线程管理器，用于管理多个线程"""
+    
+    def __init__(self):
+        self._threads = {}
+        self._lock = threading.Lock()
+    
+    def create_thread(self, name: str, target: Callable, args: tuple = (), 
+                     kwargs: dict = None, daemon: bool = True) -> MyThread:
+        """创建新线程
+        
+        Args:
+            name: 线程名称
+            target: 目标函数
+            args: 位置参数
+            kwargs: 关键字参数
+            daemon: 是否为守护线程
+        
+        Returns:
+            MyThread: 创建的线程对象
+        """
+        with self._lock:
+            if name in self._threads:
+                raise ValueError(f"线程名称 '{name}' 已存在")
+            
+            thread = MyThread(target=target, name=name, args=args, 
+                            kwargs=kwargs or {}, daemon=daemon)
+            self._threads[name] = thread
+            return thread
+    
+    def start_thread(self, name: str) -> bool:
+        """启动指定线程
+        
+        Args:
+            name: 线程名称
+        
+        Returns:
+            bool: 是否成功启动
+        """
+        with self._lock:
+            if name not in self._threads:
+                log(3, f"线程 '{name}' 不存在")
+                return False
+            
+            thread = self._threads[name]
+            if thread.is_alive():
+                log(2, f"线程 '{name}' 已在运行中")
+                return False
+            
+            try:
+                thread.start()
+                log(1, f"线程 '{name}' 启动成功")
+                return True
+            except Exception as e:
+                log(3, f"启动线程 '{name}' 失败: {e}")
+                return False
+    
+    def stop_thread(self, name: str, timeout: float = 5.0) -> bool:
+        """停止指定线程
+        
+        Args:
+            name: 线程名称
+            timeout: 等待超时时间
+        
+        Returns:
+            bool: 是否成功停止
+        """
+        with self._lock:
+            if name not in self._threads:
+                log(3, f"线程 '{name}' 不存在")
+                return False
+            
+            thread = self._threads[name]
+            return thread.stop_gracefully(timeout)
+    
+    def stop_all_threads(self, timeout: float = 5.0) -> bool:
+        """停止所有线程
+        
+        Args:
+            timeout: 每个线程的等待超时时间
+        
+        Returns:
+            bool: 是否所有线程都成功停止
+        """
+        log(1, "正在停止所有线程...")
+        success = True
+        
+        with self._lock:
+            for name, thread in self._threads.items():
+                if thread.is_alive():
+                    if not thread.stop_gracefully(timeout):
+                        success = False
+        
+        if success:
+            log(1, "所有线程已成功停止")
+        else:
+            log(2, "部分线程未能在指定时间内停止")
+        
+        return success
+    
+    def get_thread_status(self) -> dict:
+        """获取所有线程状态
+        
+        Returns:
+            dict: 线程状态信息
+        """
+        status = {}
+        with self._lock:
+            for name, thread in self._threads.items():
+                status[name] = {
+                    'alive': thread.is_alive(),
+                    'running': thread.is_running(),
+                    'duration': thread.get_duration(),
+                    'exception': str(thread.get_exception()) if thread.get_exception() else None
+                }
+        return status
+    
+    def cleanup_finished_threads(self):
+        """清理已完成的线程"""
+        with self._lock:
+            finished_threads = [name for name, thread in self._threads.items() 
+                              if not thread.is_alive()]
+            
+            for name in finished_threads:
+                del self._threads[name]
+                log(1, f"已清理完成的线程: {name}")
+    
+    def __len__(self) -> int:
+        """返回线程数量"""
+        with self._lock:
+            return len(self._threads)
+    
+    def __contains__(self, name: str) -> bool:
+        """检查是否包含指定名称的线程"""
+        with self._lock:
+            return name in self._threads
+
+
+# 全局线程管理器实例
+thread_manager = ThreadManager()