戒酒的李白

Update README.

@@ -16,41 +16,437 @@ @@ -16,41 +16,437 @@
16 <img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/banner_compressed.png" alt="banner" width="800"> 16 <img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/banner_compressed.png" alt="banner" width="800">
17 </div> 17 </div>
18 18
19 - ### **[Important Announcement] Refactoring Plan for Weibo_PublicOpinion_AnalysisSystem** 19 +## 项目概述
20 20
21 -Dear all contributors, users, and followers, 21 +**Weibo舆情分析多智能体系统** 是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过多个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。
22 22
23 -Hello everyone, 23 +### 核心特色
24 24
25 -I am the initiator and main developer of this project. First and foremost, I want to personally thank you for your continued attention, contributions, and enthusiasm for the `Weibo_PublicOpinion_AnalysisSystem` project. 25 +- **多智能体协作架构**:5个专门化Agent协同工作,各司其职
  26 +- **全方位数据采集**:整合微博爬虫、新闻搜索、网络信息多维度数据源
  27 +- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准情感识别
  28 +- **智能报告生成**:自动生成结构化HTML分析报告
  29 +- **Agent论坛交流**:Forum Engine提供Agent间信息共享和协作决策平台
  30 +- **高性能异步处理**:支持并发处理多个舆情任务
26 31
27 -Over the past period, as the project has expanded, I have noticed several challenges that require attention: 32 +## 系统架构
28 33
29 -1. **Architectural and Module Issues:** Through rapid iteration, many modules have been integrated. However, a lack of unified top-level design has led to some module conflicts and a need for structural optimization.  
30 -2. **High Barrier to Entry:** A significant current challenge is that users need to configure their own crawlers and scrape data from scratch. This makes the deployment and startup process relatively complex, creating an inconvenience for many new users.  
31 -3. **Development and Presentation Limitations:** The development progress of various functional modules has been uneven. Additionally, the existing dashboard paradigm has limitations in compatibility and scalability that hinder my future development goals.  
32 -4. **Constraints of the Self-Trained Model:** Considering its size and maintenance costs, the previously trained model has become a constraint on the project's long-term development. 34 +### 整体架构图
33 35
34 -After a careful evaluation of these points, and in light of current technological trends (especially in LLMs, and Agents), I have decided to initiate a **comprehensive, bottom-up architectural refactoring** of the project, with the goal of providing a more user-friendly tool for everyone. 36 +```mermaid
  37 +graph TB
  38 + subgraph "前端展示层"
  39 + UI[Web界面<br/>Flask + Streamlit]
  40 + end
35 41
36 -**My next update plan will focus on:** 42 + subgraph "多Agent协作层"
  43 + QE[QueryEngine<br/>新闻搜索Agent]
  44 + ME[MediaEngine<br/>多媒体搜索Agent]
  45 + IE[InsightEngine<br/>深度洞察Agent]
  46 + RE[ReportEngine<br/>报告生成Agent]
  47 + Forum[ForumEngine<br/>Agent论坛交流中心]
  48 + end
37 49
38 -1. **Optimizing the Core Architecture:** I will be moving away from the current dashboard-centric presentation to design a more lightweight and flexible system framework.  
39 -2. **Focusing on Core Competencies:** The new architecture will refocus my efforts on the crawling, processing, and in-depth analysis of Weibo data, aiming to build a stable and efficient data core.  
40 -3. **Integrating Advanced Large Language Models (LLMs):** I plan to discontinue maintenance of the self-trained model and will instead utilize APIs to call mainstream large language models for analysis tasks, enhancing the system's analytical capabilities and flexibility.  
41 -4. **The Ultimate Goal: A New Model of "Deployable Core + Online Service":**  
42 - - **For Developers:** I aim to refine the project into a **"minimal, user-friendly, low-cost, modular"** public opinion analysis **core engine** to facilitate secondary development and private deployment.  
43 - - **For General Users:** Leveraging the new architecture, I **plan to introduce a new "Online Service" version, designed to address the challenges of deployment and data acquisition.**  
44 - - **Providing a Shared Database:** I will begin building and maintaining a **continuously updated, shared database**. This will allow users to access our data source directly, **removing the need to configure and run their own crawlers.**  
45 - - **Simplifying the User Experience:** This will eliminate the need for a complex local setup, enabling a **click-to-use** experience.  
46 - - **Retaining Personalized Analysis:** Users will still be able to configure their own LLM API keys in the online service to perform personalized, in-depth analysis with our data core. 50 + subgraph "数据处理层"
  51 + MS[MindSpider<br/>微博爬虫系统]
  52 + SA[SentimentAnalysis<br/>情感分析模型]
  53 + DB[(MySQL<br/>数据库)]
  54 + end
47 55
48 -This refactoring is a necessary step in our development. I understand this will require adjusting and, in some cases, rewriting code to which many of you have contributed. However, for the long-term health of the project and to make it accessible to a broader audience, I believe this step is essential. 56 + subgraph "外部服务层"
  57 + LLM[LLM API<br/>DeepSeek/Kimi/Gemini]
  58 + Search[搜索API<br/>Tavily/Bocha]
  59 + end
49 60
50 -In the coming weeks, I will begin to outline the new project blueprint and will keep the community updated on my progress. I value your wisdom and support now more than ever. 61 + UI --> QE
  62 + UI --> ME
  63 + UI --> IE
  64 + UI --> RE
51 65
52 -Thank you once again for your understanding and support! Let's look forward to the next evolution of `Weibo_PublicOpinion_AnalysisSystem`. 66 + QE --> Search
  67 + ME --> Search
  68 + IE --> MS
  69 + IE --> SA
53 70
54 -Sincerely, 71 + QE --> LLM
  72 + ME --> LLM
  73 + IE --> LLM
  74 + RE --> LLM
55 75
56 -Project Initiator 76 + MS --> DB
  77 + SA --> DB
  78 +
  79 + %% Agent论坛交流机制
  80 + QE <--> Forum
  81 + ME <--> Forum
  82 + IE <--> Forum
  83 + RE <--> Forum
  84 +
  85 + style UI fill:#e1f5fe
  86 + style QE fill:#fff3e0
  87 + style ME fill:#fff3e0
  88 + style IE fill:#fff3e0
  89 + style RE fill:#f3e5f5
  90 + style Forum fill:#e8f5e9
  91 + style MS fill:#fce4ec
  92 + style SA fill:#fce4ec
  93 + style DB fill:#fff9c4
  94 + style LLM fill:#e3f2fd
  95 + style Search fill:#e3f2fd
  96 +```
  97 +
  98 +### 数据流程图
  99 +
  100 +```mermaid
  101 +sequenceDiagram
  102 + participant User as 用户
  103 + participant UI as Web界面
  104 + participant QE as QueryEngine
  105 + participant ME as MediaEngine
  106 + participant IE as InsightEngine
  107 + participant Forum as ForumEngine
  108 + participant RE as ReportEngine
  109 + participant DB as 数据库
  110 +
  111 + User->>UI: 输入查询关键词
  112 + UI->>QE: 发起搜索请求
  113 + UI->>ME: 发起搜索请求
  114 + UI->>IE: 发起搜索请求
  115 +
  116 + Note over QE,IE: Agent执行前先读取论坛信息
  117 + QE->>Forum: 读取论坛交流信息
  118 + ME->>Forum: 读取论坛交流信息
  119 + IE->>Forum: 读取论坛交流信息
  120 +
  121 + par 并行处理与持续思维链交流
  122 + Note over QE: 结构思考→反思搜索→持续交流
  123 + QE->>QE: 确定新闻搜索结构
  124 + QE->>Forum: 思维链交流(结构思考)
  125 + QE->>QE: 多步反思与搜索分析
  126 + QE->>Forum: 思维链交流(搜索进展)
  127 + QE->>QE: 生成汇总报告
  128 + QE->>Forum: 思维链交流(关键发现)
  129 + and
  130 + Note over ME: 结构思考→反思搜索→持续交流
  131 + ME->>ME: 确定多媒体搜索结构
  132 + ME->>Forum: 思维链交流(结构思考)
  133 + ME->>ME: 多步反思与搜索分析
  134 + ME->>Forum: 思维链交流(搜索进展)
  135 + ME->>ME: 生成汇总报告
  136 + ME->>Forum: 思维链交流(关键发现)
  137 + and
  138 + Note over IE: 结构思考→反思搜索→持续交流
  139 + IE->>IE: 确定洞察分析结构
  140 + IE->>Forum: 思维链交流(结构思考)
  141 + IE->>DB: 查询微博数据
  142 + IE->>IE: 多步反思与情感洞察
  143 + IE->>Forum: 思维链交流(洞察进展)
  144 + IE->>IE: 生成汇总报告
  145 + IE->>Forum: 思维链交流(关键发现)
  146 + end
  147 +
  148 + Note over Forum: 论坛汇总Agent交流信息
  149 + Forum->>RE: 触发报告生成
  150 + RE->>Forum: 读取所有Agent的交流信息
  151 + RE->>QE: 获取QueryEngine汇总报告
  152 + RE->>ME: 获取MediaEngine汇总报告
  153 + RE->>IE: 获取InsightEngine汇总报告
  154 +
  155 + Note over RE: ReportEngine智能报告生成
  156 + RE->>RE: 读取模板库与样式库并选择
  157 + RE->>RE: 分步思考生成报告各部分
  158 + RE->>RE: 整合生成最终报告
  159 + RE->>UI: 生成综合HTML报告
  160 + UI->>User: 展示分析结果
  161 +```
  162 +
  163 +## 项目结构
  164 +
  165 +```
  166 +Weibo_PublicOpinion_AnalysisSystem/
  167 +├── QueryEngine/ # web查询引擎Agent
  168 +│ ├── agent.py # Agent主逻辑
  169 +│ ├── llms/ # LLM接口封装
  170 +│ ├── nodes/ # 处理节点
  171 +│ ├── tools/ # 搜索工具
  172 +│ └── utils/ # 工具函数
  173 +├── MediaEngine/ # 媒体引擎Agent
  174 +│ └── (类似结构)
  175 +├── InsightEngine/ # 数据库引擎Agent
  176 +│ └── (类似结构)
  177 +├── ReportEngine/ # 报告生成Agent
  178 +│ ├── report_template/ # 报告模板
  179 +│ └── flask_interface.py # API接口
  180 +├── ForumEgine/ # 论坛交流Agent
  181 +│ └── monitor.py # 论坛交流管理器
  182 +├── MindSpider/ # 微博爬虫系统
  183 +│ ├── BroadTopicExtraction/ # 话题提取
  184 +│ ├── DeepSentimentCrawling/ # 深度爬取
  185 +│ └── schema/ # 数据库结构
  186 +├── SentimentAnalysisModel/ # 情感分析模型
  187 +│ ├── BertTopicDetection_Finetuned/
  188 +│ ├── WeiboSentiment_Finetuned/
  189 +│ └── WeiboSentiment_MachineLearning/
  190 +├── SingleEngineApp/ # Streamlit应用
  191 +├── templates/ # Flask模板
  192 +├── static/ # 静态资源
  193 +├── logs/ # 运行日志
  194 +├── app.py # 主应用入口
  195 +├── config.py # 配置文件
  196 +└── requirements.txt # 依赖包
  197 +```
  198 +
  199 +## 快速开始
  200 +
  201 +### 环境要求
  202 +
  203 +- **操作系统**: Windows 10/11
  204 +- **Python版本**: 3.11+
  205 +- **Conda**: Anaconda或Miniconda
  206 +- **数据库**: MySQL 8.0+
  207 +- **内存**: 建议8GB以上
  208 +
  209 +### 1. 创建Conda环境
  210 +
  211 +```bash
  212 +# 创建名为pytorch_python11的conda环境
  213 +conda create -n pytorch_python11 python=3.11
  214 +conda activate pytorch_python11
  215 +```
  216 +
  217 +### 2. 安装依赖包
  218 +
  219 +```bash
  220 +# 基础依赖安装
  221 +pip install -r requirements.txt
  222 +
  223 +# 如果需要情感分析功能,安装PyTorch(根据CUDA版本选择)
  224 +# CPU版本
  225 +pip install torch torchvision torchaudio
  226 +
  227 +# CUDA 11.8版本
  228 +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  229 +
  230 +# 安装transformers(用于BERT/GPT模型)
  231 +pip install transformers scikit-learn xgboost
  232 +```
  233 +
  234 +### 3. 安装Playwright浏览器驱动
  235 +
  236 +```bash
  237 +# 安装浏览器驱动(用于爬虫功能)
  238 +playwright install chromium
  239 +```
  240 +
  241 +### 4. 配置系统
  242 +
  243 +#### 4.1 配置API密钥
  244 +
  245 +编辑 `config.py` 文件,填入您的API密钥:
  246 +
  247 +```python
  248 +# MySQL数据库配置
  249 +DB_HOST = "localhost"
  250 +DB_PORT = 3306
  251 +DB_USER = "your_username"
  252 +DB_PASSWORD = "your_password"
  253 +DB_NAME = "weibo_analysis"
  254 +DB_CHARSET = "utf8mb4"
  255 +
  256 +# DeepSeek API(申请地址:https://www.deepseek.com/)
  257 +DEEPSEEK_API_KEY = "your_deepseek_api_key"
  258 +
  259 +# Tavily搜索API(申请地址:https://www.tavily.com/)
  260 +TAVILY_API_KEY = "your_tavily_api_key"
  261 +
  262 +# Kimi API(申请地址:https://www.kimi.com/)
  263 +KIMI_API_KEY = "your_kimi_api_key"
  264 +
  265 +# Gemini API(申请地址:https://api.chataiapi.com/)
  266 +GEMINI_API_KEY = "your_gemini_api_key"
  267 +
  268 +# 博查搜索API(申请地址:https://open.bochaai.com/)
  269 +BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
  270 +
  271 +# 硅基流动API(申请地址:https://siliconflow.cn/)
  272 +GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
  273 +```
  274 +
  275 +#### 4.2 初始化数据库
  276 +
  277 +```bash
  278 +cd MindSpider
  279 +python schema/init_database.py
  280 +```
  281 +
  282 +### 5. 启动系统
  283 +
  284 +#### 方式一:完整系统启动(推荐)
  285 +
  286 +```bash
  287 +# 在项目根目录下,激活conda环境
  288 +conda activate pytorch_python11
  289 +
  290 +# 启动主应用(自动启动所有Agent)
  291 +python app.py
  292 +```
  293 +
  294 +访问 http://localhost:5000 即可使用系统
  295 +
  296 +#### 方式二:单独启动某个Agent
  297 +
  298 +```bash
  299 +# 启动QueryEngine
  300 +streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503
  301 +
  302 +# 启动MediaEngine
  303 +streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
  304 +
  305 +# 启动InsightEngine
  306 +streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
  307 +```
  308 +
  309 +## 使用指南
  310 +
  311 +### 基础使用流程
  312 +
  313 +1. **启动系统**:运行 `python app.py`,系统会自动启动所有Agent
  314 +
  315 +2. **输入查询**:在Web界面搜索框输入要分析的舆情关键词
  316 +
  317 +3. **Agent协作**
  318 + - QueryEngine:搜索新闻和官方报道,将关键发现发布到论坛
  319 + - MediaEngine:搜索多媒体内容,与其他Agent分享重要信息
  320 + - InsightEngine:分析微博数据和情感,在论坛中交流洞察
  321 + - ForumEngine:提供Agent间交流平台,汇总协作信息
  322 +
  323 +4. **查看结果**
  324 + - Agent论坛交流:查看Agent间的实时信息交换
  325 + - 分析报告:查看基于Agent协作的综合HTML报告
  326 +
  327 +### 高级配置
  328 +
  329 +#### 配置爬虫系统
  330 +
  331 +1. **配置爬虫参数**
  332 +```python
  333 +# MindSpider/config.py
  334 +CRAWLER_CONFIG = {
  335 + 'max_pages': 100, # 最大爬取页数
  336 + 'delay': 1, # 请求延迟(秒)
  337 + 'timeout': 30, # 超时时间(秒)
  338 + 'use_proxy': False, # 是否使用代理
  339 +}
  340 +```
  341 +
  342 +2. **运行爬虫**
  343 +```bash
  344 +cd MindSpider
  345 +python main.py --topic "话题关键词" --days 7
  346 +```
  347 +
  348 +#### 配置情感分析模型
  349 +
  350 +1. **选择模型**
  351 + - BERT微调模型(精度高)
  352 + - GPT-2 LoRA(速度快)
  353 + - Qwen小模型(平衡型)
  354 + - 机器学习基线(轻量级)
  355 +
  356 +2. **模型切换**
  357 +```python
  358 +# InsightEngine/tools/sentiment_analyzer.py
  359 +MODEL_TYPE = "bert" # 可选: "bert", "gpt2", "qwen", "ml"
  360 +```
  361 +
  362 +#### 自定义报告模板
  363 +
  364 +`ReportEngine/report_template/` 目录下创建新模板:
  365 +
  366 +```markdown
  367 +# 自定义报告模板
  368 +## 舆情概览
  369 +${overview}
  370 +
  371 +## 情感分析
  372 +${sentiment_analysis}
  373 +
  374 +## 关键观点
  375 +${key_insights}
  376 +
  377 +## 趋势预测
  378 +${trend_prediction}
  379 +```
  380 +
  381 +### 监控与日志
  382 +
  383 +#### 查看系统日志
  384 +
  385 +所有日志文件位于 `logs/` 目录:
  386 +- `query.log`: QueryEngine运行日志
  387 +- `media.log`: MediaEngine运行日志
  388 +- `insight.log`: InsightEngine运行日志
  389 +- `forum.log`: ForumEngine论坛交流日志
  390 +- `report.log`: ReportEngine生成日志
  391 +
  392 +#### Agent论坛交流
  393 +
  394 +ForumEngine提供多Agent协作交流功能:
  395 +1. Agent行动前读取论坛交流信息
  396 +2. Agent思考后决定是否分享关键发现
  397 +3. 汇总所有Agent的交流信息
  398 +4. 为ReportEngine提供协作数据基础
  399 +
  400 +## 故障排除
  401 +
  402 +### 常见问题
  403 +
  404 +#### 1. 端口占用
  405 +```bash
  406 +# 查看端口占用(Windows)
  407 +netstat -ano | findstr :5000
  408 +netstat -ano | findstr :8501
  409 +
  410 +# 结束占用进程
  411 +taskkill /F /PID <进程ID>
  412 +```
  413 +
  414 +#### 2. 编码问题
  415 +```python
  416 +# 在代码开头添加
  417 +import sys
  418 +import os
  419 +os.environ['PYTHONIOENCODING'] = 'utf-8'
  420 +os.environ['PYTHONUTF8'] = '1'
  421 +```
  422 +
  423 +#### 3. Playwright安装失败
  424 +```bash
  425 +# 手动安装
  426 +python -m playwright install chromium --with-deps
  427 +```
  428 +
  429 +#### 4. MySQL连接失败
  430 +- 检查MySQL服务是否启动
  431 +- 确认用户权限配置
  432 +- 检查防火墙设置
  433 +
  434 +## 贡献指南
  435 +
  436 +我们欢迎所有形式的贡献!
  437 +
  438 +1. Fork项目
  439 +2. 创建Feature分支 (`git checkout -b feature/AmazingFeature`)
  440 +3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
  441 +4. 推送到分支 (`git push origin feature/AmazingFeature`)
  442 +5. 开启Pull Request
  443 +
  444 +## 许可证
  445 +
  446 +本项目采用 MIT 许可证。详见 [LICENSE](LICENSE) 文件。
  447 +
  448 +## 联系我们
  449 +
  450 +- 项目地址:[https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
  451 +- 邮箱:670939375@qq.com
  452 +- Issues:[项目Issues](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
1 -# 多Agent舆情分析协作系统依赖包 1 +# ========================================
  2 +# Weibo舆情分析多智能体系统依赖包
  3 +# 适用于Windows环境 + Conda部署
  4 +# ========================================
2 5
3 -# 核心依赖  
4 -streamlit>=1.28.0  
5 -requests>=2.31.0  
6 -python-dotenv>=1.0.0 6 +# ===== 核心Web框架 =====
  7 +flask==2.3.3
  8 +flask-socketio==5.3.6
  9 +streamlit==1.28.1
  10 +python-socketio==5.8.0
  11 +eventlet==0.33.3
  12 +
  13 +# ===== HTTP请求和异步 =====
  14 +requests==2.31.0
  15 +httpx==0.28.1
  16 +aiofiles==23.2.1
  17 +aiohttp>=3.8.0
7 18
8 -# LLM接口 19 +# ===== LLM接口 =====
9 openai>=1.3.0 20 openai>=1.3.0
10 -deepseek-ai>=0.1.0 21 +# deepseek-ai>=0.1.0 # 使用OpenAI格式
11 22
12 -# 搜索API 23 +# ===== 搜索API =====
13 tavily-python>=0.3.0 24 tavily-python>=0.3.0
14 25
15 -# 数据处理 26 +# ===== 数据处理 =====
16 pandas>=2.0.0 27 pandas>=2.0.0
17 numpy>=1.24.0 28 numpy>=1.24.0
  29 +regex>=2023.8.8
  30 +jieba==0.42.1
  31 +
  32 +# ===== 数据库 =====
  33 +pymysql==1.1.0
  34 +aiomysql==0.2.0
  35 +aiosqlite==0.21.0
  36 +redis>=4.6.0
  37 +
  38 +# ===== 爬虫相关 =====
  39 +playwright==1.45.0
  40 +Pillow==9.5.0
  41 +opencv-python>=4.8.0
  42 +beautifulsoup4>=4.12.0
  43 +lxml>=4.9.0
  44 +parsel==1.9.1
  45 +pyexecjs==1.5.1
18 46
19 -# 可视化 47 +# ===== 可视化 =====
20 plotly>=5.17.0 48 plotly>=5.17.0
  49 +matplotlib==3.9.0
  50 +wordcloud==1.9.3
21 51
22 -# 工具库  
23 -python-dateutil>=2.8.2  
24 -uuid>=1.30 52 +# ===== 机器学习(可选,用于情感分析) =====
  53 +# torch>=2.0.0 # 需要单独安装CUDA版本
  54 +# transformers>=4.30.0
  55 +# scikit-learn>=1.3.0
  56 +# xgboost>=2.0.0
25 57
26 -# 博查API相关  
27 -# 注意:博查API使用标准的HTTP请求,不需要额外的包 58 +# ===== 工具库 =====
  59 +python-dotenv>=1.0.0
  60 +python-dateutil>=2.8.2
  61 +pytz>=2023.3
  62 +tqdm>=4.65.0
  63 +tenacity==8.2.2
  64 +loguru>=0.7.0
  65 +pydantic==2.5.2
28 66
29 -# 开发工具 67 +# ===== 开发工具(可选) =====
30 pytest>=7.4.0 68 pytest>=7.4.0
31 black>=23.0.0 69 black>=23.0.0
32 flake8>=6.0.0 70 flake8>=6.0.0
33 71
34 -# Flask Web应用  
35 -flask==2.3.3  
36 -flask-socketio==5.3.6  
37 -streamlit==1.28.1  
38 -requests==2.31.0  
39 -python-socketio==5.8.0  
40 -eventlet==0.33.3  
  72 +# ===== Web服务器 =====
  73 +fastapi==0.110.2
  74 +uvicorn==0.29.0