戒酒的李白

Update README.

... ... @@ -16,41 +16,437 @@
<img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/banner_compressed.png" alt="banner" width="800">
</div>
### **[Important Announcement] Refactoring Plan for Weibo_PublicOpinion_AnalysisSystem**
## 项目概述
Dear all contributors, users, and followers,
**Weibo舆情分析多智能体系统** 是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过多个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。
Hello everyone,
### 核心特色
I am the initiator and main developer of this project. First and foremost, I want to personally thank you for your continued attention, contributions, and enthusiasm for the `Weibo_PublicOpinion_AnalysisSystem` project.
- **多智能体协作架构**:5个专门化Agent协同工作,各司其职
- **全方位数据采集**:整合微博爬虫、新闻搜索、网络信息多维度数据源
- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准情感识别
- **智能报告生成**:自动生成结构化HTML分析报告
- **Agent论坛交流**:Forum Engine提供Agent间信息共享和协作决策平台
- **高性能异步处理**:支持并发处理多个舆情任务
Over the past period, as the project has expanded, I have noticed several challenges that require attention:
## 系统架构
1. **Architectural and Module Issues:** Through rapid iteration, many modules have been integrated. However, a lack of unified top-level design has led to some module conflicts and a need for structural optimization.
2. **High Barrier to Entry:** A significant current challenge is that users need to configure their own crawlers and scrape data from scratch. This makes the deployment and startup process relatively complex, creating an inconvenience for many new users.
3. **Development and Presentation Limitations:** The development progress of various functional modules has been uneven. Additionally, the existing dashboard paradigm has limitations in compatibility and scalability that hinder my future development goals.
4. **Constraints of the Self-Trained Model:** Considering its size and maintenance costs, the previously trained model has become a constraint on the project's long-term development.
### 整体架构图
After a careful evaluation of these points, and in light of current technological trends (especially in LLMs, and Agents), I have decided to initiate a **comprehensive, bottom-up architectural refactoring** of the project, with the goal of providing a more user-friendly tool for everyone.
```mermaid
graph TB
subgraph "前端展示层"
UI[Web界面<br/>Flask + Streamlit]
end
**My next update plan will focus on:**
subgraph "多Agent协作层"
QE[QueryEngine<br/>新闻搜索Agent]
ME[MediaEngine<br/>多媒体搜索Agent]
IE[InsightEngine<br/>深度洞察Agent]
RE[ReportEngine<br/>报告生成Agent]
Forum[ForumEngine<br/>Agent论坛交流中心]
end
1. **Optimizing the Core Architecture:** I will be moving away from the current dashboard-centric presentation to design a more lightweight and flexible system framework.
2. **Focusing on Core Competencies:** The new architecture will refocus my efforts on the crawling, processing, and in-depth analysis of Weibo data, aiming to build a stable and efficient data core.
3. **Integrating Advanced Large Language Models (LLMs):** I plan to discontinue maintenance of the self-trained model and will instead utilize APIs to call mainstream large language models for analysis tasks, enhancing the system's analytical capabilities and flexibility.
4. **The Ultimate Goal: A New Model of "Deployable Core + Online Service":**
- **For Developers:** I aim to refine the project into a **"minimal, user-friendly, low-cost, modular"** public opinion analysis **core engine** to facilitate secondary development and private deployment.
- **For General Users:** Leveraging the new architecture, I **plan to introduce a new "Online Service" version, designed to address the challenges of deployment and data acquisition.**
- **Providing a Shared Database:** I will begin building and maintaining a **continuously updated, shared database**. This will allow users to access our data source directly, **removing the need to configure and run their own crawlers.**
- **Simplifying the User Experience:** This will eliminate the need for a complex local setup, enabling a **click-to-use** experience.
- **Retaining Personalized Analysis:** Users will still be able to configure their own LLM API keys in the online service to perform personalized, in-depth analysis with our data core.
subgraph "数据处理层"
MS[MindSpider<br/>微博爬虫系统]
SA[SentimentAnalysis<br/>情感分析模型]
DB[(MySQL<br/>数据库)]
end
This refactoring is a necessary step in our development. I understand this will require adjusting and, in some cases, rewriting code to which many of you have contributed. However, for the long-term health of the project and to make it accessible to a broader audience, I believe this step is essential.
subgraph "外部服务层"
LLM[LLM API<br/>DeepSeek/Kimi/Gemini]
Search[搜索API<br/>Tavily/Bocha]
end
In the coming weeks, I will begin to outline the new project blueprint and will keep the community updated on my progress. I value your wisdom and support now more than ever.
UI --> QE
UI --> ME
UI --> IE
UI --> RE
Thank you once again for your understanding and support! Let's look forward to the next evolution of `Weibo_PublicOpinion_AnalysisSystem`.
QE --> Search
ME --> Search
IE --> MS
IE --> SA
Sincerely,
QE --> LLM
ME --> LLM
IE --> LLM
RE --> LLM
Project Initiator
MS --> DB
SA --> DB
%% Agent论坛交流机制
QE <--> Forum
ME <--> Forum
IE <--> Forum
RE <--> Forum
style UI fill:#e1f5fe
style QE fill:#fff3e0
style ME fill:#fff3e0
style IE fill:#fff3e0
style RE fill:#f3e5f5
style Forum fill:#e8f5e9
style MS fill:#fce4ec
style SA fill:#fce4ec
style DB fill:#fff9c4
style LLM fill:#e3f2fd
style Search fill:#e3f2fd
```
### 数据流程图
```mermaid
sequenceDiagram
participant User as 用户
participant UI as Web界面
participant QE as QueryEngine
participant ME as MediaEngine
participant IE as InsightEngine
participant Forum as ForumEngine
participant RE as ReportEngine
participant DB as 数据库
User->>UI: 输入查询关键词
UI->>QE: 发起搜索请求
UI->>ME: 发起搜索请求
UI->>IE: 发起搜索请求
Note over QE,IE: Agent执行前先读取论坛信息
QE->>Forum: 读取论坛交流信息
ME->>Forum: 读取论坛交流信息
IE->>Forum: 读取论坛交流信息
par 并行处理与持续思维链交流
Note over QE: 结构思考→反思搜索→持续交流
QE->>QE: 确定新闻搜索结构
QE->>Forum: 思维链交流(结构思考)
QE->>QE: 多步反思与搜索分析
QE->>Forum: 思维链交流(搜索进展)
QE->>QE: 生成汇总报告
QE->>Forum: 思维链交流(关键发现)
and
Note over ME: 结构思考→反思搜索→持续交流
ME->>ME: 确定多媒体搜索结构
ME->>Forum: 思维链交流(结构思考)
ME->>ME: 多步反思与搜索分析
ME->>Forum: 思维链交流(搜索进展)
ME->>ME: 生成汇总报告
ME->>Forum: 思维链交流(关键发现)
and
Note over IE: 结构思考→反思搜索→持续交流
IE->>IE: 确定洞察分析结构
IE->>Forum: 思维链交流(结构思考)
IE->>DB: 查询微博数据
IE->>IE: 多步反思与情感洞察
IE->>Forum: 思维链交流(洞察进展)
IE->>IE: 生成汇总报告
IE->>Forum: 思维链交流(关键发现)
end
Note over Forum: 论坛汇总Agent交流信息
Forum->>RE: 触发报告生成
RE->>Forum: 读取所有Agent的交流信息
RE->>QE: 获取QueryEngine汇总报告
RE->>ME: 获取MediaEngine汇总报告
RE->>IE: 获取InsightEngine汇总报告
Note over RE: ReportEngine智能报告生成
RE->>RE: 读取模板库与样式库并选择
RE->>RE: 分步思考生成报告各部分
RE->>RE: 整合生成最终报告
RE->>UI: 生成综合HTML报告
UI->>User: 展示分析结果
```
## 项目结构
```
Weibo_PublicOpinion_AnalysisSystem/
├── QueryEngine/ # web查询引擎Agent
│ ├── agent.py # Agent主逻辑
│ ├── llms/ # LLM接口封装
│ ├── nodes/ # 处理节点
│ ├── tools/ # 搜索工具
│ └── utils/ # 工具函数
├── MediaEngine/ # 媒体引擎Agent
│ └── (类似结构)
├── InsightEngine/ # 数据库引擎Agent
│ └── (类似结构)
├── ReportEngine/ # 报告生成Agent
│ ├── report_template/ # 报告模板
│ └── flask_interface.py # API接口
├── ForumEgine/ # 论坛交流Agent
│ └── monitor.py # 论坛交流管理器
├── MindSpider/ # 微博爬虫系统
│ ├── BroadTopicExtraction/ # 话题提取
│ ├── DeepSentimentCrawling/ # 深度爬取
│ └── schema/ # 数据库结构
├── SentimentAnalysisModel/ # 情感分析模型
│ ├── BertTopicDetection_Finetuned/
│ ├── WeiboSentiment_Finetuned/
│ └── WeiboSentiment_MachineLearning/
├── SingleEngineApp/ # Streamlit应用
├── templates/ # Flask模板
├── static/ # 静态资源
├── logs/ # 运行日志
├── app.py # 主应用入口
├── config.py # 配置文件
└── requirements.txt # 依赖包
```
## 快速开始
### 环境要求
- **操作系统**: Windows 10/11
- **Python版本**: 3.11+
- **Conda**: Anaconda或Miniconda
- **数据库**: MySQL 8.0+
- **内存**: 建议8GB以上
### 1. 创建Conda环境
```bash
# 创建名为pytorch_python11的conda环境
conda create -n pytorch_python11 python=3.11
conda activate pytorch_python11
```
### 2. 安装依赖包
```bash
# 基础依赖安装
pip install -r requirements.txt
# 如果需要情感分析功能,安装PyTorch(根据CUDA版本选择)
# CPU版本
pip install torch torchvision torchaudio
# CUDA 11.8版本
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# 安装transformers(用于BERT/GPT模型)
pip install transformers scikit-learn xgboost
```
### 3. 安装Playwright浏览器驱动
```bash
# 安装浏览器驱动(用于爬虫功能)
playwright install chromium
```
### 4. 配置系统
#### 4.1 配置API密钥
编辑 `config.py` 文件,填入您的API密钥:
```python
# MySQL数据库配置
DB_HOST = "localhost"
DB_PORT = 3306
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_NAME = "weibo_analysis"
DB_CHARSET = "utf8mb4"
# DeepSeek API(申请地址:https://www.deepseek.com/)
DEEPSEEK_API_KEY = "your_deepseek_api_key"
# Tavily搜索API(申请地址:https://www.tavily.com/)
TAVILY_API_KEY = "your_tavily_api_key"
# Kimi API(申请地址:https://www.kimi.com/)
KIMI_API_KEY = "your_kimi_api_key"
# Gemini API(申请地址:https://api.chataiapi.com/)
GEMINI_API_KEY = "your_gemini_api_key"
# 博查搜索API(申请地址:https://open.bochaai.com/)
BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
# 硅基流动API(申请地址:https://siliconflow.cn/)
GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
```
#### 4.2 初始化数据库
```bash
cd MindSpider
python schema/init_database.py
```
### 5. 启动系统
#### 方式一:完整系统启动(推荐)
```bash
# 在项目根目录下,激活conda环境
conda activate pytorch_python11
# 启动主应用(自动启动所有Agent)
python app.py
```
访问 http://localhost:5000 即可使用系统
#### 方式二:单独启动某个Agent
```bash
# 启动QueryEngine
streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503
# 启动MediaEngine
streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
# 启动InsightEngine
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
```
## 使用指南
### 基础使用流程
1. **启动系统**:运行 `python app.py`,系统会自动启动所有Agent
2. **输入查询**:在Web界面搜索框输入要分析的舆情关键词
3. **Agent协作**
- QueryEngine:搜索新闻和官方报道,将关键发现发布到论坛
- MediaEngine:搜索多媒体内容,与其他Agent分享重要信息
- InsightEngine:分析微博数据和情感,在论坛中交流洞察
- ForumEngine:提供Agent间交流平台,汇总协作信息
4. **查看结果**
- Agent论坛交流:查看Agent间的实时信息交换
- 分析报告:查看基于Agent协作的综合HTML报告
### 高级配置
#### 配置爬虫系统
1. **配置爬虫参数**
```python
# MindSpider/config.py
CRAWLER_CONFIG = {
'max_pages': 100, # 最大爬取页数
'delay': 1, # 请求延迟(秒)
'timeout': 30, # 超时时间(秒)
'use_proxy': False, # 是否使用代理
}
```
2. **运行爬虫**
```bash
cd MindSpider
python main.py --topic "话题关键词" --days 7
```
#### 配置情感分析模型
1. **选择模型**
- BERT微调模型(精度高)
- GPT-2 LoRA(速度快)
- Qwen小模型(平衡型)
- 机器学习基线(轻量级)
2. **模型切换**
```python
# InsightEngine/tools/sentiment_analyzer.py
MODEL_TYPE = "bert" # 可选: "bert", "gpt2", "qwen", "ml"
```
#### 自定义报告模板
`ReportEngine/report_template/` 目录下创建新模板:
```markdown
# 自定义报告模板
## 舆情概览
${overview}
## 情感分析
${sentiment_analysis}
## 关键观点
${key_insights}
## 趋势预测
${trend_prediction}
```
### 监控与日志
#### 查看系统日志
所有日志文件位于 `logs/` 目录:
- `query.log`: QueryEngine运行日志
- `media.log`: MediaEngine运行日志
- `insight.log`: InsightEngine运行日志
- `forum.log`: ForumEngine论坛交流日志
- `report.log`: ReportEngine生成日志
#### Agent论坛交流
ForumEngine提供多Agent协作交流功能:
1. Agent行动前读取论坛交流信息
2. Agent思考后决定是否分享关键发现
3. 汇总所有Agent的交流信息
4. 为ReportEngine提供协作数据基础
## 故障排除
### 常见问题
#### 1. 端口占用
```bash
# 查看端口占用(Windows)
netstat -ano | findstr :5000
netstat -ano | findstr :8501
# 结束占用进程
taskkill /F /PID <进程ID>
```
#### 2. 编码问题
```python
# 在代码开头添加
import sys
import os
os.environ['PYTHONIOENCODING'] = 'utf-8'
os.environ['PYTHONUTF8'] = '1'
```
#### 3. Playwright安装失败
```bash
# 手动安装
python -m playwright install chromium --with-deps
```
#### 4. MySQL连接失败
- 检查MySQL服务是否启动
- 确认用户权限配置
- 检查防火墙设置
## 贡献指南
我们欢迎所有形式的贡献!
1. Fork项目
2. 创建Feature分支 (`git checkout -b feature/AmazingFeature`)
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
4. 推送到分支 (`git push origin feature/AmazingFeature`)
5. 开启Pull Request
## 许可证
本项目采用 MIT 许可证。详见 [LICENSE](LICENSE) 文件。
## 联系我们
- 项目地址:[https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
- 邮箱:670939375@qq.com
- Issues:[项目Issues](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
... ...
# 多Agent舆情分析协作系统依赖包
# ========================================
# Weibo舆情分析多智能体系统依赖包
# 适用于Windows环境 + Conda部署
# ========================================
# 核心依赖
streamlit>=1.28.0
requests>=2.31.0
python-dotenv>=1.0.0
# ===== 核心Web框架 =====
flask==2.3.3
flask-socketio==5.3.6
streamlit==1.28.1
python-socketio==5.8.0
eventlet==0.33.3
# ===== HTTP请求和异步 =====
requests==2.31.0
httpx==0.28.1
aiofiles==23.2.1
aiohttp>=3.8.0
# LLM接口
# ===== LLM接口 =====
openai>=1.3.0
deepseek-ai>=0.1.0
# deepseek-ai>=0.1.0 # 使用OpenAI格式
# 搜索API
# ===== 搜索API =====
tavily-python>=0.3.0
# 数据处理
# ===== 数据处理 =====
pandas>=2.0.0
numpy>=1.24.0
regex>=2023.8.8
jieba==0.42.1
# ===== 数据库 =====
pymysql==1.1.0
aiomysql==0.2.0
aiosqlite==0.21.0
redis>=4.6.0
# ===== 爬虫相关 =====
playwright==1.45.0
Pillow==9.5.0
opencv-python>=4.8.0
beautifulsoup4>=4.12.0
lxml>=4.9.0
parsel==1.9.1
pyexecjs==1.5.1
# 可视化
# ===== 可视化 =====
plotly>=5.17.0
matplotlib==3.9.0
wordcloud==1.9.3
# 工具库
python-dateutil>=2.8.2
uuid>=1.30
# ===== 机器学习(可选,用于情感分析) =====
# torch>=2.0.0 # 需要单独安装CUDA版本
# transformers>=4.30.0
# scikit-learn>=1.3.0
# xgboost>=2.0.0
# 博查API相关
# 注意:博查API使用标准的HTTP请求,不需要额外的包
# ===== 工具库 =====
python-dotenv>=1.0.0
python-dateutil>=2.8.2
pytz>=2023.3
tqdm>=4.65.0
tenacity==8.2.2
loguru>=0.7.0
pydantic==2.5.2
# 开发工具
# ===== 开发工具(可选) =====
pytest>=7.4.0
black>=23.0.0
flake8>=6.0.0
# Flask Web应用
flask==2.3.3
flask-socketio==5.3.6
streamlit==1.28.1
requests==2.31.0
python-socketio==5.8.0
eventlet==0.33.3
\ No newline at end of file
# ===== Web服务器 =====
fastapi==0.110.2
uvicorn==0.29.0
\ No newline at end of file
... ...