戒酒的李白

Update readme.

  1 +<div align="center">
  2 +
  3 +# 📊 Weibo Public Opinion Multi-Agent Analysis System
  4 +
  5 +<img src="static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="600">
  6 +
  7 +[![GitHub Stars](https://img.shields.io/github/stars/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
  8 +[![GitHub Forks](https://img.shields.io/github/forks/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
  9 +[![GitHub Issues](https://img.shields.io/github/issues/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
  10 +[![GitHub License](https://img.shields.io/github/license/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)
  11 +
  12 +[English](./README-EN.md) | [中文文档](./README.md)
  13 +
  14 +</div>
  15 +
  16 +<div align="center">
  17 +<img src="static/image/banner_compressed.png" alt="banner" width="800">
  18 +</div>
  19 +
  20 +## 📝 Project Overview
  21 +
  22 +**Weibo Public Opinion Multi-Agent Analysis System** is an innovative public opinion analysis platform built from scratch, utilizing multi-agent collaborative architecture to provide accurate, real-time, and comprehensive Weibo public opinion monitoring and analysis services. The system achieves full-process automation from data collection and sentiment analysis to report generation through the collaboration of five specialized AI agents.
  23 +
  24 +### 🚀 Key Features
  25 +
  26 +- **Multi-Agent Collaborative Architecture**: 5 specialized agents working together to complete the full process of public opinion analysis
  27 +- **Comprehensive Data Collection**: Integrating Weibo crawlers, news search, multimedia content, and other multi-dimensional data sources
  28 +- **Deep Sentiment Analysis**: Precise multilingual sentiment recognition based on fine-tuned BERT/GPT-2/Qwen models
  29 +- **Intelligent Report Generation**: Automatically generate structured HTML analysis reports with custom template support
  30 +- **Agent Forum Communication**: ForumEngine provides information sharing and collaborative decision-making platform for agents
  31 +- **High-Performance Asynchronous Processing**: Support concurrent processing of multiple public opinion tasks with real-time status monitoring
  32 +- **Cloud Data Support**: Convenient cloud database service with 100,000+ daily real data
  33 +
  34 +## 🏗️ System Architecture
  35 +
  36 +### Overall Architecture Diagram
  37 +
  38 +```mermaid
  39 +graph TB
  40 + subgraph "Frontend Display Layer"
  41 + UI[Web Interface<br/>Flask + Streamlit]
  42 + end
  43 +
  44 + subgraph "Multi-Agent Collaboration Layer"
  45 + QE[QueryEngine<br/>News Search Agent]
  46 + ME[MediaEngine<br/>Multimedia Search Agent]
  47 + IE[InsightEngine<br/>Deep Insight Agent]
  48 + RE[ReportEngine<br/>Report Generation Agent]
  49 + Forum[ForumEngine<br/>Agent Forum Communication Center]
  50 + end
  51 +
  52 + subgraph "Data Processing Layer"
  53 + MS[MindSpider<br/>Weibo Crawler System]
  54 + SA[SentimentAnalysis<br/>Sentiment Analysis Model Collection]
  55 + DB[(MySQL<br/>Database)]
  56 + end
  57 +
  58 + subgraph "External Service Layer"
  59 + LLM[LLM API<br/>DeepSeek/Kimi/Gemini]
  60 + Search[Search API<br/>Tavily/Bocha]
  61 + end
  62 +
  63 + UI --> QE
  64 + UI --> ME
  65 + UI --> IE
  66 + UI --> RE
  67 +
  68 + QE --> Search
  69 + ME --> Search
  70 + IE --> MS
  71 + IE --> SA
  72 +
  73 + QE --> LLM
  74 + ME --> LLM
  75 + IE --> LLM
  76 + RE --> LLM
  77 +
  78 + MS --> DB
  79 + SA --> DB
  80 +
  81 + %% Agent Forum Communication Mechanism
  82 + QE <--> Forum
  83 + ME <--> Forum
  84 + IE <--> Forum
  85 + RE <--> Forum
  86 +```
  87 +
  88 +### Agent Collaboration Workflow
  89 +
  90 +The system's core workflow is based on multi-agent collaboration:
  91 +
  92 +1. **QueryEngine (News Query Agent)**: Uses Tavily API to search authoritative news reports, providing official information sources
  93 +2. **MediaEngine (Multimedia Search Agent)**: Conducts multimodal content search through Bocha API to gather social media perspectives
  94 +3. **InsightEngine (Deep Insight Agent)**: Queries local Weibo database, combines multiple sentiment analysis models for deep analysis
  95 +4. **ForumEngine (Forum Monitoring Agent)**: Real-time monitoring of agent log outputs, extracts key information and promotes collaboration
  96 +5. **ReportEngine (Report Generation Agent)**: Based on analysis results from all agents, uses Gemini LLM to generate comprehensive HTML reports
  97 +
  98 +### Project Code Structure
  99 +
  100 +```
  101 +Weibo_PublicOpinion_AnalysisSystem/
  102 +├── QueryEngine/ # News Query Engine Agent
  103 +│ ├── agent.py # Agent main logic
  104 +│ ├── llms/ # LLM interface wrapper
  105 +│ ├── nodes/ # Processing nodes
  106 +│ ├── tools/ # Search tools
  107 +│ └── utils/ # Utility functions
  108 +├── MediaEngine/ # Multimedia Search Engine Agent
  109 +│ ├── agent.py # Agent main logic
  110 +│ ├── llms/ # LLM interfaces
  111 +│ ├── tools/ # Search tools
  112 +│ └── ... # Other modules
  113 +├── InsightEngine/ # Data Insight Engine Agent
  114 +│ ├── agent.py # Agent main logic
  115 +│ ├── llms/ # LLM interface wrapper
  116 +│ │ ├── deepseek.py # DeepSeek API
  117 +│ │ ├── kimi.py # Kimi API
  118 +│ │ ├── openai_llm.py # OpenAI format API
  119 +│ │ └── base.py # LLM base class
  120 +│ ├── nodes/ # Processing nodes
  121 +│ │ ├── first_search_node.py # First search node
  122 +│ │ ├── reflection_node.py # Reflection node
  123 +│ │ ├── summary_nodes.py # Summary nodes
  124 +│ │ ├── search_node.py # Search node
  125 +│ │ ├── sentiment_node.py # Sentiment analysis node
  126 +│ │ └── insight_node.py # Insight generation node
  127 +│ ├── tools/ # Database query and analysis tools
  128 +│ │ ├── media_crawler_db.py # Database query tool
  129 +│ │ └── sentiment_analyzer.py # Sentiment analysis integration tool
  130 +│ ├── state/ # State management
  131 +│ │ ├── __init__.py
  132 +│ │ └── state.py # Agent state definition
  133 +│ ├── prompts/ # Prompt templates
  134 +│ │ ├── __init__.py
  135 +│ │ └── prompts.py # Various prompts
  136 +│ └── utils/ # Utility functions
  137 +│ ├── __init__.py
  138 +│ ├── config.py # Configuration management
  139 +│ └── helpers.py # Helper functions
  140 +├── ReportEngine/ # Report Generation Engine Agent
  141 +│ ├── agent.py # Agent main logic
  142 +│ ├── llms/ # LLM interfaces
  143 +│ │ └── gemini.py # Gemini API dedicated
  144 +│ ├── nodes/ # Report generation nodes
  145 +│ │ ├── template_selection.py # Template selection node
  146 +│ │ └── html_generation.py # HTML generation node
  147 +│ ├── report_template/ # Report template library
  148 +│ │ ├── 社会公共热点事件分析.md
  149 +│ │ ├── 商业品牌舆情监测.md
  150 +│ │ └── ... # More templates
  151 +│ └── flask_interface.py # Flask API interface
  152 +├── ForumEngine/ # Forum Communication Engine Agent
  153 +│ └── monitor.py # Log monitoring and forum management
  154 +├── MindSpider/ # Weibo Crawler System
  155 +│ ├── main.py # Crawler main program
  156 +│ ├── BroadTopicExtraction/ # Topic extraction module
  157 +│ │ ├── get_today_news.py # Today's news fetching
  158 +│ │ └── topic_extractor.py # Topic extractor
  159 +│ ├── DeepSentimentCrawling/ # Deep sentiment crawling
  160 +│ │ ├── MediaCrawler/ # Media crawler core
  161 +│ │ └── platform_crawler.py # Platform crawler management
  162 +│ └── schema/ # Database schema
  163 +│ └── init_database.py # Database initialization
  164 +├── SentimentAnalysisModel/ # Sentiment Analysis Model Collection
  165 +│ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models
  166 +│ ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis
  167 +│ ├── WeiboSentiment_SmallQwen/ # Small Qwen model
  168 +│ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
  169 +├── SingleEngineApp/ # Individual Agent Streamlit apps
  170 +│ ├── query_engine_streamlit_app.py
  171 +│ ├── media_engine_streamlit_app.py
  172 +│ └── insight_engine_streamlit_app.py
  173 +├── templates/ # Flask templates
  174 +│ └── index.html # Main interface template
  175 +├── static/ # Static resources
  176 +├── logs/ # Runtime log directory
  177 +├── app.py # Flask main application entry
  178 +├── config.py # Global configuration file
  179 +└── requirements.txt # Python dependency list
  180 +```
  181 +
  182 +## 🚀 Quick Start
  183 +
  184 +### System Requirements
  185 +
  186 +- **Operating System**: Windows 10/11 (Linux/macOS also supported)
  187 +- **Python Version**: 3.11+
  188 +- **Conda**: Anaconda or Miniconda
  189 +- **Database**: MySQL 8.0+ (or choose our cloud database service)
  190 +- **Memory**: 8GB+ recommended
  191 +
  192 +### 1. Create Conda Environment
  193 +
  194 +```bash
  195 +# Create conda environment named pytorch_python11
  196 +conda create -n pytorch_python11 python=3.11
  197 +conda activate pytorch_python11
  198 +```
  199 +
  200 +### 2. Install Dependencies
  201 +
  202 +```bash
  203 +# Install basic dependencies
  204 +pip install -r requirements.txt
  205 +
  206 +# If you need local sentiment analysis functionality, install PyTorch
  207 +# CPU version
  208 +pip install torch torchvision torchaudio
  209 +
  210 +# CUDA 11.8 version (if you have GPU)
  211 +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  212 +
  213 +# Install transformers and other AI-related dependencies
  214 +pip install transformers scikit-learn xgboost
  215 +```
  216 +
  217 +### 3. Install Playwright Browser Drivers
  218 +
  219 +```bash
  220 +# Install browser drivers (for crawler functionality)
  221 +playwright install chromium
  222 +```
  223 +
  224 +### 4. System Configuration
  225 +
  226 +#### 4.1 Configure API Keys
  227 +
  228 +Edit the `config.py` file and fill in your API keys:
  229 +
  230 +```python
  231 +# MySQL Database Configuration
  232 +DB_HOST = "localhost"
  233 +DB_PORT = 3306
  234 +DB_USER = "your_username"
  235 +DB_PASSWORD = "your_password"
  236 +DB_NAME = "weibo_analysis"
  237 +DB_CHARSET = "utf8mb4"
  238 +
  239 +# DeepSeek API (Apply at: https://www.deepseek.com/)
  240 +DEEPSEEK_API_KEY = "your_deepseek_api_key"
  241 +
  242 +# Tavily Search API (Apply at: https://www.tavily.com/)
  243 +TAVILY_API_KEY = "your_tavily_api_key"
  244 +
  245 +# Kimi API (Apply at: https://www.kimi.com/)
  246 +KIMI_API_KEY = "your_kimi_api_key"
  247 +
  248 +# Gemini API (Apply at: https://api.chataiapi.com/)
  249 +GEMINI_API_KEY = "your_gemini_api_key"
  250 +
  251 +# Bocha Search API (Apply at: https://open.bochaai.com/)
  252 +BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
  253 +
  254 +# Silicon Flow API (Apply at: https://siliconflow.cn/)
  255 +GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
  256 +```
  257 +
  258 +#### 4.2 Database Initialization
  259 +
  260 +**Option 1: Use Local Database**
  261 +```bash
  262 +# Local MySQL database initialization
  263 +cd MindSpider
  264 +python schema/init_database.py
  265 +```
  266 +
  267 +**Option 2: Use Cloud Database Service (Recommended)**
  268 +
  269 +We provide convenient cloud database service with 100,000+ daily real Weibo data, currently **free application** during the promotion period!
  270 +
  271 +- Real Weibo data, updated in real-time
  272 +- Pre-processed sentiment annotation data
  273 +- Multi-dimensional tag classification
  274 +- High-availability cloud service
  275 +- Professional technical support
  276 +
  277 +**Contact us to apply for free cloud database access: 📧 670939375@qq.com**
  278 +
  279 +### 5. Launch System
  280 +
  281 +#### 5.1 Complete System Launch (Recommended)
  282 +
  283 +```bash
  284 +# In project root directory, activate conda environment
  285 +conda activate pytorch_python11
  286 +
  287 +# Start main application (automatically starts all agents)
  288 +python app.py
  289 +```
  290 +
  291 +Visit http://localhost:5000 to use the complete system
  292 +
  293 +#### 5.2 Launch Individual Agents
  294 +
  295 +```bash
  296 +# Start QueryEngine
  297 +streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503
  298 +
  299 +# Start MediaEngine
  300 +streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
  301 +
  302 +# Start InsightEngine
  303 +streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
  304 +```
  305 +
  306 +#### 5.3 Standalone Crawler System
  307 +
  308 +```bash
  309 +# Enter crawler directory
  310 +cd MindSpider
  311 +
  312 +# Project initialization
  313 +python main.py --setup
  314 +
  315 +# Run complete crawler workflow
  316 +python main.py --complete --date 2024-01-20
  317 +
  318 +# Run topic extraction only
  319 +python main.py --broad-topic --date 2024-01-20
  320 +
  321 +# Run deep crawling only
  322 +python main.py --deep-sentiment --platforms xhs dy wb
  323 +```
  324 +
  325 +## 💾 Database Configuration
  326 +
  327 +### Local Database Configuration
  328 +
  329 +1. **Install MySQL 8.0+**
  330 +2. **Create Database**:
  331 + ```sql
  332 + CREATE DATABASE weibo_analysis CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  333 + ```
  334 +3. **Run Initialization Script**:
  335 + ```bash
  336 + cd MindSpider
  337 + python schema/init_database.py
  338 + ```
  339 +
  340 +### Auto-Crawling Configuration
  341 +
  342 +Configure automatic crawling tasks for continuous data updates:
  343 +
  344 +```python
  345 +# Configure crawler parameters in MindSpider/config.py
  346 +CRAWLER_CONFIG = {
  347 + 'max_pages': 200, # Maximum pages to crawl
  348 + 'delay': 1, # Request delay (seconds)
  349 + 'timeout': 30, # Timeout (seconds)
  350 + 'platforms': ['xhs', 'dy', 'wb', 'bili'], # Crawling platforms
  351 + 'daily_keywords': 100, # Daily keywords count
  352 + 'max_notes_per_keyword': 50, # Max content per keyword
  353 + 'use_proxy': False, # Whether to use proxy
  354 +}
  355 +```
  356 +
  357 +### Cloud Database Service (Recommended)
  358 +
  359 +**Why Choose Our Cloud Database Service?**
  360 +
  361 +- **Rich Data Sources**: 100,000+ daily real Weibo data covering hot topics across all industries
  362 +- **High-Quality Annotations**: Professional team manually annotated sentiment data with 95%+ accuracy
  363 +- **Multi-Dimensional Analysis**: Including topic classification, sentiment tendency, influence scoring and other multi-dimensional tags
  364 +- **Real-Time Updates**: 24/7 continuous data collection ensuring timeliness
  365 +- **Technical Support**: Professional team providing technical support and customization services
  366 +
  367 +**Application Method**:
  368 +📧 Email Contact: 670939375@qq.com
  369 +📝 Email Subject: Apply for Weibo Public Opinion Cloud Database Access
  370 +📝 Email Content: Please describe your use case and expected data volume requirements
  371 +
  372 +**Promotion Period Benefits**:
  373 +- Free basic cloud database access
  374 +- Free technical support and deployment guidance
  375 +- Priority access to new features
  376 +
  377 +## ⚙️ Advanced Configuration
  378 +
  379 +### Modify Key Parameters
  380 +
  381 +#### Agent Configuration Parameters
  382 +
  383 +Each agent has dedicated configuration files that can be adjusted according to needs:
  384 +
  385 +```python
  386 +# QueryEngine/utils/config.py
  387 +class Config:
  388 + max_reflections = 2 # Reflection rounds
  389 + max_search_results = 15 # Maximum search results
  390 + max_content_length = 8000 # Maximum content length
  391 +
  392 +# MediaEngine/utils/config.py
  393 +class Config:
  394 + comprehensive_search_limit = 10 # Comprehensive search limit
  395 + web_search_limit = 15 # Web search limit
  396 +
  397 +# InsightEngine/utils/config.py
  398 +class Config:
  399 + default_search_topic_globally_limit = 200 # Global search limit
  400 + default_get_comments_limit = 500 # Comment retrieval limit
  401 + max_search_results_for_llm = 50 # Max results for LLM
  402 +```
  403 +
  404 +#### Sentiment Analysis Model Configuration
  405 +
  406 +```python
  407 +# InsightEngine/tools/sentiment_analyzer.py
  408 +SENTIMENT_CONFIG = {
  409 + 'model_type': 'multilingual', # Options: 'bert', 'multilingual', 'qwen'
  410 + 'confidence_threshold': 0.8, # Confidence threshold
  411 + 'batch_size': 32, # Batch size
  412 + 'max_sequence_length': 512, # Max sequence length
  413 +}
  414 +```
  415 +
  416 +### Integrate Different LLM Models
  417 +
  418 +The system supports multiple LLM providers, switchable in each agent's configuration:
  419 +
  420 +```python
  421 +# Configure in each Engine's utils/config.py
  422 +class Config:
  423 + default_llm_provider = "deepseek" # Options: "deepseek", "openai", "kimi", "gemini"
  424 +
  425 + # DeepSeek configuration
  426 + deepseek_api_key = "your_api_key"
  427 + deepseek_model = "deepseek-chat"
  428 +
  429 + # OpenAI compatible configuration
  430 + openai_api_key = "your_api_key"
  431 + openai_model = "gpt-3.5-turbo"
  432 + openai_base_url = "https://api.openai.com/v1"
  433 +
  434 + # Kimi configuration
  435 + kimi_api_key = "your_api_key"
  436 + kimi_model = "moonshot-v1-8k"
  437 +
  438 + # Gemini configuration
  439 + gemini_api_key = "your_api_key"
  440 + gemini_model = "gemini-pro"
  441 +```
  442 +
  443 +### Change Sentiment Analysis Models
  444 +
  445 +The system integrates multiple sentiment analysis methods, selectable based on needs:
  446 +
  447 +#### 1. BERT-based Fine-tuned Model (Highest Accuracy)
  448 +
  449 +```bash
  450 +# Use BERT Chinese model
  451 +cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
  452 +python predict.py --text "This product is really great"
  453 +```
  454 +
  455 +#### 2. GPT-2 LoRA Fine-tuned Model (Faster Speed)
  456 +
  457 +```bash
  458 +cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
  459 +python predict.py --text "I'm not feeling great today"
  460 +```
  461 +
  462 +#### 3. Small Qwen Model (Balanced)
  463 +
  464 +```bash
  465 +cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
  466 +python predict_universal.py --text "This event was very successful"
  467 +```
  468 +
  469 +#### 4. Traditional Machine Learning Methods (Lightweight)
  470 +
  471 +```bash
  472 +cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
  473 +python predict.py --model_type "svm" --text "Service attitude needs improvement"
  474 +```
  475 +
  476 +#### 5. Multilingual Sentiment Analysis (Supports 22 Languages)
  477 +
  478 +```bash
  479 +cd SentimentAnalysisModel/WeiboMultilingualSentiment
  480 +python predict.py --text "This product is amazing!" --lang "en"
  481 +```
  482 +
  483 +### Integrate Custom Business Database
  484 +
  485 +#### 1. Modify Database Connection Configuration
  486 +
  487 +```python
  488 +# Add your business database configuration in config.py
  489 +BUSINESS_DB_HOST = "your_business_db_host"
  490 +BUSINESS_DB_PORT = 3306
  491 +BUSINESS_DB_USER = "your_business_user"
  492 +BUSINESS_DB_PASSWORD = "your_business_password"
  493 +BUSINESS_DB_NAME = "your_business_database"
  494 +```
  495 +
  496 +#### 2. Create Custom Data Access Tools
  497 +
  498 +```python
  499 +# InsightEngine/tools/custom_db_tool.py
  500 +class CustomBusinessDBTool:
  501 + """Custom business database query tool"""
  502 +
  503 + def __init__(self):
  504 + self.connection_config = {
  505 + 'host': config.BUSINESS_DB_HOST,
  506 + 'port': config.BUSINESS_DB_PORT,
  507 + 'user': config.BUSINESS_DB_USER,
  508 + 'password': config.BUSINESS_DB_PASSWORD,
  509 + 'database': config.BUSINESS_DB_NAME,
  510 + }
  511 +
  512 + def search_business_data(self, query: str, table: str):
  513 + """Query business data"""
  514 + # Implement your business logic
  515 + pass
  516 +
  517 + def get_customer_feedback(self, product_id: str):
  518 + """Get customer feedback data"""
  519 + # Implement customer feedback query logic
  520 + pass
  521 +```
  522 +
  523 +#### 3. Integrate into InsightEngine
  524 +
  525 +```python
  526 +# Integrate custom tools in InsightEngine/agent.py
  527 +from .tools.custom_db_tool import CustomBusinessDBTool
  528 +
  529 +class DeepSearchAgent:
  530 + def __init__(self, config=None):
  531 + # ... other initialization code
  532 + self.custom_db_tool = CustomBusinessDBTool()
  533 +
  534 + def execute_custom_search(self, query: str):
  535 + """Execute custom business data search"""
  536 + return self.custom_db_tool.search_business_data(query, "your_table")
  537 +```
  538 +
  539 +### Custom Report Templates
  540 +
  541 +#### 1. Create Template Files
  542 +
  543 +Create new Markdown templates in the `ReportEngine/report_template/` directory:
  544 +
  545 +```markdown
  546 +<!-- Enterprise Brand Monitoring Report.md -->
  547 +# Enterprise Brand Public Opinion Monitoring Report
  548 +
  549 +## 📊 Executive Summary
  550 +{executive_summary}
  551 +
  552 +## 🔍 Brand Mention Analysis
  553 +### Mention Volume Trends
  554 +{mention_trend}
  555 +
  556 +### Sentiment Distribution
  557 +{sentiment_distribution}
  558 +
  559 +## 📈 Competitor Analysis
  560 +{competitor_analysis}
  561 +
  562 +## 🎯 Key Insights Summary
  563 +{key_insights}
  564 +
  565 +## ⚠️ Risk Alerts
  566 +{risk_alerts}
  567 +
  568 +## 📋 Improvement Recommendations
  569 +{recommendations}
  570 +
  571 +---
  572 +*Report Type: Enterprise Brand Public Opinion Monitoring*
  573 +*Generation Time: {generation_time}*
  574 +*Data Sources: {data_sources}*
  575 +```
  576 +
  577 +#### 2. Use in Web Interface
  578 +
  579 +The system supports uploading custom template files (.md or .txt format), selectable when generating reports.
  580 +
  581 +## 🤝 Contributing Guide
  582 +
  583 +We welcome all forms of contributions!
  584 +
  585 +### How to Contribute
  586 +
  587 +1. **Fork the project** to your GitHub account
  588 +2. **Create Feature branch**: `git checkout -b feature/AmazingFeature`
  589 +3. **Commit changes**: `git commit -m 'Add some AmazingFeature'`
  590 +4. **Push to branch**: `git push origin feature/AmazingFeature`
  591 +5. **Open Pull Request**
  592 +
  593 +### Contribution Types
  594 +
  595 +- 🐛 Bug fixes
  596 +- ✨ New feature development
  597 +- 📚 Documentation improvements
  598 +- 🎨 UI/UX improvements
  599 +- ⚡ Performance optimization
  600 +- 🧪 Test case additions
  601 +
  602 +### Development Standards
  603 +
  604 +- Code follows PEP8 standards
  605 +- Commit messages use clear Chinese/English descriptions
  606 +- New features need corresponding test cases
  607 +- Update related documentation
  608 +
  609 +## 📄 License
  610 +
  611 +This project is licensed under the [MIT License](LICENSE). Please see the LICENSE file for details.
  612 +
  613 +## 🎉 Support & Contact
  614 +
  615 +### Get Help
  616 +
  617 +- **Project Homepage**: [GitHub Repository](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
  618 +- **Issue Reporting**: [Issues Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
  619 +- **Feature Requests**: [Discussions Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions)
  620 +
  621 +### Contact Information
  622 +
  623 +- 📧 **Email**: 670939375@qq.com
  624 +- 💬 **QQ Group**: [Join Technical Discussion Group]
  625 +- 🐦 **WeChat**: [Scan QR Code for Technical Support]
  626 +
  627 +### Business Cooperation
  628 +
  629 +- 🏢 **Enterprise Custom Development**
  630 +- 📊 **Big Data Services**
  631 +- 🎓 **Academic Collaboration**
  632 +- 💼 **Technical Training**
  633 +
  634 +### Cloud Service Application
  635 +
  636 +**Free Cloud Database Service Application**:
  637 +📧 Send email to: 670939375@qq.com
  638 +📝 Subject: Weibo Public Opinion Cloud Database Application
  639 +📝 Description: Your use case and requirements
  640 +
  641 +## 👥 Contributors
  642 +
  643 +Thanks to these excellent contributors:
  644 +
  645 +[![Contributors](https://contrib.rocks/image?repo=666ghj/Weibo_PublicOpinion_AnalysisSystem)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)
  646 +
  647 +---
  648 +
  649 +<div align="center">
  650 +
  651 +**⭐ If this project helps you, please give us a star!**
  652 +
  653 +Made with ❤️ by [Weibo Public Opinion Analysis Team](https://github.com/666ghj)
  654 +
  655 +</div>
1 <div align="center"> 1 <div align="center">
2 2
3 - <!-- # 📊 Weibo Public Opinion Analysis System --> 3 +<img src="static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="600">
4 4
5 - <img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="800"> 5 +# 微舆 - 致力于打造简洁通用的舆情分析平台
6 6
7 - [![GitHub Stars](https://img.shields.io/github/stars/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)  
8 - [![GitHub Forks](https://img.shields.io/github/forks/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)  
9 - [![GitHub Issues](https://img.shields.io/github/issues/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)  
10 - [![GitHub Contributors](https://img.shields.io/github/contributors/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)  
11 - [![GitHub License](https://img.shields.io/github/license/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE) 7 +[![GitHub Stars](https://img.shields.io/github/stars/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
  8 +[![GitHub Forks](https://img.shields.io/github/forks/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
  9 +[![GitHub Issues](https://img.shields.io/github/issues/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
  10 +[![GitHub License](https://img.shields.io/github/license/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)
  11 +
  12 +[English](./README-EN.md) | [中文文档](./README.md)
12 13
13 </div> 14 </div>
14 15
15 <div align="center"> 16 <div align="center">
16 -<img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/banner_compressed.png" alt="banner" width="800"> 17 +<img src="static/image/system_schematic.png" alt="banner" width="800">
17 </div> 18 </div>
18 19
19 -## 项目概述 20 +## 📝 项目概述
20 21
21 -**Weibo舆情分析多智能体系统** 是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过多个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。 22 +**微博舆情分析多智能体系统**是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过五个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。
22 23
23 -### 核心特色 24 +### 🚀 核心亮点
24 25
25 -- **多智能体协作架构**:5个专门化Agent协同工作,各司其职  
26 -- **全方位数据采集**:整合微博爬虫、新闻搜索、网络信息多维度数据源  
27 -- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准情感识别  
28 -- **智能报告生成**:自动生成结构化HTML分析报告  
29 -- **Agent论坛交流**:Forum Engine提供Agent间信息共享和协作决策平台  
30 -- **高性能异步处理**:支持并发处理多个舆情任务 26 +- **多智能体协作架构**:5个专门化Agent各司其职,协同工作完成舆情分析全流程
  27 +- **全方位数据采集**:整合微博爬虫、新闻搜索、多媒体内容等多维度数据源
  28 +- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准多语言情感识别
  29 +- **智能报告生成**:自动生成结构化HTML分析报告,支持自定义模板
  30 +- **Agent论坛交流**:ForumEngine提供Agent间信息共享和协作决策平台
  31 +- **高性能异步处理**:支持并发处理多个舆情任务,实时状态监控
  32 +- **云端数据支持**:提供便捷云数据库服务,日均10万+真实数据
31 33
32 -## 系统架构 34 +## 🏗️ 系统架构
33 35
34 ### 整体架构图 36 ### 整体架构图
35 37
@@ -49,7 +51,7 @@ graph TB @@ -49,7 +51,7 @@ graph TB
49 51
50 subgraph "数据处理层" 52 subgraph "数据处理层"
51 MS[MindSpider<br/>微博爬虫系统] 53 MS[MindSpider<br/>微博爬虫系统]
52 - SA[SentimentAnalysis<br/>情感分析模型] 54 + SA[SentimentAnalysis<br/>情感分析模型集合]
53 DB[(MySQL<br/>数据库)] 55 DB[(MySQL<br/>数据库)]
54 end 56 end
55 57
@@ -81,129 +83,110 @@ graph TB @@ -81,129 +83,110 @@ graph TB
81 ME <--> Forum 83 ME <--> Forum
82 IE <--> Forum 84 IE <--> Forum
83 RE <--> Forum 85 RE <--> Forum
84 -  
85 - style UI fill:#e1f5fe  
86 - style QE fill:#fff3e0  
87 - style ME fill:#fff3e0  
88 - style IE fill:#fff3e0  
89 - style RE fill:#f3e5f5  
90 - style Forum fill:#e8f5e9  
91 - style MS fill:#fce4ec  
92 - style SA fill:#fce4ec  
93 - style DB fill:#fff9c4  
94 - style LLM fill:#e3f2fd  
95 - style Search fill:#e3f2fd  
96 ``` 86 ```
97 87
98 -### 数据流程图 88 +### Agent协作流程
99 89
100 -```mermaid  
101 -sequenceDiagram  
102 - participant User as 用户  
103 - participant UI as Web界面  
104 - participant QE as QueryEngine  
105 - participant ME as MediaEngine  
106 - participant IE as InsightEngine  
107 - participant Forum as ForumEngine  
108 - participant RE as ReportEngine  
109 - participant DB as 数据库  
110 -  
111 - User->>UI: 输入查询关键词  
112 - UI->>QE: 发起搜索请求  
113 - UI->>ME: 发起搜索请求  
114 - UI->>IE: 发起搜索请求  
115 -  
116 - Note over QE,IE: Agent执行前先读取论坛信息  
117 - QE->>Forum: 读取论坛交流信息  
118 - ME->>Forum: 读取论坛交流信息  
119 - IE->>Forum: 读取论坛交流信息  
120 -  
121 - par 并行处理与持续思维链交流  
122 - Note over QE: 结构思考→反思搜索→持续交流  
123 - QE->>QE: 确定新闻搜索结构  
124 - QE->>Forum: 思维链交流(结构思考)  
125 - QE->>QE: 多步反思与搜索分析  
126 - QE->>Forum: 思维链交流(搜索进展)  
127 - QE->>QE: 生成汇总报告  
128 - QE->>Forum: 思维链交流(关键发现)  
129 - and  
130 - Note over ME: 结构思考→反思搜索→持续交流  
131 - ME->>ME: 确定多媒体搜索结构  
132 - ME->>Forum: 思维链交流(结构思考)  
133 - ME->>ME: 多步反思与搜索分析  
134 - ME->>Forum: 思维链交流(搜索进展)  
135 - ME->>ME: 生成汇总报告  
136 - ME->>Forum: 思维链交流(关键发现)  
137 - and  
138 - Note over IE: 结构思考→反思搜索→持续交流  
139 - IE->>IE: 确定洞察分析结构  
140 - IE->>Forum: 思维链交流(结构思考)  
141 - IE->>DB: 查询微博数据  
142 - IE->>IE: 多步反思与情感洞察  
143 - IE->>Forum: 思维链交流(洞察进展)  
144 - IE->>IE: 生成汇总报告  
145 - IE->>Forum: 思维链交流(关键发现)  
146 - end  
147 -  
148 - Note over Forum: 论坛汇总Agent交流信息  
149 - Forum->>RE: 触发报告生成  
150 - RE->>Forum: 读取所有Agent的交流信息  
151 - RE->>QE: 获取QueryEngine汇总报告  
152 - RE->>ME: 获取MediaEngine汇总报告  
153 - RE->>IE: 获取InsightEngine汇总报告  
154 -  
155 - Note over RE: ReportEngine智能报告生成  
156 - RE->>RE: 读取模板库与样式库并选择  
157 - RE->>RE: 分步思考生成报告各部分  
158 - RE->>RE: 整合生成最终报告  
159 - RE->>UI: 生成综合HTML报告  
160 - UI->>User: 展示分析结果  
161 -``` 90 +系统核心工作流程基于多Agent协作模式:
162 91
163 -## 项目结构 92 +1. **QueryEngine(新闻查询Agent)**:使用Tavily API搜索权威新闻报道,提供官方信息源
  93 +2. **MediaEngine(多媒体搜索Agent)**:通过Bocha API进行多模态内容搜索,获取社交媒体观点
  94 +3. **InsightEngine(深度洞察Agent)**:查询本地微博数据库,结合多种情感分析模型进行深度分析
  95 +4. **ForumEngine(论坛监控Agent)**:实时监控各Agent日志输出,提取关键信息并促进协作
  96 +5. **ReportEngine(报告生成Agent)**:基于所有Agent的分析结果,使用Gemini LLM生成综合HTML报告
  97 +
  98 +### 项目代码结构
164 99
165 ``` 100 ```
166 Weibo_PublicOpinion_AnalysisSystem/ 101 Weibo_PublicOpinion_AnalysisSystem/
167 -├── QueryEngine/ # web查询引擎Agent  
168 -│ ├── agent.py # Agent主逻辑  
169 -│ ├── llms/ # LLM接口封装  
170 -│ ├── nodes/ # 处理节点  
171 -│ ├── tools/ # 搜索工具  
172 -│ └── utils/ # 工具函数  
173 -├── MediaEngine/ # 媒体引擎Agent  
174 -│ └── (类似结构)  
175 -├── InsightEngine/ # 数据库引擎Agent  
176 -│ └── (类似结构)  
177 -├── ReportEngine/ # 报告生成Agent  
178 -│ ├── report_template/ # 报告模板  
179 -│ └── flask_interface.py # API接口  
180 -├── ForumEgine/ # 论坛交流Agent  
181 -│ └── monitor.py # 论坛交流管理器  
182 -├── MindSpider/ # 微博爬虫系统  
183 -│ ├── BroadTopicExtraction/ # 话题提取  
184 -│ ├── DeepSentimentCrawling/ # 深度爬取  
185 -│ └── schema/ # 数据库结构  
186 -├── SentimentAnalysisModel/ # 情感分析模型  
187 -│ ├── BertTopicDetection_Finetuned/  
188 -│ ├── WeiboSentiment_Finetuned/  
189 -│ └── WeiboSentiment_MachineLearning/  
190 -├── SingleEngineApp/ # Streamlit应用  
191 -├── templates/ # Flask模板  
192 -├── static/ # 静态资源  
193 -├── logs/ # 运行日志  
194 -├── app.py # 主应用入口  
195 -├── config.py # 配置文件  
196 -└── requirements.txt # 依赖包 102 +├── QueryEngine/ # 新闻查询引擎Agent
  103 +│ ├── agent.py # Agent主逻辑
  104 +│ ├── llms/ # LLM接口封装
  105 +│ ├── nodes/ # 处理节点
  106 +│ ├── tools/ # 搜索工具
  107 +│ └── utils/ # 工具函数
  108 +├── MediaEngine/ # 多媒体搜索引擎Agent
  109 +│ ├── agent.py # Agent主逻辑
  110 +│ ├── llms/ # LLM接口
  111 +│ ├── tools/ # 搜索工具
  112 +│ └── ... # 其他模块
  113 +├── InsightEngine/ # 数据洞察引擎Agent
  114 +│ ├── agent.py # Agent主逻辑
  115 +│ ├── llms/ # LLM接口封装
  116 +│ │ ├── deepseek.py # DeepSeek API
  117 +│ │ ├── kimi.py # Kimi API
  118 +│ │ ├── openai_llm.py # OpenAI格式API
  119 +│ │ └── base.py # LLM基类
  120 +│ ├── nodes/ # 处理节点
  121 +│ │ ├── first_search_node.py # 首次搜索节点
  122 +│ │ ├── reflection_node.py # 反思节点
  123 +│ │ ├── summary_nodes.py # 总结节点
  124 +│ │ ├── search_node.py # 搜索节点
  125 +│ │ ├── sentiment_node.py # 情感分析节点
  126 +│ │ └── insight_node.py # 洞察生成节点
  127 +│ ├── tools/ # 数据库查询和分析工具
  128 +│ │ ├── media_crawler_db.py # 数据库查询工具
  129 +│ │ └── sentiment_analyzer.py # 情感分析集成工具
  130 +│ ├── state/ # 状态管理
  131 +│ │ ├── __init__.py
  132 +│ │ └── state.py # Agent状态定义
  133 +│ ├── prompts/ # 提示词模板
  134 +│ │ ├── __init__.py
  135 +│ │ └── prompts.py # 各类提示词
  136 +│ └── utils/ # 工具函数
  137 +│ ├── __init__.py
  138 +│ ├── config.py # 配置管理
  139 +│ └── helpers.py # 辅助函数
  140 +├── ReportEngine/ # 报告生成引擎Agent
  141 +│ ├── agent.py # Agent主逻辑
  142 +│ ├── llms/ # LLM接口
  143 +│ │ └── gemini.py # Gemini API专用
  144 +│ ├── nodes/ # 报告生成节点
  145 +│ │ ├── template_selection.py # 模板选择节点
  146 +│ │ └── html_generation.py # HTML生成节点
  147 +│ ├── report_template/ # 报告模板库
  148 +│ │ ├── 社会公共热点事件分析.md
  149 +│ │ ├── 商业品牌舆情监测.md
  150 +│ │ └── ... # 更多模板
  151 +│ └── flask_interface.py # Flask API接口
  152 +├── ForumEngine/ # 论坛交流引擎Agent
  153 +│ └── monitor.py # 日志监控和论坛管理
  154 +├── MindSpider/ # 微博爬虫系统
  155 +│ ├── main.py # 爬虫主程序
  156 +│ ├── BroadTopicExtraction/ # 话题提取模块
  157 +│ │ ├── get_today_news.py # 今日新闻获取
  158 +│ │ └── topic_extractor.py # 话题提取器
  159 +│ ├── DeepSentimentCrawling/ # 深度情感爬取
  160 +│ │ ├── MediaCrawler/ # 媒体爬虫核心
  161 +│ │ └── platform_crawler.py # 平台爬虫管理
  162 +│ └── schema/ # 数据库结构
  163 +│ └── init_database.py # 数据库初始化
  164 +├── SentimentAnalysisModel/ # 情感分析模型集合
  165 +│ ├── WeiboSentiment_Finetuned/ # 微调BERT/GPT-2模型
  166 +│ ├── WeiboMultilingualSentiment/ # 多语言情感分析
  167 +│ ├── WeiboSentiment_SmallQwen/ # 小型Qwen模型
  168 +│ └── WeiboSentiment_MachineLearning/ # 传统机器学习方法
  169 +├── SingleEngineApp/ # 单独Agent的Streamlit应用
  170 +│ ├── query_engine_streamlit_app.py
  171 +│ ├── media_engine_streamlit_app.py
  172 +│ └── insight_engine_streamlit_app.py
  173 +├── templates/ # Flask模板
  174 +│ └── index.html # 主界面模板
  175 +├── static/ # 静态资源
  176 +├── logs/ # 运行日志目录
  177 +├── app.py # Flask主应用入口
  178 +├── config.py # 全局配置文件
  179 +└── requirements.txt # Python依赖包清单
197 ``` 180 ```
198 181
199 -## 快速开始 182 +## 🚀 快速开始
200 183
201 ### 环境要求 184 ### 环境要求
202 185
203 -- **操作系统**: Windows 10/11 186 +- **操作系统**: Windows 10/11(Linux/macOS也支持)
204 - **Python版本**: 3.11+ 187 - **Python版本**: 3.11+
205 - **Conda**: Anaconda或Miniconda 188 - **Conda**: Anaconda或Miniconda
206 -- **数据库**: MySQL 8.0+ 189 +- **数据库**: MySQL 8.0+(可选择我们的云数据库服务)
207 - **内存**: 建议8GB以上 190 - **内存**: 建议8GB以上
208 191
209 ### 1. 创建Conda环境 192 ### 1. 创建Conda环境
@@ -220,14 +203,14 @@ conda activate pytorch_python11 @@ -220,14 +203,14 @@ conda activate pytorch_python11
220 # 基础依赖安装 203 # 基础依赖安装
221 pip install -r requirements.txt 204 pip install -r requirements.txt
222 205
223 -# 如果需要情感分析功能,安装PyTorch(根据CUDA版本选择) 206 +# 如果需要本地情感分析功能,安装PyTorch
224 # CPU版本 207 # CPU版本
225 pip install torch torchvision torchaudio 208 pip install torch torchvision torchaudio
226 209
227 -# CUDA 11.8版本 210 +# CUDA 11.8版本(如有GPU)
228 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 211 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
229 212
230 -# 安装transformers(用于BERT/GPT模型) 213 +# 安装transformers等AI相关依赖
231 pip install transformers scikit-learn xgboost 214 pip install transformers scikit-learn xgboost
232 ``` 215 ```
233 216
@@ -272,16 +255,30 @@ BOCHA_Web_Search_API_KEY = "your_bocha_api_key" @@ -272,16 +255,30 @@ BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
272 GUIJI_QWEN3_API_KEY = "your_guiji_api_key" 255 GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
273 ``` 256 ```
274 257
275 -#### 4.2 初始化数据库 258 +#### 4.2 数据库初始化
276 259
  260 +**选择1:使用本地数据库**
277 ```bash 261 ```bash
  262 +# 本地MySQL数据库初始化
278 cd MindSpider 263 cd MindSpider
279 python schema/init_database.py 264 python schema/init_database.py
280 ``` 265 ```
281 266
  267 +**选择2:使用云数据库服务(推荐)**
  268 +
  269 +我们提供便捷的云数据库服务,包含日均10万+真实微博数据,目前推广期间**免费申请**
  270 +
  271 +- 真实微博数据,实时更新
  272 +- 预处理的情感标注数据
  273 +- 多维度标签分类
  274 +- 高可用云端服务
  275 +- 专业技术支持
  276 +
  277 +**联系我们申请免费云数据库访问:📧 670939375@qq.com**
  278 +
282 ### 5. 启动系统 279 ### 5. 启动系统
283 280
284 -#### 方式一:完整系统启动(推荐) 281 +#### 5.1 完整系统启动(推荐)
285 282
286 ```bash 283 ```bash
287 # 在项目根目录下,激活conda环境 284 # 在项目根目录下,激活conda环境
@@ -291,9 +288,9 @@ conda activate pytorch_python11 @@ -291,9 +288,9 @@ conda activate pytorch_python11
291 python app.py 288 python app.py
292 ``` 289 ```
293 290
294 -访问 http://localhost:5000 即可使用系统 291 +访问 http://localhost:5000 即可使用完整系统
295 292
296 -#### 方式二:单独启动某个Agent 293 +#### 5.2 单独启动某个Agent
297 294
298 ```bash 295 ```bash
299 # 启动QueryEngine 296 # 启动QueryEngine
@@ -306,147 +303,353 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502 @@ -306,147 +303,353 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
306 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501 303 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
307 ``` 304 ```
308 305
309 -## 使用指南 306 +#### 5.3 爬虫系统单独使用
310 307
311 -### 基础使用流程 308 +```bash
  309 +# 进入爬虫目录
  310 +cd MindSpider
312 311
313 -1. **启动系统**:运行 `python app.py`,系统会自动启动所有Agent 312 +# 项目初始化
  313 +python main.py --setup
314 314
315 -2. **输入查询**:在Web界面搜索框输入要分析的舆情关键词 315 +# 运行完整爬虫流程
  316 +python main.py --complete --date 2024-01-20
  317 +
  318 +# 仅运行话题提取
  319 +python main.py --broad-topic --date 2024-01-20
  320 +
  321 +# 仅运行深度爬取
  322 +python main.py --deep-sentiment --platforms xhs dy wb
  323 +```
316 324
317 -3. **Agent协作**  
318 - - QueryEngine:搜索新闻和官方报道,将关键发现发布到论坛  
319 - - MediaEngine:搜索多媒体内容,与其他Agent分享重要信息  
320 - - InsightEngine:分析微博数据和情感,在论坛中交流洞察  
321 - - ForumEngine:提供Agent间交流平台,汇总协作信息 325 +## 💾 数据库配置
322 326
323 -4. **查看结果**  
324 - - Agent论坛交流:查看Agent间的实时信息交换  
325 - - 分析报告:查看基于Agent协作的综合HTML报告 327 +### 本地数据库配置
326 328
327 -### 高级配置 329 +1. **安装MySQL 8.0+**
  330 +2. **创建数据库**
  331 + ```sql
  332 + CREATE DATABASE weibo_analysis CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  333 + ```
  334 +3. **运行初始化脚本**
  335 + ```bash
  336 + cd MindSpider
  337 + python schema/init_database.py
  338 + ```
328 339
329 -#### 配置爬虫系统 340 +### 自动爬取配置
  341 +
  342 +配置自动爬取任务,实现数据的持续更新:
330 343
331 -1. **配置爬虫参数**  
332 ```python 344 ```python
333 -# MindSpider/config.py 345 +# MindSpider/config.py 中配置爬虫参数
334 CRAWLER_CONFIG = { 346 CRAWLER_CONFIG = {
335 - 'max_pages': 100, # 最大爬取页数  
336 - 'delay': 1, # 请求延迟(秒)  
337 - 'timeout': 30, # 超时时间(秒)  
338 - 'use_proxy': False, # 是否使用代理 347 + 'max_pages': 200, # 最大爬取页数
  348 + 'delay': 1, # 请求延迟(秒)
  349 + 'timeout': 30, # 超时时间(秒)
  350 + 'platforms': ['xhs', 'dy', 'wb', 'bili'], # 爬取平台
  351 + 'daily_keywords': 100, # 每日关键词数量
  352 + 'max_notes_per_keyword': 50, # 每关键词最大内容数
  353 + 'use_proxy': False, # 是否使用代理
339 } 354 }
340 ``` 355 ```
341 356
342 -2. **运行爬虫**  
343 -```bash  
344 -cd MindSpider  
345 -python main.py --topic "话题关键词" --days 7  
346 -``` 357 +### 云数据库服务(推荐)
  358 +
  359 +**为什么选择我们的云数据库服务?**
  360 +
  361 +- **丰富数据源**:日均10万+真实微博数据,涵盖各行业热点话题
  362 +- **高质量标注**:专业团队人工标注的情感数据,准确率95%+
  363 +- **多维度分析**:包含话题分类、情感倾向、影响力评分等多维标签
  364 +- **实时更新**:24小时不间断数据采集,确保时效性
  365 +- **技术支持**:专业团队提供技术支持和定制化服务
347 366
348 -#### 配置情感分析模型 367 +**申请方式**
  368 +📧 邮件联系:670939375@qq.com
  369 +📝 邮件标题:申请微博舆情云数据库访问
  370 +📝 邮件内容:请说明您的使用场景和预期数据量需求
349 371
350 -1. **选择模型**  
351 - - BERT微调模型(精度高)  
352 - - GPT-2 LoRA(速度快)  
353 - - Qwen小模型(平衡型)  
354 - - 机器学习基线(轻量级) 372 +**推广期福利**
  373 +- 免费提供基础版云数据库访问
  374 +- 免费技术支持和部署指导
  375 +- 优先体验新功能特性
  376 +
  377 +## ⚙️ 高级配置
  378 +
  379 +### 修改关键参数
  380 +
  381 +#### Agent配置参数
  382 +
  383 +每个Agent都有专门的配置文件,可根据需求调整:
  384 +
  385 +```python
  386 +# QueryEngine/utils/config.py
  387 +class Config:
  388 + max_reflections = 2 # 反思轮次
  389 + max_search_results = 15 # 最大搜索结果数
  390 + max_content_length = 8000 # 最大内容长度
  391 +
  392 +# MediaEngine/utils/config.py
  393 +class Config:
  394 + comprehensive_search_limit = 10 # 综合搜索限制
  395 + web_search_limit = 15 # 网页搜索限制
  396 +
  397 +# InsightEngine/utils/config.py
  398 +class Config:
  399 + default_search_topic_globally_limit = 200 # 全局搜索限制
  400 + default_get_comments_limit = 500 # 评论获取限制
  401 + max_search_results_for_llm = 50 # 传给LLM的最大结果数
  402 +```
  403 +
  404 +#### 情感分析模型配置
355 405
356 -2. **模型切换**  
357 ```python 406 ```python
358 # InsightEngine/tools/sentiment_analyzer.py 407 # InsightEngine/tools/sentiment_analyzer.py
359 -MODEL_TYPE = "bert" # 可选: "bert", "gpt2", "qwen", "ml" 408 +SENTIMENT_CONFIG = {
  409 + 'model_type': 'multilingual', # 可选: 'bert', 'multilingual', 'qwen'
  410 + 'confidence_threshold': 0.8, # 置信度阈值
  411 + 'batch_size': 32, # 批处理大小
  412 + 'max_sequence_length': 512, # 最大序列长度
  413 +}
360 ``` 414 ```
361 415
362 -#### 自定义报告模板 416 +### 接入不同的LLM模型
363 417
364 -`ReportEngine/report_template/` 目录下创建新模板 418 +系统支持多种LLM提供商,可在各Agent的配置中切换
365 419
366 -```markdown  
367 -# 自定义报告模板  
368 -## 舆情概览  
369 -${overview} 420 +```python
  421 +# 在各Engine的utils/config.py中配置
  422 +class Config:
  423 + default_llm_provider = "deepseek" # 可选: "deepseek", "openai", "kimi", "gemini"
  424 +
  425 + # DeepSeek配置
  426 + deepseek_api_key = "your_api_key"
  427 + deepseek_model = "deepseek-chat"
  428 +
  429 + # OpenAI兼容配置
  430 + openai_api_key = "your_api_key"
  431 + openai_model = "gpt-3.5-turbo"
  432 + openai_base_url = "https://api.openai.com/v1"
  433 +
  434 + # Kimi配置
  435 + kimi_api_key = "your_api_key"
  436 + kimi_model = "moonshot-v1-8k"
  437 +
  438 + # Gemini配置
  439 + gemini_api_key = "your_api_key"
  440 + gemini_model = "gemini-pro"
  441 +```
  442 +
  443 +### 更改情感分析模型
370 444
371 -## 情感分析  
372 -${sentiment_analysis} 445 +系统集成了多种情感分析方法,可根据需求选择:
373 446
374 -## 关键观点  
375 -${key_insights} 447 +#### 1. 基于BERT的微调模型(精度最高)
376 448
377 -## 趋势预测  
378 -${trend_prediction} 449 +```bash
  450 +# 使用BERT中文模型
  451 +cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
  452 +python predict.py --text "这个产品真的很不错"
379 ``` 453 ```
380 454
381 -### 监控与日志 455 +#### 2. GPT-2 LoRA微调模型(速度较快)
382 456
383 -#### 查看系统日志 457 +```bash
  458 +cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
  459 +python predict.py --text "今天心情不太好"
  460 +```
384 461
385 -所有日志文件位于 `logs/` 目录:  
386 -- `query.log`: QueryEngine运行日志  
387 -- `media.log`: MediaEngine运行日志  
388 -- `insight.log`: InsightEngine运行日志  
389 -- `forum.log`: ForumEngine论坛交流日志  
390 -- `report.log`: ReportEngine生成日志 462 +#### 3. 小型Qwen模型(平衡型)
391 463
392 -#### Agent论坛交流 464 +```bash
  465 +cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
  466 +python predict_universal.py --text "这次活动办得很成功"
  467 +```
393 468
394 -ForumEngine提供多Agent协作交流功能:  
395 -1. Agent行动前读取论坛交流信息  
396 -2. Agent思考后决定是否分享关键发现  
397 -3. 汇总所有Agent的交流信息  
398 -4. 为ReportEngine提供协作数据基础 469 +#### 4. 传统机器学习方法(轻量级)
399 470
400 -## 故障排除 471 +```bash
  472 +cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
  473 +python predict.py --model_type "svm" --text "服务态度需要改进"
  474 +```
401 475
402 -### 常见问题 476 +#### 5. 多语言情感分析(支持22种语言)
403 477
404 -#### 1. 端口占用  
405 ```bash 478 ```bash
406 -# 查看端口占用(Windows)  
407 -netstat -ano | findstr :5000  
408 -netstat -ano | findstr :8501 479 +cd SentimentAnalysisModel/WeiboMultilingualSentiment
  480 +python predict.py --text "This product is amazing!" --lang "en"
  481 +```
  482 +
  483 +### 接入自定义业务数据库
  484 +
  485 +#### 1. 修改数据库连接配置
409 486
410 -# 结束占用进程  
411 -taskkill /F /PID <进程ID> 487 +```python
  488 +# config.py 中添加您的业务数据库配置
  489 +BUSINESS_DB_HOST = "your_business_db_host"
  490 +BUSINESS_DB_PORT = 3306
  491 +BUSINESS_DB_USER = "your_business_user"
  492 +BUSINESS_DB_PASSWORD = "your_business_password"
  493 +BUSINESS_DB_NAME = "your_business_database"
412 ``` 494 ```
413 495
414 -#### 2. 编码问题 496 +#### 2. 创建自定义数据访问工具
  497 +
415 ```python 498 ```python
416 -# 在代码开头添加  
417 -import sys  
418 -import os  
419 -os.environ['PYTHONIOENCODING'] = 'utf-8'  
420 -os.environ['PYTHONUTF8'] = '1' 499 +# InsightEngine/tools/custom_db_tool.py
  500 +class CustomBusinessDBTool:
  501 + """自定义业务数据库查询工具"""
  502 +
  503 + def __init__(self):
  504 + self.connection_config = {
  505 + 'host': config.BUSINESS_DB_HOST,
  506 + 'port': config.BUSINESS_DB_PORT,
  507 + 'user': config.BUSINESS_DB_USER,
  508 + 'password': config.BUSINESS_DB_PASSWORD,
  509 + 'database': config.BUSINESS_DB_NAME,
  510 + }
  511 +
  512 + def search_business_data(self, query: str, table: str):
  513 + """查询业务数据"""
  514 + # 实现您的业务逻辑
  515 + pass
  516 +
  517 + def get_customer_feedback(self, product_id: str):
  518 + """获取客户反馈数据"""
  519 + # 实现客户反馈查询逻辑
  520 + pass
421 ``` 521 ```
422 522
423 -#### 3. Playwright安装失败  
424 -```bash  
425 -# 手动安装  
426 -python -m playwright install chromium --with-deps 523 +#### 3. 集成到InsightEngine
  524 +
  525 +```python
  526 +# InsightEngine/agent.py 中集成自定义工具
  527 +from .tools.custom_db_tool import CustomBusinessDBTool
  528 +
  529 +class DeepSearchAgent:
  530 + def __init__(self, config=None):
  531 + # ... 其他初始化代码
  532 + self.custom_db_tool = CustomBusinessDBTool()
  533 +
  534 + def execute_custom_search(self, query: str):
  535 + """执行自定义业务数据搜索"""
  536 + return self.custom_db_tool.search_business_data(query, "your_table")
427 ``` 537 ```
428 538
429 -#### 4. MySQL连接失败  
430 -- 检查MySQL服务是否启动  
431 -- 确认用户权限配置  
432 -- 检查防火墙设置 539 +### 自定义报告模板
  540 +
  541 +#### 1. 创建模板文件
  542 +
  543 +`ReportEngine/report_template/` 目录下创建新的Markdown模板:
  544 +
  545 +```markdown
  546 +<!-- 企业品牌监测报告.md -->
  547 +# 企业品牌舆情监测报告
  548 +
  549 +## 📊 执行摘要
  550 +{executive_summary}
  551 +
  552 +## 🔍 品牌提及分析
  553 +### 提及量趋势
  554 +{mention_trend}
433 555
434 -## 贡献指南 556 +### 情感分布
  557 +{sentiment_distribution}
  558 +
  559 +## 📈 竞品对比分析
  560 +{competitor_analysis}
  561 +
  562 +## 🎯 关键观点摘要
  563 +{key_insights}
  564 +
  565 +## ⚠️ 风险预警
  566 +{risk_alerts}
  567 +
  568 +## 📋 改进建议
  569 +{recommendations}
  570 +
  571 +---
  572 +*报告类型:企业品牌舆情监测*
  573 +*生成时间:{generation_time}*
  574 +*数据来源:{data_sources}*
  575 +```
  576 +
  577 +#### 2. 在Web界面中使用
  578 +
  579 +系统支持上传自定义模板文件(.md或.txt格式),可在生成报告时选择使用。
  580 +
  581 +## 🤝 贡献指南
435 582
436 我们欢迎所有形式的贡献! 583 我们欢迎所有形式的贡献!
437 584
438 -1. Fork项目  
439 -2. 创建Feature分支 (`git checkout -b feature/AmazingFeature`)  
440 -3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)  
441 -4. 推送到分支 (`git push origin feature/AmazingFeature`)  
442 -5. 开启Pull Request 585 +### 如何贡献
  586 +
  587 +1. **Fork项目**到您的GitHub账号
  588 +2. **创建Feature分支**`git checkout -b feature/AmazingFeature`
  589 +3. **提交更改**`git commit -m 'Add some AmazingFeature'`
  590 +4. **推送到分支**`git push origin feature/AmazingFeature`
  591 +5. **开启Pull Request**
  592 +
  593 +### 贡献类型
  594 +
  595 +- 🐛 Bug修复
  596 +- ✨ 新功能开发
  597 +- 📚 文档完善
  598 +- 🎨 UI/UX改进
  599 +- ⚡ 性能优化
  600 +- 🧪 测试用例添加
  601 +
  602 +### 开发规范
  603 +
  604 +- 代码遵循PEP8规范
  605 +- 提交信息使用清晰的中英文描述
  606 +- 新功能需要包含相应的测试用例
  607 +- 更新相关文档
  608 +
  609 +## 📄 许可证
  610 +
  611 +本项目采用 [MIT许可证](LICENSE)。详细信息请参阅LICENSE文件。
  612 +
  613 +## 🎉 支持与联系
  614 +
  615 +### 获取帮助
  616 +
  617 +- **项目主页**[GitHub仓库](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
  618 +- **问题反馈**[Issues页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
  619 +- **功能建议**[Discussions页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions)
443 620
444 -## 许可证 621 +### 联系方式
445 622
446 -本项目采用 MIT 许可证。详见 [LICENSE](LICENSE) 文件。 623 +- 📧 **邮箱**:670939375@qq.com
  624 +- 💬 **QQ群**[加入技术交流群]
  625 +- 🐦 **微信**[扫码添加技术支持]
447 626
448 -## 联系我们 627 +### 商务合作
449 628
450 -- 项目地址:[https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)  
451 -- 邮箱:670939375@qq.com  
452 -- Issues:[项目Issues](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues) 629 +- 🏢 **企业定制开发**
  630 +- 📊 **大数据服务**
  631 +- 🎓 **学术合作**
  632 +- 💼 **技术培训**
  633 +
  634 +### 云服务申请
  635 +
  636 +**免费云数据库服务申请**
  637 +📧 发送邮件至:670939375@qq.com
  638 +📝 标题:微博舆情云数据库申请
  639 +📝 说明:您的使用场景和需求
  640 +
  641 +## 👥 贡献者
  642 +
  643 +感谢以下优秀的贡献者们:
  644 +
  645 +[![Contributors](https://contrib.rocks/image?repo=666ghj/Weibo_PublicOpinion_AnalysisSystem)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)
  646 +
  647 +---
  648 +
  649 +<div align="center">
  650 +
  651 +**⭐ 如果这个项目对您有帮助,请给我们一个星标!**
  652 +
  653 +Made with ❤️ by [微博舆情分析团队](https://github.com/666ghj)
  654 +
  655 +</div>

90.2 KB | W: | H:

60.4 KB | W: | H:

  • 2-up
  • Swipe
  • Onion skin