README-EN.md 22.3 KB

Raw Blame History Permalink



# 📊 Weibo Public Opinion Multi-Agent Analysis System


[![GitHub Stars](https://img.shields.io/github/stars/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
[![GitHub Forks](https://img.shields.io/github/forks/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
[![GitHub Issues](https://img.shields.io/github/issues/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
[![GitHub License](https://img.shields.io/github/license/666ghj/Weibo_PublicOpinion_AnalysisSystem?style=flat-square)](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)

[English](./README-EN.md) | [中文文档](./README.md)


📝 Project Overview

Weibo Public Opinion Multi-Agent Analysis System is an innovative public opinion analysis platform built from scratch, utilizing multi-agent collaborative architecture to provide accurate, real-time, and comprehensive Weibo public opinion monitoring and analysis services. The system achieves full-process automation from data collection and sentiment analysis to report generation through the collaboration of five specialized AI agents.


🚀 Key Features


Multi-Agent Collaborative Architecture: 5 specialized agents working together to complete the full process of public opinion analysis

Comprehensive Data Collection: Integrating Weibo crawlers, news search, multimedia content, and other multi-dimensional data sources

Deep Sentiment Analysis: Precise multilingual sentiment recognition based on fine-tuned BERT/GPT-2/Qwen models

Intelligent Report Generation: Automatically generate structured HTML analysis reports with custom template support

Agent Forum Communication: ForumEngine provides information sharing and collaborative decision-making platform for agents

High-Performance Asynchronous Processing: Support concurrent processing of multiple public opinion tasks with real-time status monitoring

Cloud Data Support: Convenient cloud database service with 100,000+ daily real data


🏗️ System Architecture


Overall Architecture Diagram

graph TB
    subgraph "Frontend Display Layer"
        UI[Web Interface<br/>Flask + Streamlit]
    end

    subgraph "Multi-Agent Collaboration Layer"
        QE[QueryEngine<br/>News Search Agent]
        ME[MediaEngine<br/>Multimedia Search Agent]
        IE[InsightEngine<br/>Deep Insight Agent]
        RE[ReportEngine<br/>Report Generation Agent]
        Forum[ForumEngine<br/>Agent Forum Communication Center]
    end

    subgraph "Data Processing Layer"
        MS[MindSpider<br/>Weibo Crawler System]
        SA[SentimentAnalysis<br/>Sentiment Analysis Model Collection]
        DB[(MySQL<br/>Database)]
    end

    subgraph "External Service Layer"
        LLM[LLM API<br/>DeepSeek/Kimi/Gemini]
        Search[Search API<br/>Tavily/Bocha]
    end

    UI --> QE
    UI --> ME
    UI --> IE
    UI --> RE

    QE --> Search
    ME --> Search
    IE --> MS
    IE --> SA

    QE --> LLM
    ME --> LLM
    IE --> LLM
    RE --> LLM

    MS --> DB
    SA --> DB

    %% Agent Forum Communication Mechanism
    QE <--> Forum
    ME <--> Forum
    IE <--> Forum
    RE <--> Forum


Agent Collaboration Workflow

The system's core workflow is based on multi-agent collaboration:


QueryEngine (News Query Agent): Uses Tavily API to search authoritative news reports, providing official information sources

MediaEngine (Multimedia Search Agent): Conducts multimodal content search through Bocha API to gather social media perspectives

InsightEngine (Deep Insight Agent): Queries local Weibo database, combines multiple sentiment analysis models for deep analysis

ForumEngine (Forum Monitoring Agent): Real-time monitoring of agent log outputs, extracts key information and promotes collaboration

ReportEngine (Report Generation Agent): Based on analysis results from all agents, uses Gemini LLM to generate comprehensive HTML reports


Project Code Structure

Weibo_PublicOpinion_AnalysisSystem/
├── QueryEngine/                    # News Query Engine Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interface wrapper
│   ├── nodes/                     # Processing nodes
│   ├── tools/                     # Search tools
│   └── utils/                     # Utility functions
├── MediaEngine/                    # Multimedia Search Engine Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interfaces
│   ├── tools/                     # Search tools
│   └── ...                       # Other modules
├── InsightEngine/                 # Data Insight Engine Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interface wrapper
│   │   ├── deepseek.py           # DeepSeek API
│   │   ├── kimi.py               # Kimi API
│   │   ├── openai_llm.py         # OpenAI format API
│   │   └── base.py               # LLM base class
│   ├── nodes/                     # Processing nodes
│   │   ├── first_search_node.py  # First search node
│   │   ├── reflection_node.py    # Reflection node
│   │   ├── summary_nodes.py      # Summary nodes
│   │   ├── search_node.py        # Search node
│   │   ├── sentiment_node.py     # Sentiment analysis node
│   │   └── insight_node.py       # Insight generation node
│   ├── tools/                     # Database query and analysis tools
│   │   ├── media_crawler_db.py   # Database query tool
│   │   └── sentiment_analyzer.py # Sentiment analysis integration tool
│   ├── state/                     # State management
│   │   ├── __init__.py
│   │   └── state.py              # Agent state definition
│   ├── prompts/                   # Prompt templates
│   │   ├── __init__.py
│   │   └── prompts.py            # Various prompts
│   └── utils/                     # Utility functions
│       ├── __init__.py
│       ├── config.py             # Configuration management
│       └── helpers.py            # Helper functions
├── ReportEngine/                  # Report Generation Engine Agent
│   ├── agent.py                   # Agent main logic
│   ├── llms/                      # LLM interfaces
│   │   └── gemini.py             # Gemini API dedicated
│   ├── nodes/                     # Report generation nodes
│   │   ├── template_selection.py # Template selection node
│   │   └── html_generation.py    # HTML generation node
│   ├── report_template/           # Report template library
│   │   ├── 社会公共热点事件分析.md
│   │   ├── 商业品牌舆情监测.md
│   │   └── ...                   # More templates
│   └── flask_interface.py         # Flask API interface
├── ForumEngine/                   # Forum Communication Engine Agent
│   └── monitor.py                 # Log monitoring and forum management
├── MindSpider/                    # Weibo Crawler System
│   ├── main.py                    # Crawler main program
│   ├── BroadTopicExtraction/      # Topic extraction module
│   │   ├── get_today_news.py     # Today's news fetching
│   │   └── topic_extractor.py    # Topic extractor
│   ├── DeepSentimentCrawling/     # Deep sentiment crawling
│   │   ├── MediaCrawler/         # Media crawler core
│   │   └── platform_crawler.py  # Platform crawler management
│   └── schema/                    # Database schema
│       └── init_database.py      # Database initialization
├── SentimentAnalysisModel/        # Sentiment Analysis Model Collection
│   ├── WeiboSentiment_Finetuned/  # Fine-tuned BERT/GPT-2 models
│   ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis
│   ├── WeiboSentiment_SmallQwen/   # Small Qwen model
│   └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
├── SingleEngineApp/               # Individual Agent Streamlit apps
│   ├── query_engine_streamlit_app.py
│   ├── media_engine_streamlit_app.py
│   └── insight_engine_streamlit_app.py
├── templates/                     # Flask templates
│   └── index.html                # Main interface template
├── static/                        # Static resources
├── logs/                         # Runtime log directory
├── app.py                        # Flask main application entry
├── config.py                     # Global configuration file
└── requirements.txt              # Python dependency list


🚀 Quick Start


System Requirements


Operating System: Windows 10/11 (Linux/macOS also supported)

Python Version: 3.11+

Conda: Anaconda or Miniconda

Database: MySQL 8.0+ (or choose our cloud database service)

Memory: 8GB+ recommended


1. Create Conda Environment

# Create conda environment named pytorch_python11
conda create -n pytorch_python11 python=3.11
conda activate pytorch_python11


2. Install Dependencies

# Install basic dependencies
pip install -r requirements.txt

# If you need local sentiment analysis functionality, install PyTorch
# CPU version
pip install torch torchvision torchaudio

# CUDA 11.8 version (if you have GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install transformers and other AI-related dependencies
pip install transformers scikit-learn xgboost


3. Install Playwright Browser Drivers

# Install browser drivers (for crawler functionality)
playwright install chromium


4. System Configuration


4.1 Configure API Keys

Edit the config.py file and fill in your API keys:

# MySQL Database Configuration
DB_HOST = "localhost"
DB_PORT = 3306
DB_USER = "your_username"
DB_PASSWORD = "your_password"
DB_NAME = "weibo_analysis"
DB_CHARSET = "utf8mb4"

# DeepSeek API (Apply at: https://www.deepseek.com/)
DEEPSEEK_API_KEY = "your_deepseek_api_key"

# Tavily Search API (Apply at: https://www.tavily.com/)
TAVILY_API_KEY = "your_tavily_api_key"

# Kimi API (Apply at: https://www.kimi.com/)
KIMI_API_KEY = "your_kimi_api_key"

# Gemini API (Apply at: https://api.chataiapi.com/)
GEMINI_API_KEY = "your_gemini_api_key"

# Bocha Search API (Apply at: https://open.bochaai.com/)
BOCHA_Web_Search_API_KEY = "your_bocha_api_key"

# Silicon Flow API (Apply at: https://siliconflow.cn/)
GUIJI_QWEN3_API_KEY = "your_guiji_api_key"


4.2 Database Initialization

Option 1: Use Local Database

# Local MySQL database initialization
cd MindSpider
python schema/init_database.py


Option 2: Use Cloud Database Service (Recommended)

We provide convenient cloud database service with 100,000+ daily real Weibo data, currently free application during the promotion period!


Real Weibo data, updated in real-time
Pre-processed sentiment annotation data
Multi-dimensional tag classification
High-availability cloud service
Professional technical support


Contact us to apply for free cloud database access: 📧 670939375@qq.com


5. Launch System


5.1 Complete System Launch (Recommended)

# In project root directory, activate conda environment
conda activate pytorch_python11

# Start main application (automatically starts all agents)
python app.py


Visit http://localhost:5000 to use the complete system


5.2 Launch Individual Agents

# Start QueryEngine
streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503

# Start MediaEngine  
streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502

# Start InsightEngine
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501


5.3 Standalone Crawler System

# Enter crawler directory
cd MindSpider

# Project initialization
python main.py --setup

# Run complete crawler workflow
python main.py --complete --date 2024-01-20

# Run topic extraction only
python main.py --broad-topic --date 2024-01-20

# Run deep crawling only
python main.py --deep-sentiment --platforms xhs dy wb


💾 Database Configuration


Local Database Configuration


Install MySQL 8.0+

Create Database:
sql
CREATE DATABASE weibo_analysis CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;


Run Initialization Script:
bash
cd MindSpider
python schema/init_database.py


Auto-Crawling Configuration

Configure automatic crawling tasks for continuous data updates:

# Configure crawler parameters in MindSpider/config.py
CRAWLER_CONFIG = {
    'max_pages': 200,         # Maximum pages to crawl
    'delay': 1,               # Request delay (seconds)
    'timeout': 30,            # Timeout (seconds)
    'platforms': ['xhs', 'dy', 'wb', 'bili'],  # Crawling platforms
    'daily_keywords': 100,    # Daily keywords count
    'max_notes_per_keyword': 50,  # Max content per keyword
    'use_proxy': False,       # Whether to use proxy
}


Cloud Database Service (Recommended)

Why Choose Our Cloud Database Service?


Rich Data Sources: 100,000+ daily real Weibo data covering hot topics across all industries

High-Quality Annotations: Professional team manually annotated sentiment data with 95%+ accuracy

Multi-Dimensional Analysis: Including topic classification, sentiment tendency, influence scoring and other multi-dimensional tags

Real-Time Updates: 24/7 continuous data collection ensuring timeliness

Technical Support: Professional team providing technical support and customization services


Application Method:
📧 Email Contact: 670939375@qq.com
📝 Email Subject: Apply for Weibo Public Opinion Cloud Database Access
📝 Email Content: Please describe your use case and expected data volume requirements

Promotion Period Benefits:


Free basic cloud database access
Free technical support and deployment guidance
Priority access to new features


⚙️ Advanced Configuration


Modify Key Parameters


Agent Configuration Parameters

Each agent has dedicated configuration files that can be adjusted according to needs:

# QueryEngine/utils/config.py
class Config:
    max_reflections = 2           # Reflection rounds
    max_search_results = 15       # Maximum search results
    max_content_length = 8000     # Maximum content length

# MediaEngine/utils/config.py  
class Config:
    comprehensive_search_limit = 10  # Comprehensive search limit
    web_search_limit = 15           # Web search limit

# InsightEngine/utils/config.py
class Config:
    default_search_topic_globally_limit = 200    # Global search limit
    default_get_comments_limit = 500             # Comment retrieval limit
    max_search_results_for_llm = 50              # Max results for LLM


Sentiment Analysis Model Configuration

# InsightEngine/tools/sentiment_analyzer.py
SENTIMENT_CONFIG = {
    'model_type': 'multilingual',     # Options: 'bert', 'multilingual', 'qwen'
    'confidence_threshold': 0.8,      # Confidence threshold
    'batch_size': 32,                 # Batch size
    'max_sequence_length': 512,       # Max sequence length
}


Integrate Different LLM Models

The system supports multiple LLM providers, switchable in each agent's configuration:

# Configure in each Engine's utils/config.py
class Config:
    default_llm_provider = "deepseek"  # Options: "deepseek", "openai", "kimi", "gemini"

    # DeepSeek configuration
    deepseek_api_key = "your_api_key"
    deepseek_model = "deepseek-chat"

    # OpenAI compatible configuration
    openai_api_key = "your_api_key"
    openai_model = "gpt-3.5-turbo"
    openai_base_url = "https://api.openai.com/v1"

    # Kimi configuration
    kimi_api_key = "your_api_key"  
    kimi_model = "moonshot-v1-8k"

    # Gemini configuration
    gemini_api_key = "your_api_key"
    gemini_model = "gemini-pro"


Change Sentiment Analysis Models

The system integrates multiple sentiment analysis methods, selectable based on needs:


1. BERT-based Fine-tuned Model (Highest Accuracy)

# Use BERT Chinese model
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
python predict.py --text "This product is really great"


2. GPT-2 LoRA Fine-tuned Model (Faster Speed)

cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
python predict.py --text "I'm not feeling great today"


3. Small Qwen Model (Balanced)

cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
python predict_universal.py --text "This event was very successful"


4. Traditional Machine Learning Methods (Lightweight)

cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
python predict.py --model_type "svm" --text "Service attitude needs improvement"


5. Multilingual Sentiment Analysis (Supports 22 Languages)

cd SentimentAnalysisModel/WeiboMultilingualSentiment
python predict.py --text "This product is amazing!" --lang "en"


Integrate Custom Business Database


1. Modify Database Connection Configuration

# Add your business database configuration in config.py
BUSINESS_DB_HOST = "your_business_db_host"
BUSINESS_DB_PORT = 3306
BUSINESS_DB_USER = "your_business_user"
BUSINESS_DB_PASSWORD = "your_business_password"
BUSINESS_DB_NAME = "your_business_database"


2. Create Custom Data Access Tools

# InsightEngine/tools/custom_db_tool.py
class CustomBusinessDBTool:
    """Custom business database query tool"""

    def __init__(self):
        self.connection_config = {
            'host': config.BUSINESS_DB_HOST,
            'port': config.BUSINESS_DB_PORT,
            'user': config.BUSINESS_DB_USER,
            'password': config.BUSINESS_DB_PASSWORD,
            'database': config.BUSINESS_DB_NAME,
        }

    def search_business_data(self, query: str, table: str):
        """Query business data"""
        # Implement your business logic
        pass

    def get_customer_feedback(self, product_id: str):
        """Get customer feedback data"""
        # Implement customer feedback query logic
        pass


3. Integrate into InsightEngine

# Integrate custom tools in InsightEngine/agent.py
from .tools.custom_db_tool import CustomBusinessDBTool

class DeepSearchAgent:
    def __init__(self, config=None):
        # ... other initialization code
        self.custom_db_tool = CustomBusinessDBTool()

    def execute_custom_search(self, query: str):
        """Execute custom business data search"""
        return self.custom_db_tool.search_business_data(query, "your_table")


Custom Report Templates


1. Create Template Files

Create new Markdown templates in the ReportEngine/report_template/ directory:

<!-- Enterprise Brand Monitoring Report.md -->
# Enterprise Brand Public Opinion Monitoring Report

## 📊 Executive Summary
{executive_summary}

## 🔍 Brand Mention Analysis
### Mention Volume Trends
{mention_trend}

### Sentiment Distribution
{sentiment_distribution}

## 📈 Competitor Analysis
{competitor_analysis}

## 🎯 Key Insights Summary
{key_insights}

## ⚠️ Risk Alerts
{risk_alerts}

## 📋 Improvement Recommendations
{recommendations}

---
*Report Type: Enterprise Brand Public Opinion Monitoring*  
*Generation Time: {generation_time}*  
*Data Sources: {data_sources}*


2. Use in Web Interface

The system supports uploading custom template files (.md or .txt format), selectable when generating reports.


🤝 Contributing Guide

We welcome all forms of contributions!


How to Contribute


Fork the project to your GitHub account

Create Feature branch: git checkout -b feature/AmazingFeature


Commit changes: git commit -m 'Add some AmazingFeature'


Push to branch: git push origin feature/AmazingFeature

Open Pull Request


Contribution Types


🐛 Bug fixes
✨ New feature development
📚 Documentation improvements
🎨 UI/UX improvements
⚡ Performance optimization
🧪 Test case additions


Development Standards


Code follows PEP8 standards
Commit messages use clear Chinese/English descriptions
New features need corresponding test cases
Update related documentation


📄 License

This project is licensed under the MIT License. Please see the LICENSE file for details.


🎉 Support & Contact


Get Help


Project Homepage: GitHub Repository


Issue Reporting: Issues Page


Feature Requests: Discussions Page


Contact Information


📧 Email: 670939375@qq.com
💬 QQ Group: [Join Technical Discussion Group]
🐦 WeChat: [Scan QR Code for Technical Support]


Business Cooperation


🏢 Enterprise Custom Development

📊 Big Data Services

🎓 Academic Collaboration

💼 Technical Training


Cloud Service Application

Free Cloud Database Service Application:
📧 Send email to: 670939375@qq.com

📝 Subject: Weibo Public Opinion Cloud Database Application

📝 Description: Your use case and requirements  


👥 Contributors

Thanks to these excellent contributors:


**⭐ If this project helps you, please give us a star!**

Made with ❤️ by [Weibo Public Opinion Analysis Team](https://github.com/666ghj)