Doiiars
Committed by GitHub

Merge pull request #157 from luojiyin1987/feature/add-docker-documentation

Feature/add docker documentation
@@ -301,9 +301,170 @@ We provide convenient cloud database service with 100,000+ daily real public opi @@ -301,9 +301,170 @@ We provide convenient cloud database service with 100,000+ daily real public opi
301 301
302 > To conduct a data compliance review and service upgrade, we are suspending new applications for the cloud database, effective October 1, 2025. 302 > To conduct a data compliance review and service upgrade, we are suspending new applications for the cloud database, effective October 1, 2025.
303 303
304 -### 5. Launch System 304 +### 5. Docker Deployment (Recommended)
305 305
306 -#### 5.1 Complete System Launch (Recommended) 306 +The project provides complete Docker support, including application and database services, for easy deployment and environment isolation.
  307 +
  308 +#### 5.1 Docker Requirements
  309 +
  310 +- **Docker**: 20.10+
  311 +- **Docker Compose**: 2.0+
  312 +- **Available Memory**: 4GB+ recommended
  313 +- **Available Disk Space**: 10GB+ recommended
  314 +
  315 +#### 5.2 Docker Quick Start
  316 +
  317 +1. **Clone project and enter directory**
  318 +```bash
  319 +git clone https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem.git
  320 +cd Weibo_PublicOpinion_AnalysisSystem
  321 +```
  322 +
  323 +2. **Configure environment variables**
  324 +```bash
  325 +# Copy environment variable template
  326 +cp .env.example .env
  327 +
  328 +# Edit environment variable file and fill in required configurations
  329 +vim .env
  330 +```
  331 +
  332 +> **Note:** The application reads database settings from `.env`. Keep `DB_DIALECT=postgresql` when using the bundled PostgreSQL service; change it only if you switch to another database engine.
  333 +
  334 +**Important environment variable configuration**:
  335 +```bash
  336 +# LLM API configuration (required)
  337 +INSIGHT_ENGINE_API_KEY="your_api_key"
  338 +INSIGHT_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  339 +INSIGHT_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  340 +
  341 +# Media Agent configuration
  342 +MEDIA_ENGINE_API_KEY="your_api_key"
  343 +MEDIA_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  344 +MEDIA_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  345 +
  346 +# Query Agent configuration
  347 +QUERY_ENGINE_API_KEY="your_api_key"
  348 +QUERY_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  349 +QUERY_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  350 +
  351 +# Report Agent configuration
  352 +REPORT_ENGINE_API_KEY="your_api_key"
  353 +REPORT_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  354 +REPORT_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  355 +
  356 +# Database configuration (using built-in Docker PostgreSQL)
  357 +POSTGRES_USER=bettafish
  358 +POSTGRES_PASSWORD=bettafish
  359 +POSTGRES_DB=bettafish
  360 +POSTGRES_PORT=5444
  361 +```
  362 +
  363 +3. **Start Docker services**
  364 +```bash
  365 +# Build and start all services
  366 +docker-compose up -d
  367 +
  368 +# Check service status
  369 +docker-compose ps
  370 +
  371 +# View logs
  372 +docker-compose logs -f bettafish
  373 +```
  374 +
  375 +4. **Access applications**
  376 +- **Main Application**: http://localhost:5000
  377 +- **Insight Engine**: http://localhost:8501
  378 +- **Media Engine**: http://localhost:8502
  379 +- **Query Engine**: http://localhost:8503
  380 +
  381 +#### 5.3 Docker Management Commands
  382 +
  383 +```bash
  384 +# Start all services
  385 +docker-compose up -d
  386 +
  387 +# Stop all services
  388 +docker-compose down
  389 +
  390 +# Stop and delete all data (use with caution)
  391 +docker-compose down -v
  392 +
  393 +# Rebuild and start
  394 +docker-compose up --build -d
  395 +
  396 +# View real-time logs
  397 +docker-compose logs -f
  398 +
  399 +# View specific service logs
  400 +docker-compose logs -f bettafish
  401 +docker-compose logs -f db
  402 +
  403 +# Enter container
  404 +docker-compose exec bettafish bash
  405 +
  406 +# Backup database
  407 +docker-compose exec db pg_dump -U bettafish bettafish > backup.sql
  408 +
  409 +# Restore database
  410 +docker-compose exec -T db psql -U bettafish bettafish < backup.sql
  411 +```
  412 +
  413 +#### 5.4 Docker Data Persistence
  414 +
  415 +The project configures the following data volumes:
  416 +- `./logs`: Application log files
  417 +- `./final_reports`: Generated analysis reports
  418 +- `./insight_engine_streamlit_reports`: Insight Engine reports
  419 +- `./media_engine_streamlit_reports`: Media Engine reports
  420 +- `./query_engine_streamlit_reports`: Query Engine reports
  421 +- `./db_data`: PostgreSQL database data
  422 +
  423 +#### 5.5 Docker Troubleshooting
  424 +
  425 +**Common issues and solutions**:
  426 +
  427 +1. **Port conflicts**
  428 +```bash
  429 +# Check port usage
  430 +netstat -tulpn | grep :5000
  431 +# Or modify port mapping in docker-compose.yml
  432 +```
  433 +
  434 +2. **Insufficient memory**
  435 +```bash
  436 +# Increase Docker memory limits
  437 +# Adjust resource allocation in Docker Desktop
  438 +```
  439 +
  440 +3. **Permission issues**
  441 +```bash
  442 +# Ensure scripts have execute permissions
  443 +chmod +x scripts/*.sh
  444 +
  445 +# Ensure data directory permissions are correct
  446 +sudo chown -R $USER:$USER ./
  447 +```
  448 +
  449 +4. **Build failures**
  450 +```bash
  451 +# Clear Docker cache and rebuild
  452 +docker system prune -a
  453 +docker-compose build --no-cache
  454 +```
  455 +
  456 +5. **Service won't start**
  457 +```bash
  458 +# Check logs to troubleshoot
  459 +docker-compose logs bettafish
  460 +
  461 +# Check environment variable configuration
  462 +docker-compose config
  463 +```
  464 +
  465 +### 6. Traditional Deployment
  466 +
  467 +#### 6.1 Complete System Launch (Recommended)
307 468
308 ```bash 469 ```bash
309 # In project root directory, activate conda environment 470 # In project root directory, activate conda environment
@@ -324,13 +485,13 @@ python app.py @@ -324,13 +485,13 @@ python app.py
324 485
325 > Note 1: After a run is terminated, the Streamlit app might not shut down correctly and may still be occupying the port. If this occurs, find the process that is holding the port and kill it. 486 > Note 1: After a run is terminated, the Streamlit app might not shut down correctly and may still be occupying the port. If this occurs, find the process that is holding the port and kill it.
326 487
327 -> Note 2: Data scraping needs to be performed as a separate operation. Please refer to the instructions in section 5.3. 488 +> Note 2: Data scraping needs to be performed as a separate operation. Please refer to the instructions in section 6.3.
328 489
329 > Note 3: If page display issues occur during remote server deployment, see [PR#45](https://github.com/666ghj/BettaFish/pull/45) 490 > Note 3: If page display issues occur during remote server deployment, see [PR#45](https://github.com/666ghj/BettaFish/pull/45)
330 491
331 Visit http://localhost:5000 to use the complete system 492 Visit http://localhost:5000 to use the complete system
332 493
333 -#### 5.2 Launch Individual Agents 494 +#### 6.2 Launch Individual Agents
334 495
335 ```bash 496 ```bash
336 # Start QueryEngine 497 # Start QueryEngine
@@ -343,7 +504,7 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502 @@ -343,7 +504,7 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
343 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501 504 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
344 ``` 505 ```
345 506
346 -#### 5.3 Crawler System Standalone Use 507 +#### 6.3 Crawler System Standalone Use
347 508
348 This section has detailed configuration documentation: [MindSpider Usage Guide](./MindSpider/README.md) 509 This section has detailed configuration documentation: [MindSpider Usage Guide](./MindSpider/README.md)
349 510
@@ -300,9 +300,170 @@ python main.py --setup @@ -300,9 +300,170 @@ python main.py --setup
300 300
301 > 为进行数据合规性审查与服务升级,云数据库自2025年10月1日起暂停接收新的使用申请 301 > 为进行数据合规性审查与服务升级,云数据库自2025年10月1日起暂停接收新的使用申请
302 302
303 -### 5. 启动系统 303 +### 5. Docker 部署(推荐)
304 304
305 -#### 5.1 完整系统启动(推荐) 305 +项目提供了完整的Docker支持,包含应用程序和数据库服务,便于快速部署和环境隔离。
  306 +
  307 +#### 5.1 Docker 环境要求
  308 +
  309 +- **Docker**: 20.10+
  310 +- **Docker Compose**: 2.0+
  311 +- **可用内存**: 建议4GB以上
  312 +- **可用磁盘空间**: 建议10GB以上
  313 +
  314 +#### 5.2 Docker 快速启动
  315 +
  316 +1. **克隆项目并进入目录**
  317 +```bash
  318 +git clone https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem.git
  319 +cd Weibo_PublicOpinion_AnalysisSystem
  320 +```
  321 +
  322 +2. **配置环境变量**
  323 +```bash
  324 +# 复制环境变量模板
  325 +cp .env.example .env
  326 +
  327 +# 编辑环境变量文件,填入必要的配置
  328 +vim .env
  329 +```
  330 +
  331 +> **提示:** 应用会从 `.env` 读取数据库相关配置。使用内置 PostgreSQL 时请保持 `DB_DIALECT=postgresql`,只有在切换到其他数据库引擎时再调整该值。
  332 +
  333 +**重要环境变量配置**
  334 +```bash
  335 +# LLM API配置(必需)
  336 +INSIGHT_ENGINE_API_KEY="your_api_key"
  337 +INSIGHT_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  338 +INSIGHT_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  339 +
  340 +# Media Agent配置
  341 +MEDIA_ENGINE_API_KEY="your_api_key"
  342 +MEDIA_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  343 +MEDIA_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  344 +
  345 +# Query Agent配置
  346 +QUERY_ENGINE_API_KEY="your_api_key"
  347 +QUERY_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  348 +QUERY_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  349 +
  350 +# Report Agent配置
  351 +REPORT_ENGINE_API_KEY="your_api_key"
  352 +REPORT_ENGINE_BASE_URL="https://api.moonshot.cn/v1"
  353 +REPORT_ENGINE_MODEL_NAME="kimi-k2-0711-preview"
  354 +
  355 +# 数据库配置(使用Docker内置PostgreSQL)
  356 +POSTGRES_USER=bettafish
  357 +POSTGRES_PASSWORD=bettafish
  358 +POSTGRES_DB=bettafish
  359 +POSTGRES_PORT=5444
  360 +```
  361 +
  362 +3. **启动Docker服务**
  363 +```bash
  364 +# 构建并启动所有服务
  365 +docker-compose up -d
  366 +
  367 +# 查看服务状态
  368 +docker-compose ps
  369 +
  370 +# 查看日志
  371 +docker-compose logs -f bettafish
  372 +```
  373 +
  374 +4. **访问应用**
  375 +- **主应用**: http://localhost:5000
  376 +- **Insight Engine**: http://localhost:8501
  377 +- **Media Engine**: http://localhost:8502
  378 +- **Query Engine**: http://localhost:8503
  379 +
  380 +#### 5.3 Docker 管理命令
  381 +
  382 +```bash
  383 +# 启动所有服务
  384 +docker-compose up -d
  385 +
  386 +# 停止所有服务
  387 +docker-compose down
  388 +
  389 +# 停止并删除所有数据(谨慎使用)
  390 +docker-compose down -v
  391 +
  392 +# 重新构建并启动
  393 +docker-compose up --build -d
  394 +
  395 +# 查看实时日志
  396 +docker-compose logs -f
  397 +
  398 +# 查看特定服务日志
  399 +docker-compose logs -f bettafish
  400 +docker-compose logs -f db
  401 +
  402 +# 进入容器内部
  403 +docker-compose exec bettafish bash
  404 +
  405 +# 备份数据库
  406 +docker-compose exec db pg_dump -U bettafish bettafish > backup.sql
  407 +
  408 +# 恢复数据库
  409 +docker-compose exec -T db psql -U bettafish bettafish < backup.sql
  410 +```
  411 +
  412 +#### 5.4 Docker 数据持久化
  413 +
  414 +项目配置了以下数据卷:
  415 +- `./logs`: 应用日志文件
  416 +- `./final_reports`: 生成的分析报告
  417 +- `./insight_engine_streamlit_reports`: Insight Engine报告
  418 +- `./media_engine_streamlit_reports`: Media Engine报告
  419 +- `./query_engine_streamlit_reports`: Query Engine报告
  420 +- `./db_data`: PostgreSQL数据库数据
  421 +
  422 +#### 5.5 Docker 故障排除
  423 +
  424 +**常见问题及解决方案**
  425 +
  426 +1. **端口冲突**
  427 +```bash
  428 +# 检查端口占用
  429 +netstat -tulpn | grep :5000
  430 +# 或修改docker-compose.yml中的端口映射
  431 +```
  432 +
  433 +2. **内存不足**
  434 +```bash
  435 +# 增加Docker内存限制
  436 +# 在Docker Desktop中调整资源分配
  437 +```
  438 +
  439 +3. **权限问题**
  440 +```bash
  441 +# 确保脚本有执行权限
  442 +chmod +x scripts/*.sh
  443 +
  444 +# 确保数据目录权限正确
  445 +sudo chown -R $USER:$USER ./
  446 +```
  447 +
  448 +4. **构建失败**
  449 +```bash
  450 +# 清理Docker缓存并重新构建
  451 +docker system prune -a
  452 +docker-compose build --no-cache
  453 +```
  454 +
  455 +5. **服务无法启动**
  456 +```bash
  457 +# 检查日志排查问题
  458 +docker-compose logs bettafish
  459 +
  460 +# 检查环境变量配置
  461 +docker-compose config
  462 +```
  463 +
  464 +### 6. 传统方式启动
  465 +
  466 +#### 6.1 完整系统启动
306 467
307 ```bash 468 ```bash
308 # 在项目根目录下,激活conda环境 469 # 在项目根目录下,激活conda环境
@@ -323,13 +484,13 @@ python app.py @@ -323,13 +484,13 @@ python app.py
323 484
324 > 注1:一次运行终止后,streamlit app可能结束异常仍然占用端口,此时搜索占用端口的进程kill掉即可 485 > 注1:一次运行终止后,streamlit app可能结束异常仍然占用端口,此时搜索占用端口的进程kill掉即可
325 486
326 -> 注2:数据爬取需要单独操作,见5.3指引 487 +> 注2:数据爬取需要单独操作,见6.3指引
327 488
328 > 注3:如果服务器远程部署出现页面显示问题,见[PR#45](https://github.com/666ghj/BettaFish/pull/45) 489 > 注3:如果服务器远程部署出现页面显示问题,见[PR#45](https://github.com/666ghj/BettaFish/pull/45)
329 490
330 访问 http://localhost:5000 即可使用完整系统 491 访问 http://localhost:5000 即可使用完整系统
331 492
332 -#### 5.2 单独启动某个Agent 493 +#### 6.2 单独启动某个Agent
333 494
334 ```bash 495 ```bash
335 # 启动QueryEngine 496 # 启动QueryEngine
@@ -342,7 +503,7 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502 @@ -342,7 +503,7 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
342 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501 503 streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
343 ``` 504 ```
344 505
345 -#### 5.3 爬虫系统单独使用 506 +#### 6.3 爬虫系统单独使用
346 507
347 这部分有详细的配置文档:[MindSpider使用说明](./MindSpider/README.md) 508 这部分有详细的配置文档:[MindSpider使用说明](./MindSpider/README.md)
348 509