马一丁

Update README-EN.md

Showing 1 changed file with 112 additions and 103 deletions
@@ -123,115 +123,124 @@ Solomon LionCC BettaFish WeiYu Benefits: Open codecodex.ai Lion Programming Chan @@ -123,115 +123,124 @@ Solomon LionCC BettaFish WeiYu Benefits: Open codecodex.ai Lion Programming Chan
123 123
124 ``` 124 ```
125 BettaFish/ 125 BettaFish/
126 -├── QueryEngine/ # Domestic and international news breadth search Agent  
127 -│ ├── agent.py # Agent main logic  
128 -│ ├── llms/ # LLM interface wrapper  
129 -│ ├── nodes/ # Processing nodes  
130 -│ ├── tools/ # Search tools  
131 -│ ├── utils/ # Utility functions  
132 -│ └── ... # Other modules  
133 -├── MediaEngine/ # Powerful multimodal understanding Agent  
134 -│ ├── agent.py # Agent main logic  
135 -│ ├── nodes/ # Processing nodes  
136 -│ ├── llms/ # LLM interfaces  
137 -│ ├── tools/ # Search tools  
138 -│ ├── utils/ # Utility functions  
139 -│ └── ... # Other modules  
140 -├── InsightEngine/ # Private database mining Agent  
141 -│ ├── agent.py # Agent main logic  
142 -│ ├── llms/ # LLM interface wrapper  
143 -│ │ └── base.py # Unified OpenAI-compatible client  
144 -│ ├── nodes/ # Processing nodes  
145 -│ │ ├── base_node.py # Base node class  
146 -│ │ ├── formatting_node.py # Formatting node  
147 -│ │ ├── report_structure_node.py # Report structure node  
148 -│ │ ├── search_node.py # Search node  
149 -│ │ └── summary_node.py # Summary node  
150 -│ ├── tools/ # Database query and analysis tools  
151 -│ │ ├── keyword_optimizer.py # Qwen keyword optimization middleware  
152 -│ │ ├── search.py # Database operation toolkit  
153 -│ │ └── sentiment_analyzer.py # Sentiment analysis integration tool  
154 -│ ├── state/ # State management 126 +├── QueryEngine/ # Domestic and international news breadth search Agent
  127 +│ ├── agent.py # Agent main logic
  128 +│ ├── llms/ # LLM interface wrapper
  129 +│ ├── nodes/ # Processing nodes
  130 +│ ├── tools/ # Search tools
  131 +│ ├── utils/ # Utility functions
  132 +│ └── ... # Other modules
  133 +├── MediaEngine/ # Powerful multimodal understanding Agent
  134 +│ ├── agent.py # Agent main logic
  135 +│ ├── nodes/ # Processing nodes
  136 +│ ├── llms/ # LLM interfaces
  137 +│ ├── tools/ # Search tools
  138 +│ ├── utils/ # Utility functions
  139 +│ └── ... # Other modules
  140 +├── InsightEngine/ # Private database mining Agent
  141 +│ ├── agent.py # Agent main logic
  142 +│ ├── llms/ # LLM interface wrapper
  143 +│ │ └── base.py # Unified OpenAI-compatible client
  144 +│ ├── nodes/ # Processing nodes
  145 +│ │ ├── base_node.py # Base node class
  146 +│ │ ├── formatting_node.py # Formatting node
  147 +│ │ ├── report_structure_node.py # Report structure node
  148 +│ │ ├── search_node.py # Search node
  149 +│ │ └── summary_node.py # Summary node
  150 +│ ├── tools/ # Database query and analysis tools
  151 +│ │ ├── keyword_optimizer.py # Qwen keyword optimization middleware
  152 +│ │ ├── search.py # Database operation toolkit
  153 +│ │ └── sentiment_analyzer.py # Sentiment analysis integration tool
  154 +│ ├── state/ # State management
155 │ │ ├── __init__.py 155 │ │ ├── __init__.py
156 -│ │ └── state.py # Agent state definition  
157 -│ ├── prompts/ # Prompt templates 156 +│ │ └── state.py # Agent state definition
  157 +│ ├── prompts/ # Prompt templates
158 │ │ ├── __init__.py 158 │ │ ├── __init__.py
159 -│ │ └── prompts.py # Various prompts  
160 -│ └── utils/ # Utility functions 159 +│ │ └── prompts.py # Various prompts
  160 +│ └── utils/ # Utility functions
161 │ ├── __init__.py 161 │ ├── __init__.py
162 -│ ├── config.py # Configuration management  
163 -│ └── text_processing.py # Text processing tools  
164 -├── ReportEngine/ # Multi-round report generation Agent  
165 -│ ├── agent.py # Orchestrates template → layout → budget → chapter → render pipeline  
166 -│ ├── flask_interface.py # Flask/SSE facade handling task queueing and streaming events  
167 -│ ├── llms/ # OpenAI-compatible LLM wrappers  
168 -│ │ └── base.py # Unified streaming/retry client  
169 -│ ├── core/ # Template slicing, chapter storage, document stitching  
170 -│ │ ├── template_parser.py # Markdown slicer and slug generator  
171 -│ │ ├── chapter_storage.py # Run directory + manifest + raw streaming writer  
172 -│ │ └── stitcher.py # Document IR composer injecting anchors/metadata  
173 -│ ├── ir/ # Report IR contract & validator  
174 -│ │ ├── schema.py # Block/mark schema constants  
175 -│ │ └── validator.py # Chapter JSON structure validator  
176 -│ ├── nodes/ # Reasoning nodes for the whole pipeline  
177 -│ │ ├── base_node.py # Base class with logging/state hooks  
178 -│ │ ├── template_selection_node.py # Gather candidates and ask LLM to pick  
179 -│ │ ├── document_layout_node.py # Title/TOC/theme designer  
180 -│ │ ├── word_budget_node.py # Word plan & directives per chapter  
181 -│ │ └── chapter_generation_node.py # Chapter-level JSON generation + validation  
182 -│ ├── prompts/ # Prompt library and schema notes  
183 -│ │ └── prompts.py # Templates for selection/layout/budget/chapters  
184 -│ ├── renderers/ # IR renderers  
185 -│ │ └── html_renderer.py # Document IR → interactive HTML  
186 -│ ├── state/ # Task and metadata state models  
187 -│ │ └── state.py # ReportState plus serialization helpers  
188 -│ ├── utils/ # Config/log helpers  
189 -│ │ └── config.py # Pydantic settings + printer  
190 -│ ├── report_template/ # Markdown template library  
191 -│ │ ├── 社会公共热点事件分析.md  
192 -│ │ ├── 商业品牌舆情监测.md  
193 -│ │ └── ... # More templates  
194 -│ └── ... # Misc caches, __init__.py, etc.  
195 -├── ForumEngine/ # Forum engine simple implementation  
196 -│ ├── monitor.py # Log monitoring and forum management  
197 -│ └── llm_host.py # Forum host LLM module  
198 -├── MindSpider/ # Weibo crawler system  
199 -│ ├── main.py # Crawler main program  
200 -│ ├── config.py # Crawler configuration file  
201 -│ ├── BroadTopicExtraction/ # Topic extraction module  
202 -│ │ ├── database_manager.py # Database manager  
203 -│ │ ├── get_today_news.py # Today's news fetching  
204 -│ │ ├── main.py # Topic extraction main program  
205 -│ │ └── topic_extractor.py # Topic extractor  
206 -│ ├── DeepSentimentCrawling/ # Deep sentiment crawling  
207 -│ │ ├── keyword_manager.py # Keyword manager  
208 -│ │ ├── main.py # Deep crawling main program  
209 -│ │ ├── MediaCrawler/ # Media crawler core  
210 -│ │ └── platform_crawler.py # Platform crawler management  
211 -│ └── schema/ # Database schema  
212 -│ ├── db_manager.py # Database manager  
213 -│ ├── init_database.py # Database initialization  
214 -│ └── mindspider_tables.sql # Database table structure  
215 -├── SentimentAnalysisModel/ # Sentiment analysis model collection  
216 -│ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models  
217 -│ ├── WeiboMultilingualSentiment/# Multilingual sentiment analysis (recommended)  
218 -│ ├── WeiboSentiment_SmallQwen/ # Small parameter Qwen3 fine-tuning  
219 -│ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods  
220 -├── SingleEngineApp/ # Individual Agent Streamlit applications 162 +│ ├── config.py # Configuration management
  163 +│ ├── db.py # SQLAlchemy async engine + read-only query helpers
  164 +│ └── text_processing.py # Text processing tools
  165 +├── ReportEngine/ # Multi-round report generation Agent
  166 +│ ├── agent.py # Orchestrates template → layout → budget → chapter → render pipeline
  167 +│ ├── flask_interface.py # Flask/SSE facade handling task queueing and streaming events
  168 +│ ├── llms/ # OpenAI-compatible LLM wrappers
  169 +│ │ └── base.py # Unified streaming/retry client
  170 +│ ├── core/ # Template slicing, chapter storage, document stitching
  171 +│ │ ├── template_parser.py # Markdown slicer and slug generator
  172 +│ │ ├── chapter_storage.py # Run directory + manifest + raw streaming writer
  173 +│ │ └── stitcher.py # Document IR composer injecting anchors/metadata
  174 +│ ├── ir/ # Report IR contract & validator
  175 +│ │ ├── schema.py # Block/mark schema constants
  176 +│ │ └── validator.py # Chapter JSON structure validator
  177 +│ ├── nodes/ # Reasoning nodes for the whole pipeline
  178 +│ │ ├── base_node.py # Base class with logging/state hooks
  179 +│ │ ├── template_selection_node.py # Gather candidates and ask LLM to pick
  180 +│ │ ├── document_layout_node.py # Title/TOC/theme designer
  181 +│ │ ├── word_budget_node.py # Word plan & directives per chapter
  182 +│ │ └── chapter_generation_node.py # Chapter-level JSON generation + validation
  183 +│ ├── prompts/ # Prompt library and schema notes
  184 +│ │ └── prompts.py # Templates for selection/layout/budget/chapters
  185 +│ ├── renderers/ # IR renderers
  186 +│ │ └── html_renderer.py # Document IR → interactive HTML
  187 +│ ├── state/ # Task and metadata state models
  188 +│ │ └── state.py # ReportState plus serialization helpers
  189 +│ ├── utils/ # Config/log helpers
  190 +│ │ └── config.py # Pydantic settings + printer
  191 +│ ├── report_template/ # Markdown template library
  192 +│ └── ... # Misc caches, __init__.py, etc.
  193 +├── ForumEngine/ # Forum engine simple implementation
  194 +│ ├── monitor.py # Log monitoring and forum management
  195 +│ └── llm_host.py # Forum host LLM module
  196 +├── MindSpider/ # Weibo crawler system
  197 +│ ├── main.py # Crawler main program
  198 +│ ├── config.py # Crawler configuration file
  199 +│ ├── BroadTopicExtraction/ # Topic extraction module
  200 +│ │ ├── database_manager.py # Database manager
  201 +│ │ ├── get_today_news.py # Today's news fetching
  202 +│ │ ├── main.py # Topic extraction main program
  203 +│ │ └── topic_extractor.py # Topic extractor
  204 +│ ├── DeepSentimentCrawling/ # Deep sentiment crawling
  205 +│ │ ├── keyword_manager.py # Keyword manager
  206 +│ │ ├── main.py # Deep crawling main program
  207 +│ │ ├── MediaCrawler/ # Media crawler core
  208 +│ │ └── platform_crawler.py # Platform crawler management
  209 +│ └── schema/ # Database schema
  210 +│ ├── db_manager.py # Database manager
  211 +│ ├── init_database.py # Database initialization
  212 +│ ├── mindspider_tables.sql # Database table structure
  213 +│ ├── models_bigdata.py # SQLAlchemy models for large media crawling tables
  214 +│ └── models_sa.py # ORM base and topic/task models
  215 +├── SentimentAnalysisModel/ # Sentiment analysis model collection
  216 +│ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models
  217 +│ ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis (recommended)
  218 +│ ├── WeiboSentiment_SmallQwen/ # Small parameter Qwen3 fine-tuning
  219 +│ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
  220 +├── SingleEngineApp/ # Individual Agent Streamlit applications
221 │ ├── query_engine_streamlit_app.py 221 │ ├── query_engine_streamlit_app.py
222 │ ├── media_engine_streamlit_app.py 222 │ ├── media_engine_streamlit_app.py
223 │ └── insight_engine_streamlit_app.py 223 │ └── insight_engine_streamlit_app.py
224 -├── templates/ # Flask templates  
225 -│ └── index.html # Main interface frontend  
226 -├── static/ # Static resources  
227 -├── logs/ # Runtime log directory  
228 -├── final_reports/ # Final generated HTML report files  
229 -├── utils/ # Common utility functions  
230 -│ ├── forum_reader.py # Agent forum communication  
231 -│ └── retry_helper.py # Network request retry mechanism tool  
232 -├── app.py # Flask main application entry  
233 -├── config.py # Global configuration file  
234 -└── requirements.txt # Python dependency list 224 +├── query_engine_streamlit_reports/ # QueryEngine Streamlit outputs (Markdown + state)
  225 +├── media_engine_streamlit_reports/ # MediaEngine Streamlit outputs (Markdown + state)
  226 +├── insight_engine_streamlit_reports/ # InsightEngine Streamlit outputs (Markdown + state)
  227 +├── templates/ # Flask templates
  228 +│ └── index.html # Main interface frontend
  229 +├── static/ # Static resources
  230 +├── logs/ # Runtime log directory
  231 +├── final_reports/ # Final generated HTML report files
  232 +├── utils/ # Common utility functions
  233 +│ ├── forum_reader.py # Agent forum communication
  234 +│ ├── github_issues.py # Helper to prefill GitHub issue links and errors
  235 +│ └── retry_helper.py # Network request retry mechanism tool
  236 +├── tests/ # Targeted pytest suites
  237 +│ ├── run_tests.py # pytest entry helper
  238 +│ ├── test_monitor.py # ForumEngine monitor tests
  239 +│ └── test_report_engine_sanitization.py # ReportEngine sanitization tests
  240 +├── app.py # Flask main application entry
  241 +├── config.py # Global configuration file
  242 +├── docker-compose.yml # Orchestrates multi-service deployment
  243 +└── requirements.txt # Python dependency list
235 ``` 244 ```
236 245
237 ## 🚀 Quick Start (Docker) 246 ## 🚀 Quick Start (Docker)