马一丁

Updata README-EN.md

Showing 1 changed file with 146 additions and 76 deletions
@@ -124,123 +124,193 @@ Solomon LionCC BettaFish WeiYu Benefits: Open codecodex.ai Lion Programming Chan @@ -124,123 +124,193 @@ Solomon LionCC BettaFish WeiYu Benefits: Open codecodex.ai Lion Programming Chan
124 ``` 124 ```
125 BettaFish/ 125 BettaFish/
126 ├── QueryEngine/ # Domestic and international news breadth search Agent 126 ├── QueryEngine/ # Domestic and international news breadth search Agent
127 -│ ├── agent.py # Agent main logic 127 +│ ├── agent.py # Agent main logic, coordinates search and analysis workflow
128 │ ├── llms/ # LLM interface wrapper 128 │ ├── llms/ # LLM interface wrapper
129 -│ ├── nodes/ # Processing nodes 129 +│ │ └── base.py # Unified OpenAI-compatible client
  130 +│ ├── nodes/ # Processing nodes: search, formatting, summarization, etc.
  131 +│ │ ├── base_node.py # Base node class
  132 +│ │ ├── search_node.py # Search node
  133 +│ │ ├── formatting_node.py # Formatting node
  134 +│ │ ├── report_structure_node.py # Report structure node
  135 +│ │ └── summary_node.py # Summary node
130 │ ├── tools/ # Search tools 136 │ ├── tools/ # Search tools
  137 +│ │ └── search.py # Web search toolkit
131 │ ├── utils/ # Utility functions 138 │ ├── utils/ # Utility functions
132 -│ └── ... # Other modules 139 +│ │ ├── config.py # Configuration management
  140 +│ │ └── text_processing.py # Text processing utilities
  141 +│ ├── state/ # State management
  142 +│ │ └── state.py # Agent state definition
  143 +│ ├── prompts/ # Prompt templates
  144 +│ │ └── prompts.py # Various prompt templates
  145 +│ └── __init__.py
133 ├── MediaEngine/ # Powerful multimodal understanding Agent 146 ├── MediaEngine/ # Powerful multimodal understanding Agent
134 -│ ├── agent.py # Agent main logic  
135 -│ ├── nodes/ # Processing nodes  
136 -│ ├── llms/ # LLM interfaces  
137 -│ ├── tools/ # Search tools 147 +│ ├── agent.py # Agent main logic, handles video/image multimodal content
  148 +│ ├── llms/ # LLM interface wrapper
  149 +│ │ └── base.py # Unified OpenAI-compatible client
  150 +│ ├── nodes/ # Processing nodes: search, formatting, summarization, etc.
  151 +│ │ ├── base_node.py # Base node class
  152 +│ │ ├── search_node.py # Search node
  153 +│ │ ├── formatting_node.py # Formatting node
  154 +│ │ ├── report_structure_node.py # Report structure node
  155 +│ │ └── summary_node.py # Summary node
  156 +│ ├── tools/ # Multimodal search tools
  157 +│ │ └── search.py # Multimodal content search toolkit
138 │ ├── utils/ # Utility functions 158 │ ├── utils/ # Utility functions
139 -│ └── ... # Other modules 159 +│ │ ├── config.py # Configuration management
  160 +│ │ └── text_processing.py # Text processing utilities
  161 +│ ├── state/ # State management
  162 +│ │ └── state.py # Agent state definition
  163 +│ ├── prompts/ # Prompt templates
  164 +│ │ └── prompts.py # Various prompt templates
  165 +│ └── __init__.py
140 ├── InsightEngine/ # Private database mining Agent 166 ├── InsightEngine/ # Private database mining Agent
141 -│ ├── agent.py # Agent main logic 167 +│ ├── agent.py # Agent main logic, coordinates database queries and analysis
142 │ ├── llms/ # LLM interface wrapper 168 │ ├── llms/ # LLM interface wrapper
143 │ │ └── base.py # Unified OpenAI-compatible client 169 │ │ └── base.py # Unified OpenAI-compatible client
144 -│ ├── nodes/ # Processing nodes 170 +│ ├── nodes/ # Processing nodes: search, formatting, summarization, etc.
145 │ │ ├── base_node.py # Base node class 171 │ │ ├── base_node.py # Base node class
  172 +│ │ ├── search_node.py # Search node
146 │ │ ├── formatting_node.py # Formatting node 173 │ │ ├── formatting_node.py # Formatting node
147 │ │ ├── report_structure_node.py # Report structure node 174 │ │ ├── report_structure_node.py # Report structure node
148 -│ │ ├── search_node.py # Search node  
149 │ │ └── summary_node.py # Summary node 175 │ │ └── summary_node.py # Summary node
150 │ ├── tools/ # Database query and analysis tools 176 │ ├── tools/ # Database query and analysis tools
151 │ │ ├── keyword_optimizer.py # Qwen keyword optimization middleware 177 │ │ ├── keyword_optimizer.py # Qwen keyword optimization middleware
152 -│ │ ├── search.py # Database operation toolkit 178 +│ │ ├── search.py # Database operation toolkit (topic search, comment retrieval, etc.)
153 │ │ └── sentiment_analyzer.py # Sentiment analysis integration tool 179 │ │ └── sentiment_analyzer.py # Sentiment analysis integration tool
  180 +│ ├── utils/ # Utility functions
  181 +│ │ ├── config.py # Configuration management
  182 +│ │ ├── db.py # SQLAlchemy async engine + read-only query wrapper
  183 +│ │ └── text_processing.py # Text processing utilities
154 │ ├── state/ # State management 184 │ ├── state/ # State management
155 -│ │ ├── __init__.py  
156 │ │ └── state.py # Agent state definition 185 │ │ └── state.py # Agent state definition
157 │ ├── prompts/ # Prompt templates 186 │ ├── prompts/ # Prompt templates
158 -│ │ ├── __init__.py  
159 -│ │ └── prompts.py # Various prompts  
160 -│ └── utils/ # Utility functions  
161 -│ ├── __init__.py  
162 -│ ├── config.py # Configuration management  
163 -│ ├── db.py # SQLAlchemy async engine + read-only query helpers  
164 -│ └── text_processing.py # Text processing tools 187 +│ │ └── prompts.py # Various prompt templates
  188 +│ └── __init__.py
165 ├── ReportEngine/ # Multi-round report generation Agent 189 ├── ReportEngine/ # Multi-round report generation Agent
166 -│ ├── agent.py # Orchestrates template → layout → budget → chapter → render pipeline  
167 -│ ├── flask_interface.py # Flask/SSE facade handling task queueing and streaming events 190 +│ ├── agent.py # Master orchestrator: template selection → layout → budget → chapter → render
  191 +│ ├── flask_interface.py # Flask/SSE entry point, manages task queuing and streaming events
168 │ ├── llms/ # OpenAI-compatible LLM wrappers 192 │ ├── llms/ # OpenAI-compatible LLM wrappers
169 │ │ └── base.py # Unified streaming/retry client 193 │ │ └── base.py # Unified streaming/retry client
170 -│ ├── core/ # Template slicing, chapter storage, document stitching  
171 -│ │ ├── template_parser.py # Markdown slicer and slug generator  
172 -│ │ ├── chapter_storage.py # Run directory + manifest + raw streaming writer  
173 -│ │ └── stitcher.py # Document IR composer injecting anchors/metadata  
174 -│ ├── ir/ # Report IR contract & validator  
175 -│ │ ├── schema.py # Block/mark schema constants 194 +│ ├── core/ # Core functionalities: template parsing, chapter storage, document stitching
  195 +│ │ ├── template_parser.py # Markdown template slicer and slug generator
  196 +│ │ ├── chapter_storage.py # Chapter run directory, manifest, and raw stream writer
  197 +│ │ └── stitcher.py # Document IR stitcher, adds anchors/metadata
  198 +│ ├── ir/ # Report Intermediate Representation (IR) contract & validation
  199 +│ │ ├── schema.py # Block/mark schema constant definitions
176 │ │ └── validator.py # Chapter JSON structure validator 200 │ │ └── validator.py # Chapter JSON structure validator
177 -│ ├── nodes/ # Reasoning nodes for the whole pipeline  
178 -│ │ ├── base_node.py # Base class with logging/state hooks  
179 -│ │ ├── template_selection_node.py # Gather candidates and ask LLM to pick 201 +│ ├── nodes/ # Full workflow reasoning nodes
  202 +│ │ ├── base_node.py # Node base class + logging/state hooks
  203 +│ │ ├── template_selection_node.py # Template candidate collection and LLM selection
180 │ │ ├── document_layout_node.py # Title/TOC/theme designer 204 │ │ ├── document_layout_node.py # Title/TOC/theme designer
181 -│ │ ├── word_budget_node.py # Word plan & directives per chapter 205 +│ │ ├── word_budget_node.py # Word budget planning and chapter directive generation
182 │ │ └── chapter_generation_node.py # Chapter-level JSON generation + validation 206 │ │ └── chapter_generation_node.py # Chapter-level JSON generation + validation
183 -│ ├── prompts/ # Prompt library and schema notes  
184 -│ │ └── prompts.py # Templates for selection/layout/budget/chapters 207 +│ ├── prompts/ # Prompt library and schema descriptions
  208 +│ │ └── prompts.py # Template selection/layout/budget/chapter prompts
185 │ ├── renderers/ # IR renderers 209 │ ├── renderers/ # IR renderers
186 -│ │ └── html_renderer.py # Document IR → interactive HTML  
187 -│ ├── state/ # Task and metadata state models  
188 -│ │ └── state.py # ReportState plus serialization helpers  
189 -│ ├── utils/ # Config/log helpers  
190 -│ │ └── config.py # Pydantic settings + printer 210 +│ │ ├── html_renderer.py # Document IR → interactive HTML
  211 +│ │ ├── pdf_renderer.py # HTML → PDF export (WeasyPrint)
  212 +│ │ ├── pdf_layout_optimizer.py # PDF layout optimizer
  213 +│ │ └── chart_to_svg.py # Chart to SVG conversion tool
  214 +│ ├── state/ # Task/metadata state models
  215 +│ │ └── state.py # ReportState and serialization utilities
  216 +│ ├── utils/ # Configuration and helper utilities
  217 +│ │ ├── config.py # Pydantic settings + printer helper
  218 +│ │ ├── dependency_check.py # Dependency checking tool
  219 +│ │ ├── json_parser.py # JSON parsing utilities
  220 +│ │ ├── chart_validator.py # Chart validation tool
  221 +│ │ └── chart_repair_api.py # Chart repair API
191 │ ├── report_template/ # Markdown template library 222 │ ├── report_template/ # Markdown template library
192 -│ └── ... # Misc caches, __init__.py, etc.  
193 -├── ForumEngine/ # Forum engine simple implementation  
194 -│ ├── monitor.py # Log monitoring and forum management  
195 -│ └── llm_host.py # Forum host LLM module  
196 -├── MindSpider/ # Weibo crawler system  
197 -│ ├── main.py # Crawler main program 223 +│ │ ├── 企业品牌声誉分析报告.md
  224 +│ │ └── ...
  225 +│ └── __init__.py
  226 +├── ForumEngine/ # Forum engine: Agent collaboration mechanism
  227 +│ ├── monitor.py # Log monitoring and forum management core
  228 +│ ├── llm_host.py # Forum moderator LLM module
  229 +│ └── __init__.py
  230 +├── MindSpider/ # Social media crawler system
  231 +│ ├── main.py # Crawler main program entry
198 │ ├── config.py # Crawler configuration file 232 │ ├── config.py # Crawler configuration file
199 │ ├── BroadTopicExtraction/ # Topic extraction module 233 │ ├── BroadTopicExtraction/ # Topic extraction module
200 -│ │ ├── database_manager.py # Database manager  
201 -│ │ ├── get_today_news.py # Today's news fetching  
202 │ │ ├── main.py # Topic extraction main program 234 │ │ ├── main.py # Topic extraction main program
  235 +│ │ ├── database_manager.py # Database manager
  236 +│ │ ├── get_today_news.py # Today's news fetcher
203 │ │ └── topic_extractor.py # Topic extractor 237 │ │ └── topic_extractor.py # Topic extractor
204 -│ ├── DeepSentimentCrawling/ # Deep sentiment crawling  
205 -│ │ ├── keyword_manager.py # Keyword manager 238 +│ ├── DeepSentimentCrawling/ # Deep sentiment crawling module
206 │ │ ├── main.py # Deep crawling main program 239 │ │ ├── main.py # Deep crawling main program
207 -│ │ ├── MediaCrawler/ # Media crawler core  
208 -│ │ └── platform_crawler.py # Platform crawler management  
209 -│ └── schema/ # Database schema 240 +│ │ ├── keyword_manager.py # Keyword manager
  241 +│ │ ├── platform_crawler.py # Platform crawler manager
  242 +│ │ └── MediaCrawler/ # Media crawler core (Weibo/TikTok/Xiaohongshu, etc.)
  243 +│ │ ├── main.py
  244 +│ │ ├── config/ # Platform configurations
  245 +│ │ ├── media_platform/ # Platform crawler implementations
  246 +│ │ └── ...
  247 +│ └── schema/ # Database schema definitions
210 │ ├── db_manager.py # Database manager 248 │ ├── db_manager.py # Database manager
211 -│ ├── init_database.py # Database initialization  
212 -│ ├── mindspider_tables.sql # Database table structure  
213 -│ ├── models_bigdata.py # SQLAlchemy models for large media crawling tables  
214 -│ └── models_sa.py # ORM base and topic/task models 249 +│ ├── init_database.py # Database initialization script
  250 +│ ├── mindspider_tables.sql # Database table structure SQL
  251 +│ ├── models_bigdata.py # SQLAlchemy mappings for large-scale media opinion tables
  252 +│ └── models_sa.py # ORM models for DailyTopic/Task extension tables
215 ├── SentimentAnalysisModel/ # Sentiment analysis model collection 253 ├── SentimentAnalysisModel/ # Sentiment analysis model collection
216 │ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models 254 │ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models
  255 +│ │ ├── BertChinese-Lora/ # BERT Chinese LoRA fine-tuning
  256 +│ │ │ ├── train.py
  257 +│ │ │ ├── predict.py
  258 +│ │ │ └── ...
  259 +│ │ └── GPT2-Lora/ # GPT-2 LoRA fine-tuning
  260 +│ │ ├── train.py
  261 +│ │ ├── predict.py
  262 +│ │ └── ...
217 │ ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis (recommended) 263 │ ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis (recommended)
  264 +│ │ ├── train.py
  265 +│ │ ├── predict.py
  266 +│ │ └── ...
218 │ ├── WeiboSentiment_SmallQwen/ # Small parameter Qwen3 fine-tuning 267 │ ├── WeiboSentiment_SmallQwen/ # Small parameter Qwen3 fine-tuning
  268 +│ │ ├── train.py
  269 +│ │ ├── predict_universal.py
  270 +│ │ └── ...
219 │ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods 271 │ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
  272 +│ ├── train.py
  273 +│ ├── predict.py
  274 +│ └── ...
220 ├── SingleEngineApp/ # Individual Agent Streamlit applications 275 ├── SingleEngineApp/ # Individual Agent Streamlit applications
221 -│ ├── query_engine_streamlit_app.py  
222 -│ ├── media_engine_streamlit_app.py  
223 -│ └── insight_engine_streamlit_app.py  
224 -├── query_engine_streamlit_reports/ # QueryEngine Streamlit outputs (Markdown + state)  
225 -├── media_engine_streamlit_reports/ # MediaEngine Streamlit outputs (Markdown + state)  
226 -├── insight_engine_streamlit_reports/ # InsightEngine Streamlit outputs (Markdown + state)  
227 -├── templates/ # Flask templates  
228 -│ └── index.html # Main interface frontend 276 +│ ├── query_engine_streamlit_app.py # QueryEngine standalone app
  277 +│ ├── media_engine_streamlit_app.py # MediaEngine standalone app
  278 +│ └── insight_engine_streamlit_app.py # InsightEngine standalone app
  279 +├── query_engine_streamlit_reports/ # QueryEngine standalone app outputs
  280 +├── media_engine_streamlit_reports/ # MediaEngine standalone app outputs
  281 +├── insight_engine_streamlit_reports/ # InsightEngine standalone app outputs
  282 +├── templates/ # Flask frontend templates
  283 +│ └── index.html # Main interface HTML
229 ├── static/ # Static resources 284 ├── static/ # Static resources
  285 +│ └── image/ # Image resources
  286 +│ ├── logo_compressed.png
  287 +│ ├── framework.png
  288 +│ └── ...
230 ├── logs/ # Runtime log directory 289 ├── logs/ # Runtime log directory
231 -├── final_reports/ # Final generated HTML report files 290 +├── final_reports/ # Final generated report files
  291 +│ ├── ir/ # Report IR JSON files
  292 +│ └── *.html # Final HTML reports
232 ├── utils/ # Common utility functions 293 ├── utils/ # Common utility functions
233 -│ ├── forum_reader.py # Agent forum communication  
234 -│ ├── github_issues.py # Helper to prefill GitHub issue links and errors  
235 -│ └── retry_helper.py # Network request retry mechanism tool  
236 -├── tests/ # Targeted pytest suites  
237 -│ ├── run_tests.py # pytest entry helper  
238 -│ ├── test_monitor.py # ForumEngine monitor tests  
239 -│ └── test_report_engine_sanitization.py # ReportEngine sanitization tests  
240 -├── app.py # Flask main application entry  
241 -├── config.py # Global configuration file  
242 -├── docker-compose.yml # Orchestrates multi-service deployment  
243 -└── requirements.txt # Python dependency list 294 +│ ├── forum_reader.py # Agent inter-communication forum tool
  295 +│ ├── github_issues.py # Unified GitHub issue link generator and error formatter
  296 +│ └── retry_helper.py # Network request retry mechanism utility
  297 +├── tests/ # Unit tests and integration tests
  298 +│ ├── run_tests.py # pytest entry script
  299 +│ ├── test_monitor.py # ForumEngine monitoring unit tests
  300 +│ ├── test_report_engine_sanitization.py # ReportEngine security tests
  301 +│ └── ...
  302 +├── app.py # Flask main application entry point
  303 +├── config.py # Global configuration file (unified LLM/DB config management)
  304 +├── .env.example # Environment variable example file
  305 +├── docker-compose.yml # Docker multi-service orchestration config
  306 +├── Dockerfile # Docker image build file
  307 +├── requirements.txt # Python dependency list
  308 +├── regenerate_latest_pdf.py # PDF regeneration utility script
  309 +├── README.md # Chinese documentation
  310 +├── README-EN.md # English documentation
  311 +├── CONTRIBUTING.md # Chinese contribution guide
  312 +├── CONTRIBUTING-EN.md # English contribution guide
  313 +└── LICENSE # GPL-2.0 open source license
244 ``` 314 ```
245 315
246 ## 🚀 Quick Start (Docker) 316 ## 🚀 Quick Start (Docker)