More model support, including OpenAI and Claude, with corresponding updates to t…
…he README documentation.
Showing
4 changed files
with
191 additions
and
26 deletions
| @@ -42,6 +42,7 @@ | @@ -42,6 +42,7 @@ | ||
| 42 | - [MySQL](https://www.mysql.com/) 数据库 | 42 | - [MySQL](https://www.mysql.com/) 数据库 |
| 43 | - [Conda](https://docs.conda.io/en/latest/)(可选,用于环境管理) | 43 | - [Conda](https://docs.conda.io/en/latest/)(可选,用于环境管理) |
| 44 | - 合法的微博账号(用于数据采集) | 44 | - 合法的微博账号(用于数据采集) |
| 45 | +- OpenAI API密钥或Anthropic(Claude)API密钥(用于AI分析功能) | ||
| 45 | 46 | ||
| 46 | ### 安装步骤 | 47 | ### 安装步骤 |
| 47 | 48 | ||
| @@ -68,7 +69,20 @@ | @@ -68,7 +69,20 @@ | ||
| 68 | - 运行 `createTables.sql` 创建所需的数据库表。 | 69 | - 运行 `createTables.sql` 创建所需的数据库表。 |
| 69 | - 修改 `config.py` 中的数据库连接配置,确保与您的MySQL设置匹配。 | 70 | - 修改 `config.py` 中的数据库连接配置,确保与您的MySQL设置匹配。 |
| 70 | 71 | ||
| 71 | -4. 启动Flask应用: | 72 | +4. 配置AI分析功能(可选): |
| 73 | + | ||
| 74 | + 设置AI分析功能所需的环境变量: | ||
| 75 | + ```bash | ||
| 76 | + # OpenAI API配置(使用GPT模型必需) | ||
| 77 | + export OPENAI_API_KEY="你的openai密钥" | ||
| 78 | + | ||
| 79 | + # Anthropic API配置(使用Claude模型必需) | ||
| 80 | + export ANTHROPIC_API_KEY="你的anthropic密钥" | ||
| 81 | + ``` | ||
| 82 | + | ||
| 83 | + 注意:至少需要配置一个API密钥才能使用AI分析功能。 | ||
| 84 | + | ||
| 85 | +5. 启动Flask应用: | ||
| 72 | 86 | ||
| 73 | ```bash | 87 | ```bash |
| 74 | python app.py | 88 | python app.py |
| @@ -90,6 +104,8 @@ | @@ -90,6 +104,8 @@ | ||
| 90 | - **[Matplotlib](https://matplotlib.org/)** - 数据可视化库。 | 104 | - **[Matplotlib](https://matplotlib.org/)** - 数据可视化库。 |
| 91 | - **[Scikit-learn](https://scikit-learn.org/)** - 机器学习库,用于模型训练和评估。 | 105 | - **[Scikit-learn](https://scikit-learn.org/)** - 机器学习库,用于模型训练和评估。 |
| 92 | - **[TensorFlow](https://www.tensorflow.org/)** 或 **[PyTorch](https://pytorch.org/)** - 深度学习框架,用于高级模型开发。 | 106 | - **[TensorFlow](https://www.tensorflow.org/)** 或 **[PyTorch](https://pytorch.org/)** - 深度学习框架,用于高级模型开发。 |
| 107 | +- **[OpenAI GPT](https://openai.com/)** - 先进的语言模型,用于文本分析。 | ||
| 108 | +- **[Anthropic Claude](https://www.anthropic.com/)** - 智能AI模型,用于复杂文本分析。 | ||
| 93 | 109 | ||
| 94 | ## 🤝 贡献 | 110 | ## 🤝 贡献 |
| 95 | 111 |
| @@ -40,6 +40,7 @@ Follow the steps below to run the project on your system. | @@ -40,6 +40,7 @@ Follow the steps below to run the project on your system. | ||
| 40 | - [MySQL](https://www.mysql.com/) Database | 40 | - [MySQL](https://www.mysql.com/) Database |
| 41 | - [Conda](https://docs.conda.io/en/latest/) (optional, for environment management) | 41 | - [Conda](https://docs.conda.io/en/latest/) (optional, for environment management) |
| 42 | - A valid Weibo account (for data collection) | 42 | - A valid Weibo account (for data collection) |
| 43 | +- OpenAI API key or Anthropic (Claude) API key for AI analysis features | ||
| 43 | 44 | ||
| 44 | ### Installation Steps | 45 | ### Installation Steps |
| 45 | 46 | ||
| @@ -66,13 +67,26 @@ Follow the steps below to run the project on your system. | @@ -66,13 +67,26 @@ Follow the steps below to run the project on your system. | ||
| 66 | - Run `createTables.sql` to create the necessary database tables. | 67 | - Run `createTables.sql` to create the necessary database tables. |
| 67 | - Modify the database connection settings in `config.py` to match your MySQL configuration. | 68 | - Modify the database connection settings in `config.py` to match your MySQL configuration. |
| 68 | 69 | ||
| 69 | -5. Start the Flask application: | 70 | +5. Configure AI Analysis (Optional): |
| 71 | + | ||
| 72 | + Set up environment variables for AI analysis features: | ||
| 73 | + ```bash | ||
| 74 | + # For OpenAI API (Required for GPT models) | ||
| 75 | + export OPENAI_API_KEY="your-openai-key" | ||
| 76 | + | ||
| 77 | + # For Anthropic API (Required for Claude models) | ||
| 78 | + export ANTHROPIC_API_KEY="your-anthropic-key" | ||
| 79 | + ``` | ||
| 80 | + | ||
| 81 | + Note: At least one API key must be configured to use AI analysis features. | ||
| 82 | + | ||
| 83 | +6. Start the Flask application: | ||
| 70 | 84 | ||
| 71 | ```bash | 85 | ```bash |
| 72 | python app.py | 86 | python app.py |
| 73 | ``` | 87 | ``` |
| 74 | 88 | ||
| 75 | -6. Access the application: Open your browser and navigate to http://localhost:5000 to use the system. | 89 | +7. Access the application: Open your browser and navigate to http://localhost:5000 to use the system. |
| 76 | 90 | ||
| 77 | ## 🛠️ Technology Stack | 91 | ## 🛠️ Technology Stack |
| 78 | 92 | ||
| @@ -88,6 +102,8 @@ The Weibo Public Opinion Analysis and Prediction System employs a range of moder | @@ -88,6 +102,8 @@ The Weibo Public Opinion Analysis and Prediction System employs a range of moder | ||
| 88 | - **[Matplotlib](https://matplotlib.org/)** - A data visualization library. | 102 | - **[Matplotlib](https://matplotlib.org/)** - A data visualization library. |
| 89 | - **[Scikit-learn](https://scikit-learn.org/)** - A machine learning library used for model training and evaluation. | 103 | - **[Scikit-learn](https://scikit-learn.org/)** - A machine learning library used for model training and evaluation. |
| 90 | - **[TensorFlow](https://www.tensorflow.org/)** 或 **[PyTorch](https://pytorch.org/)** - Deep learning frameworks used for advanced model development. | 104 | - **[TensorFlow](https://www.tensorflow.org/)** 或 **[PyTorch](https://pytorch.org/)** - Deep learning frameworks used for advanced model development. |
| 105 | +- **[OpenAI GPT](https://openai.com/)** - Advanced language models for text analysis. | ||
| 106 | +- **[Anthropic Claude](https://www.anthropic.com/)** - AI models for sophisticated text analysis. | ||
| 91 | 107 | ||
| 92 | ## 🤝 Contribution | 108 | ## 🤝 Contribution |
| 93 | 109 |
| 1 | import openai | 1 | import openai |
| 2 | +import anthropic | ||
| 2 | import json | 3 | import json |
| 3 | from typing import List, Dict | 4 | from typing import List, Dict |
| 4 | import os | 5 | import os |
| @@ -8,11 +9,34 @@ from utils.logger import app_logger as logging | @@ -8,11 +9,34 @@ from utils.logger import app_logger as logging | ||
| 8 | class AIAnalyzer: | 9 | class AIAnalyzer: |
| 9 | def __init__(self): | 10 | def __init__(self): |
| 10 | # 从环境变量获取API密钥 | 11 | # 从环境变量获取API密钥 |
| 11 | - self.api_key = os.getenv('OPENAI_API_KEY') | ||
| 12 | - if not self.api_key: | ||
| 13 | - raise ValueError("请设置OPENAI_API_KEY环境变量") | 12 | + self.openai_key = os.getenv('OPENAI_API_KEY') |
| 13 | + self.claude_key = os.getenv('ANTHROPIC_API_KEY') | ||
| 14 | 14 | ||
| 15 | - openai.api_key = self.api_key | 15 | + if not self.openai_key and not self.claude_key: |
| 16 | + raise ValueError("请至少设置一个API密钥 (OPENAI_API_KEY 或 ANTHROPIC_API_KEY)") | ||
| 17 | + | ||
| 18 | + if self.openai_key: | ||
| 19 | + openai.api_key = self.openai_key | ||
| 20 | + if self.claude_key: | ||
| 21 | + self.claude_client = anthropic.Anthropic(api_key=self.claude_key) | ||
| 22 | + | ||
| 23 | + # 支持的模型列表 | ||
| 24 | + self.supported_models = { | ||
| 25 | + # OpenAI 模型 | ||
| 26 | + 'gpt-3.5-turbo': {'provider': 'openai', 'max_tokens': 2000, 'cost_per_1k': 0.0015}, | ||
| 27 | + 'gpt-3.5-turbo-16k': {'provider': 'openai', 'max_tokens': 16000, 'cost_per_1k': 0.003}, | ||
| 28 | + 'gpt-4': {'provider': 'openai', 'max_tokens': 8000, 'cost_per_1k': 0.03}, | ||
| 29 | + 'gpt-4-32k': {'provider': 'openai', 'max_tokens': 32000, 'cost_per_1k': 0.06}, | ||
| 30 | + 'gpt-4-turbo-preview': {'provider': 'openai', 'max_tokens': 128000, 'cost_per_1k': 0.01}, | ||
| 31 | + | ||
| 32 | + # Claude 模型 | ||
| 33 | + 'claude-3-opus-20240229': {'provider': 'anthropic', 'max_tokens': 4000, 'cost_per_1k': 0.015}, | ||
| 34 | + 'claude-3-sonnet-20240229': {'provider': 'anthropic', 'max_tokens': 3000, 'cost_per_1k': 0.003}, | ||
| 35 | + 'claude-3-haiku-20240307': {'provider': 'anthropic', 'max_tokens': 2000, 'cost_per_1k': 0.0025}, | ||
| 36 | + 'claude-2.1': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.008}, | ||
| 37 | + 'claude-2.0': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.008}, | ||
| 38 | + 'claude-instant-1.2': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.0015} | ||
| 39 | + } | ||
| 16 | 40 | ||
| 17 | # 不同深度的分析提示词 | 41 | # 不同深度的分析提示词 |
| 18 | self.prompt_templates = { | 42 | self.prompt_templates = { |
| @@ -73,46 +97,142 @@ class AIAnalyzer: | @@ -73,46 +97,142 @@ class AIAnalyzer: | ||
| 73 | analysis_depth: str = "standard") -> List[Dict]: | 97 | analysis_depth: str = "standard") -> List[Dict]: |
| 74 | """分析一批消息并返回分析结果""" | 98 | """分析一批消息并返回分析结果""" |
| 75 | try: | 99 | try: |
| 100 | + if model_type not in self.supported_models: | ||
| 101 | + raise ValueError(f"不支持的模型类型: {model_type}") | ||
| 102 | + | ||
| 103 | + model_info = self.supported_models[model_type] | ||
| 104 | + provider = model_info['provider'] | ||
| 105 | + max_tokens = model_info['max_tokens'] | ||
| 106 | + | ||
| 107 | + # 根据模型类型调整批处理大小 | ||
| 108 | + adjusted_batch_size = min(batch_size, self._get_optimal_batch_size(model_type)) | ||
| 109 | + if adjusted_batch_size != batch_size: | ||
| 110 | + logging.info(f"已将批处理大小从 {batch_size} 调整为 {adjusted_batch_size}") | ||
| 111 | + | ||
| 76 | all_results = [] | 112 | all_results = [] |
| 113 | + total_cost = 0 | ||
| 77 | 114 | ||
| 78 | # 分批处理消息 | 115 | # 分批处理消息 |
| 79 | - for i in range(0, len(messages), batch_size): | ||
| 80 | - batch = messages[i:i + batch_size] | 116 | + for i in range(0, len(messages), adjusted_batch_size): |
| 117 | + batch = messages[i:i + adjusted_batch_size] | ||
| 81 | formatted_messages = [] | 118 | formatted_messages = [] |
| 82 | for msg in batch: | 119 | for msg in batch: |
| 83 | formatted_messages.append(f"消息ID: {msg['id']}\n内容: {msg['content']}") | 120 | formatted_messages.append(f"消息ID: {msg['id']}\n内容: {msg['content']}") |
| 84 | 121 | ||
| 85 | messages_text = "\n---\n".join(formatted_messages) | 122 | messages_text = "\n---\n".join(formatted_messages) |
| 86 | - | ||
| 87 | - # 获取对应深度的提示词 | ||
| 88 | system_prompt = self.prompt_templates.get(analysis_depth, self.prompt_templates['standard']) | 123 | system_prompt = self.prompt_templates.get(analysis_depth, self.prompt_templates['standard']) |
| 89 | 124 | ||
| 90 | - # 调用OpenAI API | 125 | + if provider == 'openai': |
| 126 | + result = await self._analyze_with_openai( | ||
| 127 | + messages_text, | ||
| 128 | + system_prompt, | ||
| 129 | + model_type, | ||
| 130 | + max_tokens | ||
| 131 | + ) | ||
| 132 | + else: # anthropic | ||
| 133 | + result = await self._analyze_with_claude( | ||
| 134 | + messages_text, | ||
| 135 | + system_prompt, | ||
| 136 | + model_type, | ||
| 137 | + max_tokens | ||
| 138 | + ) | ||
| 139 | + | ||
| 140 | + if result: | ||
| 141 | + all_results.extend(result) | ||
| 142 | + # 计算本批次成本 | ||
| 143 | + batch_cost = self._calculate_cost(len(messages_text), model_type) | ||
| 144 | + total_cost += batch_cost | ||
| 145 | + logging.info(f"批次处理完成,成本: ${batch_cost:.4f}") | ||
| 146 | + | ||
| 147 | + logging.info(f"分析完成,总成本: ${total_cost:.4f}") | ||
| 148 | + return all_results | ||
| 149 | + | ||
| 150 | + except Exception as e: | ||
| 151 | + logging.error(f"AI分析过程出错: {e}") | ||
| 152 | + return [] | ||
| 153 | + | ||
| 154 | + def _get_optimal_batch_size(self, model_type: str) -> int: | ||
| 155 | + """根据模型类型获取最优批处理大小""" | ||
| 156 | + model_info = self.supported_models[model_type] | ||
| 157 | + max_tokens = model_info['max_tokens'] | ||
| 158 | + | ||
| 159 | + # 估算每条消息的平均token数(假设为200) | ||
| 160 | + avg_tokens_per_message = 200 | ||
| 161 | + | ||
| 162 | + # 预留20%的token用于系统提示词和响应 | ||
| 163 | + available_tokens = int(max_tokens * 0.8) | ||
| 164 | + | ||
| 165 | + # 计算最优批处理大小 | ||
| 166 | + optimal_batch_size = max(1, min(100, available_tokens // avg_tokens_per_message)) | ||
| 167 | + | ||
| 168 | + return optimal_batch_size | ||
| 169 | + | ||
| 170 | + def _calculate_cost(self, input_length: int, model_type: str) -> float: | ||
| 171 | + """计算API调用成本""" | ||
| 172 | + model_info = self.supported_models[model_type] | ||
| 173 | + cost_per_1k = model_info['cost_per_1k'] | ||
| 174 | + | ||
| 175 | + # 估算token数(假设每4个字符约等于1个token) | ||
| 176 | + estimated_tokens = input_length // 4 | ||
| 177 | + | ||
| 178 | + # 计算成本(美元) | ||
| 179 | + cost = (estimated_tokens / 1000) * cost_per_1k | ||
| 180 | + | ||
| 181 | + return cost | ||
| 182 | + | ||
| 183 | + async def _analyze_with_openai(self, messages_text: str, system_prompt: str, | ||
| 184 | + model: str, max_tokens: int) -> List[Dict]: | ||
| 185 | + """使用OpenAI API进行分析""" | ||
| 186 | + try: | ||
| 91 | response = await openai.ChatCompletion.acreate( | 187 | response = await openai.ChatCompletion.acreate( |
| 92 | - model=model_type, | 188 | + model=model, |
| 93 | messages=[ | 189 | messages=[ |
| 94 | {"role": "system", "content": system_prompt}, | 190 | {"role": "system", "content": system_prompt}, |
| 95 | {"role": "user", "content": f"请分析以下消息:\n{messages_text}"} | 191 | {"role": "user", "content": f"请分析以下消息:\n{messages_text}"} |
| 96 | ], | 192 | ], |
| 97 | - temperature=0.3, # 降低随机性 | ||
| 98 | - max_tokens=2000 if analysis_depth != 'deep' else 3000, | ||
| 99 | - n=1 | 193 | + temperature=0.3, |
| 194 | + max_tokens=max_tokens, | ||
| 195 | + n=1, | ||
| 196 | + response_format={"type": "json_object"} # 强制JSON响应格式 | ||
| 100 | ) | 197 | ) |
| 101 | 198 | ||
| 102 | - try: | ||
| 103 | result = json.loads(response.choices[0].message.content) | 199 | result = json.loads(response.choices[0].message.content) |
| 104 | if isinstance(result, dict) and 'analysis_results' in result: | 200 | if isinstance(result, dict) and 'analysis_results' in result: |
| 105 | - all_results.extend(result['analysis_results']) | 201 | + return result['analysis_results'] |
| 106 | else: | 202 | else: |
| 107 | - logging.error(f"API返回格式不正确: {response.choices[0].message.content}") | ||
| 108 | - except json.JSONDecodeError as e: | ||
| 109 | - logging.error(f"JSON解析失败: {e}") | ||
| 110 | - continue | 203 | + logging.error(f"OpenAI API返回格式不正确: {response.choices[0].message.content}") |
| 204 | + return [] | ||
| 111 | 205 | ||
| 112 | - return all_results | 206 | + except Exception as e: |
| 207 | + logging.error(f"OpenAI API调用失败: {e}") | ||
| 208 | + return [] | ||
| 209 | + | ||
| 210 | + async def _analyze_with_claude(self, messages_text: str, system_prompt: str, | ||
| 211 | + model: str, max_tokens: int) -> List[Dict]: | ||
| 212 | + """使用Claude API进行分析""" | ||
| 213 | + try: | ||
| 214 | + response = await self.claude_client.messages.create( | ||
| 215 | + model=model, | ||
| 216 | + max_tokens=max_tokens, | ||
| 217 | + temperature=0.3, | ||
| 218 | + system=system_prompt, | ||
| 219 | + messages=[ | ||
| 220 | + { | ||
| 221 | + "role": "user", | ||
| 222 | + "content": f"请分析以下消息:\n{messages_text}" | ||
| 223 | + } | ||
| 224 | + ] | ||
| 225 | + ) | ||
| 226 | + | ||
| 227 | + result = json.loads(response.content[0].text) | ||
| 228 | + if isinstance(result, dict) and 'analysis_results' in result: | ||
| 229 | + return result['analysis_results'] | ||
| 230 | + else: | ||
| 231 | + logging.error(f"Claude API返回格式不正确: {response.content[0].text}") | ||
| 232 | + return [] | ||
| 113 | 233 | ||
| 114 | except Exception as e: | 234 | except Exception as e: |
| 115 | - logging.error(f"AI分析过程出错: {e}") | 235 | + logging.error(f"Claude API调用失败: {e}") |
| 116 | return [] | 236 | return [] |
| 117 | 237 | ||
| 118 | def format_analysis_for_display(self, analysis: Dict) -> Dict: | 238 | def format_analysis_for_display(self, analysis: Dict) -> Dict: |
| @@ -467,8 +467,21 @@ | @@ -467,8 +467,21 @@ | ||
| 467 | </div> | 467 | </div> |
| 468 | <div class="form-group mx-2 mb-0"> | 468 | <div class="form-group mx-2 mb-0"> |
| 469 | <select id="modelType" class="form-control form-control-sm"> | 469 | <select id="modelType" class="form-control form-control-sm"> |
| 470 | - <option value="gpt-3.5-turbo" selected>GPT-3.5</option> | ||
| 471 | - <option value="gpt-4">GPT-4</option> | 470 | + <optgroup label="OpenAI 模型"> |
| 471 | + <option value="gpt-3.5-turbo">GPT-3.5-Turbo ($0.0015/1K tokens)</option> | ||
| 472 | + <option value="gpt-3.5-turbo-16k">GPT-3.5-Turbo-16K ($0.003/1K tokens)</option> | ||
| 473 | + <option value="gpt-4">GPT-4 ($0.03/1K tokens)</option> | ||
| 474 | + <option value="gpt-4-32k">GPT-4-32K ($0.06/1K tokens)</option> | ||
| 475 | + <option value="gpt-4-turbo-preview">GPT-4-Turbo ($0.01/1K tokens)</option> | ||
| 476 | + </optgroup> | ||
| 477 | + <optgroup label="Claude 模型"> | ||
| 478 | + <option value="claude-3-opus-20240229">Claude-3 Opus ($0.015/1K tokens)</option> | ||
| 479 | + <option value="claude-3-sonnet-20240229">Claude-3 Sonnet ($0.003/1K tokens)</option> | ||
| 480 | + <option value="claude-3-haiku-20240307">Claude-3 Haiku ($0.0025/1K tokens)</option> | ||
| 481 | + <option value="claude-2.1">Claude-2.1 ($0.008/1K tokens)</option> | ||
| 482 | + <option value="claude-2.0">Claude-2.0 ($0.008/1K tokens)</option> | ||
| 483 | + <option value="claude-instant-1.2">Claude Instant ($0.0015/1K tokens)</option> | ||
| 484 | + </optgroup> | ||
| 472 | </select> | 485 | </select> |
| 473 | </div> | 486 | </div> |
| 474 | <div class="form-group mx-2 mb-0"> | 487 | <div class="form-group mx-2 mb-0"> |
-
Please register or login to post a comment