戒酒的李白

More model support, including OpenAI and Claude, with corresponding updates to t…

…he README documentation.
@@ -42,6 +42,7 @@ @@ -42,6 +42,7 @@
42 - [MySQL](https://www.mysql.com/) 数据库 42 - [MySQL](https://www.mysql.com/) 数据库
43 - [Conda](https://docs.conda.io/en/latest/)(可选,用于环境管理) 43 - [Conda](https://docs.conda.io/en/latest/)(可选,用于环境管理)
44 - 合法的微博账号(用于数据采集) 44 - 合法的微博账号(用于数据采集)
  45 +- OpenAI API密钥或Anthropic(Claude)API密钥(用于AI分析功能)
45 46
46 ### 安装步骤 47 ### 安装步骤
47 48
@@ -68,7 +69,20 @@ @@ -68,7 +69,20 @@
68 - 运行 `createTables.sql` 创建所需的数据库表。 69 - 运行 `createTables.sql` 创建所需的数据库表。
69 - 修改 `config.py` 中的数据库连接配置,确保与您的MySQL设置匹配。 70 - 修改 `config.py` 中的数据库连接配置,确保与您的MySQL设置匹配。
70 71
71 -4. 启动Flask应用: 72 +4. 配置AI分析功能(可选):
  73 +
  74 + 设置AI分析功能所需的环境变量:
  75 + ```bash
  76 + # OpenAI API配置(使用GPT模型必需)
  77 + export OPENAI_API_KEY="你的openai密钥"
  78 +
  79 + # Anthropic API配置(使用Claude模型必需)
  80 + export ANTHROPIC_API_KEY="你的anthropic密钥"
  81 + ```
  82 +
  83 + 注意:至少需要配置一个API密钥才能使用AI分析功能。
  84 +
  85 +5. 启动Flask应用:
72 86
73 ```bash 87 ```bash
74 python app.py 88 python app.py
@@ -90,6 +104,8 @@ @@ -90,6 +104,8 @@
90 - **[Matplotlib](https://matplotlib.org/)** - 数据可视化库。 104 - **[Matplotlib](https://matplotlib.org/)** - 数据可视化库。
91 - **[Scikit-learn](https://scikit-learn.org/)** - 机器学习库,用于模型训练和评估。 105 - **[Scikit-learn](https://scikit-learn.org/)** - 机器学习库,用于模型训练和评估。
92 - **[TensorFlow](https://www.tensorflow.org/)****[PyTorch](https://pytorch.org/)** - 深度学习框架,用于高级模型开发。 106 - **[TensorFlow](https://www.tensorflow.org/)****[PyTorch](https://pytorch.org/)** - 深度学习框架,用于高级模型开发。
  107 +- **[OpenAI GPT](https://openai.com/)** - 先进的语言模型,用于文本分析。
  108 +- **[Anthropic Claude](https://www.anthropic.com/)** - 智能AI模型,用于复杂文本分析。
93 109
94 ## 🤝 贡献 110 ## 🤝 贡献
95 111
@@ -40,6 +40,7 @@ Follow the steps below to run the project on your system. @@ -40,6 +40,7 @@ Follow the steps below to run the project on your system.
40 - [MySQL](https://www.mysql.com/) Database 40 - [MySQL](https://www.mysql.com/) Database
41 - [Conda](https://docs.conda.io/en/latest/) (optional, for environment management) 41 - [Conda](https://docs.conda.io/en/latest/) (optional, for environment management)
42 - A valid Weibo account (for data collection) 42 - A valid Weibo account (for data collection)
  43 +- OpenAI API key or Anthropic (Claude) API key for AI analysis features
43 44
44 ### Installation Steps 45 ### Installation Steps
45 46
@@ -66,13 +67,26 @@ Follow the steps below to run the project on your system. @@ -66,13 +67,26 @@ Follow the steps below to run the project on your system.
66 - Run `createTables.sql` to create the necessary database tables. 67 - Run `createTables.sql` to create the necessary database tables.
67 - Modify the database connection settings in `config.py` to match your MySQL configuration. 68 - Modify the database connection settings in `config.py` to match your MySQL configuration.
68 69
69 -5. Start the Flask application: 70 +5. Configure AI Analysis (Optional):
  71 +
  72 + Set up environment variables for AI analysis features:
  73 + ```bash
  74 + # For OpenAI API (Required for GPT models)
  75 + export OPENAI_API_KEY="your-openai-key"
  76 +
  77 + # For Anthropic API (Required for Claude models)
  78 + export ANTHROPIC_API_KEY="your-anthropic-key"
  79 + ```
  80 +
  81 + Note: At least one API key must be configured to use AI analysis features.
  82 +
  83 +6. Start the Flask application:
70 84
71 ```bash 85 ```bash
72 python app.py 86 python app.py
73 ``` 87 ```
74 88
75 -6. Access the application: Open your browser and navigate to http://localhost:5000 to use the system. 89 +7. Access the application: Open your browser and navigate to http://localhost:5000 to use the system.
76 90
77 ## 🛠️ Technology Stack 91 ## 🛠️ Technology Stack
78 92
@@ -88,6 +102,8 @@ The Weibo Public Opinion Analysis and Prediction System employs a range of moder @@ -88,6 +102,8 @@ The Weibo Public Opinion Analysis and Prediction System employs a range of moder
88 - **[Matplotlib](https://matplotlib.org/)** - A data visualization library. 102 - **[Matplotlib](https://matplotlib.org/)** - A data visualization library.
89 - **[Scikit-learn](https://scikit-learn.org/)** - A machine learning library used for model training and evaluation. 103 - **[Scikit-learn](https://scikit-learn.org/)** - A machine learning library used for model training and evaluation.
90 - **[TensorFlow](https://www.tensorflow.org/)****[PyTorch](https://pytorch.org/)** - Deep learning frameworks used for advanced model development. 104 - **[TensorFlow](https://www.tensorflow.org/)****[PyTorch](https://pytorch.org/)** - Deep learning frameworks used for advanced model development.
  105 +- **[OpenAI GPT](https://openai.com/)** - Advanced language models for text analysis.
  106 +- **[Anthropic Claude](https://www.anthropic.com/)** - AI models for sophisticated text analysis.
91 107
92 ## 🤝 Contribution 108 ## 🤝 Contribution
93 109
1 import openai 1 import openai
  2 +import anthropic
2 import json 3 import json
3 from typing import List, Dict 4 from typing import List, Dict
4 import os 5 import os
@@ -8,11 +9,34 @@ from utils.logger import app_logger as logging @@ -8,11 +9,34 @@ from utils.logger import app_logger as logging
8 class AIAnalyzer: 9 class AIAnalyzer:
9 def __init__(self): 10 def __init__(self):
10 # 从环境变量获取API密钥 11 # 从环境变量获取API密钥
11 - self.api_key = os.getenv('OPENAI_API_KEY')  
12 - if not self.api_key:  
13 - raise ValueError("请设置OPENAI_API_KEY环境变量") 12 + self.openai_key = os.getenv('OPENAI_API_KEY')
  13 + self.claude_key = os.getenv('ANTHROPIC_API_KEY')
14 14
15 - openai.api_key = self.api_key 15 + if not self.openai_key and not self.claude_key:
  16 + raise ValueError("请至少设置一个API密钥 (OPENAI_API_KEY 或 ANTHROPIC_API_KEY)")
  17 +
  18 + if self.openai_key:
  19 + openai.api_key = self.openai_key
  20 + if self.claude_key:
  21 + self.claude_client = anthropic.Anthropic(api_key=self.claude_key)
  22 +
  23 + # 支持的模型列表
  24 + self.supported_models = {
  25 + # OpenAI 模型
  26 + 'gpt-3.5-turbo': {'provider': 'openai', 'max_tokens': 2000, 'cost_per_1k': 0.0015},
  27 + 'gpt-3.5-turbo-16k': {'provider': 'openai', 'max_tokens': 16000, 'cost_per_1k': 0.003},
  28 + 'gpt-4': {'provider': 'openai', 'max_tokens': 8000, 'cost_per_1k': 0.03},
  29 + 'gpt-4-32k': {'provider': 'openai', 'max_tokens': 32000, 'cost_per_1k': 0.06},
  30 + 'gpt-4-turbo-preview': {'provider': 'openai', 'max_tokens': 128000, 'cost_per_1k': 0.01},
  31 +
  32 + # Claude 模型
  33 + 'claude-3-opus-20240229': {'provider': 'anthropic', 'max_tokens': 4000, 'cost_per_1k': 0.015},
  34 + 'claude-3-sonnet-20240229': {'provider': 'anthropic', 'max_tokens': 3000, 'cost_per_1k': 0.003},
  35 + 'claude-3-haiku-20240307': {'provider': 'anthropic', 'max_tokens': 2000, 'cost_per_1k': 0.0025},
  36 + 'claude-2.1': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.008},
  37 + 'claude-2.0': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.008},
  38 + 'claude-instant-1.2': {'provider': 'anthropic', 'max_tokens': 100000, 'cost_per_1k': 0.0015}
  39 + }
16 40
17 # 不同深度的分析提示词 41 # 不同深度的分析提示词
18 self.prompt_templates = { 42 self.prompt_templates = {
@@ -73,46 +97,142 @@ class AIAnalyzer: @@ -73,46 +97,142 @@ class AIAnalyzer:
73 analysis_depth: str = "standard") -> List[Dict]: 97 analysis_depth: str = "standard") -> List[Dict]:
74 """分析一批消息并返回分析结果""" 98 """分析一批消息并返回分析结果"""
75 try: 99 try:
  100 + if model_type not in self.supported_models:
  101 + raise ValueError(f"不支持的模型类型: {model_type}")
  102 +
  103 + model_info = self.supported_models[model_type]
  104 + provider = model_info['provider']
  105 + max_tokens = model_info['max_tokens']
  106 +
  107 + # 根据模型类型调整批处理大小
  108 + adjusted_batch_size = min(batch_size, self._get_optimal_batch_size(model_type))
  109 + if adjusted_batch_size != batch_size:
  110 + logging.info(f"已将批处理大小从 {batch_size} 调整为 {adjusted_batch_size}")
  111 +
76 all_results = [] 112 all_results = []
  113 + total_cost = 0
77 114
78 # 分批处理消息 115 # 分批处理消息
79 - for i in range(0, len(messages), batch_size):  
80 - batch = messages[i:i + batch_size] 116 + for i in range(0, len(messages), adjusted_batch_size):
  117 + batch = messages[i:i + adjusted_batch_size]
81 formatted_messages = [] 118 formatted_messages = []
82 for msg in batch: 119 for msg in batch:
83 formatted_messages.append(f"消息ID: {msg['id']}\n内容: {msg['content']}") 120 formatted_messages.append(f"消息ID: {msg['id']}\n内容: {msg['content']}")
84 121
85 messages_text = "\n---\n".join(formatted_messages) 122 messages_text = "\n---\n".join(formatted_messages)
86 -  
87 - # 获取对应深度的提示词  
88 system_prompt = self.prompt_templates.get(analysis_depth, self.prompt_templates['standard']) 123 system_prompt = self.prompt_templates.get(analysis_depth, self.prompt_templates['standard'])
89 124
90 - # 调用OpenAI API 125 + if provider == 'openai':
  126 + result = await self._analyze_with_openai(
  127 + messages_text,
  128 + system_prompt,
  129 + model_type,
  130 + max_tokens
  131 + )
  132 + else: # anthropic
  133 + result = await self._analyze_with_claude(
  134 + messages_text,
  135 + system_prompt,
  136 + model_type,
  137 + max_tokens
  138 + )
  139 +
  140 + if result:
  141 + all_results.extend(result)
  142 + # 计算本批次成本
  143 + batch_cost = self._calculate_cost(len(messages_text), model_type)
  144 + total_cost += batch_cost
  145 + logging.info(f"批次处理完成,成本: ${batch_cost:.4f}")
  146 +
  147 + logging.info(f"分析完成,总成本: ${total_cost:.4f}")
  148 + return all_results
  149 +
  150 + except Exception as e:
  151 + logging.error(f"AI分析过程出错: {e}")
  152 + return []
  153 +
  154 + def _get_optimal_batch_size(self, model_type: str) -> int:
  155 + """根据模型类型获取最优批处理大小"""
  156 + model_info = self.supported_models[model_type]
  157 + max_tokens = model_info['max_tokens']
  158 +
  159 + # 估算每条消息的平均token数(假设为200)
  160 + avg_tokens_per_message = 200
  161 +
  162 + # 预留20%的token用于系统提示词和响应
  163 + available_tokens = int(max_tokens * 0.8)
  164 +
  165 + # 计算最优批处理大小
  166 + optimal_batch_size = max(1, min(100, available_tokens // avg_tokens_per_message))
  167 +
  168 + return optimal_batch_size
  169 +
  170 + def _calculate_cost(self, input_length: int, model_type: str) -> float:
  171 + """计算API调用成本"""
  172 + model_info = self.supported_models[model_type]
  173 + cost_per_1k = model_info['cost_per_1k']
  174 +
  175 + # 估算token数(假设每4个字符约等于1个token)
  176 + estimated_tokens = input_length // 4
  177 +
  178 + # 计算成本(美元)
  179 + cost = (estimated_tokens / 1000) * cost_per_1k
  180 +
  181 + return cost
  182 +
  183 + async def _analyze_with_openai(self, messages_text: str, system_prompt: str,
  184 + model: str, max_tokens: int) -> List[Dict]:
  185 + """使用OpenAI API进行分析"""
  186 + try:
91 response = await openai.ChatCompletion.acreate( 187 response = await openai.ChatCompletion.acreate(
92 - model=model_type, 188 + model=model,
93 messages=[ 189 messages=[
94 {"role": "system", "content": system_prompt}, 190 {"role": "system", "content": system_prompt},
95 {"role": "user", "content": f"请分析以下消息:\n{messages_text}"} 191 {"role": "user", "content": f"请分析以下消息:\n{messages_text}"}
96 ], 192 ],
97 - temperature=0.3, # 降低随机性  
98 - max_tokens=2000 if analysis_depth != 'deep' else 3000,  
99 - n=1 193 + temperature=0.3,
  194 + max_tokens=max_tokens,
  195 + n=1,
  196 + response_format={"type": "json_object"} # 强制JSON响应格式
100 ) 197 )
101 198
102 - try:  
103 result = json.loads(response.choices[0].message.content) 199 result = json.loads(response.choices[0].message.content)
104 if isinstance(result, dict) and 'analysis_results' in result: 200 if isinstance(result, dict) and 'analysis_results' in result:
105 - all_results.extend(result['analysis_results']) 201 + return result['analysis_results']
106 else: 202 else:
107 - logging.error(f"API返回格式不正确: {response.choices[0].message.content}")  
108 - except json.JSONDecodeError as e:  
109 - logging.error(f"JSON解析失败: {e}")  
110 - continue 203 + logging.error(f"OpenAI API返回格式不正确: {response.choices[0].message.content}")
  204 + return []
111 205
112 - return all_results 206 + except Exception as e:
  207 + logging.error(f"OpenAI API调用失败: {e}")
  208 + return []
  209 +
  210 + async def _analyze_with_claude(self, messages_text: str, system_prompt: str,
  211 + model: str, max_tokens: int) -> List[Dict]:
  212 + """使用Claude API进行分析"""
  213 + try:
  214 + response = await self.claude_client.messages.create(
  215 + model=model,
  216 + max_tokens=max_tokens,
  217 + temperature=0.3,
  218 + system=system_prompt,
  219 + messages=[
  220 + {
  221 + "role": "user",
  222 + "content": f"请分析以下消息:\n{messages_text}"
  223 + }
  224 + ]
  225 + )
  226 +
  227 + result = json.loads(response.content[0].text)
  228 + if isinstance(result, dict) and 'analysis_results' in result:
  229 + return result['analysis_results']
  230 + else:
  231 + logging.error(f"Claude API返回格式不正确: {response.content[0].text}")
  232 + return []
113 233
114 except Exception as e: 234 except Exception as e:
115 - logging.error(f"AI分析过程出错: {e}") 235 + logging.error(f"Claude API调用失败: {e}")
116 return [] 236 return []
117 237
118 def format_analysis_for_display(self, analysis: Dict) -> Dict: 238 def format_analysis_for_display(self, analysis: Dict) -> Dict:
@@ -467,8 +467,21 @@ @@ -467,8 +467,21 @@
467 </div> 467 </div>
468 <div class="form-group mx-2 mb-0"> 468 <div class="form-group mx-2 mb-0">
469 <select id="modelType" class="form-control form-control-sm"> 469 <select id="modelType" class="form-control form-control-sm">
470 - <option value="gpt-3.5-turbo" selected>GPT-3.5</option>  
471 - <option value="gpt-4">GPT-4</option> 470 + <optgroup label="OpenAI 模型">
  471 + <option value="gpt-3.5-turbo">GPT-3.5-Turbo ($0.0015/1K tokens)</option>
  472 + <option value="gpt-3.5-turbo-16k">GPT-3.5-Turbo-16K ($0.003/1K tokens)</option>
  473 + <option value="gpt-4">GPT-4 ($0.03/1K tokens)</option>
  474 + <option value="gpt-4-32k">GPT-4-32K ($0.06/1K tokens)</option>
  475 + <option value="gpt-4-turbo-preview">GPT-4-Turbo ($0.01/1K tokens)</option>
  476 + </optgroup>
  477 + <optgroup label="Claude 模型">
  478 + <option value="claude-3-opus-20240229">Claude-3 Opus ($0.015/1K tokens)</option>
  479 + <option value="claude-3-sonnet-20240229">Claude-3 Sonnet ($0.003/1K tokens)</option>
  480 + <option value="claude-3-haiku-20240307">Claude-3 Haiku ($0.0025/1K tokens)</option>
  481 + <option value="claude-2.1">Claude-2.1 ($0.008/1K tokens)</option>
  482 + <option value="claude-2.0">Claude-2.0 ($0.008/1K tokens)</option>
  483 + <option value="claude-instant-1.2">Claude Instant ($0.0015/1K tokens)</option>
  484 + </optgroup>
472 </select> 485 </select>
473 </div> 486 </div>
474 <div class="form-group mx-2 mb-0"> 487 <div class="form-group mx-2 mb-0">