Toggle navigation
Toggle navigation
This project
Loading...
Sign in
万朱浩
/
Venue-Ops
Go to a project
Toggle navigation
Projects
Groups
Snippets
Help
Toggle navigation pinning
Project
Activity
Repository
Pipelines
Graphs
Issues
0
Merge Requests
0
Wiki
Network
Create a new issue
Builds
Commits
You need to sign in or sign up before continuing.
Authored by
戒酒的李白
2025-02-11 22:41:09 +0800
Browse Files
Options
Browse Files
Download
Email Patches
Plain Diff
Commit
12a8732bd284e3e3cba78724fe045500edf31e01
12a8732b
1 parent
a60a0f32
More model support, including OpenAI and Claude, with corresponding updates to t…
…he README documentation.
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
191 additions
and
26 deletions
README-CN.md
README.md
utils/ai_analyzer.py
views/page/templates/yuqingpredict.html
README-CN.md
View file @
12a8732
...
...
@@ -42,6 +42,7 @@
-
[
MySQL
](
https://www.mysql.com/
)
数据库
-
[
Conda
](
https://docs.conda.io/en/latest/
)
(可选,用于环境管理)
-
合法的微博账号(用于数据采集)
-
OpenAI API密钥或Anthropic(Claude)API密钥(用于AI分析功能)
### 安装步骤
...
...
@@ -68,7 +69,20 @@
- 运行 `createTables.sql` 创建所需的数据库表。
- 修改 `config.py` 中的数据库连接配置,确保与您的MySQL设置匹配。
4. 启动Flask应用:
4. 配置AI分析功能(可选):
设置AI分析功能所需的环境变量:
```
bash
# OpenAI API配置(使用GPT模型必需)
export OPENAI_API_KEY="你的openai密钥"
# Anthropic API配置(使用Claude模型必需)
export ANTHROPIC_API_KEY="你的anthropic密钥"
```
注意:至少需要配置一个API密钥才能使用AI分析功能。
5. 启动Flask应用:
```
bash
python app.py
...
...
@@ -90,6 +104,8 @@
-
**[Matplotlib](https://matplotlib.org/)**
- 数据可视化库。
-
**[Scikit-learn](https://scikit-learn.org/)**
- 机器学习库,用于模型训练和评估。
-
**[TensorFlow](https://www.tensorflow.org/)**
或
**[PyTorch](https://pytorch.org/)**
- 深度学习框架,用于高级模型开发。
-
**[OpenAI GPT](https://openai.com/)**
- 先进的语言模型,用于文本分析。
-
**[Anthropic Claude](https://www.anthropic.com/)**
- 智能AI模型,用于复杂文本分析。
## 🤝 贡献
...
...
README.md
View file @
12a8732
...
...
@@ -40,6 +40,7 @@ Follow the steps below to run the project on your system.
-
[
MySQL
](
https://www.mysql.com/
)
Database
-
[
Conda
](
https://docs.conda.io/en/latest/
)
(
optional,
for environment management)
-
A valid Weibo account (for data collection)
-
OpenAI API key or Anthropic (Claude) API key for AI analysis features
### Installation Steps
...
...
@@ -66,13 +67,26 @@ Follow the steps below to run the project on your system.
- Run `createTables.sql` to create the necessary database tables.
- Modify the database connection settings in `config.py` to match your MySQL configuration.
5. Start the Flask application:
5. Configure AI Analysis (Optional):
Set up environment variables for AI analysis features:
```
bash
# For OpenAI API (Required for GPT models)
export OPENAI_API_KEY="your-openai-key"
# For Anthropic API (Required for Claude models)
export ANTHROPIC_API_KEY="your-anthropic-key"
```
Note: At least one API key must be configured to use AI analysis features.
6. Start the Flask application:
```
bash
python app.py
```
6
.
Access the application: Open your browser and navigate to http://localhost:5000 to use the system.
7
.
Access the application: Open your browser and navigate to http://localhost:5000 to use the system.
## 🛠️ Technology Stack
...
...
@@ -88,6 +102,8 @@ The Weibo Public Opinion Analysis and Prediction System employs a range of moder
-
**[Matplotlib](https://matplotlib.org/)**
- A data visualization library.
-
**[Scikit-learn](https://scikit-learn.org/)**
- A machine learning library used for model training and evaluation.
-
**[TensorFlow](https://www.tensorflow.org/)**
或
**[PyTorch](https://pytorch.org/)**
- Deep learning frameworks used for advanced model development.
-
**[OpenAI GPT](https://openai.com/)**
- Advanced language models for text analysis.
-
**[Anthropic Claude](https://www.anthropic.com/)**
- AI models for sophisticated text analysis.
## 🤝 Contribution
...
...
utils/ai_analyzer.py
View file @
12a8732
import
openai
import
anthropic
import
json
from
typing
import
List
,
Dict
import
os
...
...
@@ -8,11 +9,34 @@ from utils.logger import app_logger as logging
class
AIAnalyzer
:
def
__init__
(
self
):
# 从环境变量获取API密钥
self
.
api_key
=
os
.
getenv
(
'OPENAI_API_KEY'
)
if
not
self
.
api_key
:
raise
ValueError
(
"请设置OPENAI_API_KEY环境变量"
)
self
.
openai_key
=
os
.
getenv
(
'OPENAI_API_KEY'
)
self
.
claude_key
=
os
.
getenv
(
'ANTHROPIC_API_KEY'
)
openai
.
api_key
=
self
.
api_key
if
not
self
.
openai_key
and
not
self
.
claude_key
:
raise
ValueError
(
"请至少设置一个API密钥 (OPENAI_API_KEY 或 ANTHROPIC_API_KEY)"
)
if
self
.
openai_key
:
openai
.
api_key
=
self
.
openai_key
if
self
.
claude_key
:
self
.
claude_client
=
anthropic
.
Anthropic
(
api_key
=
self
.
claude_key
)
# 支持的模型列表
self
.
supported_models
=
{
# OpenAI 模型
'gpt-3.5-turbo'
:
{
'provider'
:
'openai'
,
'max_tokens'
:
2000
,
'cost_per_1k'
:
0.0015
},
'gpt-3.5-turbo-16k'
:
{
'provider'
:
'openai'
,
'max_tokens'
:
16000
,
'cost_per_1k'
:
0.003
},
'gpt-4'
:
{
'provider'
:
'openai'
,
'max_tokens'
:
8000
,
'cost_per_1k'
:
0.03
},
'gpt-4-32k'
:
{
'provider'
:
'openai'
,
'max_tokens'
:
32000
,
'cost_per_1k'
:
0.06
},
'gpt-4-turbo-preview'
:
{
'provider'
:
'openai'
,
'max_tokens'
:
128000
,
'cost_per_1k'
:
0.01
},
# Claude 模型
'claude-3-opus-20240229'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
4000
,
'cost_per_1k'
:
0.015
},
'claude-3-sonnet-20240229'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
3000
,
'cost_per_1k'
:
0.003
},
'claude-3-haiku-20240307'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
2000
,
'cost_per_1k'
:
0.0025
},
'claude-2.1'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
100000
,
'cost_per_1k'
:
0.008
},
'claude-2.0'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
100000
,
'cost_per_1k'
:
0.008
},
'claude-instant-1.2'
:
{
'provider'
:
'anthropic'
,
'max_tokens'
:
100000
,
'cost_per_1k'
:
0.0015
}
}
# 不同深度的分析提示词
self
.
prompt_templates
=
{
...
...
@@ -73,46 +97,142 @@ class AIAnalyzer:
analysis_depth
:
str
=
"standard"
)
->
List
[
Dict
]:
"""分析一批消息并返回分析结果"""
try
:
if
model_type
not
in
self
.
supported_models
:
raise
ValueError
(
f
"不支持的模型类型: {model_type}"
)
model_info
=
self
.
supported_models
[
model_type
]
provider
=
model_info
[
'provider'
]
max_tokens
=
model_info
[
'max_tokens'
]
# 根据模型类型调整批处理大小
adjusted_batch_size
=
min
(
batch_size
,
self
.
_get_optimal_batch_size
(
model_type
))
if
adjusted_batch_size
!=
batch_size
:
logging
.
info
(
f
"已将批处理大小从 {batch_size} 调整为 {adjusted_batch_size}"
)
all_results
=
[]
total_cost
=
0
# 分批处理消息
for
i
in
range
(
0
,
len
(
messages
),
batch_size
):
batch
=
messages
[
i
:
i
+
batch_size
]
for
i
in
range
(
0
,
len
(
messages
),
adjusted_batch_size
):
batch
=
messages
[
i
:
i
+
adjusted_batch_size
]
formatted_messages
=
[]
for
msg
in
batch
:
formatted_messages
.
append
(
f
"消息ID: {msg['id']}
\n
内容: {msg['content']}"
)
messages_text
=
"
\n
---
\n
"
.
join
(
formatted_messages
)
# 获取对应深度的提示词
system_prompt
=
self
.
prompt_templates
.
get
(
analysis_depth
,
self
.
prompt_templates
[
'standard'
])
# 调用OpenAI API
if
provider
==
'openai'
:
result
=
await
self
.
_analyze_with_openai
(
messages_text
,
system_prompt
,
model_type
,
max_tokens
)
else
:
# anthropic
result
=
await
self
.
_analyze_with_claude
(
messages_text
,
system_prompt
,
model_type
,
max_tokens
)
if
result
:
all_results
.
extend
(
result
)
# 计算本批次成本
batch_cost
=
self
.
_calculate_cost
(
len
(
messages_text
),
model_type
)
total_cost
+=
batch_cost
logging
.
info
(
f
"批次处理完成,成本: ${batch_cost:.4f}"
)
logging
.
info
(
f
"分析完成,总成本: ${total_cost:.4f}"
)
return
all_results
except
Exception
as
e
:
logging
.
error
(
f
"AI分析过程出错: {e}"
)
return
[]
def
_get_optimal_batch_size
(
self
,
model_type
:
str
)
->
int
:
"""根据模型类型获取最优批处理大小"""
model_info
=
self
.
supported_models
[
model_type
]
max_tokens
=
model_info
[
'max_tokens'
]
# 估算每条消息的平均token数(假设为200)
avg_tokens_per_message
=
200
# 预留20%的token用于系统提示词和响应
available_tokens
=
int
(
max_tokens
*
0.8
)
# 计算最优批处理大小
optimal_batch_size
=
max
(
1
,
min
(
100
,
available_tokens
//
avg_tokens_per_message
))
return
optimal_batch_size
def
_calculate_cost
(
self
,
input_length
:
int
,
model_type
:
str
)
->
float
:
"""计算API调用成本"""
model_info
=
self
.
supported_models
[
model_type
]
cost_per_1k
=
model_info
[
'cost_per_1k'
]
# 估算token数(假设每4个字符约等于1个token)
estimated_tokens
=
input_length
//
4
# 计算成本(美元)
cost
=
(
estimated_tokens
/
1000
)
*
cost_per_1k
return
cost
async
def
_analyze_with_openai
(
self
,
messages_text
:
str
,
system_prompt
:
str
,
model
:
str
,
max_tokens
:
int
)
->
List
[
Dict
]:
"""使用OpenAI API进行分析"""
try
:
response
=
await
openai
.
ChatCompletion
.
acreate
(
model
=
model_type
,
model
=
model
,
messages
=
[
{
"role"
:
"system"
,
"content"
:
system_prompt
},
{
"role"
:
"user"
,
"content"
:
f
"请分析以下消息:
\n
{messages_text}"
}
],
temperature
=
0.3
,
# 降低随机性
max_tokens
=
2000
if
analysis_depth
!=
'deep'
else
3000
,
n
=
1
temperature
=
0.3
,
max_tokens
=
max_tokens
,
n
=
1
,
response_format
=
{
"type"
:
"json_object"
}
# 强制JSON响应格式
)
try
:
result
=
json
.
loads
(
response
.
choices
[
0
]
.
message
.
content
)
if
isinstance
(
result
,
dict
)
and
'analysis_results'
in
result
:
all_results
.
extend
(
result
[
'analysis_results'
])
return
result
[
'analysis_results'
]
else
:
logging
.
error
(
f
"API返回格式不正确: {response.choices[0].message.content}"
)
except
json
.
JSONDecodeError
as
e
:
logging
.
error
(
f
"JSON解析失败: {e}"
)
continue
logging
.
error
(
f
"OpenAI API返回格式不正确: {response.choices[0].message.content}"
)
return
[]
return
all_results
except
Exception
as
e
:
logging
.
error
(
f
"OpenAI API调用失败: {e}"
)
return
[]
async
def
_analyze_with_claude
(
self
,
messages_text
:
str
,
system_prompt
:
str
,
model
:
str
,
max_tokens
:
int
)
->
List
[
Dict
]:
"""使用Claude API进行分析"""
try
:
response
=
await
self
.
claude_client
.
messages
.
create
(
model
=
model
,
max_tokens
=
max_tokens
,
temperature
=
0.3
,
system
=
system_prompt
,
messages
=
[
{
"role"
:
"user"
,
"content"
:
f
"请分析以下消息:
\n
{messages_text}"
}
]
)
result
=
json
.
loads
(
response
.
content
[
0
]
.
text
)
if
isinstance
(
result
,
dict
)
and
'analysis_results'
in
result
:
return
result
[
'analysis_results'
]
else
:
logging
.
error
(
f
"Claude API返回格式不正确: {response.content[0].text}"
)
return
[]
except
Exception
as
e
:
logging
.
error
(
f
"
AI分析过程出错
: {e}"
)
logging
.
error
(
f
"
Claude API调用失败
: {e}"
)
return
[]
def
format_analysis_for_display
(
self
,
analysis
:
Dict
)
->
Dict
:
...
...
views/page/templates/yuqingpredict.html
View file @
12a8732
...
...
@@ -467,8 +467,21 @@
</div>
<div
class=
"form-group mx-2 mb-0"
>
<select
id=
"modelType"
class=
"form-control form-control-sm"
>
<option
value=
"gpt-3.5-turbo"
selected
>
GPT-3.5
</option>
<option
value=
"gpt-4"
>
GPT-4
</option>
<optgroup
label=
"OpenAI 模型"
>
<option
value=
"gpt-3.5-turbo"
>
GPT-3.5-Turbo ($0.0015/1K tokens)
</option>
<option
value=
"gpt-3.5-turbo-16k"
>
GPT-3.5-Turbo-16K ($0.003/1K tokens)
</option>
<option
value=
"gpt-4"
>
GPT-4 ($0.03/1K tokens)
</option>
<option
value=
"gpt-4-32k"
>
GPT-4-32K ($0.06/1K tokens)
</option>
<option
value=
"gpt-4-turbo-preview"
>
GPT-4-Turbo ($0.01/1K tokens)
</option>
</optgroup>
<optgroup
label=
"Claude 模型"
>
<option
value=
"claude-3-opus-20240229"
>
Claude-3 Opus ($0.015/1K tokens)
</option>
<option
value=
"claude-3-sonnet-20240229"
>
Claude-3 Sonnet ($0.003/1K tokens)
</option>
<option
value=
"claude-3-haiku-20240307"
>
Claude-3 Haiku ($0.0025/1K tokens)
</option>
<option
value=
"claude-2.1"
>
Claude-2.1 ($0.008/1K tokens)
</option>
<option
value=
"claude-2.0"
>
Claude-2.0 ($0.008/1K tokens)
</option>
<option
value=
"claude-instant-1.2"
>
Claude Instant ($0.0015/1K tokens)
</option>
</optgroup>
</select>
</div>
<div
class=
"form-group mx-2 mb-0"
>
...
...
Please
register
or
login
to post a comment