intro.ipynb
14.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# anhuidianxinzhidao 说明\n",
"0. **下载地址:** [百度网盘](https://pan.baidu.com/s/1nrg5SRU3Xy1VN85dd85-vg)\n",
"1. **数据概览:** 15.6 万条电信问答数据\n",
"2. **推荐实验:** FAQ 问答系统\n",
"3. **数据来源:** 百度知道\n",
"4. **加工处理:**\n",
" 1. 过滤了id、url、qid、reply_t、user字段\n",
" 2. 对question、reply做了脱敏处理"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"path = 'anhuidianxinzhidao_文件夹_所在_路径'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1.anhuidianxinzhidao_filter.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 加载数据"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"pd_all = pd.read_csv(path + 'anhuidianxinzhidao_filter.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 字段说明\n",
"\n",
"| 字段 | 说明 |\n",
"| ---- | ---- |\n",
"| title | 标题 |\n",
"| question | 问题(可为空) |\n",
"| reply| 每个问题的内容 |\n",
"| is_best| 是否是最佳答案 |"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>question</th>\n",
" <th>reply</th>\n",
" <th>is_best</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>129754</th>\n",
" <td>红米no##4x</td>\n",
" <td>NaN</td>\n",
" <td>可以,</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15843</th>\n",
" <td>为什么不能同时用两个电信卡</td>\n",
" <td>NaN</td>\n",
" <td>您好不可以的,目前推出的手机都是不能同时支持两张电信手机卡的,即使是全网通手机也只能在其中的...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23985</th>\n",
" <td>电信181、177、133哪个号段好?</td>\n",
" <td>NaN</td>\n",
" <td>133的</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72065</th>\n",
" <td>华*荣耀7x和魅蓝note6哪个好</td>\n",
" <td>NaN</td>\n",
" <td>荣耀畅玩7X很不错,性价比很高,以下是手机的配置:1、外观方面:荣耀畅玩7X采用5.93英寸...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11843</th>\n",
" <td>p8青春版电信版多少钱</td>\n",
" <td>NaN</td>\n",
" <td>您好,这款手机价格参考如下</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3280</th>\n",
" <td>华为di####00叫什么</td>\n",
" <td>华为di####00叫什么</td>\n",
" <td>DI####00是华为畅享6S全网通版。华为畅享6S性价比高,是一款很不错的手机。电信新出流...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143200</th>\n",
" <td>电信版酷派9190L双卡双通可以用移动网络吗</td>\n",
" <td>NaN</td>\n",
" <td>您好电信版双卡双待手机只能使用电信手机卡上网,卡槽2的移动或联通手机卡只能支持2G网络,一般...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120692</th>\n",
" <td>苹果微信载图怎么载图</td>\n",
" <td>苹果微信载图怎么载图</td>\n",
" <td>您说的应该是截图吧。您可以直接通过苹果手机截图组合按键进行截图操作。直接同时安装电源键和ho...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109786</th>\n",
" <td>天翼网关的wifi被我关了又没有邦定客户端怎么办想再连wifi该怎么办</td>\n",
" <td>NaN</td>\n",
" <td>您好电信光纤猫的无线网络一般需要破解才能使用的,但破解可能会到帐宽带不稳定或不能正常上网,建...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29030</th>\n",
" <td>v*v*x21是不是全网通</td>\n",
" <td>v*v*x21是不是全网通</td>\n",
" <td>vi###21系列是有vi###21A全网通版本与vi###21移动全网通版本的;此两款机型...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72603</th>\n",
" <td>电信网上营业厅手机卡办理步骤</td>\n",
" <td>NaN</td>\n",
" <td>中*电信目前是支持网上办理手机号的,下面分享下网上营业厅办理号卡的步骤:1、首先打开浏览器,...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103229</th>\n",
" <td>花呗可以充话费吗</td>\n",
" <td>NaN</td>\n",
" <td>您好,是可以的,目前花呗进行充值话费,每个月只能使用花呗一次,最高不超过500元,如果您已经...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91507</th>\n",
" <td>荣耀8好还是三星noT4好</td>\n",
" <td>NaN</td>\n",
" <td>如果我选择三星,华为去论坛发个意见都很尴尬。</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143504</th>\n",
" <td>ios10.2.1能降级吗ios10.2.1怎么降级</td>\n",
" <td>NaN</td>\n",
" <td>IOS设备一旦升级IOS系统就无法降级了,因为:1、IOS采用推荐升级、强制保持最新的升级策...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21999</th>\n",
" <td>电信校园网宽带超一分钟多少钱</td>\n",
" <td>NaN</td>\n",
" <td>由于各地业务情况不同,建议用户通过当地的电信网是营业厅或者手机营业厅了解,也可以直接到附近的...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7644</th>\n",
" <td>有没有人办过开发区的电信卡</td>\n",
" <td>NaN</td>\n",
" <td>您好目前使用电信手机卡的用户非常多,电信手机卡资费更优惠、网络更稳定、网速更快,请放心办理使...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76835</th>\n",
" <td>请问67###18这个电话号码是哪里的</td>\n",
" <td>NaN</td>\n",
" <td>查吧</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76752</th>\n",
" <td>电信,铁通,移动,广电。那个网速好呢?</td>\n",
" <td>NaN</td>\n",
" <td>办理宽带推荐您办理电信宽带使用。由于中*电信的服务器、网络架设等较完善,且每年都在不断完善和...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94290</th>\n",
" <td>三星s8+好用不</td>\n",
" <td>NaN</td>\n",
" <td>S8+的主要特征:1.全视曲面屏:超窄边框、沉浸感视效、双曲面侧屏的显示屏,为您带来更纯粹的...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>79345</th>\n",
" <td>一加手机5玩王者会卡吗?</td>\n",
" <td>NaN</td>\n",
" <td>不会卡,我也推荐你买一加5,它运行内存有8G,玩游戏的时候就能感受到性能有多好,手机不卡,丢...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" title question \\\n",
"129754 红米no##4x NaN \n",
"15843 为什么不能同时用两个电信卡 NaN \n",
"23985 电信181、177、133哪个号段好? NaN \n",
"72065 华*荣耀7x和魅蓝note6哪个好 NaN \n",
"11843 p8青春版电信版多少钱 NaN \n",
"3280 华为di####00叫什么 华为di####00叫什么 \n",
"143200 电信版酷派9190L双卡双通可以用移动网络吗 NaN \n",
"120692 苹果微信载图怎么载图 苹果微信载图怎么载图 \n",
"109786 天翼网关的wifi被我关了又没有邦定客户端怎么办想再连wifi该怎么办 NaN \n",
"29030 v*v*x21是不是全网通 v*v*x21是不是全网通 \n",
"72603 电信网上营业厅手机卡办理步骤 NaN \n",
"103229 花呗可以充话费吗 NaN \n",
"91507 荣耀8好还是三星noT4好 NaN \n",
"143504 ios10.2.1能降级吗ios10.2.1怎么降级 NaN \n",
"21999 电信校园网宽带超一分钟多少钱 NaN \n",
"7644 有没有人办过开发区的电信卡 NaN \n",
"76835 请问67###18这个电话号码是哪里的 NaN \n",
"76752 电信,铁通,移动,广电。那个网速好呢? NaN \n",
"94290 三星s8+好用不 NaN \n",
"79345 一加手机5玩王者会卡吗? NaN \n",
"\n",
" reply is_best \n",
"129754 可以, 0 \n",
"15843 您好不可以的,目前推出的手机都是不能同时支持两张电信手机卡的,即使是全网通手机也只能在其中的... 1 \n",
"23985 133的 0 \n",
"72065 荣耀畅玩7X很不错,性价比很高,以下是手机的配置:1、外观方面:荣耀畅玩7X采用5.93英寸... 1 \n",
"11843 您好,这款手机价格参考如下 1 \n",
"3280 DI####00是华为畅享6S全网通版。华为畅享6S性价比高,是一款很不错的手机。电信新出流... 1 \n",
"143200 您好电信版双卡双待手机只能使用电信手机卡上网,卡槽2的移动或联通手机卡只能支持2G网络,一般... 1 \n",
"120692 您说的应该是截图吧。您可以直接通过苹果手机截图组合按键进行截图操作。直接同时安装电源键和ho... 1 \n",
"109786 您好电信光纤猫的无线网络一般需要破解才能使用的,但破解可能会到帐宽带不稳定或不能正常上网,建... 1 \n",
"29030 vi###21系列是有vi###21A全网通版本与vi###21移动全网通版本的;此两款机型... 0 \n",
"72603 中*电信目前是支持网上办理手机号的,下面分享下网上营业厅办理号卡的步骤:1、首先打开浏览器,... 1 \n",
"103229 您好,是可以的,目前花呗进行充值话费,每个月只能使用花呗一次,最高不超过500元,如果您已经... 0 \n",
"91507 如果我选择三星,华为去论坛发个意见都很尴尬。 0 \n",
"143504 IOS设备一旦升级IOS系统就无法降级了,因为:1、IOS采用推荐升级、强制保持最新的升级策... 1 \n",
"21999 由于各地业务情况不同,建议用户通过当地的电信网是营业厅或者手机营业厅了解,也可以直接到附近的... 1 \n",
"7644 您好目前使用电信手机卡的用户非常多,电信手机卡资费更优惠、网络更稳定、网速更快,请放心办理使... 1 \n",
"76835 查吧 0 \n",
"76752 办理宽带推荐您办理电信宽带使用。由于中*电信的服务器、网络架设等较完善,且每年都在不断完善和... 1 \n",
"94290 S8+的主要特征:1.全视曲面屏:超窄边框、沉浸感视效、双曲面侧屏的显示屏,为您带来更纯粹的... 1 \n",
"79345 不会卡,我也推荐你买一加5,它运行内存有8G,玩游戏的时候就能感受到性能有多好,手机不卡,丢... 1 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd_all.sample(n=20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}