Committed by
GitHub
功能: 重写小红书 Skills,完整迁移为 CDP Python 实现 (#1)
## 主要变更 ### 核心模块重写 - 创建 scripts/xhs/ 包,包含 18 个专业模块(3728 行代码) - 基于 xiaohongshu-mcp Go 源码完整实现 - CDP WebSocket 直接通信,替代第三方库依赖 ### 模块清单 - cdp.py: Browser/Page/Element 类,完整 CDP 协议实现 - stealth.py: 反检测 JS 注入 + Chrome 启动参数 - login.py: 登录检查与二维码登录(QR 码保存到临时文件供 Agent 显示) - publish.py: 图文发布完整流程 - publish_video.py: 视频发布完整流程 - search.py: 搜索与内容筛选 - feed_detail.py: 笔记详情与评论加载 - comment.py: 评论与回复 - like_favorite.py: 点赞与收藏 - user_profile.py: 用户主页 - cookies.py: Cookie 持久化 - types.py: 完整的 dataclass 数据类型系统 - errors.py: 自定义异常体系 - human.py: 人类行为模拟(延迟、滚动) - selectors.py: CSS 选择器常量 - urls.py: URL 构建函数 ### CLI 统一接口 - scripts/cli.py: 13 个子命令,完全兼容 xiaohongshu-mcp MCP 工具 - check-login: 检查登录状态 - login: 获取登录二维码 - switch-account/delete-cookies: 账号切换 - publish-content: 图文发布 - publish-with-video: 视频发布 - list-feeds: Feed 列表 - search-feeds: Feed 搜索 - get-feed-detail: 笔记详情 - user-profile: 用户主页 - post-comment: 发送评论 - like-feed: 点赞笔记 - favorite-feed: 收藏笔记 ### 支持脚本重写 - chrome_launcher.py: Chrome 进程管理(跨平台) - account_manager.py: 多账号 Profile 隔离 - image_downloader.py: 图片/视频下载(SHA256 缓存) - title_utils.py: UTF-16 标题长度计算 - run_lock.py: 单实例锁机制 - publish_pipeline.py: 发布流程编排 CLI ### 文档与配置 - SKILL.md: 统一技能入口(路由到 5 个子技能) - skills/xhs-auth/SKILL.md: 认证管理技能 - skills/xhs-publish/SKILL.md: 内容发布技能(图文+视频) - skills/xhs-explore/SKILL.md: 内容发现与分析技能 - skills/xhs-interact/SKILL.md: 社交互动技能(评论/点赞/收藏) - skills/xhs-content-ops/SKILL.md: 复合内容运营工作流技能 - CLAUDE.md: 项目开发指南 - PROMPT.md: Ralph Loop 驱动文件 - pyproject.toml: uv 项目配置(uv.lock) - README.md: 完整项目文档 ### 技术栈 - Python 3.11+ with uv 包管理 - requests + websockets: CDP WebSocket 通信 - 代码规范: ruff lint + format ## 对应关系 所有 13 个子命令与 xiaohongshu-mcp MCP 工具完全对应 支持 OpenClaw agent 框架直接调用 ## 前置工作 - 创建 scripts/xhs/ 包架构 - 实现 CDP WebSocket 协议 - 完整的类型系统和错误处理 - CLI 子命令系统 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Showing
30 changed files
with
4888 additions
and
1 deletions
Too many changes to show.
To preserve performance only 30 of 30+ files are displayed.
| @@ -205,3 +205,15 @@ cython_debug/ | @@ -205,3 +205,15 @@ cython_debug/ | ||
| 205 | marimo/_static/ | 205 | marimo/_static/ |
| 206 | marimo/_lsp/ | 206 | marimo/_lsp/ |
| 207 | __marimo__/ | 207 | __marimo__/ |
| 208 | + | ||
| 209 | +# Project specific | ||
| 210 | +tmp/ | ||
| 211 | +*.txt | ||
| 212 | +!requirements.txt | ||
| 213 | +config/accounts.json | ||
| 214 | +title.txt | ||
| 215 | +content.txt | ||
| 216 | +comment.txt | ||
| 217 | + | ||
| 218 | +# Ralph Loop state | ||
| 219 | +.claude/.ralph-loop.local.md |
CLAUDE.md
0 → 100644
| 1 | +# xiaohongshu-skills | ||
| 2 | + | ||
| 3 | +小红书自动化 Claude Code Skills,基于 Python CDP 浏览器自动化引擎。 | ||
| 4 | +为 OpenClaw 生态提供小红书操作能力,同时支持 Claude Code skills 格式。 | ||
| 5 | + | ||
| 6 | +## 项目结构 | ||
| 7 | + | ||
| 8 | +``` | ||
| 9 | +xiaohongshu-skills/ | ||
| 10 | +├── scripts/ # Python CDP 自动化引擎 | ||
| 11 | +│ ├── xhs/ # 核心 XHS 自动化包 | ||
| 12 | +│ │ ├── __init__.py | ||
| 13 | +│ │ ├── cdp.py # CDP WebSocket 客户端(Browser, Page, Element) | ||
| 14 | +│ │ ├── stealth.py # 反检测 JS 注入 + Chrome 启动参数 | ||
| 15 | +│ │ ├── cookies.py # Cookie 文件持久化 | ||
| 16 | +│ │ ├── types.py # 数据类型(dataclass) | ||
| 17 | +│ │ ├── errors.py # 异常体系 | ||
| 18 | +│ │ ├── selectors.py # CSS 选择器常量 | ||
| 19 | +│ │ ├── urls.py # URL 常量和构建函数 | ||
| 20 | +│ │ ├── human.py # 人类行为模拟(延迟、滚动) | ||
| 21 | +│ │ ├── login.py # 登录检查、二维码登录 | ||
| 22 | +│ │ ├── feeds.py # 首页 Feed 列表 | ||
| 23 | +│ │ ├── search.py # 搜索 + 筛选 | ||
| 24 | +│ │ ├── feed_detail.py # 笔记详情 + 评论加载 | ||
| 25 | +│ │ ├── user_profile.py # 用户主页 | ||
| 26 | +│ │ ├── comment.py # 评论、回复 | ||
| 27 | +│ │ ├── like_favorite.py # 点赞、收藏 | ||
| 28 | +│ │ ├── publish.py # 图文发布 | ||
| 29 | +│ │ └── publish_video.py # 视频发布 | ||
| 30 | +│ ├── cli.py # 统一 CLI 入口(13 个子命令) | ||
| 31 | +│ ├── chrome_launcher.py # Chrome 进程管理 | ||
| 32 | +│ ├── account_manager.py # 多账号管理 | ||
| 33 | +│ ├── image_downloader.py # 媒体下载(SHA256 缓存) | ||
| 34 | +│ ├── title_utils.py # UTF-16 标题长度计算 | ||
| 35 | +│ ├── run_lock.py # 单实例锁 | ||
| 36 | +│ └── publish_pipeline.py # 发布编排器 | ||
| 37 | +├── skills/ # Claude Code Skills 定义 | ||
| 38 | +│ ├── xhs-auth/SKILL.md # 认证管理 | ||
| 39 | +│ ├── xhs-publish/SKILL.md # 内容发布(图文+视频) | ||
| 40 | +│ ├── xhs-explore/SKILL.md # 内容发现与分析 | ||
| 41 | +│ ├── xhs-interact/SKILL.md # 社交互动(评论/点赞/收藏) | ||
| 42 | +│ └── xhs-content-ops/SKILL.md # 复合内容运营工作流 | ||
| 43 | +├── pyproject.toml # uv 项目配置 | ||
| 44 | +├── SKILL.md # 统一入口(路由到子技能) | ||
| 45 | +├── CLAUDE.md # 本文件 | ||
| 46 | +├── PROMPT.md # Ralph Loop 驱动文件 | ||
| 47 | +└── README.md | ||
| 48 | +``` | ||
| 49 | + | ||
| 50 | +## 技术栈 | ||
| 51 | + | ||
| 52 | +- **Python**: >=3.11 | ||
| 53 | +- **包管理**: uv | ||
| 54 | +- **依赖**: requests + websockets(直接 CDP WebSocket 通信) | ||
| 55 | +- **浏览器**: Chrome(通过 CDP 远程调试协议控制) | ||
| 56 | +- **代码规范**: ruff(lint + format) | ||
| 57 | +- **数据提取**: `window.__INITIAL_STATE__`(与 Go 源码一致) | ||
| 58 | + | ||
| 59 | +## 开发命令 | ||
| 60 | + | ||
| 61 | +```bash | ||
| 62 | +uv sync # 安装依赖 | ||
| 63 | +uv run ruff check . # Lint 检查 | ||
| 64 | +uv run ruff format . # 代码格式化 | ||
| 65 | +uv run pytest # 运行测试 | ||
| 66 | +``` | ||
| 67 | + | ||
| 68 | +## 架构设计 | ||
| 69 | + | ||
| 70 | +### 双层结构 | ||
| 71 | + | ||
| 72 | +1. **scripts/ — Python CDP 引擎** | ||
| 73 | + - 基于 xiaohongshu-mcp Go 源码从零重写 | ||
| 74 | + - `xhs/` 包:模块化的核心自动化库 | ||
| 75 | + - `cli.py`:统一 CLI 入口,13 个子命令对应 MCP 工具 | ||
| 76 | + - JSON 结构化输出,便于 agent 解析 | ||
| 77 | + - 多账号支持,独立 Chrome Profile 隔离 | ||
| 78 | + - 反检测保护(stealth flags + JS 注入) | ||
| 79 | + | ||
| 80 | +2. **skills/ — Claude Code Skills 定义** | ||
| 81 | + - SKILL.md 格式,指导 Claude 如何调用 scripts/ | ||
| 82 | + - 包含输入判断、约束规则、工作流程、失败处理 | ||
| 83 | + | ||
| 84 | +### 调用方式 | ||
| 85 | + | ||
| 86 | +```bash | ||
| 87 | +# 统一 CLI 入口 | ||
| 88 | +python scripts/cli.py check-login | ||
| 89 | +python scripts/cli.py search-feeds --keyword "关键词" | ||
| 90 | +python scripts/cli.py publish --title-file t.txt --content-file c.txt --images pic.jpg | ||
| 91 | + | ||
| 92 | +# 发布流水线(含图片下载和登录检查) | ||
| 93 | +python scripts/publish_pipeline.py --title-file t.txt --content-file c.txt --images URL1 | ||
| 94 | +``` | ||
| 95 | + | ||
| 96 | +## 代码规范 | ||
| 97 | + | ||
| 98 | +### Python 风格 | ||
| 99 | +- 遵循 PEP 8,使用 ruff 强制执行 | ||
| 100 | +- 完整的 type hints(PEP 484),使用 `str | None` 语法 | ||
| 101 | +- 公共函数和类必须有 docstring | ||
| 102 | +- 行长度上限 100 字符 | ||
| 103 | +- 使用 `from __future__ import annotations` 启用延迟注解 | ||
| 104 | + | ||
| 105 | +### 命名约定 | ||
| 106 | +- 文件名:snake_case | ||
| 107 | +- 类名:PascalCase | ||
| 108 | +- 函数/变量:snake_case | ||
| 109 | +- 常量:UPPER_SNAKE_CASE | ||
| 110 | + | ||
| 111 | +### 错误处理 | ||
| 112 | +- 自定义异常类继承自 `XHSError` 基类(`xhs/errors.py`) | ||
| 113 | +- CLI 命令使用结构化 exit code:0=成功,1=未登录,2=错误 | ||
| 114 | +- 所有用户可见的错误信息使用中文 | ||
| 115 | + | ||
| 116 | +### 安全约束 | ||
| 117 | +- 发布类操作必须有用户确认机制 | ||
| 118 | +- 文件路径必须使用绝对路径 | ||
| 119 | +- 不在命令行参数中内联敏感内容(使用文件传递) | ||
| 120 | +- Chrome Profile 目录隔离账号 cookies | ||
| 121 | + | ||
| 122 | +## 参考资源 | ||
| 123 | + | ||
| 124 | +- **xiaohongshu-mcp Go 源码**: /Users/zy/src/zy/xiaohongshu-mcp/ | ||
| 125 | + | ||
| 126 | +## MCP 工具对照表 | ||
| 127 | + | ||
| 128 | +scripts/cli.py 的 13 个子命令对应 xiaohongshu-mcp 的 MCP 工具: | ||
| 129 | + | ||
| 130 | +| CLI 子命令 | MCP 工具 | 分类 | | ||
| 131 | +|--|--|--| | ||
| 132 | +| `check-login` | check_login_status | 认证 | | ||
| 133 | +| `login` | get_login_qrcode | 认证 | | ||
| 134 | +| `delete-cookies` | delete_cookies | 认证 | | ||
| 135 | +| `list-feeds` | list_feeds | 浏览 | | ||
| 136 | +| `search-feeds` | search_feeds | 浏览 | | ||
| 137 | +| `get-feed-detail` | get_feed_detail | 浏览 | | ||
| 138 | +| `user-profile` | user_profile | 浏览 | | ||
| 139 | +| `post-comment` | post_comment_to_feed | 互动 | | ||
| 140 | +| `reply-comment` | reply_comment_in_feed | 互动 | | ||
| 141 | +| `like-feed` | like_feed | 互动 | | ||
| 142 | +| `favorite-feed` | favorite_feed | 互动 | | ||
| 143 | +| `publish` | publish_content | 发布 | | ||
| 144 | +| `publish-video` | publish_with_video | 发布 | |
PROMPT.md
0 → 100644
| 1 | +# 小红书 Skills 开发任务 | ||
| 2 | + | ||
| 3 | +## 目标 | ||
| 4 | + | ||
| 5 | +基于 xiaohongshu-mcp Go 源码,从零重写 Python CDP 引擎,为 OpenClaw 生态构建完整的小红书自动化 Skills。 | ||
| 6 | + | ||
| 7 | +## 参考资料 | ||
| 8 | + | ||
| 9 | +- **xiaohongshu-mcp Go 源码**: `/Users/zy/src/zy/xiaohongshu-mcp/` — 10k stars,13 个 MCP 工具 | ||
| 10 | +- **xiaohongshu-mcp 数据结构**: `/Users/zy/src/zy/xiaohongshu-mcp/xiaohongshu/types.go` | ||
| 11 | +- **xiaohongshu-mcp 工具定义**: `/Users/zy/src/zy/xiaohongshu-mcp/mcp_server.go` | ||
| 12 | + | ||
| 13 | +## 架构 | ||
| 14 | + | ||
| 15 | +### 模块结构 | ||
| 16 | + | ||
| 17 | +``` | ||
| 18 | +scripts/ | ||
| 19 | +├── xhs/ # 核心 XHS 自动化包 | ||
| 20 | +│ ├── cdp.py # CDP WebSocket 客户端 | ||
| 21 | +│ ├── stealth.py # 反检测 JS 注入 + Chrome 启动参数 | ||
| 22 | +│ ├── cookies.py # Cookie 文件持久化 | ||
| 23 | +│ ├── types.py # 数据类型(dataclass) | ||
| 24 | +│ ├── errors.py # 异常体系 | ||
| 25 | +│ ├── selectors.py # CSS 选择器常量 | ||
| 26 | +│ ├── urls.py # URL 常量 | ||
| 27 | +│ ├── human.py # 人类行为模拟 | ||
| 28 | +│ ├── login.py # 登录 | ||
| 29 | +│ ├── feeds.py # 首页 Feed | ||
| 30 | +│ ├── search.py # 搜索 + 筛选 | ||
| 31 | +│ ├── feed_detail.py # 笔记详情 + 评论加载 | ||
| 32 | +│ ├── user_profile.py # 用户主页 | ||
| 33 | +│ ├── comment.py # 评论、回复 | ||
| 34 | +│ ├── like_favorite.py # 点赞、收藏 | ||
| 35 | +│ ├── publish.py # 图文发布 | ||
| 36 | +│ └── publish_video.py # 视频发布 | ||
| 37 | +├── cli.py # 统一 CLI 入口(13 个子命令) | ||
| 38 | +├── chrome_launcher.py # Chrome 进程管理 | ||
| 39 | +├── account_manager.py # 多账号管理 | ||
| 40 | +├── image_downloader.py # 媒体下载(SHA256 缓存) | ||
| 41 | +├── title_utils.py # UTF-16 标题长度计算 | ||
| 42 | +├── run_lock.py # 单实例锁 | ||
| 43 | +└── publish_pipeline.py # 发布编排器 | ||
| 44 | +``` | ||
| 45 | + | ||
| 46 | +### CLI 接口(对应 Go 的 13 个 MCP 工具) | ||
| 47 | + | ||
| 48 | +```bash | ||
| 49 | +python scripts/cli.py check-login | ||
| 50 | +python scripts/cli.py login | ||
| 51 | +python scripts/cli.py delete-cookies | ||
| 52 | +python scripts/cli.py list-feeds | ||
| 53 | +python scripts/cli.py search-feeds --keyword "关键词" [--sort-by --note-type ...] | ||
| 54 | +python scripts/cli.py get-feed-detail --feed-id ID --xsec-token TOKEN [--load-all-comments] | ||
| 55 | +python scripts/cli.py user-profile --user-id ID --xsec-token TOKEN | ||
| 56 | +python scripts/cli.py post-comment --feed-id ID --xsec-token TOKEN --content "内容" | ||
| 57 | +python scripts/cli.py reply-comment --feed-id ID --xsec-token TOKEN --content "内容" [--comment-id | --user-id] | ||
| 58 | +python scripts/cli.py like-feed --feed-id ID --xsec-token TOKEN [--unlike] | ||
| 59 | +python scripts/cli.py favorite-feed --feed-id ID --xsec-token TOKEN [--unfavorite] | ||
| 60 | +python scripts/cli.py publish --title-file T --content-file C --images P1 P2 [--tags --schedule-at --visibility] | ||
| 61 | +python scripts/cli.py publish-video --title-file T --content-file C --video P [--tags --schedule-at] | ||
| 62 | +``` | ||
| 63 | + | ||
| 64 | +全局选项:`--host`, `--port`, `--account` | ||
| 65 | +输出:JSON(`ensure_ascii=False`) | ||
| 66 | +退出码:0=成功,1=未登录,2=错误 | ||
| 67 | + | ||
| 68 | +## 代码规范要求 | ||
| 69 | + | ||
| 70 | +- Python 代码必须通过 `ruff check` 和 `ruff format` | ||
| 71 | +- 完整的 type hints(PEP 484),使用 `str | None` 而非 `Optional[str]` | ||
| 72 | +- 公共函数和类必须有 docstring | ||
| 73 | +- 行长度上限 100 字符 | ||
| 74 | +- 使用 `from __future__ import annotations` 启用延迟注解 | ||
| 75 | +- 异常类统一继承自 `XHSError` | ||
| 76 | +- CLI 使用 argparse,exit code: 0=成功,1=未登录,2=错误 | ||
| 77 | +- JSON 输出使用 `ensure_ascii=False` 保留中文 | ||
| 78 | + | ||
| 79 | +## 完成标志 | ||
| 80 | + | ||
| 81 | +当以下条件全部满足时,输出完成标志: | ||
| 82 | +1. `xhs/` 包 17 个模块已全部创建 | ||
| 83 | +2. `cli.py` 13 个子命令已实现 | ||
| 84 | +3. 5 个支撑脚本已重写 | ||
| 85 | +4. 5 个 `skills/*/SKILL.md` 已更新 | ||
| 86 | +5. 根目录 `SKILL.md`、`CLAUDE.md`、`README.md` 已更新 | ||
| 87 | +6. `uv run ruff check .` 无错误 | ||
| 88 | +7. `uv run ruff format --check .` 无差异 | ||
| 89 | + | ||
| 90 | +<promise>ALL SKILLS COMPLETE</promise> |
| 1 | # xiaohongshu-skills | 1 | # xiaohongshu-skills |
| 2 | -xiaohongshu-skills | 2 | + |
| 3 | +小红书自动化 Claude Code Skills,基于 Python CDP 浏览器自动化引擎。 | ||
| 4 | + | ||
| 5 | +为 OpenClaw 生态提供小红书操作能力,同时兼容 Claude Code Skills 格式。 | ||
| 6 | + | ||
| 7 | +## 功能概览 | ||
| 8 | + | ||
| 9 | +| 技能 | 说明 | 核心命令 | | ||
| 10 | +|------|------|----------| | ||
| 11 | +| **xhs-auth** | 认证管理 | `check-login`, `login`, `delete-cookies` | | ||
| 12 | +| **xhs-publish** | 内容发布 | `publish`, `publish-video` | | ||
| 13 | +| **xhs-explore** | 内容发现 | `list-feeds`, `search-feeds`, `get-feed-detail`, `user-profile` | | ||
| 14 | +| **xhs-interact** | 社交互动 | `post-comment`, `reply-comment`, `like-feed`, `favorite-feed` | | ||
| 15 | +| **xhs-content-ops** | 复合运营 | 竞品分析、热点追踪、内容创作、互动管理 | | ||
| 16 | + | ||
| 17 | +## 安装 | ||
| 18 | + | ||
| 19 | +```bash | ||
| 20 | +# 克隆项目 | ||
| 21 | +git clone https://github.com/autoclaw-cc/xiaohongshu-skills.git | ||
| 22 | +cd xiaohongshu-skills | ||
| 23 | + | ||
| 24 | +# 安装依赖(需要 uv) | ||
| 25 | +uv sync | ||
| 26 | +``` | ||
| 27 | + | ||
| 28 | +### 前置条件 | ||
| 29 | + | ||
| 30 | +- Python >= 3.11 | ||
| 31 | +- [uv](https://docs.astral.sh/uv/) 包管理器 | ||
| 32 | +- Google Chrome 浏览器 | ||
| 33 | + | ||
| 34 | +## 快速开始 | ||
| 35 | + | ||
| 36 | +### 1. 启动 Chrome | ||
| 37 | + | ||
| 38 | +```bash | ||
| 39 | +# 有窗口模式(推荐首次登录) | ||
| 40 | +python scripts/chrome_launcher.py | ||
| 41 | + | ||
| 42 | +# 无头模式 | ||
| 43 | +python scripts/chrome_launcher.py --headless | ||
| 44 | +``` | ||
| 45 | + | ||
| 46 | +### 2. 登录小红书 | ||
| 47 | + | ||
| 48 | +```bash | ||
| 49 | +# 检查登录状态 | ||
| 50 | +python scripts/cli.py check-login | ||
| 51 | + | ||
| 52 | +# 登录(扫码) | ||
| 53 | +python scripts/cli.py login | ||
| 54 | +``` | ||
| 55 | + | ||
| 56 | +### 3. 搜索笔记 | ||
| 57 | + | ||
| 58 | +```bash | ||
| 59 | +python scripts/cli.py search-feeds --keyword "关键词" | ||
| 60 | + | ||
| 61 | +# 带筛选 | ||
| 62 | +python scripts/cli.py search-feeds \ | ||
| 63 | + --keyword "关键词" --sort-by 最新 --note-type 图文 | ||
| 64 | +``` | ||
| 65 | + | ||
| 66 | +### 4. 查看笔记详情 | ||
| 67 | + | ||
| 68 | +```bash | ||
| 69 | +python scripts/cli.py get-feed-detail \ | ||
| 70 | + --feed-id FEED_ID --xsec-token XSEC_TOKEN | ||
| 71 | +``` | ||
| 72 | + | ||
| 73 | +### 5. 发布内容 | ||
| 74 | + | ||
| 75 | +```bash | ||
| 76 | +# 图文发布 | ||
| 77 | +python scripts/cli.py publish \ | ||
| 78 | + --title-file title.txt \ | ||
| 79 | + --content-file content.txt \ | ||
| 80 | + --images "/abs/path/pic1.jpg" "/abs/path/pic2.jpg" | ||
| 81 | + | ||
| 82 | +# 视频发布 | ||
| 83 | +python scripts/cli.py publish-video \ | ||
| 84 | + --title-file title.txt \ | ||
| 85 | + --content-file content.txt \ | ||
| 86 | + --video "/abs/path/video.mp4" | ||
| 87 | +``` | ||
| 88 | + | ||
| 89 | +### 6. 社交互动 | ||
| 90 | + | ||
| 91 | +```bash | ||
| 92 | +# 发表评论 | ||
| 93 | +python scripts/cli.py post-comment \ | ||
| 94 | + --feed-id FEED_ID \ | ||
| 95 | + --xsec-token XSEC_TOKEN \ | ||
| 96 | + --content "评论内容" | ||
| 97 | + | ||
| 98 | +# 点赞 | ||
| 99 | +python scripts/cli.py like-feed \ | ||
| 100 | + --feed-id FEED_ID --xsec-token XSEC_TOKEN | ||
| 101 | + | ||
| 102 | +# 收藏 | ||
| 103 | +python scripts/cli.py favorite-feed \ | ||
| 104 | + --feed-id FEED_ID --xsec-token XSEC_TOKEN | ||
| 105 | +``` | ||
| 106 | + | ||
| 107 | +## CLI 命令参考 | ||
| 108 | + | ||
| 109 | +所有命令通过 `scripts/cli.py` 统一入口调用,输出 JSON 格式。 | ||
| 110 | + | ||
| 111 | +全局选项: | ||
| 112 | +- `--host HOST` — Chrome 调试主机(默认 127.0.0.1) | ||
| 113 | +- `--port PORT` — Chrome 调试端口(默认 9222) | ||
| 114 | +- `--account NAME` — 指定账号 | ||
| 115 | + | ||
| 116 | +| 子命令 | 说明 | | ||
| 117 | +|--------|------| | ||
| 118 | +| `check-login` | 检查登录状态 | | ||
| 119 | +| `login` | 获取登录二维码,等待扫码 | | ||
| 120 | +| `delete-cookies` | 清除 cookies | | ||
| 121 | +| `list-feeds` | 获取首页推荐 Feed | | ||
| 122 | +| `search-feeds` | 关键词搜索笔记 | | ||
| 123 | +| `get-feed-detail` | 获取笔记详情和评论 | | ||
| 124 | +| `user-profile` | 获取用户主页信息 | | ||
| 125 | +| `post-comment` | 对笔记发表评论 | | ||
| 126 | +| `reply-comment` | 回复指定评论 | | ||
| 127 | +| `like-feed` | 点赞 / 取消点赞 | | ||
| 128 | +| `favorite-feed` | 收藏 / 取消收藏 | | ||
| 129 | +| `publish` | 发布图文内容 | | ||
| 130 | +| `publish-video` | 发布视频内容 | | ||
| 131 | + | ||
| 132 | +退出码:0=成功,1=未登录,2=错误 | ||
| 133 | + | ||
| 134 | +## 项目结构 | ||
| 135 | + | ||
| 136 | +``` | ||
| 137 | +xiaohongshu-skills/ | ||
| 138 | +├── scripts/ # Python CDP 自动化引擎 | ||
| 139 | +│ ├── xhs/ # 核心自动化包(模块化) | ||
| 140 | +│ │ ├── cdp.py # CDP WebSocket 客户端 | ||
| 141 | +│ │ ├── stealth.py # 反检测保护 | ||
| 142 | +│ │ ├── cookies.py # Cookie 持久化 | ||
| 143 | +│ │ ├── types.py # 数据类型 | ||
| 144 | +│ │ ├── errors.py # 异常体系 | ||
| 145 | +│ │ ├── selectors.py # CSS 选择器 | ||
| 146 | +│ │ ├── urls.py # URL 常量 | ||
| 147 | +│ │ ├── human.py # 人类行为模拟 | ||
| 148 | +│ │ ├── login.py # 登录 | ||
| 149 | +│ │ ├── feeds.py # 首页 Feed | ||
| 150 | +│ │ ├── search.py # 搜索 | ||
| 151 | +│ │ ├── feed_detail.py # 笔记详情 | ||
| 152 | +│ │ ├── user_profile.py # 用户主页 | ||
| 153 | +│ │ ├── comment.py # 评论 | ||
| 154 | +│ │ ├── like_favorite.py # 点赞/收藏 | ||
| 155 | +│ │ ├── publish.py # 图文发布 | ||
| 156 | +│ │ └── publish_video.py # 视频发布 | ||
| 157 | +│ ├── cli.py # 统一 CLI(13 个子命令) | ||
| 158 | +│ ├── chrome_launcher.py # Chrome 进程管理 | ||
| 159 | +│ ├── account_manager.py # 多账号管理 | ||
| 160 | +│ ├── image_downloader.py # 媒体下载 | ||
| 161 | +│ ├── title_utils.py # 标题长度计算 | ||
| 162 | +│ ├── run_lock.py # 单实例锁 | ||
| 163 | +│ └── publish_pipeline.py # 发布编排器 | ||
| 164 | +├── skills/ # Claude Code Skills 定义 | ||
| 165 | +│ ├── xhs-auth/SKILL.md # 认证管理 | ||
| 166 | +│ ├── xhs-publish/SKILL.md # 内容发布 | ||
| 167 | +│ ├── xhs-explore/SKILL.md # 内容发现 | ||
| 168 | +│ ├── xhs-interact/SKILL.md # 社交互动 | ||
| 169 | +│ └── xhs-content-ops/SKILL.md # 复合运营 | ||
| 170 | +├── SKILL.md # 统一入口 | ||
| 171 | +├── CLAUDE.md # 项目开发指南 | ||
| 172 | +├── pyproject.toml # uv 项目配置 | ||
| 173 | +└── README.md | ||
| 174 | +``` | ||
| 175 | + | ||
| 176 | +## 技术架构 | ||
| 177 | + | ||
| 178 | +### 双层结构 | ||
| 179 | + | ||
| 180 | +1. **scripts/ — Python CDP 引擎** | ||
| 181 | + - 基于 xiaohongshu-mcp Go 源码从零重写 | ||
| 182 | + - 通过 Chrome DevTools Protocol (CDP) 直接控制浏览器 | ||
| 183 | + - 数据提取使用 `window.__INITIAL_STATE__` 模式 | ||
| 184 | + - 内置反检测保护(stealth flags + JS 注入) | ||
| 185 | + - JSON 结构化输出 | ||
| 186 | + | ||
| 187 | +2. **skills/ — Claude Code Skills 定义** | ||
| 188 | + - SKILL.md 格式,指导 AI agent 如何调用 scripts/ | ||
| 189 | + - 包含输入判断、约束规则、工作流程、失败处理 | ||
| 190 | + | ||
| 191 | +## 开发 | ||
| 192 | + | ||
| 193 | +```bash | ||
| 194 | +uv sync # 安装依赖 | ||
| 195 | +uv run ruff check . # Lint 检查 | ||
| 196 | +uv run ruff format . # 代码格式化 | ||
| 197 | +uv run pytest # 运行测试 | ||
| 198 | +``` |
SKILL.md
0 → 100644
| 1 | +--- | ||
| 2 | +name: xiaohongshu-skills | ||
| 3 | +description: | | ||
| 4 | + 小红书自动化技能集合。支持认证登录、内容发布、搜索发现、社交互动、复合运营。 | ||
| 5 | + 当用户要求操作小红书(发布、搜索、评论、登录、分析、点赞、收藏)时触发。 | ||
| 6 | +--- | ||
| 7 | + | ||
| 8 | +# 小红书自动化 Skills | ||
| 9 | + | ||
| 10 | +你是"小红书自动化助手"。根据用户意图路由到对应的子技能完成任务。 | ||
| 11 | + | ||
| 12 | +## 输入判断 | ||
| 13 | + | ||
| 14 | +按优先级判断用户意图,路由到对应子技能: | ||
| 15 | + | ||
| 16 | +1. **认证相关**("登录 / 检查登录 / 切换账号")→ 执行 `xhs-auth` 技能。 | ||
| 17 | +2. **内容发布**("发布 / 发帖 / 上传图文 / 上传视频")→ 执行 `xhs-publish` 技能。 | ||
| 18 | +3. **搜索发现**("搜索笔记 / 查看详情 / 浏览首页 / 查看用户")→ 执行 `xhs-explore` 技能。 | ||
| 19 | +4. **社交互动**("评论 / 回复 / 点赞 / 收藏")→ 执行 `xhs-interact` 技能。 | ||
| 20 | +5. **复合运营**("竞品分析 / 热点追踪 / 批量互动 / 一键创作")→ 执行 `xhs-content-ops` 技能。 | ||
| 21 | + | ||
| 22 | +## 全局约束 | ||
| 23 | + | ||
| 24 | +- 所有操作前应确认登录状态(通过 `check-login`)。 | ||
| 25 | +- 发布和评论操作必须经过用户确认后才能执行。 | ||
| 26 | +- 文件路径必须使用绝对路径。 | ||
| 27 | +- CLI 输出为 JSON 格式,结构化呈现给用户。 | ||
| 28 | +- 操作频率不宜过高,保持合理间隔。 | ||
| 29 | + | ||
| 30 | +## 子技能概览 | ||
| 31 | + | ||
| 32 | +### xhs-auth — 认证管理 | ||
| 33 | + | ||
| 34 | +管理小红书登录状态和多账号切换。 | ||
| 35 | + | ||
| 36 | +| 命令 | 功能 | | ||
| 37 | +|------|------| | ||
| 38 | +| `cli.py check-login` | 检查登录状态 | | ||
| 39 | +| `cli.py login` | 获取登录二维码,等待扫码 | | ||
| 40 | +| `cli.py delete-cookies` | 清除 cookies(退出/切换账号) | | ||
| 41 | + | ||
| 42 | +### xhs-publish — 内容发布 | ||
| 43 | + | ||
| 44 | +发布图文或视频内容到小红书。 | ||
| 45 | + | ||
| 46 | +| 命令 | 功能 | | ||
| 47 | +|------|------| | ||
| 48 | +| `cli.py publish` | 图文发布(本地图片或 URL) | | ||
| 49 | +| `cli.py publish-video` | 视频发布 | | ||
| 50 | +| `publish_pipeline.py` | 发布流水线(含图片下载和登录检查) | | ||
| 51 | + | ||
| 52 | +### xhs-explore — 内容发现 | ||
| 53 | + | ||
| 54 | +搜索笔记、查看详情、获取用户资料。 | ||
| 55 | + | ||
| 56 | +| 命令 | 功能 | | ||
| 57 | +|------|------| | ||
| 58 | +| `cli.py list-feeds` | 获取首页推荐 Feed | | ||
| 59 | +| `cli.py search-feeds` | 关键词搜索笔记 | | ||
| 60 | +| `cli.py get-feed-detail` | 获取笔记完整内容和评论 | | ||
| 61 | +| `cli.py user-profile` | 获取用户主页信息 | | ||
| 62 | + | ||
| 63 | +### xhs-interact — 社交互动 | ||
| 64 | + | ||
| 65 | +发表评论、回复、点赞、收藏。 | ||
| 66 | + | ||
| 67 | +| 命令 | 功能 | | ||
| 68 | +|------|------| | ||
| 69 | +| `cli.py post-comment` | 对笔记发表评论 | | ||
| 70 | +| `cli.py reply-comment` | 回复指定评论 | | ||
| 71 | +| `cli.py like-feed` | 点赞 / 取消点赞 | | ||
| 72 | +| `cli.py favorite-feed` | 收藏 / 取消收藏 | | ||
| 73 | + | ||
| 74 | +### xhs-content-ops — 复合运营 | ||
| 75 | + | ||
| 76 | +组合多步骤完成运营工作流:竞品分析、热点追踪、内容创作、互动管理。 | ||
| 77 | + | ||
| 78 | +## 快速开始 | ||
| 79 | + | ||
| 80 | +```bash | ||
| 81 | +# 1. 启动 Chrome | ||
| 82 | +python scripts/chrome_launcher.py | ||
| 83 | + | ||
| 84 | +# 2. 检查登录状态 | ||
| 85 | +python scripts/cli.py check-login | ||
| 86 | + | ||
| 87 | +# 3. 登录(如需要) | ||
| 88 | +python scripts/cli.py login | ||
| 89 | + | ||
| 90 | +# 4. 搜索笔记 | ||
| 91 | +python scripts/cli.py search-feeds --keyword "关键词" | ||
| 92 | + | ||
| 93 | +# 5. 查看笔记详情 | ||
| 94 | +python scripts/cli.py get-feed-detail \ | ||
| 95 | + --feed-id FEED_ID --xsec-token XSEC_TOKEN | ||
| 96 | + | ||
| 97 | +# 6. 发布图文 | ||
| 98 | +python scripts/cli.py publish \ | ||
| 99 | + --title-file title.txt \ | ||
| 100 | + --content-file content.txt \ | ||
| 101 | + --images "/abs/path/pic1.jpg" | ||
| 102 | + | ||
| 103 | +# 7. 发表评论 | ||
| 104 | +python scripts/cli.py post-comment \ | ||
| 105 | + --feed-id FEED_ID \ | ||
| 106 | + --xsec-token XSEC_TOKEN \ | ||
| 107 | + --content "评论内容" | ||
| 108 | + | ||
| 109 | +# 8. 点赞 | ||
| 110 | +python scripts/cli.py like-feed \ | ||
| 111 | + --feed-id FEED_ID --xsec-token XSEC_TOKEN | ||
| 112 | +``` | ||
| 113 | + | ||
| 114 | +## 失败处理 | ||
| 115 | + | ||
| 116 | +- **未登录**:提示用户执行登录流程(xhs-auth)。 | ||
| 117 | +- **Chrome 未启动**:使用 `chrome_launcher.py` 启动浏览器。 | ||
| 118 | +- **操作超时**:检查网络连接,适当增加等待时间。 | ||
| 119 | +- **频率限制**:降低操作频率,增大间隔。 |
pyproject.toml
0 → 100644
| 1 | +[project] | ||
| 2 | +name = "xiaohongshu-skills" | ||
| 3 | +version = "0.1.0" | ||
| 4 | +description = "小红书自动化 Skills,基于 CDP 浏览器自动化" | ||
| 5 | +readme = "README.md" | ||
| 6 | +license = { text = "MIT" } | ||
| 7 | +requires-python = ">=3.11" | ||
| 8 | +dependencies = [ | ||
| 9 | + "requests>=2.28.0", | ||
| 10 | + "websockets>=12.0", | ||
| 11 | +] | ||
| 12 | + | ||
| 13 | +[project.optional-dependencies] | ||
| 14 | +dev = [ | ||
| 15 | + "ruff>=0.9.0", | ||
| 16 | + "pytest>=8.0", | ||
| 17 | +] | ||
| 18 | + | ||
| 19 | +[tool.ruff] | ||
| 20 | +target-version = "py311" | ||
| 21 | +line-length = 100 | ||
| 22 | + | ||
| 23 | +[tool.ruff.lint] | ||
| 24 | +select = [ | ||
| 25 | + "E", # pycodestyle errors | ||
| 26 | + "W", # pycodestyle warnings | ||
| 27 | + "F", # pyflakes | ||
| 28 | + "I", # isort | ||
| 29 | + "N", # pep8-naming | ||
| 30 | + "UP", # pyupgrade | ||
| 31 | + "B", # flake8-bugbear | ||
| 32 | + "SIM", # flake8-simplify | ||
| 33 | + "RUF", # ruff-specific rules | ||
| 34 | +] | ||
| 35 | +ignore = [ | ||
| 36 | + "E402", # module-level imports not at top (needed for sys.path manipulation) | ||
| 37 | + "RUF001", # ambiguous unicode characters (Chinese punctuation is intentional) | ||
| 38 | + "RUF002", # ambiguous unicode in docstrings (Chinese punctuation is intentional) | ||
| 39 | + "RUF003", # ambiguous unicode in comments (Chinese punctuation is intentional) | ||
| 40 | +] | ||
| 41 | + | ||
| 42 | +[tool.ruff.lint.per-file-ignores] | ||
| 43 | + | ||
| 44 | +[tool.ruff.lint.isort] | ||
| 45 | +known-first-party = ["xiaohongshu_skills"] | ||
| 46 | + | ||
| 47 | +[tool.pytest.ini_options] | ||
| 48 | +testpaths = ["tests"] |
scripts/account_manager.py
0 → 100644
| 1 | +"""多账号管理,对应独立的账号配置管理。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import os | ||
| 8 | +from pathlib import Path | ||
| 9 | + | ||
| 10 | +logger = logging.getLogger(__name__) | ||
| 11 | + | ||
| 12 | +# 账号配置文件路径 | ||
| 13 | +_CONFIG_DIR = Path.home() / ".xhs" | ||
| 14 | +_ACCOUNTS_FILE = _CONFIG_DIR / "accounts.json" | ||
| 15 | + | ||
| 16 | + | ||
| 17 | +def _load_config() -> dict: | ||
| 18 | + """加载账号配置。""" | ||
| 19 | + if not _ACCOUNTS_FILE.exists(): | ||
| 20 | + return {"default": "", "accounts": {}} | ||
| 21 | + with open(_ACCOUNTS_FILE, encoding="utf-8") as f: | ||
| 22 | + return json.load(f) | ||
| 23 | + | ||
| 24 | + | ||
| 25 | +def _save_config(config: dict) -> None: | ||
| 26 | + """保存账号配置。""" | ||
| 27 | + _CONFIG_DIR.mkdir(parents=True, exist_ok=True) | ||
| 28 | + with open(_ACCOUNTS_FILE, "w", encoding="utf-8") as f: | ||
| 29 | + json.dump(config, f, ensure_ascii=False, indent=2) | ||
| 30 | + | ||
| 31 | + | ||
| 32 | +def list_accounts() -> list[dict]: | ||
| 33 | + """列出所有账号。""" | ||
| 34 | + config = _load_config() | ||
| 35 | + default = config.get("default", "") | ||
| 36 | + accounts = config.get("accounts", {}) | ||
| 37 | + result = [] | ||
| 38 | + for name, info in accounts.items(): | ||
| 39 | + result.append( | ||
| 40 | + { | ||
| 41 | + "name": name, | ||
| 42 | + "description": info.get("description", ""), | ||
| 43 | + "is_default": name == default, | ||
| 44 | + "profile_dir": _get_profile_dir(name), | ||
| 45 | + } | ||
| 46 | + ) | ||
| 47 | + return result | ||
| 48 | + | ||
| 49 | + | ||
| 50 | +def add_account(name: str, description: str = "") -> None: | ||
| 51 | + """添加账号。""" | ||
| 52 | + config = _load_config() | ||
| 53 | + accounts = config.setdefault("accounts", {}) | ||
| 54 | + if name in accounts: | ||
| 55 | + raise ValueError(f"账号 '{name}' 已存在") | ||
| 56 | + | ||
| 57 | + accounts[name] = {"description": description} | ||
| 58 | + | ||
| 59 | + # 如果是第一个账号,设为默认 | ||
| 60 | + if not config.get("default"): | ||
| 61 | + config["default"] = name | ||
| 62 | + | ||
| 63 | + _save_config(config) | ||
| 64 | + | ||
| 65 | + # 创建 Profile 目录 | ||
| 66 | + profile_dir = _get_profile_dir(name) | ||
| 67 | + os.makedirs(profile_dir, exist_ok=True) | ||
| 68 | + | ||
| 69 | + logger.info("添加账号: %s", name) | ||
| 70 | + | ||
| 71 | + | ||
| 72 | +def remove_account(name: str) -> None: | ||
| 73 | + """删除账号。""" | ||
| 74 | + config = _load_config() | ||
| 75 | + accounts = config.get("accounts", {}) | ||
| 76 | + if name not in accounts: | ||
| 77 | + raise ValueError(f"账号 '{name}' 不存在") | ||
| 78 | + | ||
| 79 | + del accounts[name] | ||
| 80 | + | ||
| 81 | + # 如果删除的是默认账号,清除默认 | ||
| 82 | + if config.get("default") == name: | ||
| 83 | + config["default"] = next(iter(accounts), "") | ||
| 84 | + | ||
| 85 | + _save_config(config) | ||
| 86 | + logger.info("删除账号: %s", name) | ||
| 87 | + | ||
| 88 | + | ||
| 89 | +def set_default_account(name: str) -> None: | ||
| 90 | + """设置默认账号。""" | ||
| 91 | + config = _load_config() | ||
| 92 | + accounts = config.get("accounts", {}) | ||
| 93 | + if name not in accounts: | ||
| 94 | + raise ValueError(f"账号 '{name}' 不存在") | ||
| 95 | + | ||
| 96 | + config["default"] = name | ||
| 97 | + _save_config(config) | ||
| 98 | + logger.info("默认账号设置为: %s", name) | ||
| 99 | + | ||
| 100 | + | ||
| 101 | +def get_default_account() -> str: | ||
| 102 | + """获取默认账号名称。""" | ||
| 103 | + config = _load_config() | ||
| 104 | + return config.get("default", "") | ||
| 105 | + | ||
| 106 | + | ||
| 107 | +def _get_profile_dir(account: str) -> str: | ||
| 108 | + """获取账号的 Chrome Profile 目录。""" | ||
| 109 | + return str(_CONFIG_DIR / "accounts" / account / "chrome-profile") |
scripts/chrome_launcher.py
0 → 100644
| 1 | +"""Chrome 进程管理(跨平台),对应 Go browser/browser.go 的进程管理部分。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import logging | ||
| 6 | +import os | ||
| 7 | +import platform | ||
| 8 | +import shutil | ||
| 9 | +import signal | ||
| 10 | +import subprocess | ||
| 11 | +import time | ||
| 12 | + | ||
| 13 | +from xhs.stealth import STEALTH_ARGS | ||
| 14 | + | ||
| 15 | +logger = logging.getLogger(__name__) | ||
| 16 | + | ||
| 17 | +# 默认远程调试端口 | ||
| 18 | +DEFAULT_PORT = 9222 | ||
| 19 | + | ||
| 20 | +# 各平台 Chrome 默认路径 | ||
| 21 | +_CHROME_PATHS: dict[str, list[str]] = { | ||
| 22 | + "Darwin": [ | ||
| 23 | + "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", | ||
| 24 | + "/Applications/Chromium.app/Contents/MacOS/Chromium", | ||
| 25 | + ], | ||
| 26 | + "Linux": [ | ||
| 27 | + "/usr/bin/google-chrome", | ||
| 28 | + "/usr/bin/google-chrome-stable", | ||
| 29 | + "/usr/bin/chromium", | ||
| 30 | + "/usr/bin/chromium-browser", | ||
| 31 | + "/snap/bin/chromium", | ||
| 32 | + ], | ||
| 33 | + "Windows": [ | ||
| 34 | + r"C:\Program Files\Google\Chrome\Application\chrome.exe", | ||
| 35 | + r"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe", | ||
| 36 | + ], | ||
| 37 | +} | ||
| 38 | + | ||
| 39 | + | ||
| 40 | +def find_chrome() -> str | None: | ||
| 41 | + """查找 Chrome 可执行文件路径。""" | ||
| 42 | + # 环境变量优先 | ||
| 43 | + env_path = os.getenv("CHROME_BIN") | ||
| 44 | + if env_path and os.path.isfile(env_path): | ||
| 45 | + return env_path | ||
| 46 | + | ||
| 47 | + # which/where 查找 | ||
| 48 | + chrome = shutil.which("google-chrome") or shutil.which("chromium") | ||
| 49 | + if chrome: | ||
| 50 | + return chrome | ||
| 51 | + | ||
| 52 | + # 平台默认路径 | ||
| 53 | + system = platform.system() | ||
| 54 | + for path in _CHROME_PATHS.get(system, []): | ||
| 55 | + if os.path.isfile(path): | ||
| 56 | + return path | ||
| 57 | + | ||
| 58 | + return None | ||
| 59 | + | ||
| 60 | + | ||
| 61 | +def launch_chrome( | ||
| 62 | + port: int = DEFAULT_PORT, | ||
| 63 | + headless: bool = False, | ||
| 64 | + user_data_dir: str | None = None, | ||
| 65 | + chrome_bin: str | None = None, | ||
| 66 | +) -> subprocess.Popen: | ||
| 67 | + """启动 Chrome 进程(带远程调试端口)。 | ||
| 68 | + | ||
| 69 | + Args: | ||
| 70 | + port: 远程调试端口。 | ||
| 71 | + headless: 是否无头模式。 | ||
| 72 | + user_data_dir: 用户数据目录(Profile 隔离)。 | ||
| 73 | + chrome_bin: Chrome 可执行文件路径。 | ||
| 74 | + | ||
| 75 | + Returns: | ||
| 76 | + Chrome 子进程。 | ||
| 77 | + | ||
| 78 | + Raises: | ||
| 79 | + FileNotFoundError: 未找到 Chrome。 | ||
| 80 | + """ | ||
| 81 | + if not chrome_bin: | ||
| 82 | + chrome_bin = find_chrome() | ||
| 83 | + if not chrome_bin: | ||
| 84 | + raise FileNotFoundError("未找到 Chrome,请设置 CHROME_BIN 环境变量或安装 Chrome") | ||
| 85 | + | ||
| 86 | + args = [ | ||
| 87 | + chrome_bin, | ||
| 88 | + f"--remote-debugging-port={port}", | ||
| 89 | + *STEALTH_ARGS, | ||
| 90 | + ] | ||
| 91 | + | ||
| 92 | + if headless: | ||
| 93 | + args.append("--headless=new") | ||
| 94 | + | ||
| 95 | + if user_data_dir: | ||
| 96 | + args.append(f"--user-data-dir={user_data_dir}") | ||
| 97 | + | ||
| 98 | + # 代理 | ||
| 99 | + proxy = os.getenv("XHS_PROXY") | ||
| 100 | + if proxy: | ||
| 101 | + args.append(f"--proxy-server={proxy}") | ||
| 102 | + logger.info("使用代理: %s", _mask_proxy(proxy)) | ||
| 103 | + | ||
| 104 | + logger.info("启动 Chrome: port=%d, headless=%s", port, headless) | ||
| 105 | + process = subprocess.Popen( | ||
| 106 | + args, | ||
| 107 | + stdout=subprocess.DEVNULL, | ||
| 108 | + stderr=subprocess.DEVNULL, | ||
| 109 | + ) | ||
| 110 | + | ||
| 111 | + # 等待 Chrome 准备就绪 | ||
| 112 | + _wait_for_chrome(port) | ||
| 113 | + return process | ||
| 114 | + | ||
| 115 | + | ||
| 116 | +def close_chrome(process: subprocess.Popen) -> None: | ||
| 117 | + """关闭 Chrome 进程。""" | ||
| 118 | + if process.poll() is not None: | ||
| 119 | + return | ||
| 120 | + | ||
| 121 | + try: | ||
| 122 | + process.send_signal(signal.SIGTERM) | ||
| 123 | + process.wait(timeout=5) | ||
| 124 | + except (subprocess.TimeoutExpired, OSError): | ||
| 125 | + process.kill() | ||
| 126 | + process.wait(timeout=3) | ||
| 127 | + | ||
| 128 | + logger.info("Chrome 进程已关闭") | ||
| 129 | + | ||
| 130 | + | ||
| 131 | +def is_chrome_running(port: int = DEFAULT_PORT) -> bool: | ||
| 132 | + """检查指定端口的 Chrome 是否在运行。""" | ||
| 133 | + import requests | ||
| 134 | + | ||
| 135 | + try: | ||
| 136 | + resp = requests.get(f"http://127.0.0.1:{port}/json/version", timeout=2) | ||
| 137 | + return resp.status_code == 200 | ||
| 138 | + except (requests.ConnectionError, requests.Timeout): | ||
| 139 | + return False | ||
| 140 | + | ||
| 141 | + | ||
| 142 | +def _wait_for_chrome(port: int, timeout: float = 15.0) -> None: | ||
| 143 | + """等待 Chrome 调试端口就绪。""" | ||
| 144 | + deadline = time.monotonic() + timeout | ||
| 145 | + while time.monotonic() < deadline: | ||
| 146 | + if is_chrome_running(port): | ||
| 147 | + logger.info("Chrome 已就绪 (port=%d)", port) | ||
| 148 | + return | ||
| 149 | + time.sleep(0.5) | ||
| 150 | + logger.warning("等待 Chrome 就绪超时 (port=%d)", port) | ||
| 151 | + | ||
| 152 | + | ||
| 153 | +def _mask_proxy(proxy_url: str) -> str: | ||
| 154 | + """隐藏代理 URL 中的敏感信息。""" | ||
| 155 | + from urllib.parse import urlparse | ||
| 156 | + | ||
| 157 | + try: | ||
| 158 | + parsed = urlparse(proxy_url) | ||
| 159 | + if parsed.username: | ||
| 160 | + return proxy_url.replace(parsed.username, "***").replace(parsed.password or "", "***") | ||
| 161 | + except Exception: | ||
| 162 | + pass | ||
| 163 | + return proxy_url |
scripts/cli.py
0 → 100644
| 1 | +"""统一 CLI 入口,对应 Go MCP 工具的 13 个子命令。 | ||
| 2 | + | ||
| 3 | +全局选项: --host, --port, --account | ||
| 4 | +输出: JSON(ensure_ascii=False) | ||
| 5 | +退出码: 0=成功, 1=未登录, 2=错误 | ||
| 6 | +""" | ||
| 7 | + | ||
| 8 | +from __future__ import annotations | ||
| 9 | + | ||
| 10 | +import argparse | ||
| 11 | +import json | ||
| 12 | +import logging | ||
| 13 | +import sys | ||
| 14 | + | ||
| 15 | +logging.basicConfig( | ||
| 16 | + level=logging.INFO, | ||
| 17 | + format="%(asctime)s %(levelname)s %(name)s: %(message)s", | ||
| 18 | +) | ||
| 19 | +logger = logging.getLogger("xhs-cli") | ||
| 20 | + | ||
| 21 | + | ||
| 22 | +def _output(data: dict, exit_code: int = 0) -> None: | ||
| 23 | + """输出 JSON 并退出。""" | ||
| 24 | + print(json.dumps(data, ensure_ascii=False, indent=2)) | ||
| 25 | + sys.exit(exit_code) | ||
| 26 | + | ||
| 27 | + | ||
| 28 | +def _connect(args: argparse.Namespace): | ||
| 29 | + """连接到 Chrome 并返回 (browser, page)。""" | ||
| 30 | + from xhs.cdp import Browser | ||
| 31 | + | ||
| 32 | + browser = Browser(host=args.host, port=args.port) | ||
| 33 | + browser.connect() | ||
| 34 | + page = browser.new_page() | ||
| 35 | + return browser, page | ||
| 36 | + | ||
| 37 | + | ||
| 38 | +# ========== 子命令实现 ========== | ||
| 39 | + | ||
| 40 | + | ||
| 41 | +def cmd_check_login(args: argparse.Namespace) -> None: | ||
| 42 | + """检查登录状态。""" | ||
| 43 | + from xhs.login import check_login_status | ||
| 44 | + | ||
| 45 | + browser, page = _connect(args) | ||
| 46 | + try: | ||
| 47 | + logged_in = check_login_status(page) | ||
| 48 | + _output({"logged_in": logged_in}, exit_code=0 if logged_in else 1) | ||
| 49 | + finally: | ||
| 50 | + browser.close_page(page) | ||
| 51 | + browser.close() | ||
| 52 | + | ||
| 53 | + | ||
| 54 | +def cmd_login(args: argparse.Namespace) -> None: | ||
| 55 | + """获取登录二维码并等待扫码。""" | ||
| 56 | + from xhs.login import fetch_qrcode, save_qrcode_to_file, wait_for_login | ||
| 57 | + | ||
| 58 | + browser, page = _connect(args) | ||
| 59 | + try: | ||
| 60 | + src, already = fetch_qrcode(page) | ||
| 61 | + if already: | ||
| 62 | + _output({"logged_in": True, "message": "已登录"}) | ||
| 63 | + else: | ||
| 64 | + # 保存二维码到临时文件 | ||
| 65 | + qrcode_path = save_qrcode_to_file(src) | ||
| 66 | + print( | ||
| 67 | + json.dumps( | ||
| 68 | + { | ||
| 69 | + "qrcode_path": qrcode_path, | ||
| 70 | + "message": "请扫码登录,二维码已保存到文件", | ||
| 71 | + }, | ||
| 72 | + ensure_ascii=False, | ||
| 73 | + ) | ||
| 74 | + ) | ||
| 75 | + success = wait_for_login(page, timeout=120) | ||
| 76 | + _output( | ||
| 77 | + {"logged_in": success, "message": "登录成功" if success else "登录超时"}, | ||
| 78 | + exit_code=0 if success else 2, | ||
| 79 | + ) | ||
| 80 | + finally: | ||
| 81 | + browser.close_page(page) | ||
| 82 | + browser.close() | ||
| 83 | + | ||
| 84 | + | ||
| 85 | +def cmd_delete_cookies(args: argparse.Namespace) -> None: | ||
| 86 | + """删除 cookies。""" | ||
| 87 | + from xhs.cookies import delete_cookies, get_cookies_file_path | ||
| 88 | + | ||
| 89 | + path = get_cookies_file_path(args.account) | ||
| 90 | + delete_cookies(path) | ||
| 91 | + _output({"success": True, "message": f"已删除 cookies: {path}"}) | ||
| 92 | + | ||
| 93 | + | ||
| 94 | +def cmd_list_feeds(args: argparse.Namespace) -> None: | ||
| 95 | + """获取首页 Feed 列表。""" | ||
| 96 | + from xhs.feeds import list_feeds | ||
| 97 | + | ||
| 98 | + browser, page = _connect(args) | ||
| 99 | + try: | ||
| 100 | + feeds = list_feeds(page) | ||
| 101 | + _output({"feeds": [f.to_dict() for f in feeds], "count": len(feeds)}) | ||
| 102 | + finally: | ||
| 103 | + browser.close_page(page) | ||
| 104 | + browser.close() | ||
| 105 | + | ||
| 106 | + | ||
| 107 | +def cmd_search_feeds(args: argparse.Namespace) -> None: | ||
| 108 | + """搜索 Feeds。""" | ||
| 109 | + from xhs.search import search_feeds | ||
| 110 | + from xhs.types import FilterOption | ||
| 111 | + | ||
| 112 | + filter_opt = FilterOption( | ||
| 113 | + sort_by=args.sort_by or "", | ||
| 114 | + note_type=args.note_type or "", | ||
| 115 | + publish_time=args.publish_time or "", | ||
| 116 | + search_scope=args.search_scope or "", | ||
| 117 | + location=args.location or "", | ||
| 118 | + ) | ||
| 119 | + | ||
| 120 | + browser, page = _connect(args) | ||
| 121 | + try: | ||
| 122 | + feeds = search_feeds(page, args.keyword, filter_opt) | ||
| 123 | + _output({"feeds": [f.to_dict() for f in feeds], "count": len(feeds)}) | ||
| 124 | + finally: | ||
| 125 | + browser.close_page(page) | ||
| 126 | + browser.close() | ||
| 127 | + | ||
| 128 | + | ||
| 129 | +def cmd_get_feed_detail(args: argparse.Namespace) -> None: | ||
| 130 | + """获取 Feed 详情。""" | ||
| 131 | + from xhs.feed_detail import get_feed_detail | ||
| 132 | + from xhs.types import CommentLoadConfig | ||
| 133 | + | ||
| 134 | + config = CommentLoadConfig( | ||
| 135 | + click_more_replies=args.click_more_replies, | ||
| 136 | + max_replies_threshold=args.max_replies_threshold, | ||
| 137 | + max_comment_items=args.max_comment_items, | ||
| 138 | + scroll_speed=args.scroll_speed, | ||
| 139 | + ) | ||
| 140 | + | ||
| 141 | + browser, page = _connect(args) | ||
| 142 | + try: | ||
| 143 | + detail = get_feed_detail( | ||
| 144 | + page, | ||
| 145 | + args.feed_id, | ||
| 146 | + args.xsec_token, | ||
| 147 | + load_all_comments=args.load_all_comments, | ||
| 148 | + config=config, | ||
| 149 | + ) | ||
| 150 | + _output(detail.to_dict()) | ||
| 151 | + finally: | ||
| 152 | + browser.close_page(page) | ||
| 153 | + browser.close() | ||
| 154 | + | ||
| 155 | + | ||
| 156 | +def cmd_user_profile(args: argparse.Namespace) -> None: | ||
| 157 | + """获取用户主页。""" | ||
| 158 | + from xhs.user_profile import get_user_profile | ||
| 159 | + | ||
| 160 | + browser, page = _connect(args) | ||
| 161 | + try: | ||
| 162 | + profile = get_user_profile(page, args.user_id, args.xsec_token) | ||
| 163 | + _output(profile.to_dict()) | ||
| 164 | + finally: | ||
| 165 | + browser.close_page(page) | ||
| 166 | + browser.close() | ||
| 167 | + | ||
| 168 | + | ||
| 169 | +def cmd_post_comment(args: argparse.Namespace) -> None: | ||
| 170 | + """发表评论。""" | ||
| 171 | + from xhs.comment import post_comment | ||
| 172 | + | ||
| 173 | + browser, page = _connect(args) | ||
| 174 | + try: | ||
| 175 | + post_comment(page, args.feed_id, args.xsec_token, args.content) | ||
| 176 | + _output({"success": True, "message": "评论发送成功"}) | ||
| 177 | + finally: | ||
| 178 | + browser.close_page(page) | ||
| 179 | + browser.close() | ||
| 180 | + | ||
| 181 | + | ||
| 182 | +def cmd_reply_comment(args: argparse.Namespace) -> None: | ||
| 183 | + """回复评论。""" | ||
| 184 | + from xhs.comment import reply_comment | ||
| 185 | + | ||
| 186 | + browser, page = _connect(args) | ||
| 187 | + try: | ||
| 188 | + reply_comment( | ||
| 189 | + page, | ||
| 190 | + args.feed_id, | ||
| 191 | + args.xsec_token, | ||
| 192 | + args.content, | ||
| 193 | + comment_id=args.comment_id or "", | ||
| 194 | + user_id=args.user_id or "", | ||
| 195 | + ) | ||
| 196 | + _output({"success": True, "message": "回复成功"}) | ||
| 197 | + finally: | ||
| 198 | + browser.close_page(page) | ||
| 199 | + browser.close() | ||
| 200 | + | ||
| 201 | + | ||
| 202 | +def cmd_like_feed(args: argparse.Namespace) -> None: | ||
| 203 | + """点赞/取消点赞。""" | ||
| 204 | + from xhs.like_favorite import like_feed, unlike_feed | ||
| 205 | + | ||
| 206 | + browser, page = _connect(args) | ||
| 207 | + try: | ||
| 208 | + if args.unlike: | ||
| 209 | + result = unlike_feed(page, args.feed_id, args.xsec_token) | ||
| 210 | + else: | ||
| 211 | + result = like_feed(page, args.feed_id, args.xsec_token) | ||
| 212 | + _output(result.to_dict()) | ||
| 213 | + finally: | ||
| 214 | + browser.close_page(page) | ||
| 215 | + browser.close() | ||
| 216 | + | ||
| 217 | + | ||
| 218 | +def cmd_favorite_feed(args: argparse.Namespace) -> None: | ||
| 219 | + """收藏/取消收藏。""" | ||
| 220 | + from xhs.like_favorite import favorite_feed, unfavorite_feed | ||
| 221 | + | ||
| 222 | + browser, page = _connect(args) | ||
| 223 | + try: | ||
| 224 | + if args.unfavorite: | ||
| 225 | + result = unfavorite_feed(page, args.feed_id, args.xsec_token) | ||
| 226 | + else: | ||
| 227 | + result = favorite_feed(page, args.feed_id, args.xsec_token) | ||
| 228 | + _output(result.to_dict()) | ||
| 229 | + finally: | ||
| 230 | + browser.close_page(page) | ||
| 231 | + browser.close() | ||
| 232 | + | ||
| 233 | + | ||
| 234 | +def cmd_publish(args: argparse.Namespace) -> None: | ||
| 235 | + """发布图文内容。""" | ||
| 236 | + from image_downloader import process_images | ||
| 237 | + from xhs.publish import publish_image_content | ||
| 238 | + from xhs.types import PublishImageContent | ||
| 239 | + | ||
| 240 | + # 读取标题和正文 | ||
| 241 | + with open(args.title_file, encoding="utf-8") as f: | ||
| 242 | + title = f.read().strip() | ||
| 243 | + with open(args.content_file, encoding="utf-8") as f: | ||
| 244 | + content = f.read().strip() | ||
| 245 | + | ||
| 246 | + # 处理图片 | ||
| 247 | + image_paths = process_images(args.images) if args.images else [] | ||
| 248 | + if not image_paths: | ||
| 249 | + _output({"success": False, "error": "没有有效的图片"}, exit_code=2) | ||
| 250 | + | ||
| 251 | + browser, page = _connect(args) | ||
| 252 | + try: | ||
| 253 | + publish_image_content( | ||
| 254 | + page, | ||
| 255 | + PublishImageContent( | ||
| 256 | + title=title, | ||
| 257 | + content=content, | ||
| 258 | + tags=args.tags or [], | ||
| 259 | + image_paths=image_paths, | ||
| 260 | + schedule_time=args.schedule_at, | ||
| 261 | + is_original=args.original, | ||
| 262 | + visibility=args.visibility or "", | ||
| 263 | + ), | ||
| 264 | + ) | ||
| 265 | + _output({"success": True, "title": title, "images": len(image_paths), "status": "发布完成"}) | ||
| 266 | + finally: | ||
| 267 | + browser.close_page(page) | ||
| 268 | + browser.close() | ||
| 269 | + | ||
| 270 | + | ||
| 271 | +def cmd_publish_video(args: argparse.Namespace) -> None: | ||
| 272 | + """发布视频内容。""" | ||
| 273 | + from xhs.publish_video import publish_video_content | ||
| 274 | + from xhs.types import PublishVideoContent | ||
| 275 | + | ||
| 276 | + with open(args.title_file, encoding="utf-8") as f: | ||
| 277 | + title = f.read().strip() | ||
| 278 | + with open(args.content_file, encoding="utf-8") as f: | ||
| 279 | + content = f.read().strip() | ||
| 280 | + | ||
| 281 | + browser, page = _connect(args) | ||
| 282 | + try: | ||
| 283 | + publish_video_content( | ||
| 284 | + page, | ||
| 285 | + PublishVideoContent( | ||
| 286 | + title=title, | ||
| 287 | + content=content, | ||
| 288 | + tags=args.tags or [], | ||
| 289 | + video_path=args.video, | ||
| 290 | + schedule_time=args.schedule_at, | ||
| 291 | + visibility=args.visibility or "", | ||
| 292 | + ), | ||
| 293 | + ) | ||
| 294 | + _output({"success": True, "title": title, "video": args.video, "status": "发布完成"}) | ||
| 295 | + finally: | ||
| 296 | + browser.close_page(page) | ||
| 297 | + browser.close() | ||
| 298 | + | ||
| 299 | + | ||
| 300 | +# ========== 参数解析 ========== | ||
| 301 | + | ||
| 302 | + | ||
| 303 | +def build_parser() -> argparse.ArgumentParser: | ||
| 304 | + """构建 CLI 参数解析器。""" | ||
| 305 | + parser = argparse.ArgumentParser( | ||
| 306 | + prog="xhs-cli", | ||
| 307 | + description="小红书自动化 CLI", | ||
| 308 | + ) | ||
| 309 | + | ||
| 310 | + # 全局选项 | ||
| 311 | + parser.add_argument("--host", default="127.0.0.1", help="Chrome 调试主机 (default: 127.0.0.1)") | ||
| 312 | + parser.add_argument("--port", type=int, default=9222, help="Chrome 调试端口 (default: 9222)") | ||
| 313 | + parser.add_argument("--account", default="", help="账号名称") | ||
| 314 | + | ||
| 315 | + subparsers = parser.add_subparsers(dest="command", required=True) | ||
| 316 | + | ||
| 317 | + # check-login | ||
| 318 | + sub = subparsers.add_parser("check-login", help="检查登录状态") | ||
| 319 | + sub.set_defaults(func=cmd_check_login) | ||
| 320 | + | ||
| 321 | + # login | ||
| 322 | + sub = subparsers.add_parser("login", help="登录(扫码)") | ||
| 323 | + sub.set_defaults(func=cmd_login) | ||
| 324 | + | ||
| 325 | + # delete-cookies | ||
| 326 | + sub = subparsers.add_parser("delete-cookies", help="删除 cookies") | ||
| 327 | + sub.set_defaults(func=cmd_delete_cookies) | ||
| 328 | + | ||
| 329 | + # list-feeds | ||
| 330 | + sub = subparsers.add_parser("list-feeds", help="获取首页 Feed 列表") | ||
| 331 | + sub.set_defaults(func=cmd_list_feeds) | ||
| 332 | + | ||
| 333 | + # search-feeds | ||
| 334 | + sub = subparsers.add_parser("search-feeds", help="搜索 Feeds") | ||
| 335 | + sub.add_argument("--keyword", required=True, help="搜索关键词") | ||
| 336 | + sub.add_argument("--sort-by", help="排序: 综合|最新|最多点赞|最多评论|最多收藏") | ||
| 337 | + sub.add_argument("--note-type", help="类型: 不限|视频|图文") | ||
| 338 | + sub.add_argument("--publish-time", help="时间: 不限|一天内|一周内|半年内") | ||
| 339 | + sub.add_argument("--search-scope", help="范围: 不限|已看过|未看过|已关注") | ||
| 340 | + sub.add_argument("--location", help="位置: 不限|同城|附近") | ||
| 341 | + sub.set_defaults(func=cmd_search_feeds) | ||
| 342 | + | ||
| 343 | + # get-feed-detail | ||
| 344 | + sub = subparsers.add_parser("get-feed-detail", help="获取 Feed 详情") | ||
| 345 | + sub.add_argument("--feed-id", required=True, help="Feed ID") | ||
| 346 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 347 | + sub.add_argument("--load-all-comments", action="store_true", help="加载全部评论") | ||
| 348 | + sub.add_argument("--click-more-replies", action="store_true", help="点击展开更多回复") | ||
| 349 | + sub.add_argument("--max-replies-threshold", type=int, default=10, help="展开回复数阈值") | ||
| 350 | + sub.add_argument("--max-comment-items", type=int, default=0, help="最大评论数 (0=不限)") | ||
| 351 | + sub.add_argument("--scroll-speed", default="normal", help="滚动速度: slow|normal|fast") | ||
| 352 | + sub.set_defaults(func=cmd_get_feed_detail) | ||
| 353 | + | ||
| 354 | + # user-profile | ||
| 355 | + sub = subparsers.add_parser("user-profile", help="获取用户主页") | ||
| 356 | + sub.add_argument("--user-id", required=True, help="用户 ID") | ||
| 357 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 358 | + sub.set_defaults(func=cmd_user_profile) | ||
| 359 | + | ||
| 360 | + # post-comment | ||
| 361 | + sub = subparsers.add_parser("post-comment", help="发表评论") | ||
| 362 | + sub.add_argument("--feed-id", required=True, help="Feed ID") | ||
| 363 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 364 | + sub.add_argument("--content", required=True, help="评论内容") | ||
| 365 | + sub.set_defaults(func=cmd_post_comment) | ||
| 366 | + | ||
| 367 | + # reply-comment | ||
| 368 | + sub = subparsers.add_parser("reply-comment", help="回复评论") | ||
| 369 | + sub.add_argument("--feed-id", required=True, help="Feed ID") | ||
| 370 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 371 | + sub.add_argument("--content", required=True, help="回复内容") | ||
| 372 | + sub.add_argument("--comment-id", help="目标评论 ID") | ||
| 373 | + sub.add_argument("--user-id", help="目标用户 ID") | ||
| 374 | + sub.set_defaults(func=cmd_reply_comment) | ||
| 375 | + | ||
| 376 | + # like-feed | ||
| 377 | + sub = subparsers.add_parser("like-feed", help="点赞") | ||
| 378 | + sub.add_argument("--feed-id", required=True, help="Feed ID") | ||
| 379 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 380 | + sub.add_argument("--unlike", action="store_true", help="取消点赞") | ||
| 381 | + sub.set_defaults(func=cmd_like_feed) | ||
| 382 | + | ||
| 383 | + # favorite-feed | ||
| 384 | + sub = subparsers.add_parser("favorite-feed", help="收藏") | ||
| 385 | + sub.add_argument("--feed-id", required=True, help="Feed ID") | ||
| 386 | + sub.add_argument("--xsec-token", required=True, help="xsec_token") | ||
| 387 | + sub.add_argument("--unfavorite", action="store_true", help="取消收藏") | ||
| 388 | + sub.set_defaults(func=cmd_favorite_feed) | ||
| 389 | + | ||
| 390 | + # publish | ||
| 391 | + sub = subparsers.add_parser("publish", help="发布图文") | ||
| 392 | + sub.add_argument("--title-file", required=True, help="标题文件路径") | ||
| 393 | + sub.add_argument("--content-file", required=True, help="正文文件路径") | ||
| 394 | + sub.add_argument("--images", nargs="+", required=True, help="图片路径/URL") | ||
| 395 | + sub.add_argument("--tags", nargs="*", help="标签") | ||
| 396 | + sub.add_argument("--schedule-at", help="定时发布 (ISO8601)") | ||
| 397 | + sub.add_argument("--original", action="store_true", help="声明原创") | ||
| 398 | + sub.add_argument("--visibility", help="可见范围") | ||
| 399 | + sub.set_defaults(func=cmd_publish) | ||
| 400 | + | ||
| 401 | + # publish-video | ||
| 402 | + sub = subparsers.add_parser("publish-video", help="发布视频") | ||
| 403 | + sub.add_argument("--title-file", required=True, help="标题文件路径") | ||
| 404 | + sub.add_argument("--content-file", required=True, help="正文文件路径") | ||
| 405 | + sub.add_argument("--video", required=True, help="视频文件路径") | ||
| 406 | + sub.add_argument("--tags", nargs="*", help="标签") | ||
| 407 | + sub.add_argument("--schedule-at", help="定时发布 (ISO8601)") | ||
| 408 | + sub.add_argument("--visibility", help="可见范围") | ||
| 409 | + sub.set_defaults(func=cmd_publish_video) | ||
| 410 | + | ||
| 411 | + return parser | ||
| 412 | + | ||
| 413 | + | ||
| 414 | +def main() -> None: | ||
| 415 | + """CLI 入口。""" | ||
| 416 | + parser = build_parser() | ||
| 417 | + args = parser.parse_args() | ||
| 418 | + | ||
| 419 | + try: | ||
| 420 | + args.func(args) | ||
| 421 | + except Exception as e: | ||
| 422 | + logger.error("执行失败: %s", e, exc_info=True) | ||
| 423 | + _output({"success": False, "error": str(e)}, exit_code=2) | ||
| 424 | + | ||
| 425 | + | ||
| 426 | +if __name__ == "__main__": | ||
| 427 | + main() |
scripts/image_downloader.py
0 → 100644
| 1 | +"""媒体下载(SHA256 缓存),对应 Go pkg/downloader/images.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import hashlib | ||
| 6 | +import logging | ||
| 7 | +import os | ||
| 8 | +import time | ||
| 9 | +from urllib.parse import urlparse | ||
| 10 | + | ||
| 11 | +import requests | ||
| 12 | + | ||
| 13 | +logger = logging.getLogger(__name__) | ||
| 14 | + | ||
| 15 | +_USER_AGENT = ( | ||
| 16 | + "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " | ||
| 17 | + "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" | ||
| 18 | +) | ||
| 19 | + | ||
| 20 | +# 已知图片扩展名 | ||
| 21 | +_IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".svg"} | ||
| 22 | + | ||
| 23 | + | ||
| 24 | +def is_image_url(path: str) -> bool: | ||
| 25 | + """判断字符串是否为图片/媒体 URL。""" | ||
| 26 | + return path.lower().startswith(("http://", "https://")) | ||
| 27 | + | ||
| 28 | + | ||
| 29 | +class ImageDownloader: | ||
| 30 | + """图片下载器(带 SHA256 缓存)。""" | ||
| 31 | + | ||
| 32 | + def __init__(self, save_path: str) -> None: | ||
| 33 | + self.save_path = save_path | ||
| 34 | + os.makedirs(save_path, exist_ok=True) | ||
| 35 | + self._session = requests.Session() | ||
| 36 | + self._session.timeout = 30 | ||
| 37 | + | ||
| 38 | + def download_image(self, image_url: str) -> str: | ||
| 39 | + """下载单张图片,返回本地文件路径。 | ||
| 40 | + | ||
| 41 | + 如果文件已存在(通过 URL hash 判断),直接返回路径。 | ||
| 42 | + | ||
| 43 | + Raises: | ||
| 44 | + ValueError: URL 格式无效。 | ||
| 45 | + RuntimeError: 下载失败。 | ||
| 46 | + """ | ||
| 47 | + if not is_image_url(image_url): | ||
| 48 | + raise ValueError(f"无效的图片 URL: {image_url}") | ||
| 49 | + | ||
| 50 | + # 生成文件名 | ||
| 51 | + url_hash = hashlib.sha256(image_url.encode()).hexdigest()[:16] | ||
| 52 | + ext = self._detect_extension(image_url) | ||
| 53 | + filename = f"img_{url_hash}_{int(time.time())}{ext}" | ||
| 54 | + filepath = os.path.join(self.save_path, filename) | ||
| 55 | + | ||
| 56 | + # 检查是否已有同 hash 的文件 | ||
| 57 | + existing = self._find_existing(url_hash) | ||
| 58 | + if existing: | ||
| 59 | + return existing | ||
| 60 | + | ||
| 61 | + # 下载 | ||
| 62 | + parsed = urlparse(image_url) | ||
| 63 | + headers = { | ||
| 64 | + "User-Agent": _USER_AGENT, | ||
| 65 | + "Referer": f"{parsed.scheme}://{parsed.hostname}/", | ||
| 66 | + } | ||
| 67 | + | ||
| 68 | + resp = self._session.get(image_url, headers=headers) | ||
| 69 | + if resp.status_code != 200: | ||
| 70 | + raise RuntimeError(f"下载失败 (status={resp.status_code}): {image_url}") | ||
| 71 | + | ||
| 72 | + # 保存 | ||
| 73 | + with open(filepath, "wb") as f: | ||
| 74 | + f.write(resp.content) | ||
| 75 | + | ||
| 76 | + logger.info("下载完成: %s -> %s", image_url, filepath) | ||
| 77 | + return filepath | ||
| 78 | + | ||
| 79 | + def download_images(self, image_urls: list[str]) -> list[str]: | ||
| 80 | + """批量下载图片。""" | ||
| 81 | + paths = [] | ||
| 82 | + for url in image_urls: | ||
| 83 | + try: | ||
| 84 | + path = self.download_image(url) | ||
| 85 | + paths.append(path) | ||
| 86 | + except Exception as e: | ||
| 87 | + logger.error("下载失败 %s: %s", url, e) | ||
| 88 | + return paths | ||
| 89 | + | ||
| 90 | + def _detect_extension(self, url: str) -> str: | ||
| 91 | + """从 URL 推断文件扩展名。""" | ||
| 92 | + parsed = urlparse(url) | ||
| 93 | + path = parsed.path.lower() | ||
| 94 | + for ext in _IMAGE_EXTENSIONS: | ||
| 95 | + if path.endswith(ext): | ||
| 96 | + return ext | ||
| 97 | + return ".jpg" # 默认 | ||
| 98 | + | ||
| 99 | + def _find_existing(self, url_hash: str) -> str | None: | ||
| 100 | + """查找已有同 hash 的文件。""" | ||
| 101 | + prefix = f"img_{url_hash}_" | ||
| 102 | + for filename in os.listdir(self.save_path): | ||
| 103 | + if filename.startswith(prefix): | ||
| 104 | + return os.path.join(self.save_path, filename) | ||
| 105 | + return None | ||
| 106 | + | ||
| 107 | + | ||
| 108 | +def process_images(images: list[str], save_dir: str | None = None) -> list[str]: | ||
| 109 | + """处理图片列表(URL 下载,本地路径直接返回)。""" | ||
| 110 | + if not save_dir: | ||
| 111 | + save_dir = os.path.join(os.path.expanduser("~"), ".xhs", "images") | ||
| 112 | + | ||
| 113 | + downloader = ImageDownloader(save_dir) | ||
| 114 | + result = [] | ||
| 115 | + | ||
| 116 | + for img in images: | ||
| 117 | + if is_image_url(img): | ||
| 118 | + path = downloader.download_image(img) | ||
| 119 | + result.append(path) | ||
| 120 | + else: | ||
| 121 | + # 本地路径 | ||
| 122 | + if os.path.exists(img): | ||
| 123 | + result.append(os.path.abspath(img)) | ||
| 124 | + else: | ||
| 125 | + logger.warning("文件不存在: %s", img) | ||
| 126 | + | ||
| 127 | + return result |
scripts/publish_pipeline.py
0 → 100644
| 1 | +"""发布编排器:下载 → 登录检查 → 发布 → 报告。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import sys | ||
| 8 | + | ||
| 9 | +from image_downloader import process_images | ||
| 10 | +from title_utils import calc_title_length | ||
| 11 | +from xhs.cdp import Browser | ||
| 12 | +from xhs.login import check_login_status | ||
| 13 | +from xhs.publish import publish_image_content | ||
| 14 | +from xhs.publish_video import publish_video_content | ||
| 15 | +from xhs.types import PublishImageContent, PublishVideoContent | ||
| 16 | + | ||
| 17 | +logger = logging.getLogger(__name__) | ||
| 18 | + | ||
| 19 | + | ||
| 20 | +def run_publish_pipeline( | ||
| 21 | + title: str, | ||
| 22 | + content: str, | ||
| 23 | + images: list[str] | None = None, | ||
| 24 | + video: str | None = None, | ||
| 25 | + tags: list[str] | None = None, | ||
| 26 | + schedule_time: str | None = None, | ||
| 27 | + is_original: bool = False, | ||
| 28 | + visibility: str = "", | ||
| 29 | + host: str = "127.0.0.1", | ||
| 30 | + port: int = 9222, | ||
| 31 | + account: str = "", | ||
| 32 | +) -> dict: | ||
| 33 | + """执行完整发布流水线。 | ||
| 34 | + | ||
| 35 | + Returns: | ||
| 36 | + 发布结果字典。 | ||
| 37 | + """ | ||
| 38 | + # 标题长度校验 | ||
| 39 | + title_len = calc_title_length(title) | ||
| 40 | + if title_len > 20: | ||
| 41 | + return {"success": False, "error": f"标题长度超限: {title_len}/20"} | ||
| 42 | + | ||
| 43 | + # 处理图片(下载 URL / 验证本地路径) | ||
| 44 | + local_images: list[str] = [] | ||
| 45 | + if images: | ||
| 46 | + local_images = process_images(images) | ||
| 47 | + if not local_images: | ||
| 48 | + return {"success": False, "error": "没有有效的图片"} | ||
| 49 | + | ||
| 50 | + # 连接浏览器 | ||
| 51 | + browser = Browser(host=host, port=port) | ||
| 52 | + browser.connect() | ||
| 53 | + | ||
| 54 | + try: | ||
| 55 | + page = browser.new_page() | ||
| 56 | + try: | ||
| 57 | + # 登录检查 | ||
| 58 | + if not check_login_status(page): | ||
| 59 | + return {"success": False, "error": "未登录", "exit_code": 1} | ||
| 60 | + | ||
| 61 | + # 发布 | ||
| 62 | + if video: | ||
| 63 | + publish_video_content( | ||
| 64 | + page, | ||
| 65 | + PublishVideoContent( | ||
| 66 | + title=title, | ||
| 67 | + content=content, | ||
| 68 | + tags=tags or [], | ||
| 69 | + video_path=video, | ||
| 70 | + schedule_time=schedule_time, | ||
| 71 | + visibility=visibility, | ||
| 72 | + ), | ||
| 73 | + ) | ||
| 74 | + else: | ||
| 75 | + publish_image_content( | ||
| 76 | + page, | ||
| 77 | + PublishImageContent( | ||
| 78 | + title=title, | ||
| 79 | + content=content, | ||
| 80 | + tags=tags or [], | ||
| 81 | + image_paths=local_images, | ||
| 82 | + schedule_time=schedule_time, | ||
| 83 | + is_original=is_original, | ||
| 84 | + visibility=visibility, | ||
| 85 | + ), | ||
| 86 | + ) | ||
| 87 | + | ||
| 88 | + return { | ||
| 89 | + "success": True, | ||
| 90 | + "title": title, | ||
| 91 | + "content_length": len(content), | ||
| 92 | + "images": len(local_images), | ||
| 93 | + "video": video or "", | ||
| 94 | + "status": "发布完成", | ||
| 95 | + } | ||
| 96 | + | ||
| 97 | + finally: | ||
| 98 | + browser.close_page(page) | ||
| 99 | + finally: | ||
| 100 | + browser.close() | ||
| 101 | + | ||
| 102 | + | ||
| 103 | +def main() -> None: | ||
| 104 | + """CLI 入口(被 cli.py 的 publish/publish-video 子命令调用时使用)。""" | ||
| 105 | + import argparse | ||
| 106 | + | ||
| 107 | + parser = argparse.ArgumentParser(description="小红书发布流水线") | ||
| 108 | + parser.add_argument("--title-file", required=True, help="标题文件路径") | ||
| 109 | + parser.add_argument("--content-file", required=True, help="正文文件路径") | ||
| 110 | + parser.add_argument("--images", nargs="*", help="图片路径或 URL 列表") | ||
| 111 | + parser.add_argument("--video", help="视频文件路径") | ||
| 112 | + parser.add_argument("--tags", nargs="*", help="标签列表") | ||
| 113 | + parser.add_argument("--schedule-at", help="定时发布时间 (ISO8601)") | ||
| 114 | + parser.add_argument("--original", action="store_true", help="声明原创") | ||
| 115 | + parser.add_argument("--visibility", default="", help="可见范围") | ||
| 116 | + parser.add_argument("--host", default="127.0.0.1") | ||
| 117 | + parser.add_argument("--port", type=int, default=9222) | ||
| 118 | + parser.add_argument("--account", default="") | ||
| 119 | + args = parser.parse_args() | ||
| 120 | + | ||
| 121 | + # 读取标题和正文 | ||
| 122 | + with open(args.title_file, encoding="utf-8") as f: | ||
| 123 | + title = f.read().strip() | ||
| 124 | + with open(args.content_file, encoding="utf-8") as f: | ||
| 125 | + content = f.read().strip() | ||
| 126 | + | ||
| 127 | + result = run_publish_pipeline( | ||
| 128 | + title=title, | ||
| 129 | + content=content, | ||
| 130 | + images=args.images, | ||
| 131 | + video=args.video, | ||
| 132 | + tags=args.tags, | ||
| 133 | + schedule_time=args.schedule_at, | ||
| 134 | + is_original=args.original, | ||
| 135 | + visibility=args.visibility, | ||
| 136 | + host=args.host, | ||
| 137 | + port=args.port, | ||
| 138 | + account=args.account, | ||
| 139 | + ) | ||
| 140 | + | ||
| 141 | + print(json.dumps(result, ensure_ascii=False, indent=2)) | ||
| 142 | + sys.exit(0 if result["success"] else 2) | ||
| 143 | + | ||
| 144 | + | ||
| 145 | +if __name__ == "__main__": | ||
| 146 | + main() |
scripts/run_lock.py
0 → 100644
| 1 | +"""单实例锁,防止多个进程同时操作浏览器。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import contextlib | ||
| 6 | +import logging | ||
| 7 | +import os | ||
| 8 | +import time | ||
| 9 | + | ||
| 10 | +logger = logging.getLogger(__name__) | ||
| 11 | + | ||
| 12 | +_DEFAULT_LOCK_FILE = os.path.join(os.path.expanduser("~"), ".xhs", "run.lock") | ||
| 13 | + | ||
| 14 | + | ||
| 15 | +class RunLock: | ||
| 16 | + """文件锁,确保同一时间只有一个进程在操作。""" | ||
| 17 | + | ||
| 18 | + def __init__(self, lock_file: str = _DEFAULT_LOCK_FILE) -> None: | ||
| 19 | + self.lock_file = lock_file | ||
| 20 | + self._fd: int | None = None | ||
| 21 | + | ||
| 22 | + def acquire(self, timeout: float = 30.0) -> bool: | ||
| 23 | + """获取锁。 | ||
| 24 | + | ||
| 25 | + Args: | ||
| 26 | + timeout: 超时时间(秒)。 | ||
| 27 | + | ||
| 28 | + Returns: | ||
| 29 | + True 获取成功,False 超时。 | ||
| 30 | + """ | ||
| 31 | + os.makedirs(os.path.dirname(self.lock_file), exist_ok=True) | ||
| 32 | + deadline = time.monotonic() + timeout | ||
| 33 | + | ||
| 34 | + while time.monotonic() < deadline: | ||
| 35 | + try: | ||
| 36 | + self._fd = os.open( | ||
| 37 | + self.lock_file, | ||
| 38 | + os.O_CREAT | os.O_EXCL | os.O_WRONLY, | ||
| 39 | + ) | ||
| 40 | + # 写入 PID | ||
| 41 | + os.write(self._fd, str(os.getpid()).encode()) | ||
| 42 | + logger.debug("获取锁成功: %s", self.lock_file) | ||
| 43 | + return True | ||
| 44 | + except FileExistsError: | ||
| 45 | + # 检查持有者是否还活着 | ||
| 46 | + if self._is_stale(): | ||
| 47 | + self._force_release() | ||
| 48 | + continue | ||
| 49 | + time.sleep(1) | ||
| 50 | + | ||
| 51 | + logger.warning("获取锁超时: %s", self.lock_file) | ||
| 52 | + return False | ||
| 53 | + | ||
| 54 | + def release(self) -> None: | ||
| 55 | + """释放锁。""" | ||
| 56 | + if self._fd is not None: | ||
| 57 | + with contextlib.suppress(OSError): | ||
| 58 | + os.close(self._fd) | ||
| 59 | + self._fd = None | ||
| 60 | + | ||
| 61 | + with contextlib.suppress(FileNotFoundError): | ||
| 62 | + os.remove(self.lock_file) | ||
| 63 | + | ||
| 64 | + logger.debug("释放锁: %s", self.lock_file) | ||
| 65 | + | ||
| 66 | + def _is_stale(self) -> bool: | ||
| 67 | + """检查锁文件是否已过时(持有进程已退出)。""" | ||
| 68 | + try: | ||
| 69 | + with open(self.lock_file) as f: | ||
| 70 | + pid = int(f.read().strip()) | ||
| 71 | + # 检查进程是否存在 | ||
| 72 | + os.kill(pid, 0) | ||
| 73 | + return False | ||
| 74 | + except (FileNotFoundError, ValueError, ProcessLookupError, PermissionError): | ||
| 75 | + return True | ||
| 76 | + | ||
| 77 | + def _force_release(self) -> None: | ||
| 78 | + """强制释放过时的锁。""" | ||
| 79 | + with contextlib.suppress(FileNotFoundError): | ||
| 80 | + os.remove(self.lock_file) | ||
| 81 | + logger.info("强制释放过时锁: %s", self.lock_file) | ||
| 82 | + | ||
| 83 | + def __enter__(self) -> RunLock: | ||
| 84 | + if not self.acquire(): | ||
| 85 | + raise TimeoutError(f"无法获取锁: {self.lock_file}") | ||
| 86 | + return self | ||
| 87 | + | ||
| 88 | + def __exit__(self, *args: object) -> None: | ||
| 89 | + self.release() |
scripts/title_utils.py
0 → 100644
| 1 | +"""UTF-16 标题长度计算,对应 Go pkg/xhsutil/title.go。""" | ||
| 2 | + | ||
| 3 | + | ||
| 4 | +def calc_title_length(s: str) -> int: | ||
| 5 | + """计算小红书标题长度。 | ||
| 6 | + | ||
| 7 | + 规则:非 ASCII 字符(中文、全角符号等)算 2 字节, | ||
| 8 | + ASCII 字符算 1 字节,最终结果向上取整除以 2。 | ||
| 9 | + | ||
| 10 | + Examples: | ||
| 11 | + >>> calc_title_length("你好世界") | ||
| 12 | + 4 | ||
| 13 | + >>> calc_title_length("hello") | ||
| 14 | + 3 | ||
| 15 | + >>> calc_title_length("OOTD穿搭分享") | ||
| 16 | + 6 | ||
| 17 | + """ | ||
| 18 | + byte_len = 0 | ||
| 19 | + # 用 UTF-16 编码来处理(包括 surrogate pairs) | ||
| 20 | + encoded = s.encode("utf-16-le") | ||
| 21 | + for i in range(0, len(encoded), 2): | ||
| 22 | + code_unit = int.from_bytes(encoded[i : i + 2], "little") | ||
| 23 | + if code_unit > 127: | ||
| 24 | + byte_len += 2 | ||
| 25 | + else: | ||
| 26 | + byte_len += 1 | ||
| 27 | + return (byte_len + 1) // 2 |
scripts/xhs/__init__.py
0 → 100644
| 1 | +"""小红书 CDP 自动化核心包。""" |
scripts/xhs/cdp.py
0 → 100644
| 1 | +"""CDP WebSocket 客户端(Browser, Page, Element),对应 Go browser/browser.go + go-rod API。 | ||
| 2 | + | ||
| 3 | +通过原生 WebSocket 与 Chrome DevTools Protocol 通信,实现浏览器自动化控制。 | ||
| 4 | +""" | ||
| 5 | + | ||
| 6 | +from __future__ import annotations | ||
| 7 | + | ||
| 8 | +import json | ||
| 9 | +import logging | ||
| 10 | +import time | ||
| 11 | +from typing import Any | ||
| 12 | + | ||
| 13 | +import requests | ||
| 14 | +import websockets.sync.client as ws_client | ||
| 15 | + | ||
| 16 | +from .errors import CDPError, ElementNotFoundError | ||
| 17 | +from .stealth import STEALTH_JS | ||
| 18 | + | ||
| 19 | +logger = logging.getLogger(__name__) | ||
| 20 | + | ||
| 21 | + | ||
| 22 | +class CDPClient: | ||
| 23 | + """底层 CDP WebSocket 通信客户端。""" | ||
| 24 | + | ||
| 25 | + def __init__(self, ws_url: str) -> None: | ||
| 26 | + self._ws = ws_client.connect(ws_url, max_size=50 * 1024 * 1024) | ||
| 27 | + self._id = 0 | ||
| 28 | + self._callbacks: dict[int, Any] = {} | ||
| 29 | + | ||
| 30 | + def send(self, method: str, params: dict | None = None) -> dict: | ||
| 31 | + """发送 CDP 命令并等待结果。""" | ||
| 32 | + self._id += 1 | ||
| 33 | + msg: dict[str, Any] = {"id": self._id, "method": method} | ||
| 34 | + if params: | ||
| 35 | + msg["params"] = params | ||
| 36 | + self._ws.send(json.dumps(msg)) | ||
| 37 | + return self._wait_for(self._id) | ||
| 38 | + | ||
| 39 | + def _wait_for(self, msg_id: int, timeout: float = 30.0) -> dict: | ||
| 40 | + """等待指定 id 的响应。""" | ||
| 41 | + deadline = time.monotonic() + timeout | ||
| 42 | + while time.monotonic() < deadline: | ||
| 43 | + try: | ||
| 44 | + raw = self._ws.recv(timeout=max(0.1, deadline - time.monotonic())) | ||
| 45 | + except TimeoutError: | ||
| 46 | + break | ||
| 47 | + data = json.loads(raw) | ||
| 48 | + if data.get("id") == msg_id: | ||
| 49 | + if "error" in data: | ||
| 50 | + raise CDPError(f"CDP 错误: {data['error']}") | ||
| 51 | + return data.get("result", {}) | ||
| 52 | + raise CDPError(f"等待 CDP 响应超时 (id={msg_id})") | ||
| 53 | + | ||
| 54 | + def close(self) -> None: | ||
| 55 | + import contextlib | ||
| 56 | + | ||
| 57 | + with contextlib.suppress(Exception): | ||
| 58 | + self._ws.close() | ||
| 59 | + | ||
| 60 | + | ||
| 61 | +class Page: | ||
| 62 | + """CDP 页面对象,封装常用操作。""" | ||
| 63 | + | ||
| 64 | + def __init__(self, cdp: CDPClient, target_id: str, session_id: str) -> None: | ||
| 65 | + self._cdp = cdp | ||
| 66 | + self.target_id = target_id | ||
| 67 | + self.session_id = session_id | ||
| 68 | + self._ws = cdp._ws | ||
| 69 | + self._id_counter = 1000 | ||
| 70 | + | ||
| 71 | + def _send_session(self, method: str, params: dict | None = None) -> dict: | ||
| 72 | + """向 session 发送命令。""" | ||
| 73 | + self._id_counter += 1 | ||
| 74 | + msg: dict[str, Any] = { | ||
| 75 | + "id": self._id_counter, | ||
| 76 | + "method": method, | ||
| 77 | + "sessionId": self.session_id, | ||
| 78 | + } | ||
| 79 | + if params: | ||
| 80 | + msg["params"] = params | ||
| 81 | + self._ws.send(json.dumps(msg)) | ||
| 82 | + return self._wait_session(self._id_counter) | ||
| 83 | + | ||
| 84 | + def _wait_session(self, msg_id: int, timeout: float = 60.0) -> dict: | ||
| 85 | + """等待 session 响应。""" | ||
| 86 | + deadline = time.monotonic() + timeout | ||
| 87 | + while time.monotonic() < deadline: | ||
| 88 | + try: | ||
| 89 | + raw = self._ws.recv(timeout=max(0.1, deadline - time.monotonic())) | ||
| 90 | + except TimeoutError: | ||
| 91 | + break | ||
| 92 | + data = json.loads(raw) | ||
| 93 | + if data.get("id") == msg_id: | ||
| 94 | + if "error" in data: | ||
| 95 | + raise CDPError(f"CDP 错误: {data['error']}") | ||
| 96 | + return data.get("result", {}) | ||
| 97 | + raise CDPError(f"等待 session 响应超时 (id={msg_id})") | ||
| 98 | + | ||
| 99 | + def navigate(self, url: str) -> None: | ||
| 100 | + """导航到指定 URL。""" | ||
| 101 | + logger.info("导航到: %s", url) | ||
| 102 | + self._send_session("Page.navigate", {"url": url}) | ||
| 103 | + | ||
| 104 | + def wait_for_load(self, timeout: float = 60.0) -> None: | ||
| 105 | + """等待页面加载完成(通过轮询 document.readyState)。""" | ||
| 106 | + deadline = time.monotonic() + timeout | ||
| 107 | + while time.monotonic() < deadline: | ||
| 108 | + try: | ||
| 109 | + state = self.evaluate("document.readyState") | ||
| 110 | + if state == "complete": | ||
| 111 | + return | ||
| 112 | + except CDPError: | ||
| 113 | + pass | ||
| 114 | + time.sleep(0.5) | ||
| 115 | + logger.warning("等待页面加载超时") | ||
| 116 | + | ||
| 117 | + def wait_dom_stable(self, timeout: float = 10.0, interval: float = 0.5) -> None: | ||
| 118 | + """等待 DOM 稳定(连续两次 DOM 快照一致)。""" | ||
| 119 | + last_html = "" | ||
| 120 | + deadline = time.monotonic() + timeout | ||
| 121 | + while time.monotonic() < deadline: | ||
| 122 | + try: | ||
| 123 | + html = self.evaluate("document.body ? document.body.innerHTML.length : 0") | ||
| 124 | + if html == last_html and html != "": | ||
| 125 | + return | ||
| 126 | + last_html = html | ||
| 127 | + except CDPError: | ||
| 128 | + pass | ||
| 129 | + time.sleep(interval) | ||
| 130 | + | ||
| 131 | + def evaluate(self, expression: str, timeout: float = 30.0) -> Any: | ||
| 132 | + """执行 JavaScript 表达式并返回结果。""" | ||
| 133 | + result = self._send_session( | ||
| 134 | + "Runtime.evaluate", | ||
| 135 | + { | ||
| 136 | + "expression": expression, | ||
| 137 | + "returnByValue": True, | ||
| 138 | + "awaitPromise": False, | ||
| 139 | + }, | ||
| 140 | + ) | ||
| 141 | + if "exceptionDetails" in result: | ||
| 142 | + raise CDPError(f"JS 执行异常: {result['exceptionDetails']}") | ||
| 143 | + remote_obj = result.get("result", {}) | ||
| 144 | + return remote_obj.get("value") | ||
| 145 | + | ||
| 146 | + def evaluate_function(self, function_body: str, *args: Any) -> Any: | ||
| 147 | + """执行 JavaScript 函数并返回结果。 | ||
| 148 | + | ||
| 149 | + function_body 是一个完整的函数体,如 `() => { return 1; }` | ||
| 150 | + """ | ||
| 151 | + result = self._send_session( | ||
| 152 | + "Runtime.evaluate", | ||
| 153 | + { | ||
| 154 | + "expression": f"({function_body})()", | ||
| 155 | + "returnByValue": True, | ||
| 156 | + "awaitPromise": False, | ||
| 157 | + }, | ||
| 158 | + ) | ||
| 159 | + if "exceptionDetails" in result: | ||
| 160 | + raise CDPError(f"JS 函数执行异常: {result['exceptionDetails']}") | ||
| 161 | + remote_obj = result.get("result", {}) | ||
| 162 | + return remote_obj.get("value") | ||
| 163 | + | ||
| 164 | + def query_selector(self, selector: str) -> str | None: | ||
| 165 | + """查找单个元素,返回 objectId 或 None。""" | ||
| 166 | + result = self._send_session( | ||
| 167 | + "Runtime.evaluate", | ||
| 168 | + { | ||
| 169 | + "expression": f"document.querySelector({json.dumps(selector)})", | ||
| 170 | + "returnByValue": False, | ||
| 171 | + }, | ||
| 172 | + ) | ||
| 173 | + remote_obj = result.get("result", {}) | ||
| 174 | + if remote_obj.get("subtype") == "null" or remote_obj.get("type") == "undefined": | ||
| 175 | + return None | ||
| 176 | + return remote_obj.get("objectId") | ||
| 177 | + | ||
| 178 | + def query_selector_all(self, selector: str) -> list[str]: | ||
| 179 | + """查找多个元素,返回 objectId 列表。""" | ||
| 180 | + # 通过 JS 返回元素数量,然后逐个获取 | ||
| 181 | + count = self.evaluate(f"document.querySelectorAll({json.dumps(selector)}).length") | ||
| 182 | + if not count: | ||
| 183 | + return [] | ||
| 184 | + object_ids = [] | ||
| 185 | + for i in range(count): | ||
| 186 | + result = self._send_session( | ||
| 187 | + "Runtime.evaluate", | ||
| 188 | + { | ||
| 189 | + "expression": (f"document.querySelectorAll({json.dumps(selector)})[{i}]"), | ||
| 190 | + "returnByValue": False, | ||
| 191 | + }, | ||
| 192 | + ) | ||
| 193 | + obj = result.get("result", {}) | ||
| 194 | + oid = obj.get("objectId") | ||
| 195 | + if oid: | ||
| 196 | + object_ids.append(oid) | ||
| 197 | + return object_ids | ||
| 198 | + | ||
| 199 | + def has_element(self, selector: str) -> bool: | ||
| 200 | + """检查元素是否存在。""" | ||
| 201 | + return self.evaluate(f"document.querySelector({json.dumps(selector)}) !== null") is True | ||
| 202 | + | ||
| 203 | + def wait_for_element(self, selector: str, timeout: float = 30.0) -> str: | ||
| 204 | + """等待元素出现,返回 objectId。""" | ||
| 205 | + deadline = time.monotonic() + timeout | ||
| 206 | + while time.monotonic() < deadline: | ||
| 207 | + oid = self.query_selector(selector) | ||
| 208 | + if oid: | ||
| 209 | + return oid | ||
| 210 | + time.sleep(0.5) | ||
| 211 | + raise ElementNotFoundError(selector) | ||
| 212 | + | ||
| 213 | + def click_element(self, selector: str) -> None: | ||
| 214 | + """点击指定选择器的元素。""" | ||
| 215 | + self.evaluate( | ||
| 216 | + f""" | ||
| 217 | + (() => {{ | ||
| 218 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 219 | + if (el) el.click(); | ||
| 220 | + }})() | ||
| 221 | + """ | ||
| 222 | + ) | ||
| 223 | + | ||
| 224 | + def input_text(self, selector: str, text: str) -> None: | ||
| 225 | + """向指定选择器的元素输入文本。""" | ||
| 226 | + self.evaluate( | ||
| 227 | + f""" | ||
| 228 | + (() => {{ | ||
| 229 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 230 | + if (!el) return; | ||
| 231 | + el.focus(); | ||
| 232 | + el.value = {json.dumps(text)}; | ||
| 233 | + el.dispatchEvent(new Event('input', {{bubbles: true}})); | ||
| 234 | + el.dispatchEvent(new Event('change', {{bubbles: true}})); | ||
| 235 | + }})() | ||
| 236 | + """ | ||
| 237 | + ) | ||
| 238 | + | ||
| 239 | + def input_content_editable(self, selector: str, text: str) -> None: | ||
| 240 | + """向 contentEditable 元素输入文本(如 div.ql-editor)。""" | ||
| 241 | + self.evaluate( | ||
| 242 | + f""" | ||
| 243 | + (() => {{ | ||
| 244 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 245 | + if (!el) return; | ||
| 246 | + el.focus(); | ||
| 247 | + el.textContent = {json.dumps(text)}; | ||
| 248 | + el.dispatchEvent(new Event('input', {{bubbles: true}})); | ||
| 249 | + }})() | ||
| 250 | + """ | ||
| 251 | + ) | ||
| 252 | + | ||
| 253 | + def get_element_text(self, selector: str) -> str | None: | ||
| 254 | + """获取元素文本内容。""" | ||
| 255 | + return self.evaluate( | ||
| 256 | + f""" | ||
| 257 | + (() => {{ | ||
| 258 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 259 | + return el ? el.textContent : null; | ||
| 260 | + }})() | ||
| 261 | + """ | ||
| 262 | + ) | ||
| 263 | + | ||
| 264 | + def get_element_attribute(self, selector: str, attr: str) -> str | None: | ||
| 265 | + """获取元素属性值。""" | ||
| 266 | + return self.evaluate( | ||
| 267 | + f""" | ||
| 268 | + (() => {{ | ||
| 269 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 270 | + return el ? el.getAttribute({json.dumps(attr)}) : null; | ||
| 271 | + }})() | ||
| 272 | + """ | ||
| 273 | + ) | ||
| 274 | + | ||
| 275 | + def get_elements_count(self, selector: str) -> int: | ||
| 276 | + """获取匹配元素数量。""" | ||
| 277 | + result = self.evaluate(f"document.querySelectorAll({json.dumps(selector)}).length") | ||
| 278 | + return result if isinstance(result, int) else 0 | ||
| 279 | + | ||
| 280 | + def scroll_by(self, x: int, y: int) -> None: | ||
| 281 | + """滚动页面。""" | ||
| 282 | + self.evaluate(f"window.scrollBy({x}, {y})") | ||
| 283 | + | ||
| 284 | + def scroll_to(self, x: int, y: int) -> None: | ||
| 285 | + """滚动到指定位置。""" | ||
| 286 | + self.evaluate(f"window.scrollTo({x}, {y})") | ||
| 287 | + | ||
| 288 | + def scroll_to_bottom(self) -> None: | ||
| 289 | + """滚动到页面底部。""" | ||
| 290 | + self.evaluate("window.scrollTo(0, document.body.scrollHeight)") | ||
| 291 | + | ||
| 292 | + def scroll_element_into_view(self, selector: str) -> None: | ||
| 293 | + """将元素滚动到可视区域。""" | ||
| 294 | + self.evaluate( | ||
| 295 | + f""" | ||
| 296 | + (() => {{ | ||
| 297 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 298 | + if (el) el.scrollIntoView({{behavior: 'smooth', block: 'center'}}); | ||
| 299 | + }})() | ||
| 300 | + """ | ||
| 301 | + ) | ||
| 302 | + | ||
| 303 | + def scroll_nth_element_into_view(self, selector: str, index: int) -> None: | ||
| 304 | + """将第 N 个匹配元素滚动到可视区域。""" | ||
| 305 | + self.evaluate( | ||
| 306 | + f""" | ||
| 307 | + (() => {{ | ||
| 308 | + const els = document.querySelectorAll({json.dumps(selector)}); | ||
| 309 | + if (els[{index}]) els[{index}].scrollIntoView( | ||
| 310 | + {{behavior: 'smooth', block: 'center'}} | ||
| 311 | + ); | ||
| 312 | + }})() | ||
| 313 | + """ | ||
| 314 | + ) | ||
| 315 | + | ||
| 316 | + def get_scroll_top(self) -> int: | ||
| 317 | + """获取当前滚动位置。""" | ||
| 318 | + result = self.evaluate( | ||
| 319 | + "window.pageYOffset || document.documentElement.scrollTop" | ||
| 320 | + " || document.body.scrollTop || 0" | ||
| 321 | + ) | ||
| 322 | + return int(result) if result else 0 | ||
| 323 | + | ||
| 324 | + def get_viewport_height(self) -> int: | ||
| 325 | + """获取视口高度。""" | ||
| 326 | + result = self.evaluate("window.innerHeight") | ||
| 327 | + return int(result) if result else 768 | ||
| 328 | + | ||
| 329 | + def set_file_input(self, selector: str, files: list[str]) -> None: | ||
| 330 | + """设置文件输入框的文件(通过 CDP DOM.setFileInputFiles)。""" | ||
| 331 | + # 先获取 nodeId | ||
| 332 | + doc = self._send_session("DOM.getDocument", {"depth": 0}) | ||
| 333 | + root_node_id = doc["root"]["nodeId"] | ||
| 334 | + result = self._send_session( | ||
| 335 | + "DOM.querySelector", | ||
| 336 | + {"nodeId": root_node_id, "selector": selector}, | ||
| 337 | + ) | ||
| 338 | + node_id = result.get("nodeId", 0) | ||
| 339 | + if node_id == 0: | ||
| 340 | + raise ElementNotFoundError(selector) | ||
| 341 | + self._send_session( | ||
| 342 | + "DOM.setFileInputFiles", | ||
| 343 | + {"nodeId": node_id, "files": files}, | ||
| 344 | + ) | ||
| 345 | + | ||
| 346 | + def dispatch_wheel_event(self, delta_y: float) -> None: | ||
| 347 | + """触发滚轮事件以激活懒加载。""" | ||
| 348 | + self.evaluate( | ||
| 349 | + f""" | ||
| 350 | + (() => {{ | ||
| 351 | + let target = document.querySelector('.note-scroller') | ||
| 352 | + || document.querySelector('.interaction-container') | ||
| 353 | + || document.documentElement; | ||
| 354 | + const event = new WheelEvent('wheel', {{ | ||
| 355 | + deltaY: {delta_y}, | ||
| 356 | + deltaMode: 0, | ||
| 357 | + bubbles: true, | ||
| 358 | + cancelable: true, | ||
| 359 | + view: window, | ||
| 360 | + }}); | ||
| 361 | + target.dispatchEvent(event); | ||
| 362 | + }})() | ||
| 363 | + """ | ||
| 364 | + ) | ||
| 365 | + | ||
| 366 | + def mouse_move(self, x: float, y: float) -> None: | ||
| 367 | + """移动鼠标。""" | ||
| 368 | + self._send_session( | ||
| 369 | + "Input.dispatchMouseEvent", | ||
| 370 | + {"type": "mouseMoved", "x": x, "y": y}, | ||
| 371 | + ) | ||
| 372 | + | ||
| 373 | + def mouse_click(self, x: float, y: float, button: str = "left") -> None: | ||
| 374 | + """在指定坐标点击。""" | ||
| 375 | + self._send_session( | ||
| 376 | + "Input.dispatchMouseEvent", | ||
| 377 | + {"type": "mousePressed", "x": x, "y": y, "button": button, "clickCount": 1}, | ||
| 378 | + ) | ||
| 379 | + self._send_session( | ||
| 380 | + "Input.dispatchMouseEvent", | ||
| 381 | + {"type": "mouseReleased", "x": x, "y": y, "button": button, "clickCount": 1}, | ||
| 382 | + ) | ||
| 383 | + | ||
| 384 | + def type_text(self, text: str, delay_ms: int = 50) -> None: | ||
| 385 | + """逐字符输入文本。""" | ||
| 386 | + for char in text: | ||
| 387 | + self._send_session( | ||
| 388 | + "Input.dispatchKeyEvent", | ||
| 389 | + {"type": "keyDown", "text": char}, | ||
| 390 | + ) | ||
| 391 | + self._send_session( | ||
| 392 | + "Input.dispatchKeyEvent", | ||
| 393 | + {"type": "keyUp", "text": char}, | ||
| 394 | + ) | ||
| 395 | + if delay_ms > 0: | ||
| 396 | + time.sleep(delay_ms / 1000.0) | ||
| 397 | + | ||
| 398 | + def press_key(self, key: str) -> None: | ||
| 399 | + """按下并释放指定键。""" | ||
| 400 | + key_map = { | ||
| 401 | + "Enter": {"key": "Enter", "code": "Enter", "windowsVirtualKeyCode": 13}, | ||
| 402 | + "ArrowDown": { | ||
| 403 | + "key": "ArrowDown", | ||
| 404 | + "code": "ArrowDown", | ||
| 405 | + "windowsVirtualKeyCode": 40, | ||
| 406 | + }, | ||
| 407 | + "Tab": {"key": "Tab", "code": "Tab", "windowsVirtualKeyCode": 9}, | ||
| 408 | + } | ||
| 409 | + info = key_map.get(key, {"key": key, "code": key}) | ||
| 410 | + self._send_session( | ||
| 411 | + "Input.dispatchKeyEvent", | ||
| 412 | + {"type": "keyDown", **info}, | ||
| 413 | + ) | ||
| 414 | + self._send_session( | ||
| 415 | + "Input.dispatchKeyEvent", | ||
| 416 | + {"type": "keyUp", **info}, | ||
| 417 | + ) | ||
| 418 | + | ||
| 419 | + def inject_stealth(self) -> None: | ||
| 420 | + """注入反检测脚本。""" | ||
| 421 | + self._send_session( | ||
| 422 | + "Page.addScriptToEvaluateOnNewDocument", | ||
| 423 | + {"source": STEALTH_JS}, | ||
| 424 | + ) | ||
| 425 | + | ||
| 426 | + def remove_element(self, selector: str) -> None: | ||
| 427 | + """移除 DOM 元素。""" | ||
| 428 | + self.evaluate( | ||
| 429 | + f""" | ||
| 430 | + (() => {{ | ||
| 431 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 432 | + if (el) el.remove(); | ||
| 433 | + }})() | ||
| 434 | + """ | ||
| 435 | + ) | ||
| 436 | + | ||
| 437 | + def hover_element(self, selector: str) -> None: | ||
| 438 | + """悬停到元素中心。""" | ||
| 439 | + box = self.evaluate( | ||
| 440 | + f""" | ||
| 441 | + (() => {{ | ||
| 442 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 443 | + if (!el) return null; | ||
| 444 | + const rect = el.getBoundingClientRect(); | ||
| 445 | + return {{x: rect.left + rect.width / 2, y: rect.top + rect.height / 2}}; | ||
| 446 | + }})() | ||
| 447 | + """ | ||
| 448 | + ) | ||
| 449 | + if box: | ||
| 450 | + self.mouse_move(box["x"], box["y"]) | ||
| 451 | + | ||
| 452 | + def select_all_text(self, selector: str) -> None: | ||
| 453 | + """选中输入框内所有文本。""" | ||
| 454 | + self.evaluate( | ||
| 455 | + f""" | ||
| 456 | + (() => {{ | ||
| 457 | + const el = document.querySelector({json.dumps(selector)}); | ||
| 458 | + if (!el) return; | ||
| 459 | + el.focus(); | ||
| 460 | + el.select ? el.select() : document.execCommand('selectAll'); | ||
| 461 | + }})() | ||
| 462 | + """ | ||
| 463 | + ) | ||
| 464 | + | ||
| 465 | + | ||
| 466 | +class Browser: | ||
| 467 | + """Chrome 浏览器 CDP 控制器。""" | ||
| 468 | + | ||
| 469 | + def __init__(self, host: str = "127.0.0.1", port: int = 9222) -> None: | ||
| 470 | + self.host = host | ||
| 471 | + self.port = port | ||
| 472 | + self.base_url = f"http://{host}:{port}" | ||
| 473 | + self._cdp: CDPClient | None = None | ||
| 474 | + | ||
| 475 | + def connect(self) -> None: | ||
| 476 | + """连接到 Chrome DevTools。""" | ||
| 477 | + resp = requests.get(f"{self.base_url}/json/version", timeout=5) | ||
| 478 | + resp.raise_for_status() | ||
| 479 | + info = resp.json() | ||
| 480 | + ws_url = info["webSocketDebuggerUrl"] | ||
| 481 | + logger.info("连接到 Chrome: %s", ws_url) | ||
| 482 | + self._cdp = CDPClient(ws_url) | ||
| 483 | + | ||
| 484 | + def new_page(self, url: str = "about:blank") -> Page: | ||
| 485 | + """创建新页面。""" | ||
| 486 | + if not self._cdp: | ||
| 487 | + self.connect() | ||
| 488 | + assert self._cdp is not None | ||
| 489 | + | ||
| 490 | + # 创建 target | ||
| 491 | + result = self._cdp.send("Target.createTarget", {"url": url}) | ||
| 492 | + target_id = result["targetId"] | ||
| 493 | + | ||
| 494 | + # 附加到 target | ||
| 495 | + result = self._cdp.send( | ||
| 496 | + "Target.attachToTarget", | ||
| 497 | + {"targetId": target_id, "flatten": True}, | ||
| 498 | + ) | ||
| 499 | + session_id = result["sessionId"] | ||
| 500 | + | ||
| 501 | + page = Page(self._cdp, target_id, session_id) | ||
| 502 | + | ||
| 503 | + # 启用必要的 domain | ||
| 504 | + page._send_session("Page.enable") | ||
| 505 | + page._send_session("DOM.enable") | ||
| 506 | + page._send_session("Runtime.enable") | ||
| 507 | + | ||
| 508 | + # 注入反检测 | ||
| 509 | + page.inject_stealth() | ||
| 510 | + | ||
| 511 | + return page | ||
| 512 | + | ||
| 513 | + def get_existing_page(self) -> Page | None: | ||
| 514 | + """获取已有页面(取第一个非 about:blank 的 page target)。""" | ||
| 515 | + if not self._cdp: | ||
| 516 | + self.connect() | ||
| 517 | + assert self._cdp is not None | ||
| 518 | + | ||
| 519 | + resp = requests.get(f"{self.base_url}/json", timeout=5) | ||
| 520 | + targets = resp.json() | ||
| 521 | + | ||
| 522 | + for target in targets: | ||
| 523 | + if target.get("type") == "page" and target.get("url") != "about:blank": | ||
| 524 | + target_id = target["id"] | ||
| 525 | + result = self._cdp.send( | ||
| 526 | + "Target.attachToTarget", | ||
| 527 | + {"targetId": target_id, "flatten": True}, | ||
| 528 | + ) | ||
| 529 | + session_id = result["sessionId"] | ||
| 530 | + page = Page(self._cdp, target_id, session_id) | ||
| 531 | + page._send_session("Page.enable") | ||
| 532 | + page._send_session("DOM.enable") | ||
| 533 | + page._send_session("Runtime.enable") | ||
| 534 | + page.inject_stealth() | ||
| 535 | + return page | ||
| 536 | + return None | ||
| 537 | + | ||
| 538 | + def close_page(self, page: Page) -> None: | ||
| 539 | + """关闭页面。""" | ||
| 540 | + import contextlib | ||
| 541 | + | ||
| 542 | + if self._cdp: | ||
| 543 | + with contextlib.suppress(CDPError): | ||
| 544 | + self._cdp.send("Target.closeTarget", {"targetId": page.target_id}) | ||
| 545 | + | ||
| 546 | + def close(self) -> None: | ||
| 547 | + """关闭连接。""" | ||
| 548 | + if self._cdp: | ||
| 549 | + self._cdp.close() | ||
| 550 | + self._cdp = None |
scripts/xhs/comment.py
0 → 100644
| 1 | +"""评论操作,对应 Go xiaohongshu/comment_feed.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import logging | ||
| 6 | +import time | ||
| 7 | + | ||
| 8 | +from .cdp import Page | ||
| 9 | +from .feed_detail import _check_end_container, _check_page_accessible, _get_comment_count | ||
| 10 | +from .selectors import ( | ||
| 11 | + COMMENT_INPUT_FIELD, | ||
| 12 | + COMMENT_INPUT_TRIGGER, | ||
| 13 | + COMMENT_SUBMIT_BUTTON, | ||
| 14 | + PARENT_COMMENT, | ||
| 15 | + REPLY_BUTTON, | ||
| 16 | +) | ||
| 17 | +from .urls import make_feed_detail_url | ||
| 18 | + | ||
| 19 | +logger = logging.getLogger(__name__) | ||
| 20 | + | ||
| 21 | + | ||
| 22 | +def post_comment(page: Page, feed_id: str, xsec_token: str, content: str) -> None: | ||
| 23 | + """发表评论到 Feed。 | ||
| 24 | + | ||
| 25 | + Args: | ||
| 26 | + page: CDP 页面对象。 | ||
| 27 | + feed_id: Feed ID。 | ||
| 28 | + xsec_token: xsec_token。 | ||
| 29 | + content: 评论内容。 | ||
| 30 | + | ||
| 31 | + Raises: | ||
| 32 | + RuntimeError: 评论失败。 | ||
| 33 | + """ | ||
| 34 | + url = make_feed_detail_url(feed_id, xsec_token) | ||
| 35 | + logger.info("打开 feed 详情页: %s", url) | ||
| 36 | + | ||
| 37 | + page.navigate(url) | ||
| 38 | + page.wait_for_load() | ||
| 39 | + page.wait_dom_stable() | ||
| 40 | + time.sleep(1) | ||
| 41 | + | ||
| 42 | + _check_page_accessible(page) | ||
| 43 | + | ||
| 44 | + # 点击评论输入触发区域 | ||
| 45 | + if not page.has_element(COMMENT_INPUT_TRIGGER): | ||
| 46 | + raise RuntimeError("未找到评论输入框,该帖子可能不支持评论或网页端不可访问") | ||
| 47 | + | ||
| 48 | + page.click_element(COMMENT_INPUT_TRIGGER) | ||
| 49 | + time.sleep(0.5) | ||
| 50 | + | ||
| 51 | + # 输入评论内容 | ||
| 52 | + page.wait_for_element(COMMENT_INPUT_FIELD, timeout=5) | ||
| 53 | + page.evaluate( | ||
| 54 | + f""" | ||
| 55 | + (() => {{ | ||
| 56 | + const el = document.querySelector({_js_str(COMMENT_INPUT_FIELD)}); | ||
| 57 | + if (el) {{ | ||
| 58 | + el.focus(); | ||
| 59 | + el.textContent = {_js_str(content)}; | ||
| 60 | + el.dispatchEvent(new Event('input', {{bubbles: true}})); | ||
| 61 | + }} | ||
| 62 | + }})() | ||
| 63 | + """ | ||
| 64 | + ) | ||
| 65 | + time.sleep(1) | ||
| 66 | + | ||
| 67 | + # 点击提交 | ||
| 68 | + page.click_element(COMMENT_SUBMIT_BUTTON) | ||
| 69 | + time.sleep(1) | ||
| 70 | + | ||
| 71 | + logger.info("评论发送成功: feed=%s", feed_id) | ||
| 72 | + | ||
| 73 | + | ||
| 74 | +def reply_comment( | ||
| 75 | + page: Page, | ||
| 76 | + feed_id: str, | ||
| 77 | + xsec_token: str, | ||
| 78 | + content: str, | ||
| 79 | + comment_id: str = "", | ||
| 80 | + user_id: str = "", | ||
| 81 | +) -> None: | ||
| 82 | + """回复指定评论。 | ||
| 83 | + | ||
| 84 | + 通过 comment_id 或 user_id 定位评论,然后回复。 | ||
| 85 | + | ||
| 86 | + Args: | ||
| 87 | + page: CDP 页面对象。 | ||
| 88 | + feed_id: Feed ID。 | ||
| 89 | + xsec_token: xsec_token。 | ||
| 90 | + content: 回复内容。 | ||
| 91 | + comment_id: 评论 ID(优先使用)。 | ||
| 92 | + user_id: 用户 ID(备选)。 | ||
| 93 | + | ||
| 94 | + Raises: | ||
| 95 | + RuntimeError: 回复失败。 | ||
| 96 | + """ | ||
| 97 | + if not comment_id and not user_id: | ||
| 98 | + raise ValueError("comment_id 和 user_id 至少提供一个") | ||
| 99 | + | ||
| 100 | + url = make_feed_detail_url(feed_id, xsec_token) | ||
| 101 | + logger.info("打开 feed 详情页进行回复: %s", url) | ||
| 102 | + | ||
| 103 | + page.navigate(url) | ||
| 104 | + page.wait_for_load() | ||
| 105 | + page.wait_dom_stable() | ||
| 106 | + time.sleep(1) | ||
| 107 | + | ||
| 108 | + _check_page_accessible(page) | ||
| 109 | + time.sleep(2) | ||
| 110 | + | ||
| 111 | + # 查找目标评论 | ||
| 112 | + comment_found = _find_and_scroll_to_comment(page, comment_id, user_id) | ||
| 113 | + if not comment_found: | ||
| 114 | + raise RuntimeError(f"未找到评论 (commentID: {comment_id}, userID: {user_id})") | ||
| 115 | + | ||
| 116 | + time.sleep(1) | ||
| 117 | + | ||
| 118 | + # 点击回复按钮 | ||
| 119 | + reply_selector = f"#comment-{comment_id} {REPLY_BUTTON}" if comment_id else REPLY_BUTTON | ||
| 120 | + page.click_element(reply_selector) | ||
| 121 | + time.sleep(1) | ||
| 122 | + | ||
| 123 | + # 输入回复内容 | ||
| 124 | + page.wait_for_element(COMMENT_INPUT_FIELD, timeout=5) | ||
| 125 | + page.evaluate( | ||
| 126 | + f""" | ||
| 127 | + (() => {{ | ||
| 128 | + const el = document.querySelector({_js_str(COMMENT_INPUT_FIELD)}); | ||
| 129 | + if (el) {{ | ||
| 130 | + el.focus(); | ||
| 131 | + el.textContent = {_js_str(content)}; | ||
| 132 | + el.dispatchEvent(new Event('input', {{bubbles: true}})); | ||
| 133 | + }} | ||
| 134 | + }})() | ||
| 135 | + """ | ||
| 136 | + ) | ||
| 137 | + time.sleep(0.5) | ||
| 138 | + | ||
| 139 | + # 点击提交 | ||
| 140 | + page.click_element(COMMENT_SUBMIT_BUTTON) | ||
| 141 | + time.sleep(2) | ||
| 142 | + | ||
| 143 | + logger.info("回复评论成功") | ||
| 144 | + | ||
| 145 | + | ||
| 146 | +def _find_and_scroll_to_comment( | ||
| 147 | + page: Page, | ||
| 148 | + comment_id: str, | ||
| 149 | + user_id: str, | ||
| 150 | + max_attempts: int = 100, | ||
| 151 | +) -> bool: | ||
| 152 | + """查找并滚动到目标评论。""" | ||
| 153 | + logger.info("开始查找评论 - commentID: %s, userID: %s", comment_id, user_id) | ||
| 154 | + | ||
| 155 | + # 先滚动到评论区 | ||
| 156 | + page.scroll_element_into_view(".comments-container") | ||
| 157 | + time.sleep(1) | ||
| 158 | + | ||
| 159 | + last_count = 0 | ||
| 160 | + stagnant = 0 | ||
| 161 | + | ||
| 162 | + for attempt in range(max_attempts): | ||
| 163 | + # 检查是否到底 | ||
| 164 | + if _check_end_container(page): | ||
| 165 | + logger.info("已到达评论底部,未找到目标评论") | ||
| 166 | + break | ||
| 167 | + | ||
| 168 | + # 停滞检测 | ||
| 169 | + current_count = _get_comment_count(page) | ||
| 170 | + if current_count != last_count: | ||
| 171 | + last_count = current_count | ||
| 172 | + stagnant = 0 | ||
| 173 | + else: | ||
| 174 | + stagnant += 1 | ||
| 175 | + if stagnant >= 10: | ||
| 176 | + logger.info("评论数量停滞超过10次") | ||
| 177 | + break | ||
| 178 | + | ||
| 179 | + # 滚动到最后一条评论 | ||
| 180 | + if current_count > 0: | ||
| 181 | + page.scroll_nth_element_into_view(PARENT_COMMENT, current_count - 1) | ||
| 182 | + time.sleep(0.3) | ||
| 183 | + | ||
| 184 | + # 继续滚动 | ||
| 185 | + page.evaluate("window.scrollBy(0, window.innerHeight * 0.8)") | ||
| 186 | + time.sleep(0.5) | ||
| 187 | + | ||
| 188 | + # 通过 commentID 查找 | ||
| 189 | + if comment_id: | ||
| 190 | + selector = f"#comment-{comment_id}" | ||
| 191 | + if page.has_element(selector): | ||
| 192 | + logger.info("通过 commentID 找到评论 (尝试 %d 次)", attempt + 1) | ||
| 193 | + page.scroll_element_into_view(selector) | ||
| 194 | + return True | ||
| 195 | + | ||
| 196 | + # 通过 userID 查找 | ||
| 197 | + if user_id: | ||
| 198 | + found = page.evaluate( | ||
| 199 | + f""" | ||
| 200 | + (() => {{ | ||
| 201 | + const els = document.querySelectorAll( | ||
| 202 | + '.parent-comment, .comment-item, .comment' | ||
| 203 | + ); | ||
| 204 | + for (const el of els) {{ | ||
| 205 | + if (el.querySelector('[data-user-id="{user_id}"]')) {{ | ||
| 206 | + el.scrollIntoView({{behavior: 'smooth', block: 'center'}}); | ||
| 207 | + return true; | ||
| 208 | + }} | ||
| 209 | + }} | ||
| 210 | + return false; | ||
| 211 | + }})() | ||
| 212 | + """ | ||
| 213 | + ) | ||
| 214 | + if found: | ||
| 215 | + logger.info("通过 userID 找到评论 (尝试 %d 次)", attempt + 1) | ||
| 216 | + return True | ||
| 217 | + | ||
| 218 | + time.sleep(0.8) | ||
| 219 | + | ||
| 220 | + return False | ||
| 221 | + | ||
| 222 | + | ||
| 223 | +def _js_str(s: str) -> str: | ||
| 224 | + """将 Python 字符串转为 JS 字面量(含引号)。""" | ||
| 225 | + import json | ||
| 226 | + | ||
| 227 | + return json.dumps(s) |
scripts/xhs/cookies.py
0 → 100644
| 1 | +"""Cookie 文件持久化,对应 Go cookies/cookies.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import os | ||
| 6 | +from pathlib import Path | ||
| 7 | + | ||
| 8 | + | ||
| 9 | +def get_cookies_file_path(account: str = "") -> str: | ||
| 10 | + """获取 cookies 文件路径。 | ||
| 11 | + | ||
| 12 | + 优先级: | ||
| 13 | + 1. /tmp/cookies.json(向后兼容) | ||
| 14 | + 2. COOKIES_PATH 环境变量 | ||
| 15 | + 3. 多账号模式:~/.xhs/accounts/{account}/cookies.json | ||
| 16 | + 4. ./cookies.json(本地调试) | ||
| 17 | + """ | ||
| 18 | + if account: | ||
| 19 | + account_dir = Path.home() / ".xhs" / "accounts" / account | ||
| 20 | + account_dir.mkdir(parents=True, exist_ok=True) | ||
| 21 | + return str(account_dir / "cookies.json") | ||
| 22 | + | ||
| 23 | + # 旧路径 | ||
| 24 | + import tempfile | ||
| 25 | + | ||
| 26 | + old_path = os.path.join(tempfile.gettempdir(), "cookies.json") | ||
| 27 | + if os.path.exists(old_path): | ||
| 28 | + return old_path | ||
| 29 | + | ||
| 30 | + # 环境变量 | ||
| 31 | + env_path = os.getenv("COOKIES_PATH") | ||
| 32 | + if env_path: | ||
| 33 | + return env_path | ||
| 34 | + | ||
| 35 | + return "cookies.json" | ||
| 36 | + | ||
| 37 | + | ||
| 38 | +def load_cookies(path: str) -> bytes | None: | ||
| 39 | + """从文件加载 cookies。""" | ||
| 40 | + try: | ||
| 41 | + with open(path, "rb") as f: | ||
| 42 | + return f.read() | ||
| 43 | + except FileNotFoundError: | ||
| 44 | + return None | ||
| 45 | + | ||
| 46 | + | ||
| 47 | +def save_cookies(path: str, data: bytes) -> None: | ||
| 48 | + """保存 cookies 到文件。""" | ||
| 49 | + os.makedirs(os.path.dirname(path) or ".", exist_ok=True) | ||
| 50 | + with open(path, "wb") as f: | ||
| 51 | + f.write(data) | ||
| 52 | + | ||
| 53 | + | ||
| 54 | +def delete_cookies(path: str) -> None: | ||
| 55 | + """删除 cookies 文件。""" | ||
| 56 | + import contextlib | ||
| 57 | + | ||
| 58 | + with contextlib.suppress(FileNotFoundError): | ||
| 59 | + os.remove(path) |
scripts/xhs/errors.py
0 → 100644
| 1 | +"""小红书自动化异常体系。""" | ||
| 2 | + | ||
| 3 | + | ||
| 4 | +class XHSError(Exception): | ||
| 5 | + """小红书自动化基础异常。""" | ||
| 6 | + | ||
| 7 | + | ||
| 8 | +class NoFeedsError(XHSError): | ||
| 9 | + """没有捕获到 feeds 数据。""" | ||
| 10 | + | ||
| 11 | + def __init__(self) -> None: | ||
| 12 | + super().__init__("没有捕获到 feeds 数据") | ||
| 13 | + | ||
| 14 | + | ||
| 15 | +class NoFeedDetailError(XHSError): | ||
| 16 | + """没有捕获到 feed 详情数据。""" | ||
| 17 | + | ||
| 18 | + def __init__(self) -> None: | ||
| 19 | + super().__init__("没有捕获到 feed 详情数据") | ||
| 20 | + | ||
| 21 | + | ||
| 22 | +class NotLoggedInError(XHSError): | ||
| 23 | + """未登录。""" | ||
| 24 | + | ||
| 25 | + def __init__(self) -> None: | ||
| 26 | + super().__init__("未登录,请先扫码登录") | ||
| 27 | + | ||
| 28 | + | ||
| 29 | +class PageNotAccessibleError(XHSError): | ||
| 30 | + """页面不可访问。""" | ||
| 31 | + | ||
| 32 | + def __init__(self, reason: str) -> None: | ||
| 33 | + self.reason = reason | ||
| 34 | + super().__init__(f"笔记不可访问: {reason}") | ||
| 35 | + | ||
| 36 | + | ||
| 37 | +class UploadTimeoutError(XHSError): | ||
| 38 | + """上传超时。""" | ||
| 39 | + | ||
| 40 | + | ||
| 41 | +class PublishError(XHSError): | ||
| 42 | + """发布失败。""" | ||
| 43 | + | ||
| 44 | + | ||
| 45 | +class TitleTooLongError(PublishError): | ||
| 46 | + """标题超过长度限制。""" | ||
| 47 | + | ||
| 48 | + def __init__(self, current: str, maximum: str) -> None: | ||
| 49 | + self.current = current | ||
| 50 | + self.maximum = maximum | ||
| 51 | + super().__init__(f"当前输入长度为{current},最大长度为{maximum}") | ||
| 52 | + | ||
| 53 | + | ||
| 54 | +class ContentTooLongError(PublishError): | ||
| 55 | + """正文超过长度限制。""" | ||
| 56 | + | ||
| 57 | + def __init__(self, current: str, maximum: str) -> None: | ||
| 58 | + self.current = current | ||
| 59 | + self.maximum = maximum | ||
| 60 | + super().__init__(f"当前输入长度为{current},最大长度为{maximum}") | ||
| 61 | + | ||
| 62 | + | ||
| 63 | +class CDPError(XHSError): | ||
| 64 | + """CDP 通信异常。""" | ||
| 65 | + | ||
| 66 | + | ||
| 67 | +class ElementNotFoundError(XHSError): | ||
| 68 | + """页面元素未找到。""" | ||
| 69 | + | ||
| 70 | + def __init__(self, selector: str) -> None: | ||
| 71 | + self.selector = selector | ||
| 72 | + super().__init__(f"未找到元素: {selector}") |
scripts/xhs/feed_detail.py
0 → 100644
| 1 | +"""Feed 详情 + 评论加载,对应 Go xiaohongshu/feed_detail.go(867 行)。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import random | ||
| 8 | +import re | ||
| 9 | +import time | ||
| 10 | + | ||
| 11 | +from .cdp import Page | ||
| 12 | +from .errors import NoFeedDetailError, PageNotAccessibleError | ||
| 13 | +from .human import ( | ||
| 14 | + BUTTON_CLICK_INTERVAL, | ||
| 15 | + DEFAULT_MAX_ATTEMPTS, | ||
| 16 | + FINAL_SPRINT_PUSH_COUNT, | ||
| 17 | + HUMAN_DELAY, | ||
| 18 | + LARGE_SCROLL_TRIGGER, | ||
| 19 | + MAX_CLICK_PER_ROUND, | ||
| 20 | + MIN_SCROLL_DELTA, | ||
| 21 | + POST_SCROLL, | ||
| 22 | + REACTION_TIME, | ||
| 23 | + READ_TIME, | ||
| 24 | + SCROLL_WAIT, | ||
| 25 | + SHORT_READ, | ||
| 26 | + STAGNANT_LIMIT, | ||
| 27 | + calculate_scroll_delta, | ||
| 28 | + get_scroll_interval, | ||
| 29 | + get_scroll_ratio, | ||
| 30 | + sleep_random, | ||
| 31 | +) | ||
| 32 | +from .selectors import ( | ||
| 33 | + ACCESS_ERROR_WRAPPER, | ||
| 34 | + END_CONTAINER, | ||
| 35 | + NO_COMMENTS_TEXT, | ||
| 36 | + PARENT_COMMENT, | ||
| 37 | + SHOW_MORE_BUTTON, | ||
| 38 | +) | ||
| 39 | +from .types import ( | ||
| 40 | + CommentList, | ||
| 41 | + CommentLoadConfig, | ||
| 42 | + FeedDetail, | ||
| 43 | + FeedDetailResponse, | ||
| 44 | +) | ||
| 45 | +from .urls import make_feed_detail_url | ||
| 46 | + | ||
| 47 | +logger = logging.getLogger(__name__) | ||
| 48 | + | ||
| 49 | +# 页面不可访问关键词 | ||
| 50 | +_INACCESSIBLE_KEYWORDS = [ | ||
| 51 | + "当前笔记暂时无法浏览", | ||
| 52 | + "该内容因违规已被删除", | ||
| 53 | + "该笔记已被删除", | ||
| 54 | + "内容不存在", | ||
| 55 | + "笔记不存在", | ||
| 56 | + "已失效", | ||
| 57 | + "私密笔记", | ||
| 58 | + "仅作者可见", | ||
| 59 | + "因用户设置,你无法查看", | ||
| 60 | + "因违规无法查看", | ||
| 61 | +] | ||
| 62 | + | ||
| 63 | +_REPLY_COUNT_RE = re.compile(r"展开\s*(\d+)\s*条回复") | ||
| 64 | +_TOTAL_COMMENT_RE = re.compile(r"共(\d+)条评论") | ||
| 65 | + | ||
| 66 | + | ||
| 67 | +def get_feed_detail( | ||
| 68 | + page: Page, | ||
| 69 | + feed_id: str, | ||
| 70 | + xsec_token: str, | ||
| 71 | + load_all_comments: bool = False, | ||
| 72 | + config: CommentLoadConfig | None = None, | ||
| 73 | +) -> FeedDetailResponse: | ||
| 74 | + """获取 Feed 详情(含评论)。 | ||
| 75 | + | ||
| 76 | + Args: | ||
| 77 | + page: CDP 页面对象。 | ||
| 78 | + feed_id: Feed ID。 | ||
| 79 | + xsec_token: xsec_token。 | ||
| 80 | + load_all_comments: 是否加载全部评论。 | ||
| 81 | + config: 评论加载配置。 | ||
| 82 | + | ||
| 83 | + Raises: | ||
| 84 | + PageNotAccessibleError: 页面不可访问。 | ||
| 85 | + NoFeedDetailError: 未获取到详情数据。 | ||
| 86 | + """ | ||
| 87 | + if config is None: | ||
| 88 | + config = CommentLoadConfig() | ||
| 89 | + | ||
| 90 | + url = make_feed_detail_url(feed_id, xsec_token) | ||
| 91 | + logger.info("打开 feed 详情页: %s", url) | ||
| 92 | + logger.info( | ||
| 93 | + "配置: 点击更多=%s, 回复阈值=%d, 最大评论数=%d, 滚动速度=%s", | ||
| 94 | + config.click_more_replies, | ||
| 95 | + config.max_replies_threshold, | ||
| 96 | + config.max_comment_items, | ||
| 97 | + config.scroll_speed, | ||
| 98 | + ) | ||
| 99 | + | ||
| 100 | + # 导航(含重试) | ||
| 101 | + for attempt in range(3): | ||
| 102 | + try: | ||
| 103 | + page.navigate(url) | ||
| 104 | + page.wait_for_load() | ||
| 105 | + page.wait_dom_stable() | ||
| 106 | + break | ||
| 107 | + except Exception as e: | ||
| 108 | + logger.debug("页面导航重试 #%d: %s", attempt, e) | ||
| 109 | + time.sleep(0.5 + random.random()) | ||
| 110 | + else: | ||
| 111 | + raise RuntimeError("页面导航失败") | ||
| 112 | + | ||
| 113 | + sleep_random(1000, 1000) | ||
| 114 | + | ||
| 115 | + # 检查页面可访问性 | ||
| 116 | + _check_page_accessible(page) | ||
| 117 | + | ||
| 118 | + # 加载全部评论 | ||
| 119 | + if load_all_comments: | ||
| 120 | + try: | ||
| 121 | + _load_all_comments(page, config) | ||
| 122 | + except Exception as e: | ||
| 123 | + logger.warning("加载全部评论失败: %s", e) | ||
| 124 | + | ||
| 125 | + return _extract_feed_detail(page, feed_id) | ||
| 126 | + | ||
| 127 | + | ||
| 128 | +# ========== 页面检查 ========== | ||
| 129 | + | ||
| 130 | + | ||
| 131 | +def _check_page_accessible(page: Page) -> None: | ||
| 132 | + """检查页面是否可访问。""" | ||
| 133 | + time.sleep(0.5) | ||
| 134 | + | ||
| 135 | + text = page.get_element_text(ACCESS_ERROR_WRAPPER) | ||
| 136 | + if not text: | ||
| 137 | + return | ||
| 138 | + | ||
| 139 | + text = text.strip() | ||
| 140 | + for kw in _INACCESSIBLE_KEYWORDS: | ||
| 141 | + if kw in text: | ||
| 142 | + raise PageNotAccessibleError(kw) | ||
| 143 | + | ||
| 144 | + if text: | ||
| 145 | + raise PageNotAccessibleError(text) | ||
| 146 | + | ||
| 147 | + | ||
| 148 | +# ========== 数据提取 ========== | ||
| 149 | + | ||
| 150 | + | ||
| 151 | +_EXTRACT_DETAIL_JS = """ | ||
| 152 | +(() => { | ||
| 153 | + if (window.__INITIAL_STATE__ && | ||
| 154 | + window.__INITIAL_STATE__.note && | ||
| 155 | + window.__INITIAL_STATE__.note.noteDetailMap) { | ||
| 156 | + return JSON.stringify(window.__INITIAL_STATE__.note.noteDetailMap); | ||
| 157 | + } | ||
| 158 | + return ""; | ||
| 159 | +})() | ||
| 160 | +""" | ||
| 161 | + | ||
| 162 | + | ||
| 163 | +def _extract_feed_detail(page: Page, feed_id: str) -> FeedDetailResponse: | ||
| 164 | + """从 __INITIAL_STATE__ 提取 Feed 详情。""" | ||
| 165 | + result = None | ||
| 166 | + for _ in range(3): | ||
| 167 | + result = page.evaluate(_EXTRACT_DETAIL_JS) | ||
| 168 | + if result: | ||
| 169 | + break | ||
| 170 | + time.sleep(0.2) | ||
| 171 | + | ||
| 172 | + if not result: | ||
| 173 | + raise NoFeedDetailError() | ||
| 174 | + | ||
| 175 | + note_detail_map = json.loads(result) | ||
| 176 | + note_data = note_detail_map.get(feed_id) | ||
| 177 | + if not note_data: | ||
| 178 | + raise NoFeedDetailError() | ||
| 179 | + | ||
| 180 | + return FeedDetailResponse( | ||
| 181 | + note=FeedDetail.from_dict(note_data.get("note", {})), | ||
| 182 | + comments=CommentList.from_dict(note_data.get("comments", {})), | ||
| 183 | + ) | ||
| 184 | + | ||
| 185 | + | ||
| 186 | +# ========== 评论加载状态机 ========== | ||
| 187 | + | ||
| 188 | + | ||
| 189 | +def _load_all_comments(page: Page, config: CommentLoadConfig) -> None: | ||
| 190 | + """加载全部评论的状态机。""" | ||
| 191 | + max_attempts = ( | ||
| 192 | + config.max_comment_items * 3 if config.max_comment_items > 0 else DEFAULT_MAX_ATTEMPTS | ||
| 193 | + ) | ||
| 194 | + scroll_interval = get_scroll_interval(config.scroll_speed) | ||
| 195 | + | ||
| 196 | + logger.info("开始加载评论...") | ||
| 197 | + _scroll_to_comments_area(page) | ||
| 198 | + sleep_random(*HUMAN_DELAY) | ||
| 199 | + | ||
| 200 | + # 检查是否无评论 | ||
| 201 | + if _check_no_comments(page): | ||
| 202 | + logger.info("检测到无评论区域,跳过加载") | ||
| 203 | + return | ||
| 204 | + | ||
| 205 | + # 状态 | ||
| 206 | + last_count = 0 | ||
| 207 | + last_scroll_top = 0 | ||
| 208 | + stagnant_checks = 0 | ||
| 209 | + total_clicked = 0 | ||
| 210 | + total_skipped = 0 | ||
| 211 | + | ||
| 212 | + for attempt in range(max_attempts): | ||
| 213 | + logger.debug("=== 尝试 %d/%d ===", attempt + 1, max_attempts) | ||
| 214 | + | ||
| 215 | + # 检查是否到达底部 | ||
| 216 | + if _check_end_container(page): | ||
| 217 | + count = _get_comment_count(page) | ||
| 218 | + logger.info( | ||
| 219 | + "检测到 THE END,加载完成: %d 条评论, 点击: %d, 跳过: %d", | ||
| 220 | + count, | ||
| 221 | + total_clicked, | ||
| 222 | + total_skipped, | ||
| 223 | + ) | ||
| 224 | + return | ||
| 225 | + | ||
| 226 | + # 定期点击展开按钮 | ||
| 227 | + if config.click_more_replies and attempt % BUTTON_CLICK_INTERVAL == 0: | ||
| 228 | + clicked, skipped = _click_show_more_buttons(page, config.max_replies_threshold) | ||
| 229 | + total_clicked += clicked | ||
| 230 | + total_skipped += skipped | ||
| 231 | + if clicked > 0 or skipped > 0: | ||
| 232 | + sleep_random(*READ_TIME) | ||
| 233 | + # 第二轮 | ||
| 234 | + c2, s2 = _click_show_more_buttons(page, config.max_replies_threshold) | ||
| 235 | + total_clicked += c2 | ||
| 236 | + total_skipped += s2 | ||
| 237 | + if c2 > 0 or s2 > 0: | ||
| 238 | + sleep_random(*SHORT_READ) | ||
| 239 | + | ||
| 240 | + # 获取当前评论数 | ||
| 241 | + current_count = _get_comment_count(page) | ||
| 242 | + if current_count != last_count: | ||
| 243 | + logger.info("评论增加: %d -> %d", last_count, current_count) | ||
| 244 | + last_count = current_count | ||
| 245 | + stagnant_checks = 0 | ||
| 246 | + else: | ||
| 247 | + stagnant_checks += 1 | ||
| 248 | + | ||
| 249 | + # 检查是否达到目标 | ||
| 250 | + if config.max_comment_items > 0 and current_count >= config.max_comment_items: | ||
| 251 | + logger.info("已达到目标评论数: %d/%d", current_count, config.max_comment_items) | ||
| 252 | + return | ||
| 253 | + | ||
| 254 | + # 滚动 | ||
| 255 | + if current_count > 0: | ||
| 256 | + _scroll_to_last_comment(page) | ||
| 257 | + sleep_random(*POST_SCROLL) | ||
| 258 | + | ||
| 259 | + large_mode = stagnant_checks >= LARGE_SCROLL_TRIGGER | ||
| 260 | + push_count = 1 | ||
| 261 | + if large_mode: | ||
| 262 | + push_count = 3 + random.randint(0, 2) | ||
| 263 | + | ||
| 264 | + scroll_delta, current_scroll_top = _human_scroll( | ||
| 265 | + page, config.scroll_speed, large_mode, push_count | ||
| 266 | + ) | ||
| 267 | + | ||
| 268 | + if scroll_delta < MIN_SCROLL_DELTA or current_scroll_top == last_scroll_top: | ||
| 269 | + stagnant_checks += 1 | ||
| 270 | + else: | ||
| 271 | + stagnant_checks = 0 | ||
| 272 | + last_scroll_top = current_scroll_top | ||
| 273 | + | ||
| 274 | + # 停滞处理 | ||
| 275 | + if stagnant_checks >= STAGNANT_LIMIT: | ||
| 276 | + logger.info("停滞过多,尝试大冲刺...") | ||
| 277 | + _human_scroll(page, config.scroll_speed, True, 10) | ||
| 278 | + stagnant_checks = 0 | ||
| 279 | + | ||
| 280 | + time.sleep(scroll_interval) | ||
| 281 | + | ||
| 282 | + # 最终冲刺 | ||
| 283 | + logger.info("达到最大尝试次数,最后冲刺...") | ||
| 284 | + _human_scroll(page, config.scroll_speed, True, FINAL_SPRINT_PUSH_COUNT) | ||
| 285 | + count = _get_comment_count(page) | ||
| 286 | + logger.info("加载结束: %d 条评论, 点击: %d, 跳过: %d", count, total_clicked, total_skipped) | ||
| 287 | + | ||
| 288 | + | ||
| 289 | +# ========== 滚动 ========== | ||
| 290 | + | ||
| 291 | + | ||
| 292 | +def _human_scroll( | ||
| 293 | + page: Page, | ||
| 294 | + speed: str, | ||
| 295 | + large_mode: bool, | ||
| 296 | + push_count: int, | ||
| 297 | +) -> tuple[int, int]: | ||
| 298 | + """人类化滚动。 | ||
| 299 | + | ||
| 300 | + Returns: | ||
| 301 | + (actual_delta, current_scroll_top) | ||
| 302 | + """ | ||
| 303 | + before_top = page.get_scroll_top() | ||
| 304 | + viewport_height = page.get_viewport_height() | ||
| 305 | + | ||
| 306 | + base_ratio = get_scroll_ratio(speed) | ||
| 307 | + if large_mode: | ||
| 308 | + base_ratio *= 2.0 | ||
| 309 | + | ||
| 310 | + actual_delta = 0 | ||
| 311 | + current_scroll_top = before_top | ||
| 312 | + | ||
| 313 | + for i in range(max(1, push_count)): | ||
| 314 | + scroll_delta = calculate_scroll_delta(viewport_height, base_ratio) | ||
| 315 | + page.scroll_by(0, int(scroll_delta)) | ||
| 316 | + sleep_random(*SCROLL_WAIT) | ||
| 317 | + | ||
| 318 | + current_scroll_top = page.get_scroll_top() | ||
| 319 | + delta_this = current_scroll_top - before_top | ||
| 320 | + actual_delta += delta_this | ||
| 321 | + before_top = current_scroll_top | ||
| 322 | + | ||
| 323 | + if i < push_count - 1: | ||
| 324 | + sleep_random(*HUMAN_DELAY) | ||
| 325 | + | ||
| 326 | + # 如果没有滚动,强制到底部 | ||
| 327 | + if actual_delta < MIN_SCROLL_DELTA and push_count > 0: | ||
| 328 | + page.scroll_to_bottom() | ||
| 329 | + sleep_random(*POST_SCROLL) | ||
| 330 | + current_scroll_top = page.get_scroll_top() | ||
| 331 | + actual_delta = current_scroll_top - (before_top - actual_delta) | ||
| 332 | + | ||
| 333 | + return actual_delta, current_scroll_top | ||
| 334 | + | ||
| 335 | + | ||
| 336 | +def _scroll_to_comments_area(page: Page) -> None: | ||
| 337 | + """滚动到评论区。""" | ||
| 338 | + logger.info("滚动到评论区...") | ||
| 339 | + page.scroll_element_into_view(".comments-container") | ||
| 340 | + time.sleep(0.5) | ||
| 341 | + # 触发懒加载 | ||
| 342 | + page.dispatch_wheel_event(100) | ||
| 343 | + | ||
| 344 | + | ||
| 345 | +def _scroll_to_last_comment(page: Page) -> None: | ||
| 346 | + """滚动到最后一条评论。""" | ||
| 347 | + count = page.get_elements_count(PARENT_COMMENT) | ||
| 348 | + if count > 0: | ||
| 349 | + page.scroll_nth_element_into_view(PARENT_COMMENT, count - 1) | ||
| 350 | + | ||
| 351 | + | ||
| 352 | +# ========== DOM 查询 ========== | ||
| 353 | + | ||
| 354 | + | ||
| 355 | +def _get_comment_count(page: Page) -> int: | ||
| 356 | + """获取当前评论数量。""" | ||
| 357 | + return page.get_elements_count(PARENT_COMMENT) | ||
| 358 | + | ||
| 359 | + | ||
| 360 | +def _get_total_comment_count(page: Page) -> int: | ||
| 361 | + """获取总评论数(从 "共N条评论" 提取)。""" | ||
| 362 | + text = page.get_element_text(".comments-container .total") | ||
| 363 | + if not text: | ||
| 364 | + return 0 | ||
| 365 | + match = _TOTAL_COMMENT_RE.search(text) | ||
| 366 | + if match: | ||
| 367 | + return int(match.group(1)) | ||
| 368 | + return 0 | ||
| 369 | + | ||
| 370 | + | ||
| 371 | +def _check_no_comments(page: Page) -> bool: | ||
| 372 | + """检查是否无评论区域。""" | ||
| 373 | + text = page.get_element_text(NO_COMMENTS_TEXT) | ||
| 374 | + if not text: | ||
| 375 | + return False | ||
| 376 | + return "这是一片荒地" in text.strip() | ||
| 377 | + | ||
| 378 | + | ||
| 379 | +def _check_end_container(page: Page) -> bool: | ||
| 380 | + """检查是否到达底部 THE END。""" | ||
| 381 | + text = page.get_element_text(END_CONTAINER) | ||
| 382 | + if not text: | ||
| 383 | + return False | ||
| 384 | + upper = text.strip().upper() | ||
| 385 | + return "THE END" in upper or "THEEND" in upper | ||
| 386 | + | ||
| 387 | + | ||
| 388 | +# ========== 按钮点击 ========== | ||
| 389 | + | ||
| 390 | + | ||
| 391 | +def _click_show_more_buttons(page: Page, max_threshold: int) -> tuple[int, int]: | ||
| 392 | + """点击"展开N条回复"按钮。 | ||
| 393 | + | ||
| 394 | + Returns: | ||
| 395 | + (clicked, skipped) | ||
| 396 | + """ | ||
| 397 | + count = page.get_elements_count(SHOW_MORE_BUTTON) | ||
| 398 | + if count == 0: | ||
| 399 | + return 0, 0 | ||
| 400 | + | ||
| 401 | + max_click = MAX_CLICK_PER_ROUND + random.randint(0, MAX_CLICK_PER_ROUND - 1) | ||
| 402 | + clicked = 0 | ||
| 403 | + skipped = 0 | ||
| 404 | + | ||
| 405 | + for i in range(count): | ||
| 406 | + if clicked >= max_click: | ||
| 407 | + break | ||
| 408 | + | ||
| 409 | + # 获取按钮文本 | ||
| 410 | + text = page.evaluate( | ||
| 411 | + f"document.querySelectorAll({json.dumps(SHOW_MORE_BUTTON)})[{i}]?.textContent || ''" | ||
| 412 | + ) | ||
| 413 | + if not text: | ||
| 414 | + continue | ||
| 415 | + | ||
| 416 | + # 检查是否应该跳过 | ||
| 417 | + if max_threshold > 0: | ||
| 418 | + match = _REPLY_COUNT_RE.search(text) | ||
| 419 | + if match: | ||
| 420 | + reply_count = int(match.group(1)) | ||
| 421 | + if reply_count > max_threshold: | ||
| 422 | + logger.debug( | ||
| 423 | + "跳过 '%s'(回复数 %d > 阈值 %d)", text, reply_count, max_threshold | ||
| 424 | + ) | ||
| 425 | + skipped += 1 | ||
| 426 | + continue | ||
| 427 | + | ||
| 428 | + # 滚动到按钮并点击 | ||
| 429 | + page.scroll_nth_element_into_view(SHOW_MORE_BUTTON, i) | ||
| 430 | + sleep_random(*REACTION_TIME) | ||
| 431 | + page.evaluate(f"document.querySelectorAll({json.dumps(SHOW_MORE_BUTTON)})[{i}]?.click()") | ||
| 432 | + sleep_random(*READ_TIME) | ||
| 433 | + clicked += 1 | ||
| 434 | + | ||
| 435 | + return clicked, skipped |
scripts/xhs/feeds.py
0 → 100644
| 1 | +"""首页 Feed 列表,对应 Go xiaohongshu/feeds.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import time | ||
| 8 | + | ||
| 9 | +from .cdp import Page | ||
| 10 | +from .errors import NoFeedsError | ||
| 11 | +from .types import Feed | ||
| 12 | +from .urls import HOME_URL | ||
| 13 | + | ||
| 14 | +logger = logging.getLogger(__name__) | ||
| 15 | + | ||
| 16 | +# 从 __INITIAL_STATE__ 提取 feeds 的 JS | ||
| 17 | +_EXTRACT_FEEDS_JS = """ | ||
| 18 | +(() => { | ||
| 19 | + if (window.__INITIAL_STATE__ && | ||
| 20 | + window.__INITIAL_STATE__.feed && | ||
| 21 | + window.__INITIAL_STATE__.feed.feeds) { | ||
| 22 | + const feeds = window.__INITIAL_STATE__.feed.feeds; | ||
| 23 | + const feedsData = feeds.value !== undefined ? feeds.value : feeds._value; | ||
| 24 | + if (feedsData) { | ||
| 25 | + return JSON.stringify(feedsData); | ||
| 26 | + } | ||
| 27 | + } | ||
| 28 | + return ""; | ||
| 29 | +})() | ||
| 30 | +""" | ||
| 31 | + | ||
| 32 | + | ||
| 33 | +def list_feeds(page: Page) -> list[Feed]: | ||
| 34 | + """获取首页 Feed 列表。 | ||
| 35 | + | ||
| 36 | + Raises: | ||
| 37 | + NoFeedsError: 没有捕获到 feeds 数据。 | ||
| 38 | + """ | ||
| 39 | + page.navigate(HOME_URL) | ||
| 40 | + page.wait_for_load() | ||
| 41 | + page.wait_dom_stable() | ||
| 42 | + time.sleep(1) | ||
| 43 | + | ||
| 44 | + result = page.evaluate(_EXTRACT_FEEDS_JS) | ||
| 45 | + if not result: | ||
| 46 | + raise NoFeedsError() | ||
| 47 | + | ||
| 48 | + feeds_data = json.loads(result) | ||
| 49 | + return [Feed.from_dict(f) for f in feeds_data] |
scripts/xhs/human.py
0 → 100644
| 1 | +"""人类行为模拟参数(延迟、滚动、悬停),对应 Go feed_detail.go 中的常量。""" | ||
| 2 | + | ||
| 3 | +import random | ||
| 4 | +import time | ||
| 5 | + | ||
| 6 | +# ========== 配置常量 ========== | ||
| 7 | +DEFAULT_MAX_ATTEMPTS = 500 | ||
| 8 | +STAGNANT_LIMIT = 20 | ||
| 9 | +MIN_SCROLL_DELTA = 10 | ||
| 10 | +MAX_CLICK_PER_ROUND = 3 | ||
| 11 | +STAGNANT_CHECK_THRESHOLD = 2 | ||
| 12 | +LARGE_SCROLL_TRIGGER = 5 | ||
| 13 | +BUTTON_CLICK_INTERVAL = 3 | ||
| 14 | +FINAL_SPRINT_PUSH_COUNT = 15 | ||
| 15 | + | ||
| 16 | +# ========== 延迟范围(毫秒) ========== | ||
| 17 | +HUMAN_DELAY = (300, 700) | ||
| 18 | +REACTION_TIME = (300, 800) | ||
| 19 | +HOVER_TIME = (100, 300) | ||
| 20 | +READ_TIME = (500, 1200) | ||
| 21 | +SHORT_READ = (600, 1200) | ||
| 22 | +SCROLL_WAIT = (100, 200) | ||
| 23 | +POST_SCROLL = (300, 500) | ||
| 24 | + | ||
| 25 | + | ||
| 26 | +def sleep_random(min_ms: int, max_ms: int) -> None: | ||
| 27 | + """随机延迟。""" | ||
| 28 | + if max_ms <= min_ms: | ||
| 29 | + time.sleep(min_ms / 1000.0) | ||
| 30 | + return | ||
| 31 | + delay = random.randint(min_ms, max_ms) / 1000.0 | ||
| 32 | + time.sleep(delay) | ||
| 33 | + | ||
| 34 | + | ||
| 35 | +def get_scroll_interval(speed: str) -> float: | ||
| 36 | + """根据速度获取滚动间隔(秒)。""" | ||
| 37 | + if speed == "slow": | ||
| 38 | + return (1200 + random.randint(0, 300)) / 1000.0 | ||
| 39 | + if speed == "fast": | ||
| 40 | + return (300 + random.randint(0, 100)) / 1000.0 | ||
| 41 | + # normal | ||
| 42 | + return (600 + random.randint(0, 200)) / 1000.0 | ||
| 43 | + | ||
| 44 | + | ||
| 45 | +def get_scroll_ratio(speed: str) -> float: | ||
| 46 | + """根据速度获取滚动比例。""" | ||
| 47 | + if speed == "slow": | ||
| 48 | + return 0.5 | ||
| 49 | + if speed == "fast": | ||
| 50 | + return 0.9 | ||
| 51 | + return 0.7 | ||
| 52 | + | ||
| 53 | + | ||
| 54 | +def calculate_scroll_delta(viewport_height: int, base_ratio: float) -> float: | ||
| 55 | + """计算滚动距离。""" | ||
| 56 | + scroll_delta = viewport_height * (base_ratio + random.random() * 0.2) | ||
| 57 | + if scroll_delta < 400: | ||
| 58 | + scroll_delta = 400.0 | ||
| 59 | + return scroll_delta + random.randint(-50, 50) | ||
| 60 | + | ||
| 61 | + | ||
| 62 | +# 页面不可访问关键词 | ||
| 63 | +INACCESSIBLE_KEYWORDS = [ | ||
| 64 | + "当前笔记暂时无法浏览", | ||
| 65 | + "该内容因违规已被删除", | ||
| 66 | + "该笔记已被删除", | ||
| 67 | + "内容不存在", | ||
| 68 | + "笔记不存在", | ||
| 69 | + "已失效", | ||
| 70 | + "私密笔记", | ||
| 71 | + "仅作者可见", | ||
| 72 | + "因用户设置,你无法查看", | ||
| 73 | + "因违规无法查看", | ||
| 74 | +] |
scripts/xhs/like_favorite.py
0 → 100644
| 1 | +"""点赞/收藏操作,对应 Go xiaohongshu/like_favorite.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import time | ||
| 8 | + | ||
| 9 | +from .cdp import Page | ||
| 10 | +from .errors import NoFeedDetailError | ||
| 11 | +from .selectors import COLLECT_BUTTON, LIKE_BUTTON | ||
| 12 | +from .types import ActionResult | ||
| 13 | +from .urls import make_feed_detail_url | ||
| 14 | + | ||
| 15 | +logger = logging.getLogger(__name__) | ||
| 16 | + | ||
| 17 | +# 从 __INITIAL_STATE__ 读取互动状态的 JS | ||
| 18 | +_GET_INTERACT_STATE_JS = """ | ||
| 19 | +(() => { | ||
| 20 | + if (window.__INITIAL_STATE__ && | ||
| 21 | + window.__INITIAL_STATE__.note && | ||
| 22 | + window.__INITIAL_STATE__.note.noteDetailMap) { | ||
| 23 | + return JSON.stringify(window.__INITIAL_STATE__.note.noteDetailMap); | ||
| 24 | + } | ||
| 25 | + return ""; | ||
| 26 | +})() | ||
| 27 | +""" | ||
| 28 | + | ||
| 29 | + | ||
| 30 | +def _get_interact_state(page: Page, feed_id: str) -> tuple[bool, bool]: | ||
| 31 | + """读取笔记的点赞/收藏状态。 | ||
| 32 | + | ||
| 33 | + Returns: | ||
| 34 | + (liked, collected) | ||
| 35 | + | ||
| 36 | + Raises: | ||
| 37 | + NoFeedDetailError: 无法获取状态。 | ||
| 38 | + """ | ||
| 39 | + result = page.evaluate(_GET_INTERACT_STATE_JS) | ||
| 40 | + if not result: | ||
| 41 | + raise NoFeedDetailError() | ||
| 42 | + | ||
| 43 | + note_detail_map = json.loads(result) | ||
| 44 | + detail = note_detail_map.get(feed_id) | ||
| 45 | + if not detail: | ||
| 46 | + raise NoFeedDetailError() | ||
| 47 | + | ||
| 48 | + interact = detail.get("note", {}).get("interactInfo", {}) | ||
| 49 | + return interact.get("liked", False), interact.get("collected", False) | ||
| 50 | + | ||
| 51 | + | ||
| 52 | +def _prepare_page(page: Page, feed_id: str, xsec_token: str) -> None: | ||
| 53 | + """导航到 feed 详情页。""" | ||
| 54 | + url = make_feed_detail_url(feed_id, xsec_token) | ||
| 55 | + page.navigate(url) | ||
| 56 | + page.wait_for_load() | ||
| 57 | + page.wait_dom_stable() | ||
| 58 | + time.sleep(1) | ||
| 59 | + | ||
| 60 | + | ||
| 61 | +# ========== 点赞 ========== | ||
| 62 | + | ||
| 63 | + | ||
| 64 | +def like_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult: | ||
| 65 | + """点赞笔记(幂等:已点赞则跳过)。""" | ||
| 66 | + _prepare_page(page, feed_id, xsec_token) | ||
| 67 | + return _toggle_like(page, feed_id, target_liked=True) | ||
| 68 | + | ||
| 69 | + | ||
| 70 | +def unlike_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult: | ||
| 71 | + """取消点赞(幂等:未点赞则跳过)。""" | ||
| 72 | + _prepare_page(page, feed_id, xsec_token) | ||
| 73 | + return _toggle_like(page, feed_id, target_liked=False) | ||
| 74 | + | ||
| 75 | + | ||
| 76 | +def _toggle_like(page: Page, feed_id: str, target_liked: bool) -> ActionResult: | ||
| 77 | + """执行点赞/取消点赞操作。""" | ||
| 78 | + action_name = "点赞" if target_liked else "取消点赞" | ||
| 79 | + | ||
| 80 | + try: | ||
| 81 | + liked, _ = _get_interact_state(page, feed_id) | ||
| 82 | + except NoFeedDetailError: | ||
| 83 | + logger.warning("无法读取互动状态,直接点击") | ||
| 84 | + liked = not target_liked # 强制执行点击 | ||
| 85 | + | ||
| 86 | + # 幂等检查 | ||
| 87 | + if liked == target_liked: | ||
| 88 | + logger.info("feed %s 已%s,跳过", feed_id, action_name) | ||
| 89 | + return ActionResult(feed_id=feed_id, success=True, message=f"已{action_name}") | ||
| 90 | + | ||
| 91 | + # 点击 | ||
| 92 | + page.click_element(LIKE_BUTTON) | ||
| 93 | + time.sleep(3) | ||
| 94 | + | ||
| 95 | + # 验证 | ||
| 96 | + try: | ||
| 97 | + liked, _ = _get_interact_state(page, feed_id) | ||
| 98 | + if liked == target_liked: | ||
| 99 | + logger.info("feed %s %s成功", feed_id, action_name) | ||
| 100 | + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}成功") | ||
| 101 | + except NoFeedDetailError: | ||
| 102 | + pass | ||
| 103 | + | ||
| 104 | + # 重试一次 | ||
| 105 | + logger.warning("feed %s %s可能未成功,重试", feed_id, action_name) | ||
| 106 | + page.click_element(LIKE_BUTTON) | ||
| 107 | + time.sleep(2) | ||
| 108 | + | ||
| 109 | + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}已执行") | ||
| 110 | + | ||
| 111 | + | ||
| 112 | +# ========== 收藏 ========== | ||
| 113 | + | ||
| 114 | + | ||
| 115 | +def favorite_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult: | ||
| 116 | + """收藏笔记(幂等:已收藏则跳过)。""" | ||
| 117 | + _prepare_page(page, feed_id, xsec_token) | ||
| 118 | + return _toggle_favorite(page, feed_id, target_collected=True) | ||
| 119 | + | ||
| 120 | + | ||
| 121 | +def unfavorite_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult: | ||
| 122 | + """取消收藏(幂等:未收藏则跳过)。""" | ||
| 123 | + _prepare_page(page, feed_id, xsec_token) | ||
| 124 | + return _toggle_favorite(page, feed_id, target_collected=False) | ||
| 125 | + | ||
| 126 | + | ||
| 127 | +def _toggle_favorite(page: Page, feed_id: str, target_collected: bool) -> ActionResult: | ||
| 128 | + """执行收藏/取消收藏操作。""" | ||
| 129 | + action_name = "收藏" if target_collected else "取消收藏" | ||
| 130 | + | ||
| 131 | + try: | ||
| 132 | + _, collected = _get_interact_state(page, feed_id) | ||
| 133 | + except NoFeedDetailError: | ||
| 134 | + logger.warning("无法读取互动状态,直接点击") | ||
| 135 | + collected = not target_collected | ||
| 136 | + | ||
| 137 | + # 幂等检查 | ||
| 138 | + if collected == target_collected: | ||
| 139 | + logger.info("feed %s 已%s,跳过", feed_id, action_name) | ||
| 140 | + return ActionResult(feed_id=feed_id, success=True, message=f"已{action_name}") | ||
| 141 | + | ||
| 142 | + # 点击 | ||
| 143 | + page.click_element(COLLECT_BUTTON) | ||
| 144 | + time.sleep(3) | ||
| 145 | + | ||
| 146 | + # 验证 | ||
| 147 | + try: | ||
| 148 | + _, collected = _get_interact_state(page, feed_id) | ||
| 149 | + if collected == target_collected: | ||
| 150 | + logger.info("feed %s %s成功", feed_id, action_name) | ||
| 151 | + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}成功") | ||
| 152 | + except NoFeedDetailError: | ||
| 153 | + pass | ||
| 154 | + | ||
| 155 | + # 重试 | ||
| 156 | + logger.warning("feed %s %s可能未成功,重试", feed_id, action_name) | ||
| 157 | + page.click_element(COLLECT_BUTTON) | ||
| 158 | + time.sleep(2) | ||
| 159 | + | ||
| 160 | + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}已执行") |
scripts/xhs/login.py
0 → 100644
| 1 | +"""登录管理,对应 Go xiaohongshu/login.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import base64 | ||
| 6 | +import logging | ||
| 7 | +import os | ||
| 8 | +import tempfile | ||
| 9 | +import time | ||
| 10 | + | ||
| 11 | +from .cdp import Page | ||
| 12 | +from .selectors import LOGIN_STATUS, QRCODE_IMG | ||
| 13 | +from .urls import EXPLORE_URL | ||
| 14 | + | ||
| 15 | +logger = logging.getLogger(__name__) | ||
| 16 | + | ||
| 17 | + | ||
| 18 | +def check_login_status(page: Page) -> bool: | ||
| 19 | + """检查登录状态。 | ||
| 20 | + | ||
| 21 | + Returns: | ||
| 22 | + True 已登录,False 未登录。 | ||
| 23 | + """ | ||
| 24 | + page.navigate(EXPLORE_URL) | ||
| 25 | + page.wait_for_load() | ||
| 26 | + time.sleep(1) | ||
| 27 | + | ||
| 28 | + return page.has_element(LOGIN_STATUS) | ||
| 29 | + | ||
| 30 | + | ||
| 31 | +def fetch_qrcode(page: Page) -> tuple[str, bool]: | ||
| 32 | + """获取登录二维码。 | ||
| 33 | + | ||
| 34 | + Returns: | ||
| 35 | + (qrcode_src, already_logged_in) | ||
| 36 | + - 如果已登录,返回 ("", True) | ||
| 37 | + - 如果未登录,返回 (qrcode_base64_or_url, False) | ||
| 38 | + """ | ||
| 39 | + page.navigate(EXPLORE_URL) | ||
| 40 | + page.wait_for_load() | ||
| 41 | + time.sleep(2) | ||
| 42 | + | ||
| 43 | + # 检查是否已登录 | ||
| 44 | + if page.has_element(LOGIN_STATUS): | ||
| 45 | + return "", True | ||
| 46 | + | ||
| 47 | + # 获取二维码图片 src | ||
| 48 | + src = page.get_element_attribute(QRCODE_IMG, "src") | ||
| 49 | + if not src: | ||
| 50 | + raise RuntimeError("二维码图片 src 为空") | ||
| 51 | + | ||
| 52 | + return src, False | ||
| 53 | + | ||
| 54 | + | ||
| 55 | +def save_qrcode_to_file(src: str) -> str: | ||
| 56 | + """将二维码 data URL 保存为临时 PNG 文件。 | ||
| 57 | + | ||
| 58 | + Args: | ||
| 59 | + src: 二维码图片的 data URL(data:image/png;base64,...)或普通 URL。 | ||
| 60 | + | ||
| 61 | + Returns: | ||
| 62 | + 保存的文件绝对路径。 | ||
| 63 | + """ | ||
| 64 | + prefix = "data:image/png;base64," | ||
| 65 | + if src.startswith(prefix): | ||
| 66 | + img_data = base64.b64decode(src[len(prefix) :]) | ||
| 67 | + elif src.startswith("data:image/"): | ||
| 68 | + # 处理其他 MIME 类型,如 data:image/jpeg;base64,... | ||
| 69 | + _, encoded = src.split(",", 1) | ||
| 70 | + img_data = base64.b64decode(encoded) | ||
| 71 | + else: | ||
| 72 | + # 不是 data URL,无法保存 | ||
| 73 | + raise ValueError(f"不支持的二维码格式,需要 data URL: {src[:50]}...") | ||
| 74 | + | ||
| 75 | + qr_dir = os.path.join(tempfile.gettempdir(), "xhs") | ||
| 76 | + os.makedirs(qr_dir, exist_ok=True) | ||
| 77 | + filepath = os.path.join(qr_dir, "login_qrcode.png") | ||
| 78 | + | ||
| 79 | + with open(filepath, "wb") as f: | ||
| 80 | + f.write(img_data) | ||
| 81 | + | ||
| 82 | + logger.info("二维码已保存: %s", filepath) | ||
| 83 | + return filepath | ||
| 84 | + | ||
| 85 | + | ||
| 86 | +def wait_for_login(page: Page, timeout: float = 120.0) -> bool: | ||
| 87 | + """等待扫码登录完成。 | ||
| 88 | + | ||
| 89 | + Args: | ||
| 90 | + page: CDP 页面对象。 | ||
| 91 | + timeout: 超时时间(秒)。 | ||
| 92 | + | ||
| 93 | + Returns: | ||
| 94 | + True 登录成功,False 超时。 | ||
| 95 | + """ | ||
| 96 | + deadline = time.monotonic() + timeout | ||
| 97 | + while time.monotonic() < deadline: | ||
| 98 | + if page.has_element(LOGIN_STATUS): | ||
| 99 | + logger.info("登录成功") | ||
| 100 | + return True | ||
| 101 | + time.sleep(0.5) | ||
| 102 | + return False |
scripts/xhs/publish.py
0 → 100644
| 1 | +"""图文发布,对应 Go xiaohongshu/publish.go(837 行)。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import random | ||
| 8 | +import time | ||
| 9 | + | ||
| 10 | +from .cdp import Page | ||
| 11 | +from .errors import ContentTooLongError, PublishError, TitleTooLongError, UploadTimeoutError | ||
| 12 | +from .selectors import ( | ||
| 13 | + CONTENT_EDITOR, | ||
| 14 | + CONTENT_LENGTH_ERROR, | ||
| 15 | + CREATOR_TAB, | ||
| 16 | + DATETIME_INPUT, | ||
| 17 | + FILE_INPUT, | ||
| 18 | + IMAGE_PREVIEW, | ||
| 19 | + ORIGINAL_SWITCH, | ||
| 20 | + ORIGINAL_SWITCH_CARD, | ||
| 21 | + POPOVER, | ||
| 22 | + PUBLISH_BUTTON, | ||
| 23 | + SCHEDULE_SWITCH, | ||
| 24 | + TAG_FIRST_ITEM, | ||
| 25 | + TAG_TOPIC_CONTAINER, | ||
| 26 | + TITLE_INPUT, | ||
| 27 | + TITLE_MAX_SUFFIX, | ||
| 28 | + UPLOAD_CONTENT, | ||
| 29 | + UPLOAD_INPUT, | ||
| 30 | + VISIBILITY_DROPDOWN, | ||
| 31 | + VISIBILITY_OPTIONS, | ||
| 32 | +) | ||
| 33 | +from .types import PublishImageContent | ||
| 34 | +from .urls import PUBLISH_URL | ||
| 35 | + | ||
| 36 | +logger = logging.getLogger(__name__) | ||
| 37 | + | ||
| 38 | + | ||
| 39 | +def publish_image_content(page: Page, content: PublishImageContent) -> None: | ||
| 40 | + """发布图文内容。 | ||
| 41 | + | ||
| 42 | + Args: | ||
| 43 | + page: CDP 页面对象。 | ||
| 44 | + content: 发布内容。 | ||
| 45 | + | ||
| 46 | + Raises: | ||
| 47 | + PublishError: 发布失败。 | ||
| 48 | + UploadTimeoutError: 上传超时。 | ||
| 49 | + TitleTooLongError: 标题超长。 | ||
| 50 | + ContentTooLongError: 正文超长。 | ||
| 51 | + """ | ||
| 52 | + if not content.image_paths: | ||
| 53 | + raise PublishError("图片不能为空") | ||
| 54 | + | ||
| 55 | + # 导航到发布页 | ||
| 56 | + _navigate_to_publish_page(page) | ||
| 57 | + | ||
| 58 | + # 点击"上传图文" TAB | ||
| 59 | + _click_publish_tab(page, "上传图文") | ||
| 60 | + time.sleep(1) | ||
| 61 | + | ||
| 62 | + # 上传图片 | ||
| 63 | + _upload_images(page, content.image_paths) | ||
| 64 | + | ||
| 65 | + # 标签截取 | ||
| 66 | + tags = content.tags[:10] if len(content.tags) > 10 else content.tags | ||
| 67 | + if len(content.tags) > 10: | ||
| 68 | + logger.warning("标签数量超过10,截取前10个") | ||
| 69 | + | ||
| 70 | + logger.info( | ||
| 71 | + "发布内容: title=%s, images=%d, tags=%d, schedule=%s, original=%s, visibility=%s", | ||
| 72 | + content.title, | ||
| 73 | + len(content.image_paths), | ||
| 74 | + len(tags), | ||
| 75 | + content.schedule_time, | ||
| 76 | + content.is_original, | ||
| 77 | + content.visibility, | ||
| 78 | + ) | ||
| 79 | + | ||
| 80 | + # 提交发布 | ||
| 81 | + _submit_publish( | ||
| 82 | + page, | ||
| 83 | + content.title, | ||
| 84 | + content.content, | ||
| 85 | + tags, | ||
| 86 | + content.schedule_time, | ||
| 87 | + content.is_original, | ||
| 88 | + content.visibility, | ||
| 89 | + ) | ||
| 90 | + | ||
| 91 | + | ||
| 92 | +# ========== 页面导航 ========== | ||
| 93 | + | ||
| 94 | + | ||
| 95 | +def _navigate_to_publish_page(page: Page) -> None: | ||
| 96 | + """导航到发布页面。""" | ||
| 97 | + page.navigate(PUBLISH_URL) | ||
| 98 | + page.wait_for_load(timeout=300) | ||
| 99 | + time.sleep(2) | ||
| 100 | + page.wait_dom_stable() | ||
| 101 | + time.sleep(1) | ||
| 102 | + | ||
| 103 | + | ||
| 104 | +def _click_publish_tab(page: Page, tab_name: str) -> None: | ||
| 105 | + """点击发布页 TAB(上传图文/上传视频)。""" | ||
| 106 | + page.wait_for_element(UPLOAD_CONTENT, timeout=15) | ||
| 107 | + | ||
| 108 | + deadline = time.monotonic() + 15 | ||
| 109 | + while time.monotonic() < deadline: | ||
| 110 | + # 查找匹配的 TAB | ||
| 111 | + found = page.evaluate( | ||
| 112 | + f""" | ||
| 113 | + (() => {{ | ||
| 114 | + const tabs = document.querySelectorAll({json.dumps(CREATOR_TAB)}); | ||
| 115 | + for (const tab of tabs) {{ | ||
| 116 | + if (tab.textContent.trim() === {json.dumps(tab_name)}) {{ | ||
| 117 | + // 检查是否被遮挡 | ||
| 118 | + const rect = tab.getBoundingClientRect(); | ||
| 119 | + if (rect.width === 0 || rect.height === 0) continue; | ||
| 120 | + const x = rect.left + rect.width / 2; | ||
| 121 | + const y = rect.top + rect.height / 2; | ||
| 122 | + const target = document.elementFromPoint(x, y); | ||
| 123 | + if (target === tab || tab.contains(target)) {{ | ||
| 124 | + tab.click(); | ||
| 125 | + return 'clicked'; | ||
| 126 | + }} | ||
| 127 | + return 'blocked'; | ||
| 128 | + }} | ||
| 129 | + }} | ||
| 130 | + return 'not_found'; | ||
| 131 | + }})() | ||
| 132 | + """ | ||
| 133 | + ) | ||
| 134 | + | ||
| 135 | + if found == "clicked": | ||
| 136 | + return | ||
| 137 | + | ||
| 138 | + if found == "blocked": | ||
| 139 | + # 尝试移除弹窗 | ||
| 140 | + _remove_pop_cover(page) | ||
| 141 | + | ||
| 142 | + time.sleep(0.2) | ||
| 143 | + | ||
| 144 | + raise PublishError(f"没有找到发布 TAB - {tab_name}") | ||
| 145 | + | ||
| 146 | + | ||
| 147 | +def _remove_pop_cover(page: Page) -> None: | ||
| 148 | + """移除弹窗遮挡。""" | ||
| 149 | + if page.has_element(POPOVER): | ||
| 150 | + page.remove_element(POPOVER) | ||
| 151 | + # 点击空位置 | ||
| 152 | + x = 380 + random.randint(0, 100) | ||
| 153 | + y = 20 + random.randint(0, 60) | ||
| 154 | + page.mouse_click(float(x), float(y)) | ||
| 155 | + | ||
| 156 | + | ||
| 157 | +# ========== 图片上传 ========== | ||
| 158 | + | ||
| 159 | + | ||
| 160 | +def _upload_images(page: Page, image_paths: list[str]) -> None: | ||
| 161 | + """逐张上传图片。""" | ||
| 162 | + import os | ||
| 163 | + | ||
| 164 | + valid_paths = [p for p in image_paths if os.path.exists(p)] | ||
| 165 | + if not valid_paths: | ||
| 166 | + raise PublishError("没有有效的图片文件") | ||
| 167 | + | ||
| 168 | + for i, path in enumerate(valid_paths): | ||
| 169 | + selector = UPLOAD_INPUT if i == 0 else FILE_INPUT | ||
| 170 | + logger.info("上传第 %d 张图片: %s", i + 1, path) | ||
| 171 | + | ||
| 172 | + page.set_file_input(selector, [path]) | ||
| 173 | + _wait_for_upload_complete(page, i + 1) | ||
| 174 | + time.sleep(1) | ||
| 175 | + | ||
| 176 | + | ||
| 177 | +def _wait_for_upload_complete(page: Page, expected_count: int) -> None: | ||
| 178 | + """等待图片上传完成。""" | ||
| 179 | + max_wait = 60.0 | ||
| 180 | + start = time.monotonic() | ||
| 181 | + | ||
| 182 | + while time.monotonic() - start < max_wait: | ||
| 183 | + count = page.get_elements_count(IMAGE_PREVIEW) | ||
| 184 | + if count >= expected_count: | ||
| 185 | + logger.info("图片上传完成: %d", count) | ||
| 186 | + return | ||
| 187 | + time.sleep(0.5) | ||
| 188 | + | ||
| 189 | + raise UploadTimeoutError(f"第{expected_count}张图片上传超时(60s)") | ||
| 190 | + | ||
| 191 | + | ||
| 192 | +# ========== 表单提交 ========== | ||
| 193 | + | ||
| 194 | + | ||
| 195 | +def _submit_publish( | ||
| 196 | + page: Page, | ||
| 197 | + title: str, | ||
| 198 | + content: str, | ||
| 199 | + tags: list[str], | ||
| 200 | + schedule_time: str | None, | ||
| 201 | + is_original: bool, | ||
| 202 | + visibility: str, | ||
| 203 | +) -> None: | ||
| 204 | + """填写表单并提交。""" | ||
| 205 | + # 标题 | ||
| 206 | + page.input_text(TITLE_INPUT, title) | ||
| 207 | + time.sleep(0.5) | ||
| 208 | + _check_title_max_length(page) | ||
| 209 | + logger.info("标题长度检查通过") | ||
| 210 | + time.sleep(1) | ||
| 211 | + | ||
| 212 | + # 正文 | ||
| 213 | + content_selector = _find_content_element(page) | ||
| 214 | + page.input_content_editable(content_selector, content) | ||
| 215 | + | ||
| 216 | + # 回点标题(增强稳定性) | ||
| 217 | + time.sleep(1) | ||
| 218 | + page.click_element(TITLE_INPUT) | ||
| 219 | + logger.info("已回点标题输入框") | ||
| 220 | + | ||
| 221 | + # 标签 | ||
| 222 | + if tags: | ||
| 223 | + _input_tags(page, content_selector, tags) | ||
| 224 | + time.sleep(1) | ||
| 225 | + _check_content_max_length(page) | ||
| 226 | + logger.info("正文长度检查通过") | ||
| 227 | + | ||
| 228 | + # 定时发布 | ||
| 229 | + if schedule_time: | ||
| 230 | + _set_schedule_publish(page, schedule_time) | ||
| 231 | + | ||
| 232 | + # 可见范围 | ||
| 233 | + _set_visibility(page, visibility) | ||
| 234 | + | ||
| 235 | + # 原创声明 | ||
| 236 | + if is_original: | ||
| 237 | + try: | ||
| 238 | + _set_original(page) | ||
| 239 | + logger.info("已声明原创") | ||
| 240 | + except Exception as e: | ||
| 241 | + logger.warning("设置原创声明失败: %s", e) | ||
| 242 | + | ||
| 243 | + # 点击发布 | ||
| 244 | + page.click_element(PUBLISH_BUTTON) | ||
| 245 | + time.sleep(3) | ||
| 246 | + logger.info("发布完成") | ||
| 247 | + | ||
| 248 | + | ||
| 249 | +def _find_content_element(page: Page) -> str: | ||
| 250 | + """查找内容输入框(兼容两种 UI)。""" | ||
| 251 | + if page.has_element(CONTENT_EDITOR): | ||
| 252 | + return CONTENT_EDITOR | ||
| 253 | + | ||
| 254 | + # 查找带 placeholder 的 p 元素的 textbox 父元素 | ||
| 255 | + found = page.evaluate( | ||
| 256 | + """ | ||
| 257 | + (() => { | ||
| 258 | + const ps = document.querySelectorAll('p'); | ||
| 259 | + for (const p of ps) { | ||
| 260 | + const placeholder = p.getAttribute('data-placeholder'); | ||
| 261 | + if (placeholder && placeholder.includes('输入正文描述')) { | ||
| 262 | + let current = p; | ||
| 263 | + for (let i = 0; i < 5; i++) { | ||
| 264 | + current = current.parentElement; | ||
| 265 | + if (!current) break; | ||
| 266 | + if (current.getAttribute('role') === 'textbox') { | ||
| 267 | + return 'found'; | ||
| 268 | + } | ||
| 269 | + } | ||
| 270 | + } | ||
| 271 | + } | ||
| 272 | + return ''; | ||
| 273 | + })() | ||
| 274 | + """ | ||
| 275 | + ) | ||
| 276 | + if found == "found": | ||
| 277 | + return "[role='textbox']" | ||
| 278 | + | ||
| 279 | + raise PublishError("没有找到内容输入框") | ||
| 280 | + | ||
| 281 | + | ||
| 282 | +def _check_title_max_length(page: Page) -> None: | ||
| 283 | + """检查标题长度是否超限。""" | ||
| 284 | + text = page.get_element_text(TITLE_MAX_SUFFIX) | ||
| 285 | + if text: | ||
| 286 | + parts = text.split("/") | ||
| 287 | + if len(parts) == 2: | ||
| 288 | + raise TitleTooLongError(parts[0], parts[1]) | ||
| 289 | + raise TitleTooLongError(text, "?") | ||
| 290 | + | ||
| 291 | + | ||
| 292 | +def _check_content_max_length(page: Page) -> None: | ||
| 293 | + """检查正文长度是否超限。""" | ||
| 294 | + text = page.get_element_text(CONTENT_LENGTH_ERROR) | ||
| 295 | + if text: | ||
| 296 | + parts = text.split("/") | ||
| 297 | + if len(parts) == 2: | ||
| 298 | + raise ContentTooLongError(parts[0], parts[1]) | ||
| 299 | + raise ContentTooLongError(text, "?") | ||
| 300 | + | ||
| 301 | + | ||
| 302 | +# ========== 标签输入 ========== | ||
| 303 | + | ||
| 304 | + | ||
| 305 | +def _input_tags(page: Page, content_selector: str, tags: list[str]) -> None: | ||
| 306 | + """输入标签。""" | ||
| 307 | + time.sleep(1) | ||
| 308 | + | ||
| 309 | + # 移动光标到正文末尾(20次 ArrowDown) | ||
| 310 | + for _ in range(20): | ||
| 311 | + page.press_key("ArrowDown") | ||
| 312 | + time.sleep(0.01) | ||
| 313 | + | ||
| 314 | + # 按两次回车换行 | ||
| 315 | + page.press_key("Enter") | ||
| 316 | + page.press_key("Enter") | ||
| 317 | + time.sleep(1) | ||
| 318 | + | ||
| 319 | + for tag in tags: | ||
| 320 | + tag = tag.lstrip("#") | ||
| 321 | + _input_single_tag(page, content_selector, tag) | ||
| 322 | + | ||
| 323 | + | ||
| 324 | +def _input_single_tag(page: Page, content_selector: str, tag: str) -> None: | ||
| 325 | + """输入单个标签。""" | ||
| 326 | + # 输入 # | ||
| 327 | + page.type_text("#", delay_ms=0) | ||
| 328 | + time.sleep(0.2) | ||
| 329 | + | ||
| 330 | + # 逐字输入标签 | ||
| 331 | + for char in tag: | ||
| 332 | + page.type_text(char, delay_ms=50) | ||
| 333 | + | ||
| 334 | + time.sleep(1) | ||
| 335 | + | ||
| 336 | + # 尝试点击标签联想 | ||
| 337 | + if page.has_element(TAG_TOPIC_CONTAINER): | ||
| 338 | + item_selector = f"{TAG_TOPIC_CONTAINER} {TAG_FIRST_ITEM}" | ||
| 339 | + if page.has_element(item_selector): | ||
| 340 | + page.click_element(item_selector) | ||
| 341 | + logger.info("点击标签联想: %s", tag) | ||
| 342 | + time.sleep(0.5) | ||
| 343 | + return | ||
| 344 | + | ||
| 345 | + # 没有联想,直接空格 | ||
| 346 | + logger.warning("未找到标签联想,直接输入空格: %s", tag) | ||
| 347 | + page.type_text(" ", delay_ms=0) | ||
| 348 | + time.sleep(0.5) | ||
| 349 | + | ||
| 350 | + | ||
| 351 | +# ========== 定时发布 ========== | ||
| 352 | + | ||
| 353 | + | ||
| 354 | +def _set_schedule_publish(page: Page, schedule_time: str) -> None: | ||
| 355 | + """设置定时发布。""" | ||
| 356 | + from datetime import datetime | ||
| 357 | + | ||
| 358 | + # 解析 ISO8601 时间 | ||
| 359 | + try: | ||
| 360 | + dt = datetime.fromisoformat(schedule_time) | ||
| 361 | + except ValueError as e: | ||
| 362 | + raise PublishError(f"定时发布时间格式错误: {e}") from e | ||
| 363 | + | ||
| 364 | + # 点击定时发布开关 | ||
| 365 | + page.click_element(SCHEDULE_SWITCH) | ||
| 366 | + time.sleep(0.8) | ||
| 367 | + | ||
| 368 | + # 设置日期时间 | ||
| 369 | + datetime_str = dt.strftime("%Y-%m-%d %H:%M") | ||
| 370 | + page.select_all_text(DATETIME_INPUT) | ||
| 371 | + page.input_text(DATETIME_INPUT, datetime_str) | ||
| 372 | + time.sleep(0.5) | ||
| 373 | + | ||
| 374 | + logger.info("已设置定时发布: %s", datetime_str) | ||
| 375 | + | ||
| 376 | + | ||
| 377 | +# ========== 可见范围 ========== | ||
| 378 | + | ||
| 379 | + | ||
| 380 | +def _set_visibility(page: Page, visibility: str) -> None: | ||
| 381 | + """设置可见范围。""" | ||
| 382 | + if not visibility or visibility == "公开可见": | ||
| 383 | + logger.info("可见范围: 公开可见(默认)") | ||
| 384 | + return | ||
| 385 | + | ||
| 386 | + supported = {"仅自己可见", "仅互关好友可见"} | ||
| 387 | + if visibility not in supported: | ||
| 388 | + raise PublishError( | ||
| 389 | + f"不支持的可见范围: {visibility},支持: 公开可见、仅自己可见、仅互关好友可见" | ||
| 390 | + ) | ||
| 391 | + | ||
| 392 | + # 点击下拉框 | ||
| 393 | + page.click_element(VISIBILITY_DROPDOWN) | ||
| 394 | + time.sleep(0.5) | ||
| 395 | + | ||
| 396 | + # 查找并点击目标选项 | ||
| 397 | + clicked = page.evaluate( | ||
| 398 | + f""" | ||
| 399 | + (() => {{ | ||
| 400 | + const opts = document.querySelectorAll({json.dumps(VISIBILITY_OPTIONS)}); | ||
| 401 | + for (const opt of opts) {{ | ||
| 402 | + if (opt.textContent.includes({json.dumps(visibility)})) {{ | ||
| 403 | + opt.click(); | ||
| 404 | + return true; | ||
| 405 | + }} | ||
| 406 | + }} | ||
| 407 | + return false; | ||
| 408 | + }})() | ||
| 409 | + """ | ||
| 410 | + ) | ||
| 411 | + | ||
| 412 | + if not clicked: | ||
| 413 | + raise PublishError(f"未找到可见范围选项: {visibility}") | ||
| 414 | + | ||
| 415 | + logger.info("已设置可见范围: %s", visibility) | ||
| 416 | + time.sleep(0.2) | ||
| 417 | + | ||
| 418 | + | ||
| 419 | +# ========== 原创声明 ========== | ||
| 420 | + | ||
| 421 | + | ||
| 422 | +def _set_original(page: Page) -> None: | ||
| 423 | + """设置原创声明。""" | ||
| 424 | + # 查找原创声明卡片并点击开关 | ||
| 425 | + result = page.evaluate( | ||
| 426 | + f""" | ||
| 427 | + (() => {{ | ||
| 428 | + const cards = document.querySelectorAll({json.dumps(ORIGINAL_SWITCH_CARD)}); | ||
| 429 | + for (const card of cards) {{ | ||
| 430 | + if (!card.textContent.includes('原创声明')) continue; | ||
| 431 | + const sw = card.querySelector({json.dumps(ORIGINAL_SWITCH)}); | ||
| 432 | + if (!sw) continue; | ||
| 433 | + const input = sw.querySelector('input[type="checkbox"]'); | ||
| 434 | + if (input && input.checked) return 'already_on'; | ||
| 435 | + sw.click(); | ||
| 436 | + return 'clicked'; | ||
| 437 | + }} | ||
| 438 | + return 'not_found'; | ||
| 439 | + }})() | ||
| 440 | + """ | ||
| 441 | + ) | ||
| 442 | + | ||
| 443 | + if result == "already_on": | ||
| 444 | + logger.info("原创声明已开启") | ||
| 445 | + return | ||
| 446 | + | ||
| 447 | + if result == "not_found": | ||
| 448 | + raise PublishError("未找到原创声明选项") | ||
| 449 | + | ||
| 450 | + time.sleep(0.5) | ||
| 451 | + | ||
| 452 | + # 处理确认弹窗 | ||
| 453 | + _confirm_original_declaration(page) | ||
| 454 | + | ||
| 455 | + | ||
| 456 | +def _confirm_original_declaration(page: Page) -> None: | ||
| 457 | + """处理原创声明确认弹窗。""" | ||
| 458 | + time.sleep(0.8) | ||
| 459 | + | ||
| 460 | + # 勾选 checkbox | ||
| 461 | + page.evaluate( | ||
| 462 | + """ | ||
| 463 | + (() => { | ||
| 464 | + const footers = document.querySelectorAll('div.footer'); | ||
| 465 | + for (const footer of footers) { | ||
| 466 | + if (!footer.textContent.includes('原创声明须知')) continue; | ||
| 467 | + const cb = footer.querySelector('div.d-checkbox input[type="checkbox"]'); | ||
| 468 | + if (cb && !cb.checked) cb.click(); | ||
| 469 | + return; | ||
| 470 | + } | ||
| 471 | + })() | ||
| 472 | + """ | ||
| 473 | + ) | ||
| 474 | + time.sleep(0.5) | ||
| 475 | + | ||
| 476 | + # 点击声明原创按钮 | ||
| 477 | + result = page.evaluate( | ||
| 478 | + """ | ||
| 479 | + (() => { | ||
| 480 | + const footers = document.querySelectorAll('div.footer'); | ||
| 481 | + for (const footer of footers) { | ||
| 482 | + if (!footer.textContent.includes('声明原创')) continue; | ||
| 483 | + const btn = footer.querySelector('button.custom-button'); | ||
| 484 | + if (btn) { | ||
| 485 | + if (btn.classList.contains('disabled') || btn.disabled) { | ||
| 486 | + const cb = footer.querySelector('div.d-checkbox input[type="checkbox"]'); | ||
| 487 | + if (cb && !cb.checked) cb.click(); | ||
| 488 | + return 'button_disabled'; | ||
| 489 | + } | ||
| 490 | + btn.click(); | ||
| 491 | + return 'clicked'; | ||
| 492 | + } | ||
| 493 | + } | ||
| 494 | + return 'button_not_found'; | ||
| 495 | + })() | ||
| 496 | + """ | ||
| 497 | + ) | ||
| 498 | + | ||
| 499 | + if result == "button_not_found": | ||
| 500 | + raise PublishError("未找到声明原创按钮") | ||
| 501 | + if result == "button_disabled": | ||
| 502 | + raise PublishError("声明原创按钮仍处于禁用状态") | ||
| 503 | + | ||
| 504 | + logger.info("已成功点击声明原创按钮") | ||
| 505 | + time.sleep(0.3) |
scripts/xhs/publish_video.py
0 → 100644
| 1 | +"""视频发布,对应 Go xiaohongshu/publish_video.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import logging | ||
| 6 | +import os | ||
| 7 | +import time | ||
| 8 | + | ||
| 9 | +from .cdp import Page | ||
| 10 | +from .errors import PublishError, UploadTimeoutError | ||
| 11 | +from .publish import ( | ||
| 12 | + _click_publish_tab, | ||
| 13 | + _find_content_element, | ||
| 14 | + _input_tags, | ||
| 15 | + _navigate_to_publish_page, | ||
| 16 | + _set_schedule_publish, | ||
| 17 | + _set_visibility, | ||
| 18 | +) | ||
| 19 | +from .selectors import ( | ||
| 20 | + FILE_INPUT, | ||
| 21 | + PUBLISH_BUTTON, | ||
| 22 | + TITLE_INPUT, | ||
| 23 | + UPLOAD_INPUT, | ||
| 24 | +) | ||
| 25 | +from .types import PublishVideoContent | ||
| 26 | + | ||
| 27 | +logger = logging.getLogger(__name__) | ||
| 28 | + | ||
| 29 | + | ||
| 30 | +def publish_video_content(page: Page, content: PublishVideoContent) -> None: | ||
| 31 | + """发布视频内容。 | ||
| 32 | + | ||
| 33 | + Args: | ||
| 34 | + page: CDP 页面对象。 | ||
| 35 | + content: 视频发布内容。 | ||
| 36 | + | ||
| 37 | + Raises: | ||
| 38 | + PublishError: 发布失败。 | ||
| 39 | + UploadTimeoutError: 上传/处理超时。 | ||
| 40 | + """ | ||
| 41 | + if not content.video_path: | ||
| 42 | + raise PublishError("视频不能为空") | ||
| 43 | + | ||
| 44 | + # 导航到发布页 | ||
| 45 | + _navigate_to_publish_page(page) | ||
| 46 | + | ||
| 47 | + # 点击"上传视频" TAB | ||
| 48 | + _click_publish_tab(page, "上传视频") | ||
| 49 | + time.sleep(1) | ||
| 50 | + | ||
| 51 | + # 上传视频 | ||
| 52 | + _upload_video(page, content.video_path) | ||
| 53 | + | ||
| 54 | + # 提交 | ||
| 55 | + _submit_publish_video( | ||
| 56 | + page, | ||
| 57 | + content.title, | ||
| 58 | + content.content, | ||
| 59 | + content.tags, | ||
| 60 | + content.schedule_time, | ||
| 61 | + content.visibility, | ||
| 62 | + ) | ||
| 63 | + | ||
| 64 | + | ||
| 65 | +def _upload_video(page: Page, video_path: str) -> None: | ||
| 66 | + """上传视频文件。""" | ||
| 67 | + if not os.path.exists(video_path): | ||
| 68 | + raise PublishError(f"视频文件不存在: {video_path}") | ||
| 69 | + | ||
| 70 | + # 查找上传输入框 | ||
| 71 | + selector = UPLOAD_INPUT if page.has_element(UPLOAD_INPUT) else FILE_INPUT | ||
| 72 | + page.set_file_input(selector, [video_path]) | ||
| 73 | + | ||
| 74 | + # 等待发布按钮可点击(视频处理完成) | ||
| 75 | + _wait_for_publish_button_clickable(page) | ||
| 76 | + logger.info("视频上传/处理完成") | ||
| 77 | + | ||
| 78 | + | ||
| 79 | +def _wait_for_publish_button_clickable(page: Page) -> None: | ||
| 80 | + """等待发布按钮可点击(视频处理可能需要较长时间)。""" | ||
| 81 | + max_wait = 600.0 # 10 分钟 | ||
| 82 | + start = time.monotonic() | ||
| 83 | + | ||
| 84 | + logger.info("开始等待发布按钮可点击(视频)") | ||
| 85 | + | ||
| 86 | + while time.monotonic() - start < max_wait: | ||
| 87 | + clickable = page.evaluate( | ||
| 88 | + f""" | ||
| 89 | + (() => {{ | ||
| 90 | + const btn = document.querySelector({_js_str(PUBLISH_BUTTON)}); | ||
| 91 | + if (!btn) return false; | ||
| 92 | + const rect = btn.getBoundingClientRect(); | ||
| 93 | + if (rect.width === 0 || rect.height === 0) return false; | ||
| 94 | + if (btn.disabled) return false; | ||
| 95 | + if (btn.classList.contains('disabled')) return false; | ||
| 96 | + return true; | ||
| 97 | + }})() | ||
| 98 | + """ | ||
| 99 | + ) | ||
| 100 | + if clickable: | ||
| 101 | + return | ||
| 102 | + time.sleep(1) | ||
| 103 | + | ||
| 104 | + raise UploadTimeoutError("等待发布按钮可点击超时(10分钟)") | ||
| 105 | + | ||
| 106 | + | ||
| 107 | +def _submit_publish_video( | ||
| 108 | + page: Page, | ||
| 109 | + title: str, | ||
| 110 | + content: str, | ||
| 111 | + tags: list[str], | ||
| 112 | + schedule_time: str | None, | ||
| 113 | + visibility: str, | ||
| 114 | +) -> None: | ||
| 115 | + """填写视频表单并提交。""" | ||
| 116 | + # 标题 | ||
| 117 | + page.input_text(TITLE_INPUT, title) | ||
| 118 | + time.sleep(1) | ||
| 119 | + | ||
| 120 | + # 正文 + 标签 | ||
| 121 | + content_selector = _find_content_element(page) | ||
| 122 | + page.input_content_editable(content_selector, content) | ||
| 123 | + | ||
| 124 | + # 回点标题 | ||
| 125 | + time.sleep(1) | ||
| 126 | + page.click_element(TITLE_INPUT) | ||
| 127 | + | ||
| 128 | + if tags: | ||
| 129 | + _input_tags(page, content_selector, tags) | ||
| 130 | + time.sleep(1) | ||
| 131 | + | ||
| 132 | + # 定时发布 | ||
| 133 | + if schedule_time: | ||
| 134 | + _set_schedule_publish(page, schedule_time) | ||
| 135 | + | ||
| 136 | + # 可见范围 | ||
| 137 | + _set_visibility(page, visibility) | ||
| 138 | + | ||
| 139 | + # 等待发布按钮可点击 | ||
| 140 | + _wait_for_publish_button_clickable(page) | ||
| 141 | + | ||
| 142 | + # 点击发布 | ||
| 143 | + page.click_element(PUBLISH_BUTTON) | ||
| 144 | + time.sleep(3) | ||
| 145 | + logger.info("视频发布完成") | ||
| 146 | + | ||
| 147 | + | ||
| 148 | +def _js_str(s: str) -> str: | ||
| 149 | + """将 Python 字符串转为 JS 字面量。""" | ||
| 150 | + import json | ||
| 151 | + | ||
| 152 | + return json.dumps(s) |
scripts/xhs/search.py
0 → 100644
| 1 | +"""搜索 Feeds,对应 Go xiaohongshu/search.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +import json | ||
| 6 | +import logging | ||
| 7 | +import time | ||
| 8 | + | ||
| 9 | +from .cdp import Page | ||
| 10 | +from .errors import NoFeedsError | ||
| 11 | +from .selectors import FILTER_BUTTON, FILTER_PANEL | ||
| 12 | +from .types import Feed, FilterOption | ||
| 13 | +from .urls import make_search_url | ||
| 14 | + | ||
| 15 | +logger = logging.getLogger(__name__) | ||
| 16 | + | ||
| 17 | +# 筛选选项映射表:{筛选组索引: [(标签索引, 文本), ...]} | ||
| 18 | +_FILTER_OPTIONS: dict[int, list[tuple[int, str]]] = { | ||
| 19 | + 1: [(1, "综合"), (2, "最新"), (3, "最多点赞"), (4, "最多评论"), (5, "最多收藏")], | ||
| 20 | + 2: [(1, "不限"), (2, "视频"), (3, "图文")], | ||
| 21 | + 3: [(1, "不限"), (2, "一天内"), (3, "一周内"), (4, "半年内")], | ||
| 22 | + 4: [(1, "不限"), (2, "已看过"), (3, "未看过"), (4, "已关注")], | ||
| 23 | + 5: [(1, "不限"), (2, "同城"), (3, "附近")], | ||
| 24 | +} | ||
| 25 | + | ||
| 26 | +# 从 __INITIAL_STATE__ 提取搜索结果的 JS | ||
| 27 | +_EXTRACT_SEARCH_JS = """ | ||
| 28 | +(() => { | ||
| 29 | + if (window.__INITIAL_STATE__ && | ||
| 30 | + window.__INITIAL_STATE__.search && | ||
| 31 | + window.__INITIAL_STATE__.search.feeds) { | ||
| 32 | + const feeds = window.__INITIAL_STATE__.search.feeds; | ||
| 33 | + const feedsData = feeds.value !== undefined ? feeds.value : feeds._value; | ||
| 34 | + if (feedsData) { | ||
| 35 | + return JSON.stringify(feedsData); | ||
| 36 | + } | ||
| 37 | + } | ||
| 38 | + return ""; | ||
| 39 | +})() | ||
| 40 | +""" | ||
| 41 | + | ||
| 42 | + | ||
| 43 | +def _find_internal_option(group_index: int, text: str) -> tuple[int, int]: | ||
| 44 | + """查找内部筛选选项索引。 | ||
| 45 | + | ||
| 46 | + Returns: | ||
| 47 | + (filters_index, tags_index) | ||
| 48 | + | ||
| 49 | + Raises: | ||
| 50 | + ValueError: 未找到匹配的选项。 | ||
| 51 | + """ | ||
| 52 | + options = _FILTER_OPTIONS.get(group_index) | ||
| 53 | + if not options: | ||
| 54 | + raise ValueError(f"筛选组 {group_index} 不存在") | ||
| 55 | + | ||
| 56 | + for tags_index, option_text in options: | ||
| 57 | + if option_text == text: | ||
| 58 | + return group_index, tags_index | ||
| 59 | + | ||
| 60 | + valid = [t for _, t in options] | ||
| 61 | + raise ValueError(f"在筛选组 {group_index} 中未找到 '{text}',有效值: {valid}") | ||
| 62 | + | ||
| 63 | + | ||
| 64 | +def _convert_filters(filter_opt: FilterOption) -> list[tuple[int, int]]: | ||
| 65 | + """将 FilterOption 转换为内部 (filters_index, tags_index) 列表。""" | ||
| 66 | + result: list[tuple[int, int]] = [] | ||
| 67 | + | ||
| 68 | + if filter_opt.sort_by: | ||
| 69 | + result.append(_find_internal_option(1, filter_opt.sort_by)) | ||
| 70 | + if filter_opt.note_type: | ||
| 71 | + result.append(_find_internal_option(2, filter_opt.note_type)) | ||
| 72 | + if filter_opt.publish_time: | ||
| 73 | + result.append(_find_internal_option(3, filter_opt.publish_time)) | ||
| 74 | + if filter_opt.search_scope: | ||
| 75 | + result.append(_find_internal_option(4, filter_opt.search_scope)) | ||
| 76 | + if filter_opt.location: | ||
| 77 | + result.append(_find_internal_option(5, filter_opt.location)) | ||
| 78 | + | ||
| 79 | + return result | ||
| 80 | + | ||
| 81 | + | ||
| 82 | +def search_feeds( | ||
| 83 | + page: Page, | ||
| 84 | + keyword: str, | ||
| 85 | + filter_option: FilterOption | None = None, | ||
| 86 | +) -> list[Feed]: | ||
| 87 | + """搜索 Feeds。 | ||
| 88 | + | ||
| 89 | + Args: | ||
| 90 | + page: CDP 页面对象。 | ||
| 91 | + keyword: 搜索关键词。 | ||
| 92 | + filter_option: 可选筛选条件。 | ||
| 93 | + | ||
| 94 | + Raises: | ||
| 95 | + NoFeedsError: 没有捕获到搜索结果。 | ||
| 96 | + ValueError: 筛选选项无效。 | ||
| 97 | + """ | ||
| 98 | + search_url = make_search_url(keyword) | ||
| 99 | + page.navigate(search_url) | ||
| 100 | + page.wait_for_load() | ||
| 101 | + page.wait_dom_stable() | ||
| 102 | + | ||
| 103 | + # 等待 __INITIAL_STATE__ 初始化 | ||
| 104 | + _wait_for_initial_state(page) | ||
| 105 | + | ||
| 106 | + # 应用筛选条件 | ||
| 107 | + if filter_option: | ||
| 108 | + internal_filters = _convert_filters(filter_option) | ||
| 109 | + if internal_filters: | ||
| 110 | + _apply_filters(page, internal_filters) | ||
| 111 | + | ||
| 112 | + # 提取搜索结果 | ||
| 113 | + result = page.evaluate(_EXTRACT_SEARCH_JS) | ||
| 114 | + if not result: | ||
| 115 | + raise NoFeedsError() | ||
| 116 | + | ||
| 117 | + feeds_data = json.loads(result) | ||
| 118 | + return [Feed.from_dict(f) for f in feeds_data] | ||
| 119 | + | ||
| 120 | + | ||
| 121 | +def _wait_for_initial_state(page: Page, timeout: float = 10.0) -> None: | ||
| 122 | + """等待 __INITIAL_STATE__ 就绪。""" | ||
| 123 | + deadline = time.monotonic() + timeout | ||
| 124 | + while time.monotonic() < deadline: | ||
| 125 | + ready = page.evaluate("window.__INITIAL_STATE__ !== undefined") | ||
| 126 | + if ready: | ||
| 127 | + return | ||
| 128 | + time.sleep(0.5) | ||
| 129 | + logger.warning("等待 __INITIAL_STATE__ 超时") | ||
| 130 | + | ||
| 131 | + | ||
| 132 | +def _apply_filters(page: Page, filters: list[tuple[int, int]]) -> None: | ||
| 133 | + """应用筛选条件。""" | ||
| 134 | + # 悬停筛选按钮 | ||
| 135 | + page.hover_element(FILTER_BUTTON) | ||
| 136 | + | ||
| 137 | + # 等待筛选面板出现 | ||
| 138 | + deadline = time.monotonic() + 5.0 | ||
| 139 | + while time.monotonic() < deadline: | ||
| 140 | + if page.has_element(FILTER_PANEL): | ||
| 141 | + break | ||
| 142 | + time.sleep(0.3) | ||
| 143 | + | ||
| 144 | + # 点击各筛选项 | ||
| 145 | + for filters_index, tags_index in filters: | ||
| 146 | + selector = ( | ||
| 147 | + f"div.filter-panel div.filters:nth-child({filters_index}) " | ||
| 148 | + f"div.tags:nth-child({tags_index})" | ||
| 149 | + ) | ||
| 150 | + page.click_element(selector) | ||
| 151 | + time.sleep(0.3) | ||
| 152 | + | ||
| 153 | + # 等待页面更新 | ||
| 154 | + page.wait_dom_stable() | ||
| 155 | + _wait_for_initial_state(page) |
scripts/xhs/selectors.py
0 → 100644
| 1 | +"""小红书页面 CSS 选择器常量。""" | ||
| 2 | + | ||
| 3 | +# ========== 登录 ========== | ||
| 4 | +LOGIN_STATUS = ".main-container .user .link-wrapper .channel" | ||
| 5 | +QRCODE_IMG = ".login-container .qrcode-img" | ||
| 6 | + | ||
| 7 | +# ========== 首页 / 搜索 ========== | ||
| 8 | +FILTER_BUTTON = "div.filter" | ||
| 9 | +FILTER_PANEL = "div.filter-panel" | ||
| 10 | + | ||
| 11 | +# ========== Feed 详情 ========== | ||
| 12 | +COMMENTS_CONTAINER = ".comments-container" | ||
| 13 | +PARENT_COMMENT = ".parent-comment" | ||
| 14 | +NO_COMMENTS_TEXT = ".no-comments-text" | ||
| 15 | +END_CONTAINER = ".end-container" | ||
| 16 | +TOTAL_COMMENT = ".comments-container .total" | ||
| 17 | +SHOW_MORE_BUTTON = ".show-more" | ||
| 18 | +NOTE_SCROLLER = ".note-scroller" | ||
| 19 | +INTERACTION_CONTAINER = ".interaction-container" | ||
| 20 | + | ||
| 21 | +# 页面不可访问容器 | ||
| 22 | +ACCESS_ERROR_WRAPPER = ".access-wrapper, .error-wrapper, .not-found-wrapper, .blocked-wrapper" | ||
| 23 | + | ||
| 24 | +# ========== 评论输入 ========== | ||
| 25 | +COMMENT_INPUT_TRIGGER = "div.input-box div.content-edit span" | ||
| 26 | +COMMENT_INPUT_FIELD = "div.input-box div.content-edit p.content-input" | ||
| 27 | +COMMENT_SUBMIT_BUTTON = "div.bottom button.submit" | ||
| 28 | +REPLY_BUTTON = ".right .interactions .reply" | ||
| 29 | + | ||
| 30 | +# ========== 点赞 / 收藏 ========== | ||
| 31 | +LIKE_BUTTON = ".interact-container .left .like-lottie" | ||
| 32 | +COLLECT_BUTTON = ".interact-container .left .reds-icon.collect-icon" | ||
| 33 | + | ||
| 34 | +# ========== 发布页 ========== | ||
| 35 | +UPLOAD_CONTENT = "div.upload-content" | ||
| 36 | +CREATOR_TAB = "div.creator-tab" | ||
| 37 | +UPLOAD_INPUT = ".upload-input" | ||
| 38 | +FILE_INPUT = 'input[type="file"]' | ||
| 39 | +TITLE_INPUT = "div.d-input input" | ||
| 40 | +CONTENT_EDITOR = "div.ql-editor" | ||
| 41 | +IMAGE_PREVIEW = ".img-preview-area .pr" | ||
| 42 | +PUBLISH_BUTTON = ".publish-page-publish-btn button.bg-red" | ||
| 43 | + | ||
| 44 | +# 标题/正文长度校验 | ||
| 45 | +TITLE_MAX_SUFFIX = "div.title-container div.max_suffix" | ||
| 46 | +CONTENT_LENGTH_ERROR = "div.edit-container div.length-error" | ||
| 47 | + | ||
| 48 | +# 可见范围 | ||
| 49 | +VISIBILITY_DROPDOWN = "div.permission-card-wrapper div.d-select-content" | ||
| 50 | +VISIBILITY_OPTIONS = "div.d-options-wrapper div.d-grid-item div.custom-option" | ||
| 51 | + | ||
| 52 | +# 定时发布 | ||
| 53 | +SCHEDULE_SWITCH = ".post-time-wrapper .d-switch" | ||
| 54 | +DATETIME_INPUT = ".date-picker-container input" | ||
| 55 | + | ||
| 56 | +# 原创声明 | ||
| 57 | +ORIGINAL_SWITCH_CARD = "div.custom-switch-card" | ||
| 58 | +ORIGINAL_SWITCH = "div.d-switch" | ||
| 59 | + | ||
| 60 | +# 标签联想 | ||
| 61 | +TAG_TOPIC_CONTAINER = "#creator-editor-topic-container" | ||
| 62 | +TAG_FIRST_ITEM = ".item" | ||
| 63 | + | ||
| 64 | +# 弹窗 | ||
| 65 | +POPOVER = "div.d-popover" | ||
| 66 | + | ||
| 67 | +# ========== 用户主页 ========== | ||
| 68 | +SIDEBAR_PROFILE = "div.main-container li.user.side-bar-component a.link-wrapper span.channel" |
scripts/xhs/stealth.py
0 → 100644
| 1 | +"""反检测 JS 注入 + Chrome 启动参数,对应 go-rod/stealth。""" | ||
| 2 | + | ||
| 3 | +# 反检测 JS 脚本:在页面加载时注入 | ||
| 4 | +STEALTH_JS = """ | ||
| 5 | +(() => { | ||
| 6 | + // 1. navigator.webdriver | ||
| 7 | + Object.defineProperty(navigator, 'webdriver', { | ||
| 8 | + get: () => undefined, | ||
| 9 | + configurable: true, | ||
| 10 | + }); | ||
| 11 | + | ||
| 12 | + // 2. chrome.runtime | ||
| 13 | + if (!window.chrome) { | ||
| 14 | + window.chrome = {}; | ||
| 15 | + } | ||
| 16 | + if (!window.chrome.runtime) { | ||
| 17 | + window.chrome.runtime = { | ||
| 18 | + connect: () => {}, | ||
| 19 | + sendMessage: () => {}, | ||
| 20 | + }; | ||
| 21 | + } | ||
| 22 | + | ||
| 23 | + // 3. plugins | ||
| 24 | + Object.defineProperty(navigator, 'plugins', { | ||
| 25 | + get: () => { | ||
| 26 | + return [ | ||
| 27 | + { | ||
| 28 | + 0: {type: 'application/x-google-chrome-pdf'}, | ||
| 29 | + description: 'Portable Document Format', | ||
| 30 | + filename: 'internal-pdf-viewer', | ||
| 31 | + length: 1, | ||
| 32 | + name: 'Chrome PDF Plugin', | ||
| 33 | + }, | ||
| 34 | + { | ||
| 35 | + 0: {type: 'application/pdf'}, | ||
| 36 | + description: '', | ||
| 37 | + filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai', | ||
| 38 | + length: 1, | ||
| 39 | + name: 'Chrome PDF Viewer', | ||
| 40 | + }, | ||
| 41 | + { | ||
| 42 | + 0: {type: 'application/x-nacl'}, | ||
| 43 | + description: '', | ||
| 44 | + filename: 'internal-nacl-plugin', | ||
| 45 | + length: 1, | ||
| 46 | + name: 'Native Client', | ||
| 47 | + }, | ||
| 48 | + ]; | ||
| 49 | + }, | ||
| 50 | + configurable: true, | ||
| 51 | + }); | ||
| 52 | + | ||
| 53 | + // 4. languages | ||
| 54 | + Object.defineProperty(navigator, 'languages', { | ||
| 55 | + get: () => ['zh-CN', 'zh', 'en-US', 'en'], | ||
| 56 | + configurable: true, | ||
| 57 | + }); | ||
| 58 | + | ||
| 59 | + // 5. permissions | ||
| 60 | + const originalQuery = window.navigator.permissions?.query; | ||
| 61 | + if (originalQuery) { | ||
| 62 | + window.navigator.permissions.query = (parameters) => | ||
| 63 | + parameters.name === 'notifications' | ||
| 64 | + ? Promise.resolve({ state: Notification.permission }) | ||
| 65 | + : originalQuery(parameters); | ||
| 66 | + } | ||
| 67 | + | ||
| 68 | + // 6. WebGL vendor/renderer | ||
| 69 | + const getParameter = WebGLRenderingContext.prototype.getParameter; | ||
| 70 | + WebGLRenderingContext.prototype.getParameter = function(parameter) { | ||
| 71 | + if (parameter === 37445) return 'Intel Inc.'; | ||
| 72 | + if (parameter === 37446) return 'Intel Iris OpenGL Engine'; | ||
| 73 | + return getParameter.call(this, parameter); | ||
| 74 | + }; | ||
| 75 | +})(); | ||
| 76 | +""" | ||
| 77 | + | ||
| 78 | +# Chrome 启动参数(反检测相关) | ||
| 79 | +STEALTH_ARGS = [ | ||
| 80 | + "--disable-blink-features=AutomationControlled", | ||
| 81 | + "--disable-infobars", | ||
| 82 | + "--no-first-run", | ||
| 83 | + "--no-default-browser-check", | ||
| 84 | + "--disable-background-timer-throttling", | ||
| 85 | + "--disable-backgrounding-occluded-windows", | ||
| 86 | + "--disable-renderer-backgrounding", | ||
| 87 | + "--disable-component-update", | ||
| 88 | +] |
scripts/xhs/types.py
0 → 100644
| 1 | +"""小红书数据类型定义,对应 Go types.go。""" | ||
| 2 | + | ||
| 3 | +from __future__ import annotations | ||
| 4 | + | ||
| 5 | +from dataclasses import dataclass, field | ||
| 6 | + | ||
| 7 | +# ========== Feed 列表 ========== | ||
| 8 | + | ||
| 9 | + | ||
| 10 | +@dataclass | ||
| 11 | +class ImageInfo: | ||
| 12 | + image_scene: str = "" | ||
| 13 | + url: str = "" | ||
| 14 | + | ||
| 15 | + @classmethod | ||
| 16 | + def from_dict(cls, d: dict) -> ImageInfo: | ||
| 17 | + return cls( | ||
| 18 | + image_scene=d.get("imageScene", ""), | ||
| 19 | + url=d.get("url", ""), | ||
| 20 | + ) | ||
| 21 | + | ||
| 22 | + | ||
| 23 | +@dataclass | ||
| 24 | +class VideoCapability: | ||
| 25 | + duration: int = 0 # 秒 | ||
| 26 | + | ||
| 27 | + @classmethod | ||
| 28 | + def from_dict(cls, d: dict) -> VideoCapability: | ||
| 29 | + return cls(duration=d.get("duration", 0)) | ||
| 30 | + | ||
| 31 | + | ||
| 32 | +@dataclass | ||
| 33 | +class Video: | ||
| 34 | + capa: VideoCapability = field(default_factory=VideoCapability) | ||
| 35 | + | ||
| 36 | + @classmethod | ||
| 37 | + def from_dict(cls, d: dict) -> Video: | ||
| 38 | + return cls(capa=VideoCapability.from_dict(d.get("capa", {}))) | ||
| 39 | + | ||
| 40 | + | ||
| 41 | +@dataclass | ||
| 42 | +class Cover: | ||
| 43 | + width: int = 0 | ||
| 44 | + height: int = 0 | ||
| 45 | + url: str = "" | ||
| 46 | + file_id: str = "" | ||
| 47 | + url_pre: str = "" | ||
| 48 | + url_default: str = "" | ||
| 49 | + info_list: list[ImageInfo] = field(default_factory=list) | ||
| 50 | + | ||
| 51 | + @classmethod | ||
| 52 | + def from_dict(cls, d: dict) -> Cover: | ||
| 53 | + return cls( | ||
| 54 | + width=d.get("width", 0), | ||
| 55 | + height=d.get("height", 0), | ||
| 56 | + url=d.get("url", ""), | ||
| 57 | + file_id=d.get("fileId", ""), | ||
| 58 | + url_pre=d.get("urlPre", ""), | ||
| 59 | + url_default=d.get("urlDefault", ""), | ||
| 60 | + info_list=[ImageInfo.from_dict(i) for i in d.get("infoList", [])], | ||
| 61 | + ) | ||
| 62 | + | ||
| 63 | + | ||
| 64 | +@dataclass | ||
| 65 | +class User: | ||
| 66 | + user_id: str = "" | ||
| 67 | + nickname: str = "" | ||
| 68 | + nick_name: str = "" | ||
| 69 | + avatar: str = "" | ||
| 70 | + | ||
| 71 | + @classmethod | ||
| 72 | + def from_dict(cls, d: dict) -> User: | ||
| 73 | + return cls( | ||
| 74 | + user_id=d.get("userId", ""), | ||
| 75 | + nickname=d.get("nickname", ""), | ||
| 76 | + nick_name=d.get("nickName", ""), | ||
| 77 | + avatar=d.get("avatar", ""), | ||
| 78 | + ) | ||
| 79 | + | ||
| 80 | + | ||
| 81 | +@dataclass | ||
| 82 | +class InteractInfo: | ||
| 83 | + liked: bool = False | ||
| 84 | + liked_count: str = "" | ||
| 85 | + shared_count: str = "" | ||
| 86 | + comment_count: str = "" | ||
| 87 | + collected_count: str = "" | ||
| 88 | + collected: bool = False | ||
| 89 | + | ||
| 90 | + @classmethod | ||
| 91 | + def from_dict(cls, d: dict) -> InteractInfo: | ||
| 92 | + return cls( | ||
| 93 | + liked=d.get("liked", False), | ||
| 94 | + liked_count=d.get("likedCount", ""), | ||
| 95 | + shared_count=d.get("sharedCount", ""), | ||
| 96 | + comment_count=d.get("commentCount", ""), | ||
| 97 | + collected_count=d.get("collectedCount", ""), | ||
| 98 | + collected=d.get("collected", False), | ||
| 99 | + ) | ||
| 100 | + | ||
| 101 | + | ||
| 102 | +@dataclass | ||
| 103 | +class NoteCard: | ||
| 104 | + type: str = "" | ||
| 105 | + display_title: str = "" | ||
| 106 | + user: User = field(default_factory=User) | ||
| 107 | + interact_info: InteractInfo = field(default_factory=InteractInfo) | ||
| 108 | + cover: Cover = field(default_factory=Cover) | ||
| 109 | + video: Video | None = None | ||
| 110 | + | ||
| 111 | + @classmethod | ||
| 112 | + def from_dict(cls, d: dict) -> NoteCard: | ||
| 113 | + video_data = d.get("video") | ||
| 114 | + return cls( | ||
| 115 | + type=d.get("type", ""), | ||
| 116 | + display_title=d.get("displayTitle", ""), | ||
| 117 | + user=User.from_dict(d.get("user", {})), | ||
| 118 | + interact_info=InteractInfo.from_dict(d.get("interactInfo", {})), | ||
| 119 | + cover=Cover.from_dict(d.get("cover", {})), | ||
| 120 | + video=Video.from_dict(video_data) if video_data else None, | ||
| 121 | + ) | ||
| 122 | + | ||
| 123 | + | ||
| 124 | +@dataclass | ||
| 125 | +class Feed: | ||
| 126 | + xsec_token: str = "" | ||
| 127 | + id: str = "" | ||
| 128 | + model_type: str = "" | ||
| 129 | + note_card: NoteCard = field(default_factory=NoteCard) | ||
| 130 | + index: int = 0 | ||
| 131 | + | ||
| 132 | + @classmethod | ||
| 133 | + def from_dict(cls, d: dict) -> Feed: | ||
| 134 | + return cls( | ||
| 135 | + xsec_token=d.get("xsecToken", ""), | ||
| 136 | + id=d.get("id", ""), | ||
| 137 | + model_type=d.get("modelType", ""), | ||
| 138 | + note_card=NoteCard.from_dict(d.get("noteCard", {})), | ||
| 139 | + index=d.get("index", 0), | ||
| 140 | + ) | ||
| 141 | + | ||
| 142 | + def to_dict(self) -> dict: | ||
| 143 | + """序列化为 JSON 兼容的字典。""" | ||
| 144 | + result: dict = { | ||
| 145 | + "id": self.id, | ||
| 146 | + "xsecToken": self.xsec_token, | ||
| 147 | + "modelType": self.model_type, | ||
| 148 | + "index": self.index, | ||
| 149 | + "displayTitle": self.note_card.display_title, | ||
| 150 | + "type": self.note_card.type, | ||
| 151 | + "user": { | ||
| 152 | + "userId": self.note_card.user.user_id, | ||
| 153 | + "nickname": self.note_card.user.nickname or self.note_card.user.nick_name, | ||
| 154 | + }, | ||
| 155 | + "interactInfo": { | ||
| 156 | + "likedCount": self.note_card.interact_info.liked_count, | ||
| 157 | + "collectedCount": self.note_card.interact_info.collected_count, | ||
| 158 | + "commentCount": self.note_card.interact_info.comment_count, | ||
| 159 | + "sharedCount": self.note_card.interact_info.shared_count, | ||
| 160 | + }, | ||
| 161 | + } | ||
| 162 | + if self.note_card.video: | ||
| 163 | + result["video"] = {"duration": self.note_card.video.capa.duration} | ||
| 164 | + return result | ||
| 165 | + | ||
| 166 | + | ||
| 167 | +# ========== Feed 详情 ========== | ||
| 168 | + | ||
| 169 | + | ||
| 170 | +@dataclass | ||
| 171 | +class DetailImageInfo: | ||
| 172 | + width: int = 0 | ||
| 173 | + height: int = 0 | ||
| 174 | + url_default: str = "" | ||
| 175 | + url_pre: str = "" | ||
| 176 | + live_photo: bool = False | ||
| 177 | + | ||
| 178 | + @classmethod | ||
| 179 | + def from_dict(cls, d: dict) -> DetailImageInfo: | ||
| 180 | + return cls( | ||
| 181 | + width=d.get("width", 0), | ||
| 182 | + height=d.get("height", 0), | ||
| 183 | + url_default=d.get("urlDefault", ""), | ||
| 184 | + url_pre=d.get("urlPre", ""), | ||
| 185 | + live_photo=d.get("livePhoto", False), | ||
| 186 | + ) | ||
| 187 | + | ||
| 188 | + | ||
| 189 | +@dataclass | ||
| 190 | +class Comment: | ||
| 191 | + id: str = "" | ||
| 192 | + note_id: str = "" | ||
| 193 | + content: str = "" | ||
| 194 | + like_count: str = "" | ||
| 195 | + create_time: int = 0 | ||
| 196 | + ip_location: str = "" | ||
| 197 | + liked: bool = False | ||
| 198 | + user_info: User = field(default_factory=User) | ||
| 199 | + sub_comment_count: str = "" | ||
| 200 | + sub_comments: list[Comment] = field(default_factory=list) | ||
| 201 | + show_tags: list[str] = field(default_factory=list) | ||
| 202 | + | ||
| 203 | + @classmethod | ||
| 204 | + def from_dict(cls, d: dict) -> Comment: | ||
| 205 | + return cls( | ||
| 206 | + id=d.get("id", ""), | ||
| 207 | + note_id=d.get("noteId", ""), | ||
| 208 | + content=d.get("content", ""), | ||
| 209 | + like_count=d.get("likeCount", ""), | ||
| 210 | + create_time=d.get("createTime", 0), | ||
| 211 | + ip_location=d.get("ipLocation", ""), | ||
| 212 | + liked=d.get("liked", False), | ||
| 213 | + user_info=User.from_dict(d.get("userInfo", {})), | ||
| 214 | + sub_comment_count=d.get("subCommentCount", ""), | ||
| 215 | + sub_comments=[cls.from_dict(c) for c in d.get("subComments", []) or []], | ||
| 216 | + show_tags=d.get("showTags", []) or [], | ||
| 217 | + ) | ||
| 218 | + | ||
| 219 | + def to_dict(self) -> dict: | ||
| 220 | + result: dict = { | ||
| 221 | + "id": self.id, | ||
| 222 | + "content": self.content, | ||
| 223 | + "likeCount": self.like_count, | ||
| 224 | + "createTime": self.create_time, | ||
| 225 | + "ipLocation": self.ip_location, | ||
| 226 | + "user": { | ||
| 227 | + "userId": self.user_info.user_id, | ||
| 228 | + "nickname": self.user_info.nickname or self.user_info.nick_name, | ||
| 229 | + }, | ||
| 230 | + "subCommentCount": self.sub_comment_count, | ||
| 231 | + } | ||
| 232 | + if self.sub_comments: | ||
| 233 | + result["subComments"] = [c.to_dict() for c in self.sub_comments] | ||
| 234 | + return result | ||
| 235 | + | ||
| 236 | + | ||
| 237 | +@dataclass | ||
| 238 | +class CommentList: | ||
| 239 | + list_: list[Comment] = field(default_factory=list) | ||
| 240 | + cursor: str = "" | ||
| 241 | + has_more: bool = False | ||
| 242 | + | ||
| 243 | + @classmethod | ||
| 244 | + def from_dict(cls, d: dict) -> CommentList: | ||
| 245 | + return cls( | ||
| 246 | + list_=[Comment.from_dict(c) for c in d.get("list", []) or []], | ||
| 247 | + cursor=d.get("cursor", ""), | ||
| 248 | + has_more=d.get("hasMore", False), | ||
| 249 | + ) | ||
| 250 | + | ||
| 251 | + | ||
| 252 | +@dataclass | ||
| 253 | +class FeedDetail: | ||
| 254 | + note_id: str = "" | ||
| 255 | + xsec_token: str = "" | ||
| 256 | + title: str = "" | ||
| 257 | + desc: str = "" | ||
| 258 | + type: str = "" | ||
| 259 | + time: int = 0 | ||
| 260 | + ip_location: str = "" | ||
| 261 | + user: User = field(default_factory=User) | ||
| 262 | + interact_info: InteractInfo = field(default_factory=InteractInfo) | ||
| 263 | + image_list: list[DetailImageInfo] = field(default_factory=list) | ||
| 264 | + | ||
| 265 | + @classmethod | ||
| 266 | + def from_dict(cls, d: dict) -> FeedDetail: | ||
| 267 | + return cls( | ||
| 268 | + note_id=d.get("noteId", ""), | ||
| 269 | + xsec_token=d.get("xsecToken", ""), | ||
| 270 | + title=d.get("title", ""), | ||
| 271 | + desc=d.get("desc", ""), | ||
| 272 | + type=d.get("type", ""), | ||
| 273 | + time=d.get("time", 0), | ||
| 274 | + ip_location=d.get("ipLocation", ""), | ||
| 275 | + user=User.from_dict(d.get("user", {})), | ||
| 276 | + interact_info=InteractInfo.from_dict(d.get("interactInfo", {})), | ||
| 277 | + image_list=[DetailImageInfo.from_dict(i) for i in d.get("imageList", []) or []], | ||
| 278 | + ) | ||
| 279 | + | ||
| 280 | + def to_dict(self) -> dict: | ||
| 281 | + return { | ||
| 282 | + "noteId": self.note_id, | ||
| 283 | + "title": self.title, | ||
| 284 | + "desc": self.desc, | ||
| 285 | + "type": self.type, | ||
| 286 | + "time": self.time, | ||
| 287 | + "ipLocation": self.ip_location, | ||
| 288 | + "user": { | ||
| 289 | + "userId": self.user.user_id, | ||
| 290 | + "nickname": self.user.nickname or self.user.nick_name, | ||
| 291 | + }, | ||
| 292 | + "interactInfo": { | ||
| 293 | + "liked": self.interact_info.liked, | ||
| 294 | + "likedCount": self.interact_info.liked_count, | ||
| 295 | + "collectedCount": self.interact_info.collected_count, | ||
| 296 | + "collected": self.interact_info.collected, | ||
| 297 | + "commentCount": self.interact_info.comment_count, | ||
| 298 | + "sharedCount": self.interact_info.shared_count, | ||
| 299 | + }, | ||
| 300 | + "imageList": [ | ||
| 301 | + { | ||
| 302 | + "width": img.width, | ||
| 303 | + "height": img.height, | ||
| 304 | + "urlDefault": img.url_default, | ||
| 305 | + } | ||
| 306 | + for img in self.image_list | ||
| 307 | + ], | ||
| 308 | + } | ||
| 309 | + | ||
| 310 | + | ||
| 311 | +@dataclass | ||
| 312 | +class FeedDetailResponse: | ||
| 313 | + note: FeedDetail = field(default_factory=FeedDetail) | ||
| 314 | + comments: CommentList = field(default_factory=CommentList) | ||
| 315 | + | ||
| 316 | + @classmethod | ||
| 317 | + def from_dict(cls, d: dict) -> FeedDetailResponse: | ||
| 318 | + return cls( | ||
| 319 | + note=FeedDetail.from_dict(d.get("note", {})), | ||
| 320 | + comments=CommentList.from_dict(d.get("comments", {})), | ||
| 321 | + ) | ||
| 322 | + | ||
| 323 | + def to_dict(self) -> dict: | ||
| 324 | + return { | ||
| 325 | + "note": self.note.to_dict(), | ||
| 326 | + "comments": [c.to_dict() for c in self.comments.list_], | ||
| 327 | + } | ||
| 328 | + | ||
| 329 | + | ||
| 330 | +# ========== 用户主页 ========== | ||
| 331 | + | ||
| 332 | + | ||
| 333 | +@dataclass | ||
| 334 | +class UserBasicInfo: | ||
| 335 | + gender: int = 0 | ||
| 336 | + ip_location: str = "" | ||
| 337 | + desc: str = "" | ||
| 338 | + imageb: str = "" | ||
| 339 | + nickname: str = "" | ||
| 340 | + images: str = "" | ||
| 341 | + red_id: str = "" | ||
| 342 | + | ||
| 343 | + @classmethod | ||
| 344 | + def from_dict(cls, d: dict) -> UserBasicInfo: | ||
| 345 | + return cls( | ||
| 346 | + gender=d.get("gender", 0), | ||
| 347 | + ip_location=d.get("ipLocation", ""), | ||
| 348 | + desc=d.get("desc", ""), | ||
| 349 | + imageb=d.get("imageb", ""), | ||
| 350 | + nickname=d.get("nickname", ""), | ||
| 351 | + images=d.get("images", ""), | ||
| 352 | + red_id=d.get("redId", ""), | ||
| 353 | + ) | ||
| 354 | + | ||
| 355 | + | ||
| 356 | +@dataclass | ||
| 357 | +class UserInteraction: | ||
| 358 | + type: str = "" | ||
| 359 | + name: str = "" | ||
| 360 | + count: str = "" | ||
| 361 | + | ||
| 362 | + @classmethod | ||
| 363 | + def from_dict(cls, d: dict) -> UserInteraction: | ||
| 364 | + return cls( | ||
| 365 | + type=d.get("type", ""), | ||
| 366 | + name=d.get("name", ""), | ||
| 367 | + count=d.get("count", ""), | ||
| 368 | + ) | ||
| 369 | + | ||
| 370 | + | ||
| 371 | +@dataclass | ||
| 372 | +class UserProfileResponse: | ||
| 373 | + user_basic_info: UserBasicInfo = field(default_factory=UserBasicInfo) | ||
| 374 | + interactions: list[UserInteraction] = field(default_factory=list) | ||
| 375 | + feeds: list[Feed] = field(default_factory=list) | ||
| 376 | + | ||
| 377 | + def to_dict(self) -> dict: | ||
| 378 | + return { | ||
| 379 | + "basicInfo": { | ||
| 380 | + "nickname": self.user_basic_info.nickname, | ||
| 381 | + "redId": self.user_basic_info.red_id, | ||
| 382 | + "desc": self.user_basic_info.desc, | ||
| 383 | + "gender": self.user_basic_info.gender, | ||
| 384 | + "ipLocation": self.user_basic_info.ip_location, | ||
| 385 | + }, | ||
| 386 | + "interactions": [ | ||
| 387 | + {"type": i.type, "name": i.name, "count": i.count} for i in self.interactions | ||
| 388 | + ], | ||
| 389 | + "feeds": [f.to_dict() for f in self.feeds], | ||
| 390 | + } | ||
| 391 | + | ||
| 392 | + | ||
| 393 | +# ========== 搜索 ========== | ||
| 394 | + | ||
| 395 | + | ||
| 396 | +@dataclass | ||
| 397 | +class FilterOption: | ||
| 398 | + """搜索筛选选项。""" | ||
| 399 | + | ||
| 400 | + sort_by: str = "" # 综合|最新|最多点赞|最多评论|最多收藏 | ||
| 401 | + note_type: str = "" # 不限|视频|图文 | ||
| 402 | + publish_time: str = "" # 不限|一天内|一周内|半年内 | ||
| 403 | + search_scope: str = "" # 不限|已看过|未看过|已关注 | ||
| 404 | + location: str = "" # 不限|同城|附近 | ||
| 405 | + | ||
| 406 | + | ||
| 407 | +# ========== 发布 ========== | ||
| 408 | + | ||
| 409 | + | ||
| 410 | +@dataclass | ||
| 411 | +class PublishImageContent: | ||
| 412 | + """图文发布内容。""" | ||
| 413 | + | ||
| 414 | + title: str = "" | ||
| 415 | + content: str = "" | ||
| 416 | + tags: list[str] = field(default_factory=list) | ||
| 417 | + image_paths: list[str] = field(default_factory=list) | ||
| 418 | + schedule_time: str | None = None # ISO8601 格式,None 表示立即发布 | ||
| 419 | + is_original: bool = False | ||
| 420 | + visibility: str = "" # 公开可见(默认)|仅自己可见|仅互关好友可见 | ||
| 421 | + | ||
| 422 | + | ||
| 423 | +@dataclass | ||
| 424 | +class PublishVideoContent: | ||
| 425 | + """视频发布内容。""" | ||
| 426 | + | ||
| 427 | + title: str = "" | ||
| 428 | + content: str = "" | ||
| 429 | + tags: list[str] = field(default_factory=list) | ||
| 430 | + video_path: str = "" | ||
| 431 | + schedule_time: str | None = None # ISO8601 格式 | ||
| 432 | + visibility: str = "" # 公开可见(默认)|仅自己可见|仅互关好友可见 | ||
| 433 | + | ||
| 434 | + | ||
| 435 | +# ========== 互动 ========== | ||
| 436 | + | ||
| 437 | + | ||
| 438 | +@dataclass | ||
| 439 | +class ActionResult: | ||
| 440 | + """通用动作响应(点赞/收藏等)。""" | ||
| 441 | + | ||
| 442 | + feed_id: str = "" | ||
| 443 | + success: bool = False | ||
| 444 | + message: str = "" | ||
| 445 | + | ||
| 446 | + def to_dict(self) -> dict: | ||
| 447 | + return { | ||
| 448 | + "feed_id": self.feed_id, | ||
| 449 | + "success": self.success, | ||
| 450 | + "message": self.message, | ||
| 451 | + } | ||
| 452 | + | ||
| 453 | + | ||
| 454 | +# ========== 评论加载配置 ========== | ||
| 455 | + | ||
| 456 | + | ||
| 457 | +@dataclass | ||
| 458 | +class CommentLoadConfig: | ||
| 459 | + """评论加载配置。""" | ||
| 460 | + | ||
| 461 | + click_more_replies: bool = False | ||
| 462 | + max_replies_threshold: int = 10 | ||
| 463 | + max_comment_items: int = 0 # 0 = 不限 | ||
| 464 | + scroll_speed: str = "normal" # slow|normal|fast |
scripts/xhs/urls.py
0 → 100644
| 1 | +"""小红书 URL 常量和构建函数。""" | ||
| 2 | + | ||
| 3 | +from urllib.parse import urlencode | ||
| 4 | + | ||
| 5 | +# 基础页面 | ||
| 6 | +EXPLORE_URL = "https://www.xiaohongshu.com/explore" | ||
| 7 | +HOME_URL = "https://www.xiaohongshu.com" | ||
| 8 | +PUBLISH_URL = "https://creator.xiaohongshu.com/publish/publish?source=official" | ||
| 9 | + | ||
| 10 | + | ||
| 11 | +def make_feed_detail_url(feed_id: str, xsec_token: str) -> str: | ||
| 12 | + """构建 feed 详情页 URL。""" | ||
| 13 | + return ( | ||
| 14 | + f"https://www.xiaohongshu.com/explore/{feed_id}?xsec_token={xsec_token}&xsec_source=pc_feed" | ||
| 15 | + ) | ||
| 16 | + | ||
| 17 | + | ||
| 18 | +def make_search_url(keyword: str) -> str: | ||
| 19 | + """构建搜索结果页 URL。""" | ||
| 20 | + params = urlencode({"keyword": keyword, "source": "web_explore_feed"}) | ||
| 21 | + return f"https://www.xiaohongshu.com/search_result?{params}" | ||
| 22 | + | ||
| 23 | + | ||
| 24 | +def make_user_profile_url(user_id: str, xsec_token: str) -> str: | ||
| 25 | + """构建用户主页 URL。""" | ||
| 26 | + return ( | ||
| 27 | + f"https://www.xiaohongshu.com/user/profile/{user_id}" | ||
| 28 | + f"?xsec_token={xsec_token}&xsec_source=pc_note" | ||
| 29 | + ) |
-
Please register or login to post a comment