zy
Committed by GitHub

功能: 重写小红书 Skills,完整迁移为 CDP Python 实现 (#1)

## 主要变更

### 核心模块重写
- 创建 scripts/xhs/ 包,包含 18 个专业模块(3728 行代码)
- 基于 xiaohongshu-mcp Go 源码完整实现
- CDP WebSocket 直接通信,替代第三方库依赖

### 模块清单
- cdp.py: Browser/Page/Element 类,完整 CDP 协议实现
- stealth.py: 反检测 JS 注入 + Chrome 启动参数
- login.py: 登录检查与二维码登录(QR 码保存到临时文件供 Agent 显示)
- publish.py: 图文发布完整流程
- publish_video.py: 视频发布完整流程
- search.py: 搜索与内容筛选
- feed_detail.py: 笔记详情与评论加载
- comment.py: 评论与回复
- like_favorite.py: 点赞与收藏
- user_profile.py: 用户主页
- cookies.py: Cookie 持久化
- types.py: 完整的 dataclass 数据类型系统
- errors.py: 自定义异常体系
- human.py: 人类行为模拟(延迟、滚动)
- selectors.py: CSS 选择器常量
- urls.py: URL 构建函数

### CLI 统一接口
- scripts/cli.py: 13 个子命令,完全兼容 xiaohongshu-mcp MCP 工具
- check-login: 检查登录状态
- login: 获取登录二维码
- switch-account/delete-cookies: 账号切换
- publish-content: 图文发布
- publish-with-video: 视频发布
- list-feeds: Feed 列表
- search-feeds: Feed 搜索
- get-feed-detail: 笔记详情
- user-profile: 用户主页
- post-comment: 发送评论
- like-feed: 点赞笔记
- favorite-feed: 收藏笔记

### 支持脚本重写
- chrome_launcher.py: Chrome 进程管理(跨平台)
- account_manager.py: 多账号 Profile 隔离
- image_downloader.py: 图片/视频下载(SHA256 缓存)
- title_utils.py: UTF-16 标题长度计算
- run_lock.py: 单实例锁机制
- publish_pipeline.py: 发布流程编排 CLI

### 文档与配置
- SKILL.md: 统一技能入口(路由到 5 个子技能)
- skills/xhs-auth/SKILL.md: 认证管理技能
- skills/xhs-publish/SKILL.md: 内容发布技能(图文+视频)
- skills/xhs-explore/SKILL.md: 内容发现与分析技能
- skills/xhs-interact/SKILL.md: 社交互动技能(评论/点赞/收藏)
- skills/xhs-content-ops/SKILL.md: 复合内容运营工作流技能
- CLAUDE.md: 项目开发指南
- PROMPT.md: Ralph Loop 驱动文件
- pyproject.toml: uv 项目配置(uv.lock)
- README.md: 完整项目文档

### 技术栈
- Python 3.11+ with uv 包管理
- requests + websockets: CDP WebSocket 通信
- 代码规范: ruff lint + format

## 对应关系
所有 13 个子命令与 xiaohongshu-mcp MCP 工具完全对应
支持 OpenClaw agent 框架直接调用

## 前置工作
- 创建 scripts/xhs/ 包架构
- 实现 CDP WebSocket 协议
- 完整的类型系统和错误处理
- CLI 子命令系统

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Too many changes to show.

To preserve performance only 30 of 30+ files are displayed.

@@ -205,3 +205,15 @@ cython_debug/ @@ -205,3 +205,15 @@ cython_debug/
205 marimo/_static/ 205 marimo/_static/
206 marimo/_lsp/ 206 marimo/_lsp/
207 __marimo__/ 207 __marimo__/
  208 +
  209 +# Project specific
  210 +tmp/
  211 +*.txt
  212 +!requirements.txt
  213 +config/accounts.json
  214 +title.txt
  215 +content.txt
  216 +comment.txt
  217 +
  218 +# Ralph Loop state
  219 +.claude/.ralph-loop.local.md
  1 +# xiaohongshu-skills
  2 +
  3 +小红书自动化 Claude Code Skills,基于 Python CDP 浏览器自动化引擎。
  4 +为 OpenClaw 生态提供小红书操作能力,同时支持 Claude Code skills 格式。
  5 +
  6 +## 项目结构
  7 +
  8 +```
  9 +xiaohongshu-skills/
  10 +├── scripts/ # Python CDP 自动化引擎
  11 +│ ├── xhs/ # 核心 XHS 自动化包
  12 +│ │ ├── __init__.py
  13 +│ │ ├── cdp.py # CDP WebSocket 客户端(Browser, Page, Element)
  14 +│ │ ├── stealth.py # 反检测 JS 注入 + Chrome 启动参数
  15 +│ │ ├── cookies.py # Cookie 文件持久化
  16 +│ │ ├── types.py # 数据类型(dataclass)
  17 +│ │ ├── errors.py # 异常体系
  18 +│ │ ├── selectors.py # CSS 选择器常量
  19 +│ │ ├── urls.py # URL 常量和构建函数
  20 +│ │ ├── human.py # 人类行为模拟(延迟、滚动)
  21 +│ │ ├── login.py # 登录检查、二维码登录
  22 +│ │ ├── feeds.py # 首页 Feed 列表
  23 +│ │ ├── search.py # 搜索 + 筛选
  24 +│ │ ├── feed_detail.py # 笔记详情 + 评论加载
  25 +│ │ ├── user_profile.py # 用户主页
  26 +│ │ ├── comment.py # 评论、回复
  27 +│ │ ├── like_favorite.py # 点赞、收藏
  28 +│ │ ├── publish.py # 图文发布
  29 +│ │ └── publish_video.py # 视频发布
  30 +│ ├── cli.py # 统一 CLI 入口(13 个子命令)
  31 +│ ├── chrome_launcher.py # Chrome 进程管理
  32 +│ ├── account_manager.py # 多账号管理
  33 +│ ├── image_downloader.py # 媒体下载(SHA256 缓存)
  34 +│ ├── title_utils.py # UTF-16 标题长度计算
  35 +│ ├── run_lock.py # 单实例锁
  36 +│ └── publish_pipeline.py # 发布编排器
  37 +├── skills/ # Claude Code Skills 定义
  38 +│ ├── xhs-auth/SKILL.md # 认证管理
  39 +│ ├── xhs-publish/SKILL.md # 内容发布(图文+视频)
  40 +│ ├── xhs-explore/SKILL.md # 内容发现与分析
  41 +│ ├── xhs-interact/SKILL.md # 社交互动(评论/点赞/收藏)
  42 +│ └── xhs-content-ops/SKILL.md # 复合内容运营工作流
  43 +├── pyproject.toml # uv 项目配置
  44 +├── SKILL.md # 统一入口(路由到子技能)
  45 +├── CLAUDE.md # 本文件
  46 +├── PROMPT.md # Ralph Loop 驱动文件
  47 +└── README.md
  48 +```
  49 +
  50 +## 技术栈
  51 +
  52 +- **Python**: >=3.11
  53 +- **包管理**: uv
  54 +- **依赖**: requests + websockets(直接 CDP WebSocket 通信)
  55 +- **浏览器**: Chrome(通过 CDP 远程调试协议控制)
  56 +- **代码规范**: ruff(lint + format)
  57 +- **数据提取**: `window.__INITIAL_STATE__`(与 Go 源码一致)
  58 +
  59 +## 开发命令
  60 +
  61 +```bash
  62 +uv sync # 安装依赖
  63 +uv run ruff check . # Lint 检查
  64 +uv run ruff format . # 代码格式化
  65 +uv run pytest # 运行测试
  66 +```
  67 +
  68 +## 架构设计
  69 +
  70 +### 双层结构
  71 +
  72 +1. **scripts/ — Python CDP 引擎**
  73 + - 基于 xiaohongshu-mcp Go 源码从零重写
  74 + - `xhs/` 包:模块化的核心自动化库
  75 + - `cli.py`:统一 CLI 入口,13 个子命令对应 MCP 工具
  76 + - JSON 结构化输出,便于 agent 解析
  77 + - 多账号支持,独立 Chrome Profile 隔离
  78 + - 反检测保护(stealth flags + JS 注入)
  79 +
  80 +2. **skills/ — Claude Code Skills 定义**
  81 + - SKILL.md 格式,指导 Claude 如何调用 scripts/
  82 + - 包含输入判断、约束规则、工作流程、失败处理
  83 +
  84 +### 调用方式
  85 +
  86 +```bash
  87 +# 统一 CLI 入口
  88 +python scripts/cli.py check-login
  89 +python scripts/cli.py search-feeds --keyword "关键词"
  90 +python scripts/cli.py publish --title-file t.txt --content-file c.txt --images pic.jpg
  91 +
  92 +# 发布流水线(含图片下载和登录检查)
  93 +python scripts/publish_pipeline.py --title-file t.txt --content-file c.txt --images URL1
  94 +```
  95 +
  96 +## 代码规范
  97 +
  98 +### Python 风格
  99 +- 遵循 PEP 8,使用 ruff 强制执行
  100 +- 完整的 type hints(PEP 484),使用 `str | None` 语法
  101 +- 公共函数和类必须有 docstring
  102 +- 行长度上限 100 字符
  103 +- 使用 `from __future__ import annotations` 启用延迟注解
  104 +
  105 +### 命名约定
  106 +- 文件名:snake_case
  107 +- 类名:PascalCase
  108 +- 函数/变量:snake_case
  109 +- 常量:UPPER_SNAKE_CASE
  110 +
  111 +### 错误处理
  112 +- 自定义异常类继承自 `XHSError` 基类(`xhs/errors.py`
  113 +- CLI 命令使用结构化 exit code:0=成功,1=未登录,2=错误
  114 +- 所有用户可见的错误信息使用中文
  115 +
  116 +### 安全约束
  117 +- 发布类操作必须有用户确认机制
  118 +- 文件路径必须使用绝对路径
  119 +- 不在命令行参数中内联敏感内容(使用文件传递)
  120 +- Chrome Profile 目录隔离账号 cookies
  121 +
  122 +## 参考资源
  123 +
  124 +- **xiaohongshu-mcp Go 源码**: /Users/zy/src/zy/xiaohongshu-mcp/
  125 +
  126 +## MCP 工具对照表
  127 +
  128 +scripts/cli.py 的 13 个子命令对应 xiaohongshu-mcp 的 MCP 工具:
  129 +
  130 +| CLI 子命令 | MCP 工具 | 分类 |
  131 +|--|--|--|
  132 +| `check-login` | check_login_status | 认证 |
  133 +| `login` | get_login_qrcode | 认证 |
  134 +| `delete-cookies` | delete_cookies | 认证 |
  135 +| `list-feeds` | list_feeds | 浏览 |
  136 +| `search-feeds` | search_feeds | 浏览 |
  137 +| `get-feed-detail` | get_feed_detail | 浏览 |
  138 +| `user-profile` | user_profile | 浏览 |
  139 +| `post-comment` | post_comment_to_feed | 互动 |
  140 +| `reply-comment` | reply_comment_in_feed | 互动 |
  141 +| `like-feed` | like_feed | 互动 |
  142 +| `favorite-feed` | favorite_feed | 互动 |
  143 +| `publish` | publish_content | 发布 |
  144 +| `publish-video` | publish_with_video | 发布 |
  1 +# 小红书 Skills 开发任务
  2 +
  3 +## 目标
  4 +
  5 +基于 xiaohongshu-mcp Go 源码,从零重写 Python CDP 引擎,为 OpenClaw 生态构建完整的小红书自动化 Skills。
  6 +
  7 +## 参考资料
  8 +
  9 +- **xiaohongshu-mcp Go 源码**: `/Users/zy/src/zy/xiaohongshu-mcp/` — 10k stars,13 个 MCP 工具
  10 +- **xiaohongshu-mcp 数据结构**: `/Users/zy/src/zy/xiaohongshu-mcp/xiaohongshu/types.go`
  11 +- **xiaohongshu-mcp 工具定义**: `/Users/zy/src/zy/xiaohongshu-mcp/mcp_server.go`
  12 +
  13 +## 架构
  14 +
  15 +### 模块结构
  16 +
  17 +```
  18 +scripts/
  19 +├── xhs/ # 核心 XHS 自动化包
  20 +│ ├── cdp.py # CDP WebSocket 客户端
  21 +│ ├── stealth.py # 反检测 JS 注入 + Chrome 启动参数
  22 +│ ├── cookies.py # Cookie 文件持久化
  23 +│ ├── types.py # 数据类型(dataclass)
  24 +│ ├── errors.py # 异常体系
  25 +│ ├── selectors.py # CSS 选择器常量
  26 +│ ├── urls.py # URL 常量
  27 +│ ├── human.py # 人类行为模拟
  28 +│ ├── login.py # 登录
  29 +│ ├── feeds.py # 首页 Feed
  30 +│ ├── search.py # 搜索 + 筛选
  31 +│ ├── feed_detail.py # 笔记详情 + 评论加载
  32 +│ ├── user_profile.py # 用户主页
  33 +│ ├── comment.py # 评论、回复
  34 +│ ├── like_favorite.py # 点赞、收藏
  35 +│ ├── publish.py # 图文发布
  36 +│ └── publish_video.py # 视频发布
  37 +├── cli.py # 统一 CLI 入口(13 个子命令)
  38 +├── chrome_launcher.py # Chrome 进程管理
  39 +├── account_manager.py # 多账号管理
  40 +├── image_downloader.py # 媒体下载(SHA256 缓存)
  41 +├── title_utils.py # UTF-16 标题长度计算
  42 +├── run_lock.py # 单实例锁
  43 +└── publish_pipeline.py # 发布编排器
  44 +```
  45 +
  46 +### CLI 接口(对应 Go 的 13 个 MCP 工具)
  47 +
  48 +```bash
  49 +python scripts/cli.py check-login
  50 +python scripts/cli.py login
  51 +python scripts/cli.py delete-cookies
  52 +python scripts/cli.py list-feeds
  53 +python scripts/cli.py search-feeds --keyword "关键词" [--sort-by --note-type ...]
  54 +python scripts/cli.py get-feed-detail --feed-id ID --xsec-token TOKEN [--load-all-comments]
  55 +python scripts/cli.py user-profile --user-id ID --xsec-token TOKEN
  56 +python scripts/cli.py post-comment --feed-id ID --xsec-token TOKEN --content "内容"
  57 +python scripts/cli.py reply-comment --feed-id ID --xsec-token TOKEN --content "内容" [--comment-id | --user-id]
  58 +python scripts/cli.py like-feed --feed-id ID --xsec-token TOKEN [--unlike]
  59 +python scripts/cli.py favorite-feed --feed-id ID --xsec-token TOKEN [--unfavorite]
  60 +python scripts/cli.py publish --title-file T --content-file C --images P1 P2 [--tags --schedule-at --visibility]
  61 +python scripts/cli.py publish-video --title-file T --content-file C --video P [--tags --schedule-at]
  62 +```
  63 +
  64 +全局选项:`--host`, `--port`, `--account`
  65 +输出:JSON(`ensure_ascii=False`
  66 +退出码:0=成功,1=未登录,2=错误
  67 +
  68 +## 代码规范要求
  69 +
  70 +- Python 代码必须通过 `ruff check` 和 `ruff format`
  71 +- 完整的 type hints(PEP 484),使用 `str | None` 而非 `Optional[str]`
  72 +- 公共函数和类必须有 docstring
  73 +- 行长度上限 100 字符
  74 +- 使用 `from __future__ import annotations` 启用延迟注解
  75 +- 异常类统一继承自 `XHSError`
  76 +- CLI 使用 argparse,exit code: 0=成功,1=未登录,2=错误
  77 +- JSON 输出使用 `ensure_ascii=False` 保留中文
  78 +
  79 +## 完成标志
  80 +
  81 +当以下条件全部满足时,输出完成标志:
  82 +1. `xhs/` 包 17 个模块已全部创建
  83 +2. `cli.py` 13 个子命令已实现
  84 +3. 5 个支撑脚本已重写
  85 +4. 5 个 `skills/*/SKILL.md` 已更新
  86 +5. 根目录 `SKILL.md`、`CLAUDE.md`、`README.md` 已更新
  87 +6. `uv run ruff check .` 无错误
  88 +7. `uv run ruff format --check .` 无差异
  89 +
  90 +<promise>ALL SKILLS COMPLETE</promise>
1 # xiaohongshu-skills 1 # xiaohongshu-skills
2 -xiaohongshu-skills 2 +
  3 +小红书自动化 Claude Code Skills,基于 Python CDP 浏览器自动化引擎。
  4 +
  5 +为 OpenClaw 生态提供小红书操作能力,同时兼容 Claude Code Skills 格式。
  6 +
  7 +## 功能概览
  8 +
  9 +| 技能 | 说明 | 核心命令 |
  10 +|------|------|----------|
  11 +| **xhs-auth** | 认证管理 | `check-login`, `login`, `delete-cookies` |
  12 +| **xhs-publish** | 内容发布 | `publish`, `publish-video` |
  13 +| **xhs-explore** | 内容发现 | `list-feeds`, `search-feeds`, `get-feed-detail`, `user-profile` |
  14 +| **xhs-interact** | 社交互动 | `post-comment`, `reply-comment`, `like-feed`, `favorite-feed` |
  15 +| **xhs-content-ops** | 复合运营 | 竞品分析、热点追踪、内容创作、互动管理 |
  16 +
  17 +## 安装
  18 +
  19 +```bash
  20 +# 克隆项目
  21 +git clone https://github.com/autoclaw-cc/xiaohongshu-skills.git
  22 +cd xiaohongshu-skills
  23 +
  24 +# 安装依赖(需要 uv)
  25 +uv sync
  26 +```
  27 +
  28 +### 前置条件
  29 +
  30 +- Python >= 3.11
  31 +- [uv](https://docs.astral.sh/uv/) 包管理器
  32 +- Google Chrome 浏览器
  33 +
  34 +## 快速开始
  35 +
  36 +### 1. 启动 Chrome
  37 +
  38 +```bash
  39 +# 有窗口模式(推荐首次登录)
  40 +python scripts/chrome_launcher.py
  41 +
  42 +# 无头模式
  43 +python scripts/chrome_launcher.py --headless
  44 +```
  45 +
  46 +### 2. 登录小红书
  47 +
  48 +```bash
  49 +# 检查登录状态
  50 +python scripts/cli.py check-login
  51 +
  52 +# 登录(扫码)
  53 +python scripts/cli.py login
  54 +```
  55 +
  56 +### 3. 搜索笔记
  57 +
  58 +```bash
  59 +python scripts/cli.py search-feeds --keyword "关键词"
  60 +
  61 +# 带筛选
  62 +python scripts/cli.py search-feeds \
  63 + --keyword "关键词" --sort-by 最新 --note-type 图文
  64 +```
  65 +
  66 +### 4. 查看笔记详情
  67 +
  68 +```bash
  69 +python scripts/cli.py get-feed-detail \
  70 + --feed-id FEED_ID --xsec-token XSEC_TOKEN
  71 +```
  72 +
  73 +### 5. 发布内容
  74 +
  75 +```bash
  76 +# 图文发布
  77 +python scripts/cli.py publish \
  78 + --title-file title.txt \
  79 + --content-file content.txt \
  80 + --images "/abs/path/pic1.jpg" "/abs/path/pic2.jpg"
  81 +
  82 +# 视频发布
  83 +python scripts/cli.py publish-video \
  84 + --title-file title.txt \
  85 + --content-file content.txt \
  86 + --video "/abs/path/video.mp4"
  87 +```
  88 +
  89 +### 6. 社交互动
  90 +
  91 +```bash
  92 +# 发表评论
  93 +python scripts/cli.py post-comment \
  94 + --feed-id FEED_ID \
  95 + --xsec-token XSEC_TOKEN \
  96 + --content "评论内容"
  97 +
  98 +# 点赞
  99 +python scripts/cli.py like-feed \
  100 + --feed-id FEED_ID --xsec-token XSEC_TOKEN
  101 +
  102 +# 收藏
  103 +python scripts/cli.py favorite-feed \
  104 + --feed-id FEED_ID --xsec-token XSEC_TOKEN
  105 +```
  106 +
  107 +## CLI 命令参考
  108 +
  109 +所有命令通过 `scripts/cli.py` 统一入口调用,输出 JSON 格式。
  110 +
  111 +全局选项:
  112 +- `--host HOST` — Chrome 调试主机(默认 127.0.0.1)
  113 +- `--port PORT` — Chrome 调试端口(默认 9222)
  114 +- `--account NAME` — 指定账号
  115 +
  116 +| 子命令 | 说明 |
  117 +|--------|------|
  118 +| `check-login` | 检查登录状态 |
  119 +| `login` | 获取登录二维码,等待扫码 |
  120 +| `delete-cookies` | 清除 cookies |
  121 +| `list-feeds` | 获取首页推荐 Feed |
  122 +| `search-feeds` | 关键词搜索笔记 |
  123 +| `get-feed-detail` | 获取笔记详情和评论 |
  124 +| `user-profile` | 获取用户主页信息 |
  125 +| `post-comment` | 对笔记发表评论 |
  126 +| `reply-comment` | 回复指定评论 |
  127 +| `like-feed` | 点赞 / 取消点赞 |
  128 +| `favorite-feed` | 收藏 / 取消收藏 |
  129 +| `publish` | 发布图文内容 |
  130 +| `publish-video` | 发布视频内容 |
  131 +
  132 +退出码:0=成功,1=未登录,2=错误
  133 +
  134 +## 项目结构
  135 +
  136 +```
  137 +xiaohongshu-skills/
  138 +├── scripts/ # Python CDP 自动化引擎
  139 +│ ├── xhs/ # 核心自动化包(模块化)
  140 +│ │ ├── cdp.py # CDP WebSocket 客户端
  141 +│ │ ├── stealth.py # 反检测保护
  142 +│ │ ├── cookies.py # Cookie 持久化
  143 +│ │ ├── types.py # 数据类型
  144 +│ │ ├── errors.py # 异常体系
  145 +│ │ ├── selectors.py # CSS 选择器
  146 +│ │ ├── urls.py # URL 常量
  147 +│ │ ├── human.py # 人类行为模拟
  148 +│ │ ├── login.py # 登录
  149 +│ │ ├── feeds.py # 首页 Feed
  150 +│ │ ├── search.py # 搜索
  151 +│ │ ├── feed_detail.py # 笔记详情
  152 +│ │ ├── user_profile.py # 用户主页
  153 +│ │ ├── comment.py # 评论
  154 +│ │ ├── like_favorite.py # 点赞/收藏
  155 +│ │ ├── publish.py # 图文发布
  156 +│ │ └── publish_video.py # 视频发布
  157 +│ ├── cli.py # 统一 CLI(13 个子命令)
  158 +│ ├── chrome_launcher.py # Chrome 进程管理
  159 +│ ├── account_manager.py # 多账号管理
  160 +│ ├── image_downloader.py # 媒体下载
  161 +│ ├── title_utils.py # 标题长度计算
  162 +│ ├── run_lock.py # 单实例锁
  163 +│ └── publish_pipeline.py # 发布编排器
  164 +├── skills/ # Claude Code Skills 定义
  165 +│ ├── xhs-auth/SKILL.md # 认证管理
  166 +│ ├── xhs-publish/SKILL.md # 内容发布
  167 +│ ├── xhs-explore/SKILL.md # 内容发现
  168 +│ ├── xhs-interact/SKILL.md # 社交互动
  169 +│ └── xhs-content-ops/SKILL.md # 复合运营
  170 +├── SKILL.md # 统一入口
  171 +├── CLAUDE.md # 项目开发指南
  172 +├── pyproject.toml # uv 项目配置
  173 +└── README.md
  174 +```
  175 +
  176 +## 技术架构
  177 +
  178 +### 双层结构
  179 +
  180 +1. **scripts/ — Python CDP 引擎**
  181 + - 基于 xiaohongshu-mcp Go 源码从零重写
  182 + - 通过 Chrome DevTools Protocol (CDP) 直接控制浏览器
  183 + - 数据提取使用 `window.__INITIAL_STATE__` 模式
  184 + - 内置反检测保护(stealth flags + JS 注入)
  185 + - JSON 结构化输出
  186 +
  187 +2. **skills/ — Claude Code Skills 定义**
  188 + - SKILL.md 格式,指导 AI agent 如何调用 scripts/
  189 + - 包含输入判断、约束规则、工作流程、失败处理
  190 +
  191 +## 开发
  192 +
  193 +```bash
  194 +uv sync # 安装依赖
  195 +uv run ruff check . # Lint 检查
  196 +uv run ruff format . # 代码格式化
  197 +uv run pytest # 运行测试
  198 +```
  1 +---
  2 +name: xiaohongshu-skills
  3 +description: |
  4 + 小红书自动化技能集合。支持认证登录、内容发布、搜索发现、社交互动、复合运营。
  5 + 当用户要求操作小红书(发布、搜索、评论、登录、分析、点赞、收藏)时触发。
  6 +---
  7 +
  8 +# 小红书自动化 Skills
  9 +
  10 +你是"小红书自动化助手"。根据用户意图路由到对应的子技能完成任务。
  11 +
  12 +## 输入判断
  13 +
  14 +按优先级判断用户意图,路由到对应子技能:
  15 +
  16 +1. **认证相关**("登录 / 检查登录 / 切换账号")→ 执行 `xhs-auth` 技能。
  17 +2. **内容发布**("发布 / 发帖 / 上传图文 / 上传视频")→ 执行 `xhs-publish` 技能。
  18 +3. **搜索发现**("搜索笔记 / 查看详情 / 浏览首页 / 查看用户")→ 执行 `xhs-explore` 技能。
  19 +4. **社交互动**("评论 / 回复 / 点赞 / 收藏")→ 执行 `xhs-interact` 技能。
  20 +5. **复合运营**("竞品分析 / 热点追踪 / 批量互动 / 一键创作")→ 执行 `xhs-content-ops` 技能。
  21 +
  22 +## 全局约束
  23 +
  24 +- 所有操作前应确认登录状态(通过 `check-login`)。
  25 +- 发布和评论操作必须经过用户确认后才能执行。
  26 +- 文件路径必须使用绝对路径。
  27 +- CLI 输出为 JSON 格式,结构化呈现给用户。
  28 +- 操作频率不宜过高,保持合理间隔。
  29 +
  30 +## 子技能概览
  31 +
  32 +### xhs-auth — 认证管理
  33 +
  34 +管理小红书登录状态和多账号切换。
  35 +
  36 +| 命令 | 功能 |
  37 +|------|------|
  38 +| `cli.py check-login` | 检查登录状态 |
  39 +| `cli.py login` | 获取登录二维码,等待扫码 |
  40 +| `cli.py delete-cookies` | 清除 cookies(退出/切换账号) |
  41 +
  42 +### xhs-publish — 内容发布
  43 +
  44 +发布图文或视频内容到小红书。
  45 +
  46 +| 命令 | 功能 |
  47 +|------|------|
  48 +| `cli.py publish` | 图文发布(本地图片或 URL) |
  49 +| `cli.py publish-video` | 视频发布 |
  50 +| `publish_pipeline.py` | 发布流水线(含图片下载和登录检查) |
  51 +
  52 +### xhs-explore — 内容发现
  53 +
  54 +搜索笔记、查看详情、获取用户资料。
  55 +
  56 +| 命令 | 功能 |
  57 +|------|------|
  58 +| `cli.py list-feeds` | 获取首页推荐 Feed |
  59 +| `cli.py search-feeds` | 关键词搜索笔记 |
  60 +| `cli.py get-feed-detail` | 获取笔记完整内容和评论 |
  61 +| `cli.py user-profile` | 获取用户主页信息 |
  62 +
  63 +### xhs-interact — 社交互动
  64 +
  65 +发表评论、回复、点赞、收藏。
  66 +
  67 +| 命令 | 功能 |
  68 +|------|------|
  69 +| `cli.py post-comment` | 对笔记发表评论 |
  70 +| `cli.py reply-comment` | 回复指定评论 |
  71 +| `cli.py like-feed` | 点赞 / 取消点赞 |
  72 +| `cli.py favorite-feed` | 收藏 / 取消收藏 |
  73 +
  74 +### xhs-content-ops — 复合运营
  75 +
  76 +组合多步骤完成运营工作流:竞品分析、热点追踪、内容创作、互动管理。
  77 +
  78 +## 快速开始
  79 +
  80 +```bash
  81 +# 1. 启动 Chrome
  82 +python scripts/chrome_launcher.py
  83 +
  84 +# 2. 检查登录状态
  85 +python scripts/cli.py check-login
  86 +
  87 +# 3. 登录(如需要)
  88 +python scripts/cli.py login
  89 +
  90 +# 4. 搜索笔记
  91 +python scripts/cli.py search-feeds --keyword "关键词"
  92 +
  93 +# 5. 查看笔记详情
  94 +python scripts/cli.py get-feed-detail \
  95 + --feed-id FEED_ID --xsec-token XSEC_TOKEN
  96 +
  97 +# 6. 发布图文
  98 +python scripts/cli.py publish \
  99 + --title-file title.txt \
  100 + --content-file content.txt \
  101 + --images "/abs/path/pic1.jpg"
  102 +
  103 +# 7. 发表评论
  104 +python scripts/cli.py post-comment \
  105 + --feed-id FEED_ID \
  106 + --xsec-token XSEC_TOKEN \
  107 + --content "评论内容"
  108 +
  109 +# 8. 点赞
  110 +python scripts/cli.py like-feed \
  111 + --feed-id FEED_ID --xsec-token XSEC_TOKEN
  112 +```
  113 +
  114 +## 失败处理
  115 +
  116 +- **未登录**:提示用户执行登录流程(xhs-auth)。
  117 +- **Chrome 未启动**:使用 `chrome_launcher.py` 启动浏览器。
  118 +- **操作超时**:检查网络连接,适当增加等待时间。
  119 +- **频率限制**:降低操作频率,增大间隔。
  1 +[project]
  2 +name = "xiaohongshu-skills"
  3 +version = "0.1.0"
  4 +description = "小红书自动化 Skills,基于 CDP 浏览器自动化"
  5 +readme = "README.md"
  6 +license = { text = "MIT" }
  7 +requires-python = ">=3.11"
  8 +dependencies = [
  9 + "requests>=2.28.0",
  10 + "websockets>=12.0",
  11 +]
  12 +
  13 +[project.optional-dependencies]
  14 +dev = [
  15 + "ruff>=0.9.0",
  16 + "pytest>=8.0",
  17 +]
  18 +
  19 +[tool.ruff]
  20 +target-version = "py311"
  21 +line-length = 100
  22 +
  23 +[tool.ruff.lint]
  24 +select = [
  25 + "E", # pycodestyle errors
  26 + "W", # pycodestyle warnings
  27 + "F", # pyflakes
  28 + "I", # isort
  29 + "N", # pep8-naming
  30 + "UP", # pyupgrade
  31 + "B", # flake8-bugbear
  32 + "SIM", # flake8-simplify
  33 + "RUF", # ruff-specific rules
  34 +]
  35 +ignore = [
  36 + "E402", # module-level imports not at top (needed for sys.path manipulation)
  37 + "RUF001", # ambiguous unicode characters (Chinese punctuation is intentional)
  38 + "RUF002", # ambiguous unicode in docstrings (Chinese punctuation is intentional)
  39 + "RUF003", # ambiguous unicode in comments (Chinese punctuation is intentional)
  40 +]
  41 +
  42 +[tool.ruff.lint.per-file-ignores]
  43 +
  44 +[tool.ruff.lint.isort]
  45 +known-first-party = ["xiaohongshu_skills"]
  46 +
  47 +[tool.pytest.ini_options]
  48 +testpaths = ["tests"]
  1 +"""多账号管理,对应独立的账号配置管理。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import os
  8 +from pathlib import Path
  9 +
  10 +logger = logging.getLogger(__name__)
  11 +
  12 +# 账号配置文件路径
  13 +_CONFIG_DIR = Path.home() / ".xhs"
  14 +_ACCOUNTS_FILE = _CONFIG_DIR / "accounts.json"
  15 +
  16 +
  17 +def _load_config() -> dict:
  18 + """加载账号配置。"""
  19 + if not _ACCOUNTS_FILE.exists():
  20 + return {"default": "", "accounts": {}}
  21 + with open(_ACCOUNTS_FILE, encoding="utf-8") as f:
  22 + return json.load(f)
  23 +
  24 +
  25 +def _save_config(config: dict) -> None:
  26 + """保存账号配置。"""
  27 + _CONFIG_DIR.mkdir(parents=True, exist_ok=True)
  28 + with open(_ACCOUNTS_FILE, "w", encoding="utf-8") as f:
  29 + json.dump(config, f, ensure_ascii=False, indent=2)
  30 +
  31 +
  32 +def list_accounts() -> list[dict]:
  33 + """列出所有账号。"""
  34 + config = _load_config()
  35 + default = config.get("default", "")
  36 + accounts = config.get("accounts", {})
  37 + result = []
  38 + for name, info in accounts.items():
  39 + result.append(
  40 + {
  41 + "name": name,
  42 + "description": info.get("description", ""),
  43 + "is_default": name == default,
  44 + "profile_dir": _get_profile_dir(name),
  45 + }
  46 + )
  47 + return result
  48 +
  49 +
  50 +def add_account(name: str, description: str = "") -> None:
  51 + """添加账号。"""
  52 + config = _load_config()
  53 + accounts = config.setdefault("accounts", {})
  54 + if name in accounts:
  55 + raise ValueError(f"账号 '{name}' 已存在")
  56 +
  57 + accounts[name] = {"description": description}
  58 +
  59 + # 如果是第一个账号,设为默认
  60 + if not config.get("default"):
  61 + config["default"] = name
  62 +
  63 + _save_config(config)
  64 +
  65 + # 创建 Profile 目录
  66 + profile_dir = _get_profile_dir(name)
  67 + os.makedirs(profile_dir, exist_ok=True)
  68 +
  69 + logger.info("添加账号: %s", name)
  70 +
  71 +
  72 +def remove_account(name: str) -> None:
  73 + """删除账号。"""
  74 + config = _load_config()
  75 + accounts = config.get("accounts", {})
  76 + if name not in accounts:
  77 + raise ValueError(f"账号 '{name}' 不存在")
  78 +
  79 + del accounts[name]
  80 +
  81 + # 如果删除的是默认账号,清除默认
  82 + if config.get("default") == name:
  83 + config["default"] = next(iter(accounts), "")
  84 +
  85 + _save_config(config)
  86 + logger.info("删除账号: %s", name)
  87 +
  88 +
  89 +def set_default_account(name: str) -> None:
  90 + """设置默认账号。"""
  91 + config = _load_config()
  92 + accounts = config.get("accounts", {})
  93 + if name not in accounts:
  94 + raise ValueError(f"账号 '{name}' 不存在")
  95 +
  96 + config["default"] = name
  97 + _save_config(config)
  98 + logger.info("默认账号设置为: %s", name)
  99 +
  100 +
  101 +def get_default_account() -> str:
  102 + """获取默认账号名称。"""
  103 + config = _load_config()
  104 + return config.get("default", "")
  105 +
  106 +
  107 +def _get_profile_dir(account: str) -> str:
  108 + """获取账号的 Chrome Profile 目录。"""
  109 + return str(_CONFIG_DIR / "accounts" / account / "chrome-profile")
  1 +"""Chrome 进程管理(跨平台),对应 Go browser/browser.go 的进程管理部分。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import logging
  6 +import os
  7 +import platform
  8 +import shutil
  9 +import signal
  10 +import subprocess
  11 +import time
  12 +
  13 +from xhs.stealth import STEALTH_ARGS
  14 +
  15 +logger = logging.getLogger(__name__)
  16 +
  17 +# 默认远程调试端口
  18 +DEFAULT_PORT = 9222
  19 +
  20 +# 各平台 Chrome 默认路径
  21 +_CHROME_PATHS: dict[str, list[str]] = {
  22 + "Darwin": [
  23 + "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
  24 + "/Applications/Chromium.app/Contents/MacOS/Chromium",
  25 + ],
  26 + "Linux": [
  27 + "/usr/bin/google-chrome",
  28 + "/usr/bin/google-chrome-stable",
  29 + "/usr/bin/chromium",
  30 + "/usr/bin/chromium-browser",
  31 + "/snap/bin/chromium",
  32 + ],
  33 + "Windows": [
  34 + r"C:\Program Files\Google\Chrome\Application\chrome.exe",
  35 + r"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe",
  36 + ],
  37 +}
  38 +
  39 +
  40 +def find_chrome() -> str | None:
  41 + """查找 Chrome 可执行文件路径。"""
  42 + # 环境变量优先
  43 + env_path = os.getenv("CHROME_BIN")
  44 + if env_path and os.path.isfile(env_path):
  45 + return env_path
  46 +
  47 + # which/where 查找
  48 + chrome = shutil.which("google-chrome") or shutil.which("chromium")
  49 + if chrome:
  50 + return chrome
  51 +
  52 + # 平台默认路径
  53 + system = platform.system()
  54 + for path in _CHROME_PATHS.get(system, []):
  55 + if os.path.isfile(path):
  56 + return path
  57 +
  58 + return None
  59 +
  60 +
  61 +def launch_chrome(
  62 + port: int = DEFAULT_PORT,
  63 + headless: bool = False,
  64 + user_data_dir: str | None = None,
  65 + chrome_bin: str | None = None,
  66 +) -> subprocess.Popen:
  67 + """启动 Chrome 进程(带远程调试端口)。
  68 +
  69 + Args:
  70 + port: 远程调试端口。
  71 + headless: 是否无头模式。
  72 + user_data_dir: 用户数据目录(Profile 隔离)。
  73 + chrome_bin: Chrome 可执行文件路径。
  74 +
  75 + Returns:
  76 + Chrome 子进程。
  77 +
  78 + Raises:
  79 + FileNotFoundError: 未找到 Chrome。
  80 + """
  81 + if not chrome_bin:
  82 + chrome_bin = find_chrome()
  83 + if not chrome_bin:
  84 + raise FileNotFoundError("未找到 Chrome,请设置 CHROME_BIN 环境变量或安装 Chrome")
  85 +
  86 + args = [
  87 + chrome_bin,
  88 + f"--remote-debugging-port={port}",
  89 + *STEALTH_ARGS,
  90 + ]
  91 +
  92 + if headless:
  93 + args.append("--headless=new")
  94 +
  95 + if user_data_dir:
  96 + args.append(f"--user-data-dir={user_data_dir}")
  97 +
  98 + # 代理
  99 + proxy = os.getenv("XHS_PROXY")
  100 + if proxy:
  101 + args.append(f"--proxy-server={proxy}")
  102 + logger.info("使用代理: %s", _mask_proxy(proxy))
  103 +
  104 + logger.info("启动 Chrome: port=%d, headless=%s", port, headless)
  105 + process = subprocess.Popen(
  106 + args,
  107 + stdout=subprocess.DEVNULL,
  108 + stderr=subprocess.DEVNULL,
  109 + )
  110 +
  111 + # 等待 Chrome 准备就绪
  112 + _wait_for_chrome(port)
  113 + return process
  114 +
  115 +
  116 +def close_chrome(process: subprocess.Popen) -> None:
  117 + """关闭 Chrome 进程。"""
  118 + if process.poll() is not None:
  119 + return
  120 +
  121 + try:
  122 + process.send_signal(signal.SIGTERM)
  123 + process.wait(timeout=5)
  124 + except (subprocess.TimeoutExpired, OSError):
  125 + process.kill()
  126 + process.wait(timeout=3)
  127 +
  128 + logger.info("Chrome 进程已关闭")
  129 +
  130 +
  131 +def is_chrome_running(port: int = DEFAULT_PORT) -> bool:
  132 + """检查指定端口的 Chrome 是否在运行。"""
  133 + import requests
  134 +
  135 + try:
  136 + resp = requests.get(f"http://127.0.0.1:{port}/json/version", timeout=2)
  137 + return resp.status_code == 200
  138 + except (requests.ConnectionError, requests.Timeout):
  139 + return False
  140 +
  141 +
  142 +def _wait_for_chrome(port: int, timeout: float = 15.0) -> None:
  143 + """等待 Chrome 调试端口就绪。"""
  144 + deadline = time.monotonic() + timeout
  145 + while time.monotonic() < deadline:
  146 + if is_chrome_running(port):
  147 + logger.info("Chrome 已就绪 (port=%d)", port)
  148 + return
  149 + time.sleep(0.5)
  150 + logger.warning("等待 Chrome 就绪超时 (port=%d)", port)
  151 +
  152 +
  153 +def _mask_proxy(proxy_url: str) -> str:
  154 + """隐藏代理 URL 中的敏感信息。"""
  155 + from urllib.parse import urlparse
  156 +
  157 + try:
  158 + parsed = urlparse(proxy_url)
  159 + if parsed.username:
  160 + return proxy_url.replace(parsed.username, "***").replace(parsed.password or "", "***")
  161 + except Exception:
  162 + pass
  163 + return proxy_url
  1 +"""统一 CLI 入口,对应 Go MCP 工具的 13 个子命令。
  2 +
  3 +全局选项: --host, --port, --account
  4 +输出: JSON(ensure_ascii=False)
  5 +退出码: 0=成功, 1=未登录, 2=错误
  6 +"""
  7 +
  8 +from __future__ import annotations
  9 +
  10 +import argparse
  11 +import json
  12 +import logging
  13 +import sys
  14 +
  15 +logging.basicConfig(
  16 + level=logging.INFO,
  17 + format="%(asctime)s %(levelname)s %(name)s: %(message)s",
  18 +)
  19 +logger = logging.getLogger("xhs-cli")
  20 +
  21 +
  22 +def _output(data: dict, exit_code: int = 0) -> None:
  23 + """输出 JSON 并退出。"""
  24 + print(json.dumps(data, ensure_ascii=False, indent=2))
  25 + sys.exit(exit_code)
  26 +
  27 +
  28 +def _connect(args: argparse.Namespace):
  29 + """连接到 Chrome 并返回 (browser, page)。"""
  30 + from xhs.cdp import Browser
  31 +
  32 + browser = Browser(host=args.host, port=args.port)
  33 + browser.connect()
  34 + page = browser.new_page()
  35 + return browser, page
  36 +
  37 +
  38 +# ========== 子命令实现 ==========
  39 +
  40 +
  41 +def cmd_check_login(args: argparse.Namespace) -> None:
  42 + """检查登录状态。"""
  43 + from xhs.login import check_login_status
  44 +
  45 + browser, page = _connect(args)
  46 + try:
  47 + logged_in = check_login_status(page)
  48 + _output({"logged_in": logged_in}, exit_code=0 if logged_in else 1)
  49 + finally:
  50 + browser.close_page(page)
  51 + browser.close()
  52 +
  53 +
  54 +def cmd_login(args: argparse.Namespace) -> None:
  55 + """获取登录二维码并等待扫码。"""
  56 + from xhs.login import fetch_qrcode, save_qrcode_to_file, wait_for_login
  57 +
  58 + browser, page = _connect(args)
  59 + try:
  60 + src, already = fetch_qrcode(page)
  61 + if already:
  62 + _output({"logged_in": True, "message": "已登录"})
  63 + else:
  64 + # 保存二维码到临时文件
  65 + qrcode_path = save_qrcode_to_file(src)
  66 + print(
  67 + json.dumps(
  68 + {
  69 + "qrcode_path": qrcode_path,
  70 + "message": "请扫码登录,二维码已保存到文件",
  71 + },
  72 + ensure_ascii=False,
  73 + )
  74 + )
  75 + success = wait_for_login(page, timeout=120)
  76 + _output(
  77 + {"logged_in": success, "message": "登录成功" if success else "登录超时"},
  78 + exit_code=0 if success else 2,
  79 + )
  80 + finally:
  81 + browser.close_page(page)
  82 + browser.close()
  83 +
  84 +
  85 +def cmd_delete_cookies(args: argparse.Namespace) -> None:
  86 + """删除 cookies。"""
  87 + from xhs.cookies import delete_cookies, get_cookies_file_path
  88 +
  89 + path = get_cookies_file_path(args.account)
  90 + delete_cookies(path)
  91 + _output({"success": True, "message": f"已删除 cookies: {path}"})
  92 +
  93 +
  94 +def cmd_list_feeds(args: argparse.Namespace) -> None:
  95 + """获取首页 Feed 列表。"""
  96 + from xhs.feeds import list_feeds
  97 +
  98 + browser, page = _connect(args)
  99 + try:
  100 + feeds = list_feeds(page)
  101 + _output({"feeds": [f.to_dict() for f in feeds], "count": len(feeds)})
  102 + finally:
  103 + browser.close_page(page)
  104 + browser.close()
  105 +
  106 +
  107 +def cmd_search_feeds(args: argparse.Namespace) -> None:
  108 + """搜索 Feeds。"""
  109 + from xhs.search import search_feeds
  110 + from xhs.types import FilterOption
  111 +
  112 + filter_opt = FilterOption(
  113 + sort_by=args.sort_by or "",
  114 + note_type=args.note_type or "",
  115 + publish_time=args.publish_time or "",
  116 + search_scope=args.search_scope or "",
  117 + location=args.location or "",
  118 + )
  119 +
  120 + browser, page = _connect(args)
  121 + try:
  122 + feeds = search_feeds(page, args.keyword, filter_opt)
  123 + _output({"feeds": [f.to_dict() for f in feeds], "count": len(feeds)})
  124 + finally:
  125 + browser.close_page(page)
  126 + browser.close()
  127 +
  128 +
  129 +def cmd_get_feed_detail(args: argparse.Namespace) -> None:
  130 + """获取 Feed 详情。"""
  131 + from xhs.feed_detail import get_feed_detail
  132 + from xhs.types import CommentLoadConfig
  133 +
  134 + config = CommentLoadConfig(
  135 + click_more_replies=args.click_more_replies,
  136 + max_replies_threshold=args.max_replies_threshold,
  137 + max_comment_items=args.max_comment_items,
  138 + scroll_speed=args.scroll_speed,
  139 + )
  140 +
  141 + browser, page = _connect(args)
  142 + try:
  143 + detail = get_feed_detail(
  144 + page,
  145 + args.feed_id,
  146 + args.xsec_token,
  147 + load_all_comments=args.load_all_comments,
  148 + config=config,
  149 + )
  150 + _output(detail.to_dict())
  151 + finally:
  152 + browser.close_page(page)
  153 + browser.close()
  154 +
  155 +
  156 +def cmd_user_profile(args: argparse.Namespace) -> None:
  157 + """获取用户主页。"""
  158 + from xhs.user_profile import get_user_profile
  159 +
  160 + browser, page = _connect(args)
  161 + try:
  162 + profile = get_user_profile(page, args.user_id, args.xsec_token)
  163 + _output(profile.to_dict())
  164 + finally:
  165 + browser.close_page(page)
  166 + browser.close()
  167 +
  168 +
  169 +def cmd_post_comment(args: argparse.Namespace) -> None:
  170 + """发表评论。"""
  171 + from xhs.comment import post_comment
  172 +
  173 + browser, page = _connect(args)
  174 + try:
  175 + post_comment(page, args.feed_id, args.xsec_token, args.content)
  176 + _output({"success": True, "message": "评论发送成功"})
  177 + finally:
  178 + browser.close_page(page)
  179 + browser.close()
  180 +
  181 +
  182 +def cmd_reply_comment(args: argparse.Namespace) -> None:
  183 + """回复评论。"""
  184 + from xhs.comment import reply_comment
  185 +
  186 + browser, page = _connect(args)
  187 + try:
  188 + reply_comment(
  189 + page,
  190 + args.feed_id,
  191 + args.xsec_token,
  192 + args.content,
  193 + comment_id=args.comment_id or "",
  194 + user_id=args.user_id or "",
  195 + )
  196 + _output({"success": True, "message": "回复成功"})
  197 + finally:
  198 + browser.close_page(page)
  199 + browser.close()
  200 +
  201 +
  202 +def cmd_like_feed(args: argparse.Namespace) -> None:
  203 + """点赞/取消点赞。"""
  204 + from xhs.like_favorite import like_feed, unlike_feed
  205 +
  206 + browser, page = _connect(args)
  207 + try:
  208 + if args.unlike:
  209 + result = unlike_feed(page, args.feed_id, args.xsec_token)
  210 + else:
  211 + result = like_feed(page, args.feed_id, args.xsec_token)
  212 + _output(result.to_dict())
  213 + finally:
  214 + browser.close_page(page)
  215 + browser.close()
  216 +
  217 +
  218 +def cmd_favorite_feed(args: argparse.Namespace) -> None:
  219 + """收藏/取消收藏。"""
  220 + from xhs.like_favorite import favorite_feed, unfavorite_feed
  221 +
  222 + browser, page = _connect(args)
  223 + try:
  224 + if args.unfavorite:
  225 + result = unfavorite_feed(page, args.feed_id, args.xsec_token)
  226 + else:
  227 + result = favorite_feed(page, args.feed_id, args.xsec_token)
  228 + _output(result.to_dict())
  229 + finally:
  230 + browser.close_page(page)
  231 + browser.close()
  232 +
  233 +
  234 +def cmd_publish(args: argparse.Namespace) -> None:
  235 + """发布图文内容。"""
  236 + from image_downloader import process_images
  237 + from xhs.publish import publish_image_content
  238 + from xhs.types import PublishImageContent
  239 +
  240 + # 读取标题和正文
  241 + with open(args.title_file, encoding="utf-8") as f:
  242 + title = f.read().strip()
  243 + with open(args.content_file, encoding="utf-8") as f:
  244 + content = f.read().strip()
  245 +
  246 + # 处理图片
  247 + image_paths = process_images(args.images) if args.images else []
  248 + if not image_paths:
  249 + _output({"success": False, "error": "没有有效的图片"}, exit_code=2)
  250 +
  251 + browser, page = _connect(args)
  252 + try:
  253 + publish_image_content(
  254 + page,
  255 + PublishImageContent(
  256 + title=title,
  257 + content=content,
  258 + tags=args.tags or [],
  259 + image_paths=image_paths,
  260 + schedule_time=args.schedule_at,
  261 + is_original=args.original,
  262 + visibility=args.visibility or "",
  263 + ),
  264 + )
  265 + _output({"success": True, "title": title, "images": len(image_paths), "status": "发布完成"})
  266 + finally:
  267 + browser.close_page(page)
  268 + browser.close()
  269 +
  270 +
  271 +def cmd_publish_video(args: argparse.Namespace) -> None:
  272 + """发布视频内容。"""
  273 + from xhs.publish_video import publish_video_content
  274 + from xhs.types import PublishVideoContent
  275 +
  276 + with open(args.title_file, encoding="utf-8") as f:
  277 + title = f.read().strip()
  278 + with open(args.content_file, encoding="utf-8") as f:
  279 + content = f.read().strip()
  280 +
  281 + browser, page = _connect(args)
  282 + try:
  283 + publish_video_content(
  284 + page,
  285 + PublishVideoContent(
  286 + title=title,
  287 + content=content,
  288 + tags=args.tags or [],
  289 + video_path=args.video,
  290 + schedule_time=args.schedule_at,
  291 + visibility=args.visibility or "",
  292 + ),
  293 + )
  294 + _output({"success": True, "title": title, "video": args.video, "status": "发布完成"})
  295 + finally:
  296 + browser.close_page(page)
  297 + browser.close()
  298 +
  299 +
  300 +# ========== 参数解析 ==========
  301 +
  302 +
  303 +def build_parser() -> argparse.ArgumentParser:
  304 + """构建 CLI 参数解析器。"""
  305 + parser = argparse.ArgumentParser(
  306 + prog="xhs-cli",
  307 + description="小红书自动化 CLI",
  308 + )
  309 +
  310 + # 全局选项
  311 + parser.add_argument("--host", default="127.0.0.1", help="Chrome 调试主机 (default: 127.0.0.1)")
  312 + parser.add_argument("--port", type=int, default=9222, help="Chrome 调试端口 (default: 9222)")
  313 + parser.add_argument("--account", default="", help="账号名称")
  314 +
  315 + subparsers = parser.add_subparsers(dest="command", required=True)
  316 +
  317 + # check-login
  318 + sub = subparsers.add_parser("check-login", help="检查登录状态")
  319 + sub.set_defaults(func=cmd_check_login)
  320 +
  321 + # login
  322 + sub = subparsers.add_parser("login", help="登录(扫码)")
  323 + sub.set_defaults(func=cmd_login)
  324 +
  325 + # delete-cookies
  326 + sub = subparsers.add_parser("delete-cookies", help="删除 cookies")
  327 + sub.set_defaults(func=cmd_delete_cookies)
  328 +
  329 + # list-feeds
  330 + sub = subparsers.add_parser("list-feeds", help="获取首页 Feed 列表")
  331 + sub.set_defaults(func=cmd_list_feeds)
  332 +
  333 + # search-feeds
  334 + sub = subparsers.add_parser("search-feeds", help="搜索 Feeds")
  335 + sub.add_argument("--keyword", required=True, help="搜索关键词")
  336 + sub.add_argument("--sort-by", help="排序: 综合|最新|最多点赞|最多评论|最多收藏")
  337 + sub.add_argument("--note-type", help="类型: 不限|视频|图文")
  338 + sub.add_argument("--publish-time", help="时间: 不限|一天内|一周内|半年内")
  339 + sub.add_argument("--search-scope", help="范围: 不限|已看过|未看过|已关注")
  340 + sub.add_argument("--location", help="位置: 不限|同城|附近")
  341 + sub.set_defaults(func=cmd_search_feeds)
  342 +
  343 + # get-feed-detail
  344 + sub = subparsers.add_parser("get-feed-detail", help="获取 Feed 详情")
  345 + sub.add_argument("--feed-id", required=True, help="Feed ID")
  346 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  347 + sub.add_argument("--load-all-comments", action="store_true", help="加载全部评论")
  348 + sub.add_argument("--click-more-replies", action="store_true", help="点击展开更多回复")
  349 + sub.add_argument("--max-replies-threshold", type=int, default=10, help="展开回复数阈值")
  350 + sub.add_argument("--max-comment-items", type=int, default=0, help="最大评论数 (0=不限)")
  351 + sub.add_argument("--scroll-speed", default="normal", help="滚动速度: slow|normal|fast")
  352 + sub.set_defaults(func=cmd_get_feed_detail)
  353 +
  354 + # user-profile
  355 + sub = subparsers.add_parser("user-profile", help="获取用户主页")
  356 + sub.add_argument("--user-id", required=True, help="用户 ID")
  357 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  358 + sub.set_defaults(func=cmd_user_profile)
  359 +
  360 + # post-comment
  361 + sub = subparsers.add_parser("post-comment", help="发表评论")
  362 + sub.add_argument("--feed-id", required=True, help="Feed ID")
  363 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  364 + sub.add_argument("--content", required=True, help="评论内容")
  365 + sub.set_defaults(func=cmd_post_comment)
  366 +
  367 + # reply-comment
  368 + sub = subparsers.add_parser("reply-comment", help="回复评论")
  369 + sub.add_argument("--feed-id", required=True, help="Feed ID")
  370 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  371 + sub.add_argument("--content", required=True, help="回复内容")
  372 + sub.add_argument("--comment-id", help="目标评论 ID")
  373 + sub.add_argument("--user-id", help="目标用户 ID")
  374 + sub.set_defaults(func=cmd_reply_comment)
  375 +
  376 + # like-feed
  377 + sub = subparsers.add_parser("like-feed", help="点赞")
  378 + sub.add_argument("--feed-id", required=True, help="Feed ID")
  379 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  380 + sub.add_argument("--unlike", action="store_true", help="取消点赞")
  381 + sub.set_defaults(func=cmd_like_feed)
  382 +
  383 + # favorite-feed
  384 + sub = subparsers.add_parser("favorite-feed", help="收藏")
  385 + sub.add_argument("--feed-id", required=True, help="Feed ID")
  386 + sub.add_argument("--xsec-token", required=True, help="xsec_token")
  387 + sub.add_argument("--unfavorite", action="store_true", help="取消收藏")
  388 + sub.set_defaults(func=cmd_favorite_feed)
  389 +
  390 + # publish
  391 + sub = subparsers.add_parser("publish", help="发布图文")
  392 + sub.add_argument("--title-file", required=True, help="标题文件路径")
  393 + sub.add_argument("--content-file", required=True, help="正文文件路径")
  394 + sub.add_argument("--images", nargs="+", required=True, help="图片路径/URL")
  395 + sub.add_argument("--tags", nargs="*", help="标签")
  396 + sub.add_argument("--schedule-at", help="定时发布 (ISO8601)")
  397 + sub.add_argument("--original", action="store_true", help="声明原创")
  398 + sub.add_argument("--visibility", help="可见范围")
  399 + sub.set_defaults(func=cmd_publish)
  400 +
  401 + # publish-video
  402 + sub = subparsers.add_parser("publish-video", help="发布视频")
  403 + sub.add_argument("--title-file", required=True, help="标题文件路径")
  404 + sub.add_argument("--content-file", required=True, help="正文文件路径")
  405 + sub.add_argument("--video", required=True, help="视频文件路径")
  406 + sub.add_argument("--tags", nargs="*", help="标签")
  407 + sub.add_argument("--schedule-at", help="定时发布 (ISO8601)")
  408 + sub.add_argument("--visibility", help="可见范围")
  409 + sub.set_defaults(func=cmd_publish_video)
  410 +
  411 + return parser
  412 +
  413 +
  414 +def main() -> None:
  415 + """CLI 入口。"""
  416 + parser = build_parser()
  417 + args = parser.parse_args()
  418 +
  419 + try:
  420 + args.func(args)
  421 + except Exception as e:
  422 + logger.error("执行失败: %s", e, exc_info=True)
  423 + _output({"success": False, "error": str(e)}, exit_code=2)
  424 +
  425 +
  426 +if __name__ == "__main__":
  427 + main()
  1 +"""媒体下载(SHA256 缓存),对应 Go pkg/downloader/images.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import hashlib
  6 +import logging
  7 +import os
  8 +import time
  9 +from urllib.parse import urlparse
  10 +
  11 +import requests
  12 +
  13 +logger = logging.getLogger(__name__)
  14 +
  15 +_USER_AGENT = (
  16 + "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
  17 + "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
  18 +)
  19 +
  20 +# 已知图片扩展名
  21 +_IMAGE_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".svg"}
  22 +
  23 +
  24 +def is_image_url(path: str) -> bool:
  25 + """判断字符串是否为图片/媒体 URL。"""
  26 + return path.lower().startswith(("http://", "https://"))
  27 +
  28 +
  29 +class ImageDownloader:
  30 + """图片下载器(带 SHA256 缓存)。"""
  31 +
  32 + def __init__(self, save_path: str) -> None:
  33 + self.save_path = save_path
  34 + os.makedirs(save_path, exist_ok=True)
  35 + self._session = requests.Session()
  36 + self._session.timeout = 30
  37 +
  38 + def download_image(self, image_url: str) -> str:
  39 + """下载单张图片,返回本地文件路径。
  40 +
  41 + 如果文件已存在(通过 URL hash 判断),直接返回路径。
  42 +
  43 + Raises:
  44 + ValueError: URL 格式无效。
  45 + RuntimeError: 下载失败。
  46 + """
  47 + if not is_image_url(image_url):
  48 + raise ValueError(f"无效的图片 URL: {image_url}")
  49 +
  50 + # 生成文件名
  51 + url_hash = hashlib.sha256(image_url.encode()).hexdigest()[:16]
  52 + ext = self._detect_extension(image_url)
  53 + filename = f"img_{url_hash}_{int(time.time())}{ext}"
  54 + filepath = os.path.join(self.save_path, filename)
  55 +
  56 + # 检查是否已有同 hash 的文件
  57 + existing = self._find_existing(url_hash)
  58 + if existing:
  59 + return existing
  60 +
  61 + # 下载
  62 + parsed = urlparse(image_url)
  63 + headers = {
  64 + "User-Agent": _USER_AGENT,
  65 + "Referer": f"{parsed.scheme}://{parsed.hostname}/",
  66 + }
  67 +
  68 + resp = self._session.get(image_url, headers=headers)
  69 + if resp.status_code != 200:
  70 + raise RuntimeError(f"下载失败 (status={resp.status_code}): {image_url}")
  71 +
  72 + # 保存
  73 + with open(filepath, "wb") as f:
  74 + f.write(resp.content)
  75 +
  76 + logger.info("下载完成: %s -> %s", image_url, filepath)
  77 + return filepath
  78 +
  79 + def download_images(self, image_urls: list[str]) -> list[str]:
  80 + """批量下载图片。"""
  81 + paths = []
  82 + for url in image_urls:
  83 + try:
  84 + path = self.download_image(url)
  85 + paths.append(path)
  86 + except Exception as e:
  87 + logger.error("下载失败 %s: %s", url, e)
  88 + return paths
  89 +
  90 + def _detect_extension(self, url: str) -> str:
  91 + """从 URL 推断文件扩展名。"""
  92 + parsed = urlparse(url)
  93 + path = parsed.path.lower()
  94 + for ext in _IMAGE_EXTENSIONS:
  95 + if path.endswith(ext):
  96 + return ext
  97 + return ".jpg" # 默认
  98 +
  99 + def _find_existing(self, url_hash: str) -> str | None:
  100 + """查找已有同 hash 的文件。"""
  101 + prefix = f"img_{url_hash}_"
  102 + for filename in os.listdir(self.save_path):
  103 + if filename.startswith(prefix):
  104 + return os.path.join(self.save_path, filename)
  105 + return None
  106 +
  107 +
  108 +def process_images(images: list[str], save_dir: str | None = None) -> list[str]:
  109 + """处理图片列表(URL 下载,本地路径直接返回)。"""
  110 + if not save_dir:
  111 + save_dir = os.path.join(os.path.expanduser("~"), ".xhs", "images")
  112 +
  113 + downloader = ImageDownloader(save_dir)
  114 + result = []
  115 +
  116 + for img in images:
  117 + if is_image_url(img):
  118 + path = downloader.download_image(img)
  119 + result.append(path)
  120 + else:
  121 + # 本地路径
  122 + if os.path.exists(img):
  123 + result.append(os.path.abspath(img))
  124 + else:
  125 + logger.warning("文件不存在: %s", img)
  126 +
  127 + return result
  1 +"""发布编排器:下载 → 登录检查 → 发布 → 报告。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import sys
  8 +
  9 +from image_downloader import process_images
  10 +from title_utils import calc_title_length
  11 +from xhs.cdp import Browser
  12 +from xhs.login import check_login_status
  13 +from xhs.publish import publish_image_content
  14 +from xhs.publish_video import publish_video_content
  15 +from xhs.types import PublishImageContent, PublishVideoContent
  16 +
  17 +logger = logging.getLogger(__name__)
  18 +
  19 +
  20 +def run_publish_pipeline(
  21 + title: str,
  22 + content: str,
  23 + images: list[str] | None = None,
  24 + video: str | None = None,
  25 + tags: list[str] | None = None,
  26 + schedule_time: str | None = None,
  27 + is_original: bool = False,
  28 + visibility: str = "",
  29 + host: str = "127.0.0.1",
  30 + port: int = 9222,
  31 + account: str = "",
  32 +) -> dict:
  33 + """执行完整发布流水线。
  34 +
  35 + Returns:
  36 + 发布结果字典。
  37 + """
  38 + # 标题长度校验
  39 + title_len = calc_title_length(title)
  40 + if title_len > 20:
  41 + return {"success": False, "error": f"标题长度超限: {title_len}/20"}
  42 +
  43 + # 处理图片(下载 URL / 验证本地路径)
  44 + local_images: list[str] = []
  45 + if images:
  46 + local_images = process_images(images)
  47 + if not local_images:
  48 + return {"success": False, "error": "没有有效的图片"}
  49 +
  50 + # 连接浏览器
  51 + browser = Browser(host=host, port=port)
  52 + browser.connect()
  53 +
  54 + try:
  55 + page = browser.new_page()
  56 + try:
  57 + # 登录检查
  58 + if not check_login_status(page):
  59 + return {"success": False, "error": "未登录", "exit_code": 1}
  60 +
  61 + # 发布
  62 + if video:
  63 + publish_video_content(
  64 + page,
  65 + PublishVideoContent(
  66 + title=title,
  67 + content=content,
  68 + tags=tags or [],
  69 + video_path=video,
  70 + schedule_time=schedule_time,
  71 + visibility=visibility,
  72 + ),
  73 + )
  74 + else:
  75 + publish_image_content(
  76 + page,
  77 + PublishImageContent(
  78 + title=title,
  79 + content=content,
  80 + tags=tags or [],
  81 + image_paths=local_images,
  82 + schedule_time=schedule_time,
  83 + is_original=is_original,
  84 + visibility=visibility,
  85 + ),
  86 + )
  87 +
  88 + return {
  89 + "success": True,
  90 + "title": title,
  91 + "content_length": len(content),
  92 + "images": len(local_images),
  93 + "video": video or "",
  94 + "status": "发布完成",
  95 + }
  96 +
  97 + finally:
  98 + browser.close_page(page)
  99 + finally:
  100 + browser.close()
  101 +
  102 +
  103 +def main() -> None:
  104 + """CLI 入口(被 cli.py 的 publish/publish-video 子命令调用时使用)。"""
  105 + import argparse
  106 +
  107 + parser = argparse.ArgumentParser(description="小红书发布流水线")
  108 + parser.add_argument("--title-file", required=True, help="标题文件路径")
  109 + parser.add_argument("--content-file", required=True, help="正文文件路径")
  110 + parser.add_argument("--images", nargs="*", help="图片路径或 URL 列表")
  111 + parser.add_argument("--video", help="视频文件路径")
  112 + parser.add_argument("--tags", nargs="*", help="标签列表")
  113 + parser.add_argument("--schedule-at", help="定时发布时间 (ISO8601)")
  114 + parser.add_argument("--original", action="store_true", help="声明原创")
  115 + parser.add_argument("--visibility", default="", help="可见范围")
  116 + parser.add_argument("--host", default="127.0.0.1")
  117 + parser.add_argument("--port", type=int, default=9222)
  118 + parser.add_argument("--account", default="")
  119 + args = parser.parse_args()
  120 +
  121 + # 读取标题和正文
  122 + with open(args.title_file, encoding="utf-8") as f:
  123 + title = f.read().strip()
  124 + with open(args.content_file, encoding="utf-8") as f:
  125 + content = f.read().strip()
  126 +
  127 + result = run_publish_pipeline(
  128 + title=title,
  129 + content=content,
  130 + images=args.images,
  131 + video=args.video,
  132 + tags=args.tags,
  133 + schedule_time=args.schedule_at,
  134 + is_original=args.original,
  135 + visibility=args.visibility,
  136 + host=args.host,
  137 + port=args.port,
  138 + account=args.account,
  139 + )
  140 +
  141 + print(json.dumps(result, ensure_ascii=False, indent=2))
  142 + sys.exit(0 if result["success"] else 2)
  143 +
  144 +
  145 +if __name__ == "__main__":
  146 + main()
  1 +"""单实例锁,防止多个进程同时操作浏览器。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import contextlib
  6 +import logging
  7 +import os
  8 +import time
  9 +
  10 +logger = logging.getLogger(__name__)
  11 +
  12 +_DEFAULT_LOCK_FILE = os.path.join(os.path.expanduser("~"), ".xhs", "run.lock")
  13 +
  14 +
  15 +class RunLock:
  16 + """文件锁,确保同一时间只有一个进程在操作。"""
  17 +
  18 + def __init__(self, lock_file: str = _DEFAULT_LOCK_FILE) -> None:
  19 + self.lock_file = lock_file
  20 + self._fd: int | None = None
  21 +
  22 + def acquire(self, timeout: float = 30.0) -> bool:
  23 + """获取锁。
  24 +
  25 + Args:
  26 + timeout: 超时时间(秒)。
  27 +
  28 + Returns:
  29 + True 获取成功,False 超时。
  30 + """
  31 + os.makedirs(os.path.dirname(self.lock_file), exist_ok=True)
  32 + deadline = time.monotonic() + timeout
  33 +
  34 + while time.monotonic() < deadline:
  35 + try:
  36 + self._fd = os.open(
  37 + self.lock_file,
  38 + os.O_CREAT | os.O_EXCL | os.O_WRONLY,
  39 + )
  40 + # 写入 PID
  41 + os.write(self._fd, str(os.getpid()).encode())
  42 + logger.debug("获取锁成功: %s", self.lock_file)
  43 + return True
  44 + except FileExistsError:
  45 + # 检查持有者是否还活着
  46 + if self._is_stale():
  47 + self._force_release()
  48 + continue
  49 + time.sleep(1)
  50 +
  51 + logger.warning("获取锁超时: %s", self.lock_file)
  52 + return False
  53 +
  54 + def release(self) -> None:
  55 + """释放锁。"""
  56 + if self._fd is not None:
  57 + with contextlib.suppress(OSError):
  58 + os.close(self._fd)
  59 + self._fd = None
  60 +
  61 + with contextlib.suppress(FileNotFoundError):
  62 + os.remove(self.lock_file)
  63 +
  64 + logger.debug("释放锁: %s", self.lock_file)
  65 +
  66 + def _is_stale(self) -> bool:
  67 + """检查锁文件是否已过时(持有进程已退出)。"""
  68 + try:
  69 + with open(self.lock_file) as f:
  70 + pid = int(f.read().strip())
  71 + # 检查进程是否存在
  72 + os.kill(pid, 0)
  73 + return False
  74 + except (FileNotFoundError, ValueError, ProcessLookupError, PermissionError):
  75 + return True
  76 +
  77 + def _force_release(self) -> None:
  78 + """强制释放过时的锁。"""
  79 + with contextlib.suppress(FileNotFoundError):
  80 + os.remove(self.lock_file)
  81 + logger.info("强制释放过时锁: %s", self.lock_file)
  82 +
  83 + def __enter__(self) -> RunLock:
  84 + if not self.acquire():
  85 + raise TimeoutError(f"无法获取锁: {self.lock_file}")
  86 + return self
  87 +
  88 + def __exit__(self, *args: object) -> None:
  89 + self.release()
  1 +"""UTF-16 标题长度计算,对应 Go pkg/xhsutil/title.go。"""
  2 +
  3 +
  4 +def calc_title_length(s: str) -> int:
  5 + """计算小红书标题长度。
  6 +
  7 + 规则:非 ASCII 字符(中文、全角符号等)算 2 字节,
  8 + ASCII 字符算 1 字节,最终结果向上取整除以 2。
  9 +
  10 + Examples:
  11 + >>> calc_title_length("你好世界")
  12 + 4
  13 + >>> calc_title_length("hello")
  14 + 3
  15 + >>> calc_title_length("OOTD穿搭分享")
  16 + 6
  17 + """
  18 + byte_len = 0
  19 + # 用 UTF-16 编码来处理(包括 surrogate pairs)
  20 + encoded = s.encode("utf-16-le")
  21 + for i in range(0, len(encoded), 2):
  22 + code_unit = int.from_bytes(encoded[i : i + 2], "little")
  23 + if code_unit > 127:
  24 + byte_len += 2
  25 + else:
  26 + byte_len += 1
  27 + return (byte_len + 1) // 2
  1 +"""小红书 CDP 自动化核心包。"""
  1 +"""CDP WebSocket 客户端(Browser, Page, Element),对应 Go browser/browser.go + go-rod API。
  2 +
  3 +通过原生 WebSocket 与 Chrome DevTools Protocol 通信,实现浏览器自动化控制。
  4 +"""
  5 +
  6 +from __future__ import annotations
  7 +
  8 +import json
  9 +import logging
  10 +import time
  11 +from typing import Any
  12 +
  13 +import requests
  14 +import websockets.sync.client as ws_client
  15 +
  16 +from .errors import CDPError, ElementNotFoundError
  17 +from .stealth import STEALTH_JS
  18 +
  19 +logger = logging.getLogger(__name__)
  20 +
  21 +
  22 +class CDPClient:
  23 + """底层 CDP WebSocket 通信客户端。"""
  24 +
  25 + def __init__(self, ws_url: str) -> None:
  26 + self._ws = ws_client.connect(ws_url, max_size=50 * 1024 * 1024)
  27 + self._id = 0
  28 + self._callbacks: dict[int, Any] = {}
  29 +
  30 + def send(self, method: str, params: dict | None = None) -> dict:
  31 + """发送 CDP 命令并等待结果。"""
  32 + self._id += 1
  33 + msg: dict[str, Any] = {"id": self._id, "method": method}
  34 + if params:
  35 + msg["params"] = params
  36 + self._ws.send(json.dumps(msg))
  37 + return self._wait_for(self._id)
  38 +
  39 + def _wait_for(self, msg_id: int, timeout: float = 30.0) -> dict:
  40 + """等待指定 id 的响应。"""
  41 + deadline = time.monotonic() + timeout
  42 + while time.monotonic() < deadline:
  43 + try:
  44 + raw = self._ws.recv(timeout=max(0.1, deadline - time.monotonic()))
  45 + except TimeoutError:
  46 + break
  47 + data = json.loads(raw)
  48 + if data.get("id") == msg_id:
  49 + if "error" in data:
  50 + raise CDPError(f"CDP 错误: {data['error']}")
  51 + return data.get("result", {})
  52 + raise CDPError(f"等待 CDP 响应超时 (id={msg_id})")
  53 +
  54 + def close(self) -> None:
  55 + import contextlib
  56 +
  57 + with contextlib.suppress(Exception):
  58 + self._ws.close()
  59 +
  60 +
  61 +class Page:
  62 + """CDP 页面对象,封装常用操作。"""
  63 +
  64 + def __init__(self, cdp: CDPClient, target_id: str, session_id: str) -> None:
  65 + self._cdp = cdp
  66 + self.target_id = target_id
  67 + self.session_id = session_id
  68 + self._ws = cdp._ws
  69 + self._id_counter = 1000
  70 +
  71 + def _send_session(self, method: str, params: dict | None = None) -> dict:
  72 + """向 session 发送命令。"""
  73 + self._id_counter += 1
  74 + msg: dict[str, Any] = {
  75 + "id": self._id_counter,
  76 + "method": method,
  77 + "sessionId": self.session_id,
  78 + }
  79 + if params:
  80 + msg["params"] = params
  81 + self._ws.send(json.dumps(msg))
  82 + return self._wait_session(self._id_counter)
  83 +
  84 + def _wait_session(self, msg_id: int, timeout: float = 60.0) -> dict:
  85 + """等待 session 响应。"""
  86 + deadline = time.monotonic() + timeout
  87 + while time.monotonic() < deadline:
  88 + try:
  89 + raw = self._ws.recv(timeout=max(0.1, deadline - time.monotonic()))
  90 + except TimeoutError:
  91 + break
  92 + data = json.loads(raw)
  93 + if data.get("id") == msg_id:
  94 + if "error" in data:
  95 + raise CDPError(f"CDP 错误: {data['error']}")
  96 + return data.get("result", {})
  97 + raise CDPError(f"等待 session 响应超时 (id={msg_id})")
  98 +
  99 + def navigate(self, url: str) -> None:
  100 + """导航到指定 URL。"""
  101 + logger.info("导航到: %s", url)
  102 + self._send_session("Page.navigate", {"url": url})
  103 +
  104 + def wait_for_load(self, timeout: float = 60.0) -> None:
  105 + """等待页面加载完成(通过轮询 document.readyState)。"""
  106 + deadline = time.monotonic() + timeout
  107 + while time.monotonic() < deadline:
  108 + try:
  109 + state = self.evaluate("document.readyState")
  110 + if state == "complete":
  111 + return
  112 + except CDPError:
  113 + pass
  114 + time.sleep(0.5)
  115 + logger.warning("等待页面加载超时")
  116 +
  117 + def wait_dom_stable(self, timeout: float = 10.0, interval: float = 0.5) -> None:
  118 + """等待 DOM 稳定(连续两次 DOM 快照一致)。"""
  119 + last_html = ""
  120 + deadline = time.monotonic() + timeout
  121 + while time.monotonic() < deadline:
  122 + try:
  123 + html = self.evaluate("document.body ? document.body.innerHTML.length : 0")
  124 + if html == last_html and html != "":
  125 + return
  126 + last_html = html
  127 + except CDPError:
  128 + pass
  129 + time.sleep(interval)
  130 +
  131 + def evaluate(self, expression: str, timeout: float = 30.0) -> Any:
  132 + """执行 JavaScript 表达式并返回结果。"""
  133 + result = self._send_session(
  134 + "Runtime.evaluate",
  135 + {
  136 + "expression": expression,
  137 + "returnByValue": True,
  138 + "awaitPromise": False,
  139 + },
  140 + )
  141 + if "exceptionDetails" in result:
  142 + raise CDPError(f"JS 执行异常: {result['exceptionDetails']}")
  143 + remote_obj = result.get("result", {})
  144 + return remote_obj.get("value")
  145 +
  146 + def evaluate_function(self, function_body: str, *args: Any) -> Any:
  147 + """执行 JavaScript 函数并返回结果。
  148 +
  149 + function_body 是一个完整的函数体,如 `() => { return 1; }`
  150 + """
  151 + result = self._send_session(
  152 + "Runtime.evaluate",
  153 + {
  154 + "expression": f"({function_body})()",
  155 + "returnByValue": True,
  156 + "awaitPromise": False,
  157 + },
  158 + )
  159 + if "exceptionDetails" in result:
  160 + raise CDPError(f"JS 函数执行异常: {result['exceptionDetails']}")
  161 + remote_obj = result.get("result", {})
  162 + return remote_obj.get("value")
  163 +
  164 + def query_selector(self, selector: str) -> str | None:
  165 + """查找单个元素,返回 objectId 或 None。"""
  166 + result = self._send_session(
  167 + "Runtime.evaluate",
  168 + {
  169 + "expression": f"document.querySelector({json.dumps(selector)})",
  170 + "returnByValue": False,
  171 + },
  172 + )
  173 + remote_obj = result.get("result", {})
  174 + if remote_obj.get("subtype") == "null" or remote_obj.get("type") == "undefined":
  175 + return None
  176 + return remote_obj.get("objectId")
  177 +
  178 + def query_selector_all(self, selector: str) -> list[str]:
  179 + """查找多个元素,返回 objectId 列表。"""
  180 + # 通过 JS 返回元素数量,然后逐个获取
  181 + count = self.evaluate(f"document.querySelectorAll({json.dumps(selector)}).length")
  182 + if not count:
  183 + return []
  184 + object_ids = []
  185 + for i in range(count):
  186 + result = self._send_session(
  187 + "Runtime.evaluate",
  188 + {
  189 + "expression": (f"document.querySelectorAll({json.dumps(selector)})[{i}]"),
  190 + "returnByValue": False,
  191 + },
  192 + )
  193 + obj = result.get("result", {})
  194 + oid = obj.get("objectId")
  195 + if oid:
  196 + object_ids.append(oid)
  197 + return object_ids
  198 +
  199 + def has_element(self, selector: str) -> bool:
  200 + """检查元素是否存在。"""
  201 + return self.evaluate(f"document.querySelector({json.dumps(selector)}) !== null") is True
  202 +
  203 + def wait_for_element(self, selector: str, timeout: float = 30.0) -> str:
  204 + """等待元素出现,返回 objectId。"""
  205 + deadline = time.monotonic() + timeout
  206 + while time.monotonic() < deadline:
  207 + oid = self.query_selector(selector)
  208 + if oid:
  209 + return oid
  210 + time.sleep(0.5)
  211 + raise ElementNotFoundError(selector)
  212 +
  213 + def click_element(self, selector: str) -> None:
  214 + """点击指定选择器的元素。"""
  215 + self.evaluate(
  216 + f"""
  217 + (() => {{
  218 + const el = document.querySelector({json.dumps(selector)});
  219 + if (el) el.click();
  220 + }})()
  221 + """
  222 + )
  223 +
  224 + def input_text(self, selector: str, text: str) -> None:
  225 + """向指定选择器的元素输入文本。"""
  226 + self.evaluate(
  227 + f"""
  228 + (() => {{
  229 + const el = document.querySelector({json.dumps(selector)});
  230 + if (!el) return;
  231 + el.focus();
  232 + el.value = {json.dumps(text)};
  233 + el.dispatchEvent(new Event('input', {{bubbles: true}}));
  234 + el.dispatchEvent(new Event('change', {{bubbles: true}}));
  235 + }})()
  236 + """
  237 + )
  238 +
  239 + def input_content_editable(self, selector: str, text: str) -> None:
  240 + """向 contentEditable 元素输入文本(如 div.ql-editor)。"""
  241 + self.evaluate(
  242 + f"""
  243 + (() => {{
  244 + const el = document.querySelector({json.dumps(selector)});
  245 + if (!el) return;
  246 + el.focus();
  247 + el.textContent = {json.dumps(text)};
  248 + el.dispatchEvent(new Event('input', {{bubbles: true}}));
  249 + }})()
  250 + """
  251 + )
  252 +
  253 + def get_element_text(self, selector: str) -> str | None:
  254 + """获取元素文本内容。"""
  255 + return self.evaluate(
  256 + f"""
  257 + (() => {{
  258 + const el = document.querySelector({json.dumps(selector)});
  259 + return el ? el.textContent : null;
  260 + }})()
  261 + """
  262 + )
  263 +
  264 + def get_element_attribute(self, selector: str, attr: str) -> str | None:
  265 + """获取元素属性值。"""
  266 + return self.evaluate(
  267 + f"""
  268 + (() => {{
  269 + const el = document.querySelector({json.dumps(selector)});
  270 + return el ? el.getAttribute({json.dumps(attr)}) : null;
  271 + }})()
  272 + """
  273 + )
  274 +
  275 + def get_elements_count(self, selector: str) -> int:
  276 + """获取匹配元素数量。"""
  277 + result = self.evaluate(f"document.querySelectorAll({json.dumps(selector)}).length")
  278 + return result if isinstance(result, int) else 0
  279 +
  280 + def scroll_by(self, x: int, y: int) -> None:
  281 + """滚动页面。"""
  282 + self.evaluate(f"window.scrollBy({x}, {y})")
  283 +
  284 + def scroll_to(self, x: int, y: int) -> None:
  285 + """滚动到指定位置。"""
  286 + self.evaluate(f"window.scrollTo({x}, {y})")
  287 +
  288 + def scroll_to_bottom(self) -> None:
  289 + """滚动到页面底部。"""
  290 + self.evaluate("window.scrollTo(0, document.body.scrollHeight)")
  291 +
  292 + def scroll_element_into_view(self, selector: str) -> None:
  293 + """将元素滚动到可视区域。"""
  294 + self.evaluate(
  295 + f"""
  296 + (() => {{
  297 + const el = document.querySelector({json.dumps(selector)});
  298 + if (el) el.scrollIntoView({{behavior: 'smooth', block: 'center'}});
  299 + }})()
  300 + """
  301 + )
  302 +
  303 + def scroll_nth_element_into_view(self, selector: str, index: int) -> None:
  304 + """将第 N 个匹配元素滚动到可视区域。"""
  305 + self.evaluate(
  306 + f"""
  307 + (() => {{
  308 + const els = document.querySelectorAll({json.dumps(selector)});
  309 + if (els[{index}]) els[{index}].scrollIntoView(
  310 + {{behavior: 'smooth', block: 'center'}}
  311 + );
  312 + }})()
  313 + """
  314 + )
  315 +
  316 + def get_scroll_top(self) -> int:
  317 + """获取当前滚动位置。"""
  318 + result = self.evaluate(
  319 + "window.pageYOffset || document.documentElement.scrollTop"
  320 + " || document.body.scrollTop || 0"
  321 + )
  322 + return int(result) if result else 0
  323 +
  324 + def get_viewport_height(self) -> int:
  325 + """获取视口高度。"""
  326 + result = self.evaluate("window.innerHeight")
  327 + return int(result) if result else 768
  328 +
  329 + def set_file_input(self, selector: str, files: list[str]) -> None:
  330 + """设置文件输入框的文件(通过 CDP DOM.setFileInputFiles)。"""
  331 + # 先获取 nodeId
  332 + doc = self._send_session("DOM.getDocument", {"depth": 0})
  333 + root_node_id = doc["root"]["nodeId"]
  334 + result = self._send_session(
  335 + "DOM.querySelector",
  336 + {"nodeId": root_node_id, "selector": selector},
  337 + )
  338 + node_id = result.get("nodeId", 0)
  339 + if node_id == 0:
  340 + raise ElementNotFoundError(selector)
  341 + self._send_session(
  342 + "DOM.setFileInputFiles",
  343 + {"nodeId": node_id, "files": files},
  344 + )
  345 +
  346 + def dispatch_wheel_event(self, delta_y: float) -> None:
  347 + """触发滚轮事件以激活懒加载。"""
  348 + self.evaluate(
  349 + f"""
  350 + (() => {{
  351 + let target = document.querySelector('.note-scroller')
  352 + || document.querySelector('.interaction-container')
  353 + || document.documentElement;
  354 + const event = new WheelEvent('wheel', {{
  355 + deltaY: {delta_y},
  356 + deltaMode: 0,
  357 + bubbles: true,
  358 + cancelable: true,
  359 + view: window,
  360 + }});
  361 + target.dispatchEvent(event);
  362 + }})()
  363 + """
  364 + )
  365 +
  366 + def mouse_move(self, x: float, y: float) -> None:
  367 + """移动鼠标。"""
  368 + self._send_session(
  369 + "Input.dispatchMouseEvent",
  370 + {"type": "mouseMoved", "x": x, "y": y},
  371 + )
  372 +
  373 + def mouse_click(self, x: float, y: float, button: str = "left") -> None:
  374 + """在指定坐标点击。"""
  375 + self._send_session(
  376 + "Input.dispatchMouseEvent",
  377 + {"type": "mousePressed", "x": x, "y": y, "button": button, "clickCount": 1},
  378 + )
  379 + self._send_session(
  380 + "Input.dispatchMouseEvent",
  381 + {"type": "mouseReleased", "x": x, "y": y, "button": button, "clickCount": 1},
  382 + )
  383 +
  384 + def type_text(self, text: str, delay_ms: int = 50) -> None:
  385 + """逐字符输入文本。"""
  386 + for char in text:
  387 + self._send_session(
  388 + "Input.dispatchKeyEvent",
  389 + {"type": "keyDown", "text": char},
  390 + )
  391 + self._send_session(
  392 + "Input.dispatchKeyEvent",
  393 + {"type": "keyUp", "text": char},
  394 + )
  395 + if delay_ms > 0:
  396 + time.sleep(delay_ms / 1000.0)
  397 +
  398 + def press_key(self, key: str) -> None:
  399 + """按下并释放指定键。"""
  400 + key_map = {
  401 + "Enter": {"key": "Enter", "code": "Enter", "windowsVirtualKeyCode": 13},
  402 + "ArrowDown": {
  403 + "key": "ArrowDown",
  404 + "code": "ArrowDown",
  405 + "windowsVirtualKeyCode": 40,
  406 + },
  407 + "Tab": {"key": "Tab", "code": "Tab", "windowsVirtualKeyCode": 9},
  408 + }
  409 + info = key_map.get(key, {"key": key, "code": key})
  410 + self._send_session(
  411 + "Input.dispatchKeyEvent",
  412 + {"type": "keyDown", **info},
  413 + )
  414 + self._send_session(
  415 + "Input.dispatchKeyEvent",
  416 + {"type": "keyUp", **info},
  417 + )
  418 +
  419 + def inject_stealth(self) -> None:
  420 + """注入反检测脚本。"""
  421 + self._send_session(
  422 + "Page.addScriptToEvaluateOnNewDocument",
  423 + {"source": STEALTH_JS},
  424 + )
  425 +
  426 + def remove_element(self, selector: str) -> None:
  427 + """移除 DOM 元素。"""
  428 + self.evaluate(
  429 + f"""
  430 + (() => {{
  431 + const el = document.querySelector({json.dumps(selector)});
  432 + if (el) el.remove();
  433 + }})()
  434 + """
  435 + )
  436 +
  437 + def hover_element(self, selector: str) -> None:
  438 + """悬停到元素中心。"""
  439 + box = self.evaluate(
  440 + f"""
  441 + (() => {{
  442 + const el = document.querySelector({json.dumps(selector)});
  443 + if (!el) return null;
  444 + const rect = el.getBoundingClientRect();
  445 + return {{x: rect.left + rect.width / 2, y: rect.top + rect.height / 2}};
  446 + }})()
  447 + """
  448 + )
  449 + if box:
  450 + self.mouse_move(box["x"], box["y"])
  451 +
  452 + def select_all_text(self, selector: str) -> None:
  453 + """选中输入框内所有文本。"""
  454 + self.evaluate(
  455 + f"""
  456 + (() => {{
  457 + const el = document.querySelector({json.dumps(selector)});
  458 + if (!el) return;
  459 + el.focus();
  460 + el.select ? el.select() : document.execCommand('selectAll');
  461 + }})()
  462 + """
  463 + )
  464 +
  465 +
  466 +class Browser:
  467 + """Chrome 浏览器 CDP 控制器。"""
  468 +
  469 + def __init__(self, host: str = "127.0.0.1", port: int = 9222) -> None:
  470 + self.host = host
  471 + self.port = port
  472 + self.base_url = f"http://{host}:{port}"
  473 + self._cdp: CDPClient | None = None
  474 +
  475 + def connect(self) -> None:
  476 + """连接到 Chrome DevTools。"""
  477 + resp = requests.get(f"{self.base_url}/json/version", timeout=5)
  478 + resp.raise_for_status()
  479 + info = resp.json()
  480 + ws_url = info["webSocketDebuggerUrl"]
  481 + logger.info("连接到 Chrome: %s", ws_url)
  482 + self._cdp = CDPClient(ws_url)
  483 +
  484 + def new_page(self, url: str = "about:blank") -> Page:
  485 + """创建新页面。"""
  486 + if not self._cdp:
  487 + self.connect()
  488 + assert self._cdp is not None
  489 +
  490 + # 创建 target
  491 + result = self._cdp.send("Target.createTarget", {"url": url})
  492 + target_id = result["targetId"]
  493 +
  494 + # 附加到 target
  495 + result = self._cdp.send(
  496 + "Target.attachToTarget",
  497 + {"targetId": target_id, "flatten": True},
  498 + )
  499 + session_id = result["sessionId"]
  500 +
  501 + page = Page(self._cdp, target_id, session_id)
  502 +
  503 + # 启用必要的 domain
  504 + page._send_session("Page.enable")
  505 + page._send_session("DOM.enable")
  506 + page._send_session("Runtime.enable")
  507 +
  508 + # 注入反检测
  509 + page.inject_stealth()
  510 +
  511 + return page
  512 +
  513 + def get_existing_page(self) -> Page | None:
  514 + """获取已有页面(取第一个非 about:blank 的 page target)。"""
  515 + if not self._cdp:
  516 + self.connect()
  517 + assert self._cdp is not None
  518 +
  519 + resp = requests.get(f"{self.base_url}/json", timeout=5)
  520 + targets = resp.json()
  521 +
  522 + for target in targets:
  523 + if target.get("type") == "page" and target.get("url") != "about:blank":
  524 + target_id = target["id"]
  525 + result = self._cdp.send(
  526 + "Target.attachToTarget",
  527 + {"targetId": target_id, "flatten": True},
  528 + )
  529 + session_id = result["sessionId"]
  530 + page = Page(self._cdp, target_id, session_id)
  531 + page._send_session("Page.enable")
  532 + page._send_session("DOM.enable")
  533 + page._send_session("Runtime.enable")
  534 + page.inject_stealth()
  535 + return page
  536 + return None
  537 +
  538 + def close_page(self, page: Page) -> None:
  539 + """关闭页面。"""
  540 + import contextlib
  541 +
  542 + if self._cdp:
  543 + with contextlib.suppress(CDPError):
  544 + self._cdp.send("Target.closeTarget", {"targetId": page.target_id})
  545 +
  546 + def close(self) -> None:
  547 + """关闭连接。"""
  548 + if self._cdp:
  549 + self._cdp.close()
  550 + self._cdp = None
  1 +"""评论操作,对应 Go xiaohongshu/comment_feed.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import logging
  6 +import time
  7 +
  8 +from .cdp import Page
  9 +from .feed_detail import _check_end_container, _check_page_accessible, _get_comment_count
  10 +from .selectors import (
  11 + COMMENT_INPUT_FIELD,
  12 + COMMENT_INPUT_TRIGGER,
  13 + COMMENT_SUBMIT_BUTTON,
  14 + PARENT_COMMENT,
  15 + REPLY_BUTTON,
  16 +)
  17 +from .urls import make_feed_detail_url
  18 +
  19 +logger = logging.getLogger(__name__)
  20 +
  21 +
  22 +def post_comment(page: Page, feed_id: str, xsec_token: str, content: str) -> None:
  23 + """发表评论到 Feed。
  24 +
  25 + Args:
  26 + page: CDP 页面对象。
  27 + feed_id: Feed ID。
  28 + xsec_token: xsec_token。
  29 + content: 评论内容。
  30 +
  31 + Raises:
  32 + RuntimeError: 评论失败。
  33 + """
  34 + url = make_feed_detail_url(feed_id, xsec_token)
  35 + logger.info("打开 feed 详情页: %s", url)
  36 +
  37 + page.navigate(url)
  38 + page.wait_for_load()
  39 + page.wait_dom_stable()
  40 + time.sleep(1)
  41 +
  42 + _check_page_accessible(page)
  43 +
  44 + # 点击评论输入触发区域
  45 + if not page.has_element(COMMENT_INPUT_TRIGGER):
  46 + raise RuntimeError("未找到评论输入框,该帖子可能不支持评论或网页端不可访问")
  47 +
  48 + page.click_element(COMMENT_INPUT_TRIGGER)
  49 + time.sleep(0.5)
  50 +
  51 + # 输入评论内容
  52 + page.wait_for_element(COMMENT_INPUT_FIELD, timeout=5)
  53 + page.evaluate(
  54 + f"""
  55 + (() => {{
  56 + const el = document.querySelector({_js_str(COMMENT_INPUT_FIELD)});
  57 + if (el) {{
  58 + el.focus();
  59 + el.textContent = {_js_str(content)};
  60 + el.dispatchEvent(new Event('input', {{bubbles: true}}));
  61 + }}
  62 + }})()
  63 + """
  64 + )
  65 + time.sleep(1)
  66 +
  67 + # 点击提交
  68 + page.click_element(COMMENT_SUBMIT_BUTTON)
  69 + time.sleep(1)
  70 +
  71 + logger.info("评论发送成功: feed=%s", feed_id)
  72 +
  73 +
  74 +def reply_comment(
  75 + page: Page,
  76 + feed_id: str,
  77 + xsec_token: str,
  78 + content: str,
  79 + comment_id: str = "",
  80 + user_id: str = "",
  81 +) -> None:
  82 + """回复指定评论。
  83 +
  84 + 通过 comment_id 或 user_id 定位评论,然后回复。
  85 +
  86 + Args:
  87 + page: CDP 页面对象。
  88 + feed_id: Feed ID。
  89 + xsec_token: xsec_token。
  90 + content: 回复内容。
  91 + comment_id: 评论 ID(优先使用)。
  92 + user_id: 用户 ID(备选)。
  93 +
  94 + Raises:
  95 + RuntimeError: 回复失败。
  96 + """
  97 + if not comment_id and not user_id:
  98 + raise ValueError("comment_id 和 user_id 至少提供一个")
  99 +
  100 + url = make_feed_detail_url(feed_id, xsec_token)
  101 + logger.info("打开 feed 详情页进行回复: %s", url)
  102 +
  103 + page.navigate(url)
  104 + page.wait_for_load()
  105 + page.wait_dom_stable()
  106 + time.sleep(1)
  107 +
  108 + _check_page_accessible(page)
  109 + time.sleep(2)
  110 +
  111 + # 查找目标评论
  112 + comment_found = _find_and_scroll_to_comment(page, comment_id, user_id)
  113 + if not comment_found:
  114 + raise RuntimeError(f"未找到评论 (commentID: {comment_id}, userID: {user_id})")
  115 +
  116 + time.sleep(1)
  117 +
  118 + # 点击回复按钮
  119 + reply_selector = f"#comment-{comment_id} {REPLY_BUTTON}" if comment_id else REPLY_BUTTON
  120 + page.click_element(reply_selector)
  121 + time.sleep(1)
  122 +
  123 + # 输入回复内容
  124 + page.wait_for_element(COMMENT_INPUT_FIELD, timeout=5)
  125 + page.evaluate(
  126 + f"""
  127 + (() => {{
  128 + const el = document.querySelector({_js_str(COMMENT_INPUT_FIELD)});
  129 + if (el) {{
  130 + el.focus();
  131 + el.textContent = {_js_str(content)};
  132 + el.dispatchEvent(new Event('input', {{bubbles: true}}));
  133 + }}
  134 + }})()
  135 + """
  136 + )
  137 + time.sleep(0.5)
  138 +
  139 + # 点击提交
  140 + page.click_element(COMMENT_SUBMIT_BUTTON)
  141 + time.sleep(2)
  142 +
  143 + logger.info("回复评论成功")
  144 +
  145 +
  146 +def _find_and_scroll_to_comment(
  147 + page: Page,
  148 + comment_id: str,
  149 + user_id: str,
  150 + max_attempts: int = 100,
  151 +) -> bool:
  152 + """查找并滚动到目标评论。"""
  153 + logger.info("开始查找评论 - commentID: %s, userID: %s", comment_id, user_id)
  154 +
  155 + # 先滚动到评论区
  156 + page.scroll_element_into_view(".comments-container")
  157 + time.sleep(1)
  158 +
  159 + last_count = 0
  160 + stagnant = 0
  161 +
  162 + for attempt in range(max_attempts):
  163 + # 检查是否到底
  164 + if _check_end_container(page):
  165 + logger.info("已到达评论底部,未找到目标评论")
  166 + break
  167 +
  168 + # 停滞检测
  169 + current_count = _get_comment_count(page)
  170 + if current_count != last_count:
  171 + last_count = current_count
  172 + stagnant = 0
  173 + else:
  174 + stagnant += 1
  175 + if stagnant >= 10:
  176 + logger.info("评论数量停滞超过10次")
  177 + break
  178 +
  179 + # 滚动到最后一条评论
  180 + if current_count > 0:
  181 + page.scroll_nth_element_into_view(PARENT_COMMENT, current_count - 1)
  182 + time.sleep(0.3)
  183 +
  184 + # 继续滚动
  185 + page.evaluate("window.scrollBy(0, window.innerHeight * 0.8)")
  186 + time.sleep(0.5)
  187 +
  188 + # 通过 commentID 查找
  189 + if comment_id:
  190 + selector = f"#comment-{comment_id}"
  191 + if page.has_element(selector):
  192 + logger.info("通过 commentID 找到评论 (尝试 %d 次)", attempt + 1)
  193 + page.scroll_element_into_view(selector)
  194 + return True
  195 +
  196 + # 通过 userID 查找
  197 + if user_id:
  198 + found = page.evaluate(
  199 + f"""
  200 + (() => {{
  201 + const els = document.querySelectorAll(
  202 + '.parent-comment, .comment-item, .comment'
  203 + );
  204 + for (const el of els) {{
  205 + if (el.querySelector('[data-user-id="{user_id}"]')) {{
  206 + el.scrollIntoView({{behavior: 'smooth', block: 'center'}});
  207 + return true;
  208 + }}
  209 + }}
  210 + return false;
  211 + }})()
  212 + """
  213 + )
  214 + if found:
  215 + logger.info("通过 userID 找到评论 (尝试 %d 次)", attempt + 1)
  216 + return True
  217 +
  218 + time.sleep(0.8)
  219 +
  220 + return False
  221 +
  222 +
  223 +def _js_str(s: str) -> str:
  224 + """将 Python 字符串转为 JS 字面量(含引号)。"""
  225 + import json
  226 +
  227 + return json.dumps(s)
  1 +"""Cookie 文件持久化,对应 Go cookies/cookies.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import os
  6 +from pathlib import Path
  7 +
  8 +
  9 +def get_cookies_file_path(account: str = "") -> str:
  10 + """获取 cookies 文件路径。
  11 +
  12 + 优先级:
  13 + 1. /tmp/cookies.json(向后兼容)
  14 + 2. COOKIES_PATH 环境变量
  15 + 3. 多账号模式:~/.xhs/accounts/{account}/cookies.json
  16 + 4. ./cookies.json(本地调试)
  17 + """
  18 + if account:
  19 + account_dir = Path.home() / ".xhs" / "accounts" / account
  20 + account_dir.mkdir(parents=True, exist_ok=True)
  21 + return str(account_dir / "cookies.json")
  22 +
  23 + # 旧路径
  24 + import tempfile
  25 +
  26 + old_path = os.path.join(tempfile.gettempdir(), "cookies.json")
  27 + if os.path.exists(old_path):
  28 + return old_path
  29 +
  30 + # 环境变量
  31 + env_path = os.getenv("COOKIES_PATH")
  32 + if env_path:
  33 + return env_path
  34 +
  35 + return "cookies.json"
  36 +
  37 +
  38 +def load_cookies(path: str) -> bytes | None:
  39 + """从文件加载 cookies。"""
  40 + try:
  41 + with open(path, "rb") as f:
  42 + return f.read()
  43 + except FileNotFoundError:
  44 + return None
  45 +
  46 +
  47 +def save_cookies(path: str, data: bytes) -> None:
  48 + """保存 cookies 到文件。"""
  49 + os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
  50 + with open(path, "wb") as f:
  51 + f.write(data)
  52 +
  53 +
  54 +def delete_cookies(path: str) -> None:
  55 + """删除 cookies 文件。"""
  56 + import contextlib
  57 +
  58 + with contextlib.suppress(FileNotFoundError):
  59 + os.remove(path)
  1 +"""小红书自动化异常体系。"""
  2 +
  3 +
  4 +class XHSError(Exception):
  5 + """小红书自动化基础异常。"""
  6 +
  7 +
  8 +class NoFeedsError(XHSError):
  9 + """没有捕获到 feeds 数据。"""
  10 +
  11 + def __init__(self) -> None:
  12 + super().__init__("没有捕获到 feeds 数据")
  13 +
  14 +
  15 +class NoFeedDetailError(XHSError):
  16 + """没有捕获到 feed 详情数据。"""
  17 +
  18 + def __init__(self) -> None:
  19 + super().__init__("没有捕获到 feed 详情数据")
  20 +
  21 +
  22 +class NotLoggedInError(XHSError):
  23 + """未登录。"""
  24 +
  25 + def __init__(self) -> None:
  26 + super().__init__("未登录,请先扫码登录")
  27 +
  28 +
  29 +class PageNotAccessibleError(XHSError):
  30 + """页面不可访问。"""
  31 +
  32 + def __init__(self, reason: str) -> None:
  33 + self.reason = reason
  34 + super().__init__(f"笔记不可访问: {reason}")
  35 +
  36 +
  37 +class UploadTimeoutError(XHSError):
  38 + """上传超时。"""
  39 +
  40 +
  41 +class PublishError(XHSError):
  42 + """发布失败。"""
  43 +
  44 +
  45 +class TitleTooLongError(PublishError):
  46 + """标题超过长度限制。"""
  47 +
  48 + def __init__(self, current: str, maximum: str) -> None:
  49 + self.current = current
  50 + self.maximum = maximum
  51 + super().__init__(f"当前输入长度为{current},最大长度为{maximum}")
  52 +
  53 +
  54 +class ContentTooLongError(PublishError):
  55 + """正文超过长度限制。"""
  56 +
  57 + def __init__(self, current: str, maximum: str) -> None:
  58 + self.current = current
  59 + self.maximum = maximum
  60 + super().__init__(f"当前输入长度为{current},最大长度为{maximum}")
  61 +
  62 +
  63 +class CDPError(XHSError):
  64 + """CDP 通信异常。"""
  65 +
  66 +
  67 +class ElementNotFoundError(XHSError):
  68 + """页面元素未找到。"""
  69 +
  70 + def __init__(self, selector: str) -> None:
  71 + self.selector = selector
  72 + super().__init__(f"未找到元素: {selector}")
  1 +"""Feed 详情 + 评论加载,对应 Go xiaohongshu/feed_detail.go(867 行)。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import random
  8 +import re
  9 +import time
  10 +
  11 +from .cdp import Page
  12 +from .errors import NoFeedDetailError, PageNotAccessibleError
  13 +from .human import (
  14 + BUTTON_CLICK_INTERVAL,
  15 + DEFAULT_MAX_ATTEMPTS,
  16 + FINAL_SPRINT_PUSH_COUNT,
  17 + HUMAN_DELAY,
  18 + LARGE_SCROLL_TRIGGER,
  19 + MAX_CLICK_PER_ROUND,
  20 + MIN_SCROLL_DELTA,
  21 + POST_SCROLL,
  22 + REACTION_TIME,
  23 + READ_TIME,
  24 + SCROLL_WAIT,
  25 + SHORT_READ,
  26 + STAGNANT_LIMIT,
  27 + calculate_scroll_delta,
  28 + get_scroll_interval,
  29 + get_scroll_ratio,
  30 + sleep_random,
  31 +)
  32 +from .selectors import (
  33 + ACCESS_ERROR_WRAPPER,
  34 + END_CONTAINER,
  35 + NO_COMMENTS_TEXT,
  36 + PARENT_COMMENT,
  37 + SHOW_MORE_BUTTON,
  38 +)
  39 +from .types import (
  40 + CommentList,
  41 + CommentLoadConfig,
  42 + FeedDetail,
  43 + FeedDetailResponse,
  44 +)
  45 +from .urls import make_feed_detail_url
  46 +
  47 +logger = logging.getLogger(__name__)
  48 +
  49 +# 页面不可访问关键词
  50 +_INACCESSIBLE_KEYWORDS = [
  51 + "当前笔记暂时无法浏览",
  52 + "该内容因违规已被删除",
  53 + "该笔记已被删除",
  54 + "内容不存在",
  55 + "笔记不存在",
  56 + "已失效",
  57 + "私密笔记",
  58 + "仅作者可见",
  59 + "因用户设置,你无法查看",
  60 + "因违规无法查看",
  61 +]
  62 +
  63 +_REPLY_COUNT_RE = re.compile(r"展开\s*(\d+)\s*条回复")
  64 +_TOTAL_COMMENT_RE = re.compile(r"共(\d+)条评论")
  65 +
  66 +
  67 +def get_feed_detail(
  68 + page: Page,
  69 + feed_id: str,
  70 + xsec_token: str,
  71 + load_all_comments: bool = False,
  72 + config: CommentLoadConfig | None = None,
  73 +) -> FeedDetailResponse:
  74 + """获取 Feed 详情(含评论)。
  75 +
  76 + Args:
  77 + page: CDP 页面对象。
  78 + feed_id: Feed ID。
  79 + xsec_token: xsec_token。
  80 + load_all_comments: 是否加载全部评论。
  81 + config: 评论加载配置。
  82 +
  83 + Raises:
  84 + PageNotAccessibleError: 页面不可访问。
  85 + NoFeedDetailError: 未获取到详情数据。
  86 + """
  87 + if config is None:
  88 + config = CommentLoadConfig()
  89 +
  90 + url = make_feed_detail_url(feed_id, xsec_token)
  91 + logger.info("打开 feed 详情页: %s", url)
  92 + logger.info(
  93 + "配置: 点击更多=%s, 回复阈值=%d, 最大评论数=%d, 滚动速度=%s",
  94 + config.click_more_replies,
  95 + config.max_replies_threshold,
  96 + config.max_comment_items,
  97 + config.scroll_speed,
  98 + )
  99 +
  100 + # 导航(含重试)
  101 + for attempt in range(3):
  102 + try:
  103 + page.navigate(url)
  104 + page.wait_for_load()
  105 + page.wait_dom_stable()
  106 + break
  107 + except Exception as e:
  108 + logger.debug("页面导航重试 #%d: %s", attempt, e)
  109 + time.sleep(0.5 + random.random())
  110 + else:
  111 + raise RuntimeError("页面导航失败")
  112 +
  113 + sleep_random(1000, 1000)
  114 +
  115 + # 检查页面可访问性
  116 + _check_page_accessible(page)
  117 +
  118 + # 加载全部评论
  119 + if load_all_comments:
  120 + try:
  121 + _load_all_comments(page, config)
  122 + except Exception as e:
  123 + logger.warning("加载全部评论失败: %s", e)
  124 +
  125 + return _extract_feed_detail(page, feed_id)
  126 +
  127 +
  128 +# ========== 页面检查 ==========
  129 +
  130 +
  131 +def _check_page_accessible(page: Page) -> None:
  132 + """检查页面是否可访问。"""
  133 + time.sleep(0.5)
  134 +
  135 + text = page.get_element_text(ACCESS_ERROR_WRAPPER)
  136 + if not text:
  137 + return
  138 +
  139 + text = text.strip()
  140 + for kw in _INACCESSIBLE_KEYWORDS:
  141 + if kw in text:
  142 + raise PageNotAccessibleError(kw)
  143 +
  144 + if text:
  145 + raise PageNotAccessibleError(text)
  146 +
  147 +
  148 +# ========== 数据提取 ==========
  149 +
  150 +
  151 +_EXTRACT_DETAIL_JS = """
  152 +(() => {
  153 + if (window.__INITIAL_STATE__ &&
  154 + window.__INITIAL_STATE__.note &&
  155 + window.__INITIAL_STATE__.note.noteDetailMap) {
  156 + return JSON.stringify(window.__INITIAL_STATE__.note.noteDetailMap);
  157 + }
  158 + return "";
  159 +})()
  160 +"""
  161 +
  162 +
  163 +def _extract_feed_detail(page: Page, feed_id: str) -> FeedDetailResponse:
  164 + """从 __INITIAL_STATE__ 提取 Feed 详情。"""
  165 + result = None
  166 + for _ in range(3):
  167 + result = page.evaluate(_EXTRACT_DETAIL_JS)
  168 + if result:
  169 + break
  170 + time.sleep(0.2)
  171 +
  172 + if not result:
  173 + raise NoFeedDetailError()
  174 +
  175 + note_detail_map = json.loads(result)
  176 + note_data = note_detail_map.get(feed_id)
  177 + if not note_data:
  178 + raise NoFeedDetailError()
  179 +
  180 + return FeedDetailResponse(
  181 + note=FeedDetail.from_dict(note_data.get("note", {})),
  182 + comments=CommentList.from_dict(note_data.get("comments", {})),
  183 + )
  184 +
  185 +
  186 +# ========== 评论加载状态机 ==========
  187 +
  188 +
  189 +def _load_all_comments(page: Page, config: CommentLoadConfig) -> None:
  190 + """加载全部评论的状态机。"""
  191 + max_attempts = (
  192 + config.max_comment_items * 3 if config.max_comment_items > 0 else DEFAULT_MAX_ATTEMPTS
  193 + )
  194 + scroll_interval = get_scroll_interval(config.scroll_speed)
  195 +
  196 + logger.info("开始加载评论...")
  197 + _scroll_to_comments_area(page)
  198 + sleep_random(*HUMAN_DELAY)
  199 +
  200 + # 检查是否无评论
  201 + if _check_no_comments(page):
  202 + logger.info("检测到无评论区域,跳过加载")
  203 + return
  204 +
  205 + # 状态
  206 + last_count = 0
  207 + last_scroll_top = 0
  208 + stagnant_checks = 0
  209 + total_clicked = 0
  210 + total_skipped = 0
  211 +
  212 + for attempt in range(max_attempts):
  213 + logger.debug("=== 尝试 %d/%d ===", attempt + 1, max_attempts)
  214 +
  215 + # 检查是否到达底部
  216 + if _check_end_container(page):
  217 + count = _get_comment_count(page)
  218 + logger.info(
  219 + "检测到 THE END,加载完成: %d 条评论, 点击: %d, 跳过: %d",
  220 + count,
  221 + total_clicked,
  222 + total_skipped,
  223 + )
  224 + return
  225 +
  226 + # 定期点击展开按钮
  227 + if config.click_more_replies and attempt % BUTTON_CLICK_INTERVAL == 0:
  228 + clicked, skipped = _click_show_more_buttons(page, config.max_replies_threshold)
  229 + total_clicked += clicked
  230 + total_skipped += skipped
  231 + if clicked > 0 or skipped > 0:
  232 + sleep_random(*READ_TIME)
  233 + # 第二轮
  234 + c2, s2 = _click_show_more_buttons(page, config.max_replies_threshold)
  235 + total_clicked += c2
  236 + total_skipped += s2
  237 + if c2 > 0 or s2 > 0:
  238 + sleep_random(*SHORT_READ)
  239 +
  240 + # 获取当前评论数
  241 + current_count = _get_comment_count(page)
  242 + if current_count != last_count:
  243 + logger.info("评论增加: %d -> %d", last_count, current_count)
  244 + last_count = current_count
  245 + stagnant_checks = 0
  246 + else:
  247 + stagnant_checks += 1
  248 +
  249 + # 检查是否达到目标
  250 + if config.max_comment_items > 0 and current_count >= config.max_comment_items:
  251 + logger.info("已达到目标评论数: %d/%d", current_count, config.max_comment_items)
  252 + return
  253 +
  254 + # 滚动
  255 + if current_count > 0:
  256 + _scroll_to_last_comment(page)
  257 + sleep_random(*POST_SCROLL)
  258 +
  259 + large_mode = stagnant_checks >= LARGE_SCROLL_TRIGGER
  260 + push_count = 1
  261 + if large_mode:
  262 + push_count = 3 + random.randint(0, 2)
  263 +
  264 + scroll_delta, current_scroll_top = _human_scroll(
  265 + page, config.scroll_speed, large_mode, push_count
  266 + )
  267 +
  268 + if scroll_delta < MIN_SCROLL_DELTA or current_scroll_top == last_scroll_top:
  269 + stagnant_checks += 1
  270 + else:
  271 + stagnant_checks = 0
  272 + last_scroll_top = current_scroll_top
  273 +
  274 + # 停滞处理
  275 + if stagnant_checks >= STAGNANT_LIMIT:
  276 + logger.info("停滞过多,尝试大冲刺...")
  277 + _human_scroll(page, config.scroll_speed, True, 10)
  278 + stagnant_checks = 0
  279 +
  280 + time.sleep(scroll_interval)
  281 +
  282 + # 最终冲刺
  283 + logger.info("达到最大尝试次数,最后冲刺...")
  284 + _human_scroll(page, config.scroll_speed, True, FINAL_SPRINT_PUSH_COUNT)
  285 + count = _get_comment_count(page)
  286 + logger.info("加载结束: %d 条评论, 点击: %d, 跳过: %d", count, total_clicked, total_skipped)
  287 +
  288 +
  289 +# ========== 滚动 ==========
  290 +
  291 +
  292 +def _human_scroll(
  293 + page: Page,
  294 + speed: str,
  295 + large_mode: bool,
  296 + push_count: int,
  297 +) -> tuple[int, int]:
  298 + """人类化滚动。
  299 +
  300 + Returns:
  301 + (actual_delta, current_scroll_top)
  302 + """
  303 + before_top = page.get_scroll_top()
  304 + viewport_height = page.get_viewport_height()
  305 +
  306 + base_ratio = get_scroll_ratio(speed)
  307 + if large_mode:
  308 + base_ratio *= 2.0
  309 +
  310 + actual_delta = 0
  311 + current_scroll_top = before_top
  312 +
  313 + for i in range(max(1, push_count)):
  314 + scroll_delta = calculate_scroll_delta(viewport_height, base_ratio)
  315 + page.scroll_by(0, int(scroll_delta))
  316 + sleep_random(*SCROLL_WAIT)
  317 +
  318 + current_scroll_top = page.get_scroll_top()
  319 + delta_this = current_scroll_top - before_top
  320 + actual_delta += delta_this
  321 + before_top = current_scroll_top
  322 +
  323 + if i < push_count - 1:
  324 + sleep_random(*HUMAN_DELAY)
  325 +
  326 + # 如果没有滚动,强制到底部
  327 + if actual_delta < MIN_SCROLL_DELTA and push_count > 0:
  328 + page.scroll_to_bottom()
  329 + sleep_random(*POST_SCROLL)
  330 + current_scroll_top = page.get_scroll_top()
  331 + actual_delta = current_scroll_top - (before_top - actual_delta)
  332 +
  333 + return actual_delta, current_scroll_top
  334 +
  335 +
  336 +def _scroll_to_comments_area(page: Page) -> None:
  337 + """滚动到评论区。"""
  338 + logger.info("滚动到评论区...")
  339 + page.scroll_element_into_view(".comments-container")
  340 + time.sleep(0.5)
  341 + # 触发懒加载
  342 + page.dispatch_wheel_event(100)
  343 +
  344 +
  345 +def _scroll_to_last_comment(page: Page) -> None:
  346 + """滚动到最后一条评论。"""
  347 + count = page.get_elements_count(PARENT_COMMENT)
  348 + if count > 0:
  349 + page.scroll_nth_element_into_view(PARENT_COMMENT, count - 1)
  350 +
  351 +
  352 +# ========== DOM 查询 ==========
  353 +
  354 +
  355 +def _get_comment_count(page: Page) -> int:
  356 + """获取当前评论数量。"""
  357 + return page.get_elements_count(PARENT_COMMENT)
  358 +
  359 +
  360 +def _get_total_comment_count(page: Page) -> int:
  361 + """获取总评论数(从 "共N条评论" 提取)。"""
  362 + text = page.get_element_text(".comments-container .total")
  363 + if not text:
  364 + return 0
  365 + match = _TOTAL_COMMENT_RE.search(text)
  366 + if match:
  367 + return int(match.group(1))
  368 + return 0
  369 +
  370 +
  371 +def _check_no_comments(page: Page) -> bool:
  372 + """检查是否无评论区域。"""
  373 + text = page.get_element_text(NO_COMMENTS_TEXT)
  374 + if not text:
  375 + return False
  376 + return "这是一片荒地" in text.strip()
  377 +
  378 +
  379 +def _check_end_container(page: Page) -> bool:
  380 + """检查是否到达底部 THE END。"""
  381 + text = page.get_element_text(END_CONTAINER)
  382 + if not text:
  383 + return False
  384 + upper = text.strip().upper()
  385 + return "THE END" in upper or "THEEND" in upper
  386 +
  387 +
  388 +# ========== 按钮点击 ==========
  389 +
  390 +
  391 +def _click_show_more_buttons(page: Page, max_threshold: int) -> tuple[int, int]:
  392 + """点击"展开N条回复"按钮。
  393 +
  394 + Returns:
  395 + (clicked, skipped)
  396 + """
  397 + count = page.get_elements_count(SHOW_MORE_BUTTON)
  398 + if count == 0:
  399 + return 0, 0
  400 +
  401 + max_click = MAX_CLICK_PER_ROUND + random.randint(0, MAX_CLICK_PER_ROUND - 1)
  402 + clicked = 0
  403 + skipped = 0
  404 +
  405 + for i in range(count):
  406 + if clicked >= max_click:
  407 + break
  408 +
  409 + # 获取按钮文本
  410 + text = page.evaluate(
  411 + f"document.querySelectorAll({json.dumps(SHOW_MORE_BUTTON)})[{i}]?.textContent || ''"
  412 + )
  413 + if not text:
  414 + continue
  415 +
  416 + # 检查是否应该跳过
  417 + if max_threshold > 0:
  418 + match = _REPLY_COUNT_RE.search(text)
  419 + if match:
  420 + reply_count = int(match.group(1))
  421 + if reply_count > max_threshold:
  422 + logger.debug(
  423 + "跳过 '%s'(回复数 %d > 阈值 %d)", text, reply_count, max_threshold
  424 + )
  425 + skipped += 1
  426 + continue
  427 +
  428 + # 滚动到按钮并点击
  429 + page.scroll_nth_element_into_view(SHOW_MORE_BUTTON, i)
  430 + sleep_random(*REACTION_TIME)
  431 + page.evaluate(f"document.querySelectorAll({json.dumps(SHOW_MORE_BUTTON)})[{i}]?.click()")
  432 + sleep_random(*READ_TIME)
  433 + clicked += 1
  434 +
  435 + return clicked, skipped
  1 +"""首页 Feed 列表,对应 Go xiaohongshu/feeds.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import time
  8 +
  9 +from .cdp import Page
  10 +from .errors import NoFeedsError
  11 +from .types import Feed
  12 +from .urls import HOME_URL
  13 +
  14 +logger = logging.getLogger(__name__)
  15 +
  16 +# 从 __INITIAL_STATE__ 提取 feeds 的 JS
  17 +_EXTRACT_FEEDS_JS = """
  18 +(() => {
  19 + if (window.__INITIAL_STATE__ &&
  20 + window.__INITIAL_STATE__.feed &&
  21 + window.__INITIAL_STATE__.feed.feeds) {
  22 + const feeds = window.__INITIAL_STATE__.feed.feeds;
  23 + const feedsData = feeds.value !== undefined ? feeds.value : feeds._value;
  24 + if (feedsData) {
  25 + return JSON.stringify(feedsData);
  26 + }
  27 + }
  28 + return "";
  29 +})()
  30 +"""
  31 +
  32 +
  33 +def list_feeds(page: Page) -> list[Feed]:
  34 + """获取首页 Feed 列表。
  35 +
  36 + Raises:
  37 + NoFeedsError: 没有捕获到 feeds 数据。
  38 + """
  39 + page.navigate(HOME_URL)
  40 + page.wait_for_load()
  41 + page.wait_dom_stable()
  42 + time.sleep(1)
  43 +
  44 + result = page.evaluate(_EXTRACT_FEEDS_JS)
  45 + if not result:
  46 + raise NoFeedsError()
  47 +
  48 + feeds_data = json.loads(result)
  49 + return [Feed.from_dict(f) for f in feeds_data]
  1 +"""人类行为模拟参数(延迟、滚动、悬停),对应 Go feed_detail.go 中的常量。"""
  2 +
  3 +import random
  4 +import time
  5 +
  6 +# ========== 配置常量 ==========
  7 +DEFAULT_MAX_ATTEMPTS = 500
  8 +STAGNANT_LIMIT = 20
  9 +MIN_SCROLL_DELTA = 10
  10 +MAX_CLICK_PER_ROUND = 3
  11 +STAGNANT_CHECK_THRESHOLD = 2
  12 +LARGE_SCROLL_TRIGGER = 5
  13 +BUTTON_CLICK_INTERVAL = 3
  14 +FINAL_SPRINT_PUSH_COUNT = 15
  15 +
  16 +# ========== 延迟范围(毫秒) ==========
  17 +HUMAN_DELAY = (300, 700)
  18 +REACTION_TIME = (300, 800)
  19 +HOVER_TIME = (100, 300)
  20 +READ_TIME = (500, 1200)
  21 +SHORT_READ = (600, 1200)
  22 +SCROLL_WAIT = (100, 200)
  23 +POST_SCROLL = (300, 500)
  24 +
  25 +
  26 +def sleep_random(min_ms: int, max_ms: int) -> None:
  27 + """随机延迟。"""
  28 + if max_ms <= min_ms:
  29 + time.sleep(min_ms / 1000.0)
  30 + return
  31 + delay = random.randint(min_ms, max_ms) / 1000.0
  32 + time.sleep(delay)
  33 +
  34 +
  35 +def get_scroll_interval(speed: str) -> float:
  36 + """根据速度获取滚动间隔(秒)。"""
  37 + if speed == "slow":
  38 + return (1200 + random.randint(0, 300)) / 1000.0
  39 + if speed == "fast":
  40 + return (300 + random.randint(0, 100)) / 1000.0
  41 + # normal
  42 + return (600 + random.randint(0, 200)) / 1000.0
  43 +
  44 +
  45 +def get_scroll_ratio(speed: str) -> float:
  46 + """根据速度获取滚动比例。"""
  47 + if speed == "slow":
  48 + return 0.5
  49 + if speed == "fast":
  50 + return 0.9
  51 + return 0.7
  52 +
  53 +
  54 +def calculate_scroll_delta(viewport_height: int, base_ratio: float) -> float:
  55 + """计算滚动距离。"""
  56 + scroll_delta = viewport_height * (base_ratio + random.random() * 0.2)
  57 + if scroll_delta < 400:
  58 + scroll_delta = 400.0
  59 + return scroll_delta + random.randint(-50, 50)
  60 +
  61 +
  62 +# 页面不可访问关键词
  63 +INACCESSIBLE_KEYWORDS = [
  64 + "当前笔记暂时无法浏览",
  65 + "该内容因违规已被删除",
  66 + "该笔记已被删除",
  67 + "内容不存在",
  68 + "笔记不存在",
  69 + "已失效",
  70 + "私密笔记",
  71 + "仅作者可见",
  72 + "因用户设置,你无法查看",
  73 + "因违规无法查看",
  74 +]
  1 +"""点赞/收藏操作,对应 Go xiaohongshu/like_favorite.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import time
  8 +
  9 +from .cdp import Page
  10 +from .errors import NoFeedDetailError
  11 +from .selectors import COLLECT_BUTTON, LIKE_BUTTON
  12 +from .types import ActionResult
  13 +from .urls import make_feed_detail_url
  14 +
  15 +logger = logging.getLogger(__name__)
  16 +
  17 +# 从 __INITIAL_STATE__ 读取互动状态的 JS
  18 +_GET_INTERACT_STATE_JS = """
  19 +(() => {
  20 + if (window.__INITIAL_STATE__ &&
  21 + window.__INITIAL_STATE__.note &&
  22 + window.__INITIAL_STATE__.note.noteDetailMap) {
  23 + return JSON.stringify(window.__INITIAL_STATE__.note.noteDetailMap);
  24 + }
  25 + return "";
  26 +})()
  27 +"""
  28 +
  29 +
  30 +def _get_interact_state(page: Page, feed_id: str) -> tuple[bool, bool]:
  31 + """读取笔记的点赞/收藏状态。
  32 +
  33 + Returns:
  34 + (liked, collected)
  35 +
  36 + Raises:
  37 + NoFeedDetailError: 无法获取状态。
  38 + """
  39 + result = page.evaluate(_GET_INTERACT_STATE_JS)
  40 + if not result:
  41 + raise NoFeedDetailError()
  42 +
  43 + note_detail_map = json.loads(result)
  44 + detail = note_detail_map.get(feed_id)
  45 + if not detail:
  46 + raise NoFeedDetailError()
  47 +
  48 + interact = detail.get("note", {}).get("interactInfo", {})
  49 + return interact.get("liked", False), interact.get("collected", False)
  50 +
  51 +
  52 +def _prepare_page(page: Page, feed_id: str, xsec_token: str) -> None:
  53 + """导航到 feed 详情页。"""
  54 + url = make_feed_detail_url(feed_id, xsec_token)
  55 + page.navigate(url)
  56 + page.wait_for_load()
  57 + page.wait_dom_stable()
  58 + time.sleep(1)
  59 +
  60 +
  61 +# ========== 点赞 ==========
  62 +
  63 +
  64 +def like_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult:
  65 + """点赞笔记(幂等:已点赞则跳过)。"""
  66 + _prepare_page(page, feed_id, xsec_token)
  67 + return _toggle_like(page, feed_id, target_liked=True)
  68 +
  69 +
  70 +def unlike_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult:
  71 + """取消点赞(幂等:未点赞则跳过)。"""
  72 + _prepare_page(page, feed_id, xsec_token)
  73 + return _toggle_like(page, feed_id, target_liked=False)
  74 +
  75 +
  76 +def _toggle_like(page: Page, feed_id: str, target_liked: bool) -> ActionResult:
  77 + """执行点赞/取消点赞操作。"""
  78 + action_name = "点赞" if target_liked else "取消点赞"
  79 +
  80 + try:
  81 + liked, _ = _get_interact_state(page, feed_id)
  82 + except NoFeedDetailError:
  83 + logger.warning("无法读取互动状态,直接点击")
  84 + liked = not target_liked # 强制执行点击
  85 +
  86 + # 幂等检查
  87 + if liked == target_liked:
  88 + logger.info("feed %s 已%s,跳过", feed_id, action_name)
  89 + return ActionResult(feed_id=feed_id, success=True, message=f"已{action_name}")
  90 +
  91 + # 点击
  92 + page.click_element(LIKE_BUTTON)
  93 + time.sleep(3)
  94 +
  95 + # 验证
  96 + try:
  97 + liked, _ = _get_interact_state(page, feed_id)
  98 + if liked == target_liked:
  99 + logger.info("feed %s %s成功", feed_id, action_name)
  100 + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}成功")
  101 + except NoFeedDetailError:
  102 + pass
  103 +
  104 + # 重试一次
  105 + logger.warning("feed %s %s可能未成功,重试", feed_id, action_name)
  106 + page.click_element(LIKE_BUTTON)
  107 + time.sleep(2)
  108 +
  109 + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}已执行")
  110 +
  111 +
  112 +# ========== 收藏 ==========
  113 +
  114 +
  115 +def favorite_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult:
  116 + """收藏笔记(幂等:已收藏则跳过)。"""
  117 + _prepare_page(page, feed_id, xsec_token)
  118 + return _toggle_favorite(page, feed_id, target_collected=True)
  119 +
  120 +
  121 +def unfavorite_feed(page: Page, feed_id: str, xsec_token: str) -> ActionResult:
  122 + """取消收藏(幂等:未收藏则跳过)。"""
  123 + _prepare_page(page, feed_id, xsec_token)
  124 + return _toggle_favorite(page, feed_id, target_collected=False)
  125 +
  126 +
  127 +def _toggle_favorite(page: Page, feed_id: str, target_collected: bool) -> ActionResult:
  128 + """执行收藏/取消收藏操作。"""
  129 + action_name = "收藏" if target_collected else "取消收藏"
  130 +
  131 + try:
  132 + _, collected = _get_interact_state(page, feed_id)
  133 + except NoFeedDetailError:
  134 + logger.warning("无法读取互动状态,直接点击")
  135 + collected = not target_collected
  136 +
  137 + # 幂等检查
  138 + if collected == target_collected:
  139 + logger.info("feed %s 已%s,跳过", feed_id, action_name)
  140 + return ActionResult(feed_id=feed_id, success=True, message=f"已{action_name}")
  141 +
  142 + # 点击
  143 + page.click_element(COLLECT_BUTTON)
  144 + time.sleep(3)
  145 +
  146 + # 验证
  147 + try:
  148 + _, collected = _get_interact_state(page, feed_id)
  149 + if collected == target_collected:
  150 + logger.info("feed %s %s成功", feed_id, action_name)
  151 + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}成功")
  152 + except NoFeedDetailError:
  153 + pass
  154 +
  155 + # 重试
  156 + logger.warning("feed %s %s可能未成功,重试", feed_id, action_name)
  157 + page.click_element(COLLECT_BUTTON)
  158 + time.sleep(2)
  159 +
  160 + return ActionResult(feed_id=feed_id, success=True, message=f"{action_name}已执行")
  1 +"""登录管理,对应 Go xiaohongshu/login.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import base64
  6 +import logging
  7 +import os
  8 +import tempfile
  9 +import time
  10 +
  11 +from .cdp import Page
  12 +from .selectors import LOGIN_STATUS, QRCODE_IMG
  13 +from .urls import EXPLORE_URL
  14 +
  15 +logger = logging.getLogger(__name__)
  16 +
  17 +
  18 +def check_login_status(page: Page) -> bool:
  19 + """检查登录状态。
  20 +
  21 + Returns:
  22 + True 已登录,False 未登录。
  23 + """
  24 + page.navigate(EXPLORE_URL)
  25 + page.wait_for_load()
  26 + time.sleep(1)
  27 +
  28 + return page.has_element(LOGIN_STATUS)
  29 +
  30 +
  31 +def fetch_qrcode(page: Page) -> tuple[str, bool]:
  32 + """获取登录二维码。
  33 +
  34 + Returns:
  35 + (qrcode_src, already_logged_in)
  36 + - 如果已登录,返回 ("", True)
  37 + - 如果未登录,返回 (qrcode_base64_or_url, False)
  38 + """
  39 + page.navigate(EXPLORE_URL)
  40 + page.wait_for_load()
  41 + time.sleep(2)
  42 +
  43 + # 检查是否已登录
  44 + if page.has_element(LOGIN_STATUS):
  45 + return "", True
  46 +
  47 + # 获取二维码图片 src
  48 + src = page.get_element_attribute(QRCODE_IMG, "src")
  49 + if not src:
  50 + raise RuntimeError("二维码图片 src 为空")
  51 +
  52 + return src, False
  53 +
  54 +
  55 +def save_qrcode_to_file(src: str) -> str:
  56 + """将二维码 data URL 保存为临时 PNG 文件。
  57 +
  58 + Args:
  59 + src: 二维码图片的 data URL(data:image/png;base64,...)或普通 URL。
  60 +
  61 + Returns:
  62 + 保存的文件绝对路径。
  63 + """
  64 + prefix = "data:image/png;base64,"
  65 + if src.startswith(prefix):
  66 + img_data = base64.b64decode(src[len(prefix) :])
  67 + elif src.startswith("data:image/"):
  68 + # 处理其他 MIME 类型,如 data:image/jpeg;base64,...
  69 + _, encoded = src.split(",", 1)
  70 + img_data = base64.b64decode(encoded)
  71 + else:
  72 + # 不是 data URL,无法保存
  73 + raise ValueError(f"不支持的二维码格式,需要 data URL: {src[:50]}...")
  74 +
  75 + qr_dir = os.path.join(tempfile.gettempdir(), "xhs")
  76 + os.makedirs(qr_dir, exist_ok=True)
  77 + filepath = os.path.join(qr_dir, "login_qrcode.png")
  78 +
  79 + with open(filepath, "wb") as f:
  80 + f.write(img_data)
  81 +
  82 + logger.info("二维码已保存: %s", filepath)
  83 + return filepath
  84 +
  85 +
  86 +def wait_for_login(page: Page, timeout: float = 120.0) -> bool:
  87 + """等待扫码登录完成。
  88 +
  89 + Args:
  90 + page: CDP 页面对象。
  91 + timeout: 超时时间(秒)。
  92 +
  93 + Returns:
  94 + True 登录成功,False 超时。
  95 + """
  96 + deadline = time.monotonic() + timeout
  97 + while time.monotonic() < deadline:
  98 + if page.has_element(LOGIN_STATUS):
  99 + logger.info("登录成功")
  100 + return True
  101 + time.sleep(0.5)
  102 + return False
  1 +"""图文发布,对应 Go xiaohongshu/publish.go(837 行)。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import random
  8 +import time
  9 +
  10 +from .cdp import Page
  11 +from .errors import ContentTooLongError, PublishError, TitleTooLongError, UploadTimeoutError
  12 +from .selectors import (
  13 + CONTENT_EDITOR,
  14 + CONTENT_LENGTH_ERROR,
  15 + CREATOR_TAB,
  16 + DATETIME_INPUT,
  17 + FILE_INPUT,
  18 + IMAGE_PREVIEW,
  19 + ORIGINAL_SWITCH,
  20 + ORIGINAL_SWITCH_CARD,
  21 + POPOVER,
  22 + PUBLISH_BUTTON,
  23 + SCHEDULE_SWITCH,
  24 + TAG_FIRST_ITEM,
  25 + TAG_TOPIC_CONTAINER,
  26 + TITLE_INPUT,
  27 + TITLE_MAX_SUFFIX,
  28 + UPLOAD_CONTENT,
  29 + UPLOAD_INPUT,
  30 + VISIBILITY_DROPDOWN,
  31 + VISIBILITY_OPTIONS,
  32 +)
  33 +from .types import PublishImageContent
  34 +from .urls import PUBLISH_URL
  35 +
  36 +logger = logging.getLogger(__name__)
  37 +
  38 +
  39 +def publish_image_content(page: Page, content: PublishImageContent) -> None:
  40 + """发布图文内容。
  41 +
  42 + Args:
  43 + page: CDP 页面对象。
  44 + content: 发布内容。
  45 +
  46 + Raises:
  47 + PublishError: 发布失败。
  48 + UploadTimeoutError: 上传超时。
  49 + TitleTooLongError: 标题超长。
  50 + ContentTooLongError: 正文超长。
  51 + """
  52 + if not content.image_paths:
  53 + raise PublishError("图片不能为空")
  54 +
  55 + # 导航到发布页
  56 + _navigate_to_publish_page(page)
  57 +
  58 + # 点击"上传图文" TAB
  59 + _click_publish_tab(page, "上传图文")
  60 + time.sleep(1)
  61 +
  62 + # 上传图片
  63 + _upload_images(page, content.image_paths)
  64 +
  65 + # 标签截取
  66 + tags = content.tags[:10] if len(content.tags) > 10 else content.tags
  67 + if len(content.tags) > 10:
  68 + logger.warning("标签数量超过10,截取前10个")
  69 +
  70 + logger.info(
  71 + "发布内容: title=%s, images=%d, tags=%d, schedule=%s, original=%s, visibility=%s",
  72 + content.title,
  73 + len(content.image_paths),
  74 + len(tags),
  75 + content.schedule_time,
  76 + content.is_original,
  77 + content.visibility,
  78 + )
  79 +
  80 + # 提交发布
  81 + _submit_publish(
  82 + page,
  83 + content.title,
  84 + content.content,
  85 + tags,
  86 + content.schedule_time,
  87 + content.is_original,
  88 + content.visibility,
  89 + )
  90 +
  91 +
  92 +# ========== 页面导航 ==========
  93 +
  94 +
  95 +def _navigate_to_publish_page(page: Page) -> None:
  96 + """导航到发布页面。"""
  97 + page.navigate(PUBLISH_URL)
  98 + page.wait_for_load(timeout=300)
  99 + time.sleep(2)
  100 + page.wait_dom_stable()
  101 + time.sleep(1)
  102 +
  103 +
  104 +def _click_publish_tab(page: Page, tab_name: str) -> None:
  105 + """点击发布页 TAB(上传图文/上传视频)。"""
  106 + page.wait_for_element(UPLOAD_CONTENT, timeout=15)
  107 +
  108 + deadline = time.monotonic() + 15
  109 + while time.monotonic() < deadline:
  110 + # 查找匹配的 TAB
  111 + found = page.evaluate(
  112 + f"""
  113 + (() => {{
  114 + const tabs = document.querySelectorAll({json.dumps(CREATOR_TAB)});
  115 + for (const tab of tabs) {{
  116 + if (tab.textContent.trim() === {json.dumps(tab_name)}) {{
  117 + // 检查是否被遮挡
  118 + const rect = tab.getBoundingClientRect();
  119 + if (rect.width === 0 || rect.height === 0) continue;
  120 + const x = rect.left + rect.width / 2;
  121 + const y = rect.top + rect.height / 2;
  122 + const target = document.elementFromPoint(x, y);
  123 + if (target === tab || tab.contains(target)) {{
  124 + tab.click();
  125 + return 'clicked';
  126 + }}
  127 + return 'blocked';
  128 + }}
  129 + }}
  130 + return 'not_found';
  131 + }})()
  132 + """
  133 + )
  134 +
  135 + if found == "clicked":
  136 + return
  137 +
  138 + if found == "blocked":
  139 + # 尝试移除弹窗
  140 + _remove_pop_cover(page)
  141 +
  142 + time.sleep(0.2)
  143 +
  144 + raise PublishError(f"没有找到发布 TAB - {tab_name}")
  145 +
  146 +
  147 +def _remove_pop_cover(page: Page) -> None:
  148 + """移除弹窗遮挡。"""
  149 + if page.has_element(POPOVER):
  150 + page.remove_element(POPOVER)
  151 + # 点击空位置
  152 + x = 380 + random.randint(0, 100)
  153 + y = 20 + random.randint(0, 60)
  154 + page.mouse_click(float(x), float(y))
  155 +
  156 +
  157 +# ========== 图片上传 ==========
  158 +
  159 +
  160 +def _upload_images(page: Page, image_paths: list[str]) -> None:
  161 + """逐张上传图片。"""
  162 + import os
  163 +
  164 + valid_paths = [p for p in image_paths if os.path.exists(p)]
  165 + if not valid_paths:
  166 + raise PublishError("没有有效的图片文件")
  167 +
  168 + for i, path in enumerate(valid_paths):
  169 + selector = UPLOAD_INPUT if i == 0 else FILE_INPUT
  170 + logger.info("上传第 %d 张图片: %s", i + 1, path)
  171 +
  172 + page.set_file_input(selector, [path])
  173 + _wait_for_upload_complete(page, i + 1)
  174 + time.sleep(1)
  175 +
  176 +
  177 +def _wait_for_upload_complete(page: Page, expected_count: int) -> None:
  178 + """等待图片上传完成。"""
  179 + max_wait = 60.0
  180 + start = time.monotonic()
  181 +
  182 + while time.monotonic() - start < max_wait:
  183 + count = page.get_elements_count(IMAGE_PREVIEW)
  184 + if count >= expected_count:
  185 + logger.info("图片上传完成: %d", count)
  186 + return
  187 + time.sleep(0.5)
  188 +
  189 + raise UploadTimeoutError(f"第{expected_count}张图片上传超时(60s)")
  190 +
  191 +
  192 +# ========== 表单提交 ==========
  193 +
  194 +
  195 +def _submit_publish(
  196 + page: Page,
  197 + title: str,
  198 + content: str,
  199 + tags: list[str],
  200 + schedule_time: str | None,
  201 + is_original: bool,
  202 + visibility: str,
  203 +) -> None:
  204 + """填写表单并提交。"""
  205 + # 标题
  206 + page.input_text(TITLE_INPUT, title)
  207 + time.sleep(0.5)
  208 + _check_title_max_length(page)
  209 + logger.info("标题长度检查通过")
  210 + time.sleep(1)
  211 +
  212 + # 正文
  213 + content_selector = _find_content_element(page)
  214 + page.input_content_editable(content_selector, content)
  215 +
  216 + # 回点标题(增强稳定性)
  217 + time.sleep(1)
  218 + page.click_element(TITLE_INPUT)
  219 + logger.info("已回点标题输入框")
  220 +
  221 + # 标签
  222 + if tags:
  223 + _input_tags(page, content_selector, tags)
  224 + time.sleep(1)
  225 + _check_content_max_length(page)
  226 + logger.info("正文长度检查通过")
  227 +
  228 + # 定时发布
  229 + if schedule_time:
  230 + _set_schedule_publish(page, schedule_time)
  231 +
  232 + # 可见范围
  233 + _set_visibility(page, visibility)
  234 +
  235 + # 原创声明
  236 + if is_original:
  237 + try:
  238 + _set_original(page)
  239 + logger.info("已声明原创")
  240 + except Exception as e:
  241 + logger.warning("设置原创声明失败: %s", e)
  242 +
  243 + # 点击发布
  244 + page.click_element(PUBLISH_BUTTON)
  245 + time.sleep(3)
  246 + logger.info("发布完成")
  247 +
  248 +
  249 +def _find_content_element(page: Page) -> str:
  250 + """查找内容输入框(兼容两种 UI)。"""
  251 + if page.has_element(CONTENT_EDITOR):
  252 + return CONTENT_EDITOR
  253 +
  254 + # 查找带 placeholder 的 p 元素的 textbox 父元素
  255 + found = page.evaluate(
  256 + """
  257 + (() => {
  258 + const ps = document.querySelectorAll('p');
  259 + for (const p of ps) {
  260 + const placeholder = p.getAttribute('data-placeholder');
  261 + if (placeholder && placeholder.includes('输入正文描述')) {
  262 + let current = p;
  263 + for (let i = 0; i < 5; i++) {
  264 + current = current.parentElement;
  265 + if (!current) break;
  266 + if (current.getAttribute('role') === 'textbox') {
  267 + return 'found';
  268 + }
  269 + }
  270 + }
  271 + }
  272 + return '';
  273 + })()
  274 + """
  275 + )
  276 + if found == "found":
  277 + return "[role='textbox']"
  278 +
  279 + raise PublishError("没有找到内容输入框")
  280 +
  281 +
  282 +def _check_title_max_length(page: Page) -> None:
  283 + """检查标题长度是否超限。"""
  284 + text = page.get_element_text(TITLE_MAX_SUFFIX)
  285 + if text:
  286 + parts = text.split("/")
  287 + if len(parts) == 2:
  288 + raise TitleTooLongError(parts[0], parts[1])
  289 + raise TitleTooLongError(text, "?")
  290 +
  291 +
  292 +def _check_content_max_length(page: Page) -> None:
  293 + """检查正文长度是否超限。"""
  294 + text = page.get_element_text(CONTENT_LENGTH_ERROR)
  295 + if text:
  296 + parts = text.split("/")
  297 + if len(parts) == 2:
  298 + raise ContentTooLongError(parts[0], parts[1])
  299 + raise ContentTooLongError(text, "?")
  300 +
  301 +
  302 +# ========== 标签输入 ==========
  303 +
  304 +
  305 +def _input_tags(page: Page, content_selector: str, tags: list[str]) -> None:
  306 + """输入标签。"""
  307 + time.sleep(1)
  308 +
  309 + # 移动光标到正文末尾(20次 ArrowDown)
  310 + for _ in range(20):
  311 + page.press_key("ArrowDown")
  312 + time.sleep(0.01)
  313 +
  314 + # 按两次回车换行
  315 + page.press_key("Enter")
  316 + page.press_key("Enter")
  317 + time.sleep(1)
  318 +
  319 + for tag in tags:
  320 + tag = tag.lstrip("#")
  321 + _input_single_tag(page, content_selector, tag)
  322 +
  323 +
  324 +def _input_single_tag(page: Page, content_selector: str, tag: str) -> None:
  325 + """输入单个标签。"""
  326 + # 输入 #
  327 + page.type_text("#", delay_ms=0)
  328 + time.sleep(0.2)
  329 +
  330 + # 逐字输入标签
  331 + for char in tag:
  332 + page.type_text(char, delay_ms=50)
  333 +
  334 + time.sleep(1)
  335 +
  336 + # 尝试点击标签联想
  337 + if page.has_element(TAG_TOPIC_CONTAINER):
  338 + item_selector = f"{TAG_TOPIC_CONTAINER} {TAG_FIRST_ITEM}"
  339 + if page.has_element(item_selector):
  340 + page.click_element(item_selector)
  341 + logger.info("点击标签联想: %s", tag)
  342 + time.sleep(0.5)
  343 + return
  344 +
  345 + # 没有联想,直接空格
  346 + logger.warning("未找到标签联想,直接输入空格: %s", tag)
  347 + page.type_text(" ", delay_ms=0)
  348 + time.sleep(0.5)
  349 +
  350 +
  351 +# ========== 定时发布 ==========
  352 +
  353 +
  354 +def _set_schedule_publish(page: Page, schedule_time: str) -> None:
  355 + """设置定时发布。"""
  356 + from datetime import datetime
  357 +
  358 + # 解析 ISO8601 时间
  359 + try:
  360 + dt = datetime.fromisoformat(schedule_time)
  361 + except ValueError as e:
  362 + raise PublishError(f"定时发布时间格式错误: {e}") from e
  363 +
  364 + # 点击定时发布开关
  365 + page.click_element(SCHEDULE_SWITCH)
  366 + time.sleep(0.8)
  367 +
  368 + # 设置日期时间
  369 + datetime_str = dt.strftime("%Y-%m-%d %H:%M")
  370 + page.select_all_text(DATETIME_INPUT)
  371 + page.input_text(DATETIME_INPUT, datetime_str)
  372 + time.sleep(0.5)
  373 +
  374 + logger.info("已设置定时发布: %s", datetime_str)
  375 +
  376 +
  377 +# ========== 可见范围 ==========
  378 +
  379 +
  380 +def _set_visibility(page: Page, visibility: str) -> None:
  381 + """设置可见范围。"""
  382 + if not visibility or visibility == "公开可见":
  383 + logger.info("可见范围: 公开可见(默认)")
  384 + return
  385 +
  386 + supported = {"仅自己可见", "仅互关好友可见"}
  387 + if visibility not in supported:
  388 + raise PublishError(
  389 + f"不支持的可见范围: {visibility},支持: 公开可见、仅自己可见、仅互关好友可见"
  390 + )
  391 +
  392 + # 点击下拉框
  393 + page.click_element(VISIBILITY_DROPDOWN)
  394 + time.sleep(0.5)
  395 +
  396 + # 查找并点击目标选项
  397 + clicked = page.evaluate(
  398 + f"""
  399 + (() => {{
  400 + const opts = document.querySelectorAll({json.dumps(VISIBILITY_OPTIONS)});
  401 + for (const opt of opts) {{
  402 + if (opt.textContent.includes({json.dumps(visibility)})) {{
  403 + opt.click();
  404 + return true;
  405 + }}
  406 + }}
  407 + return false;
  408 + }})()
  409 + """
  410 + )
  411 +
  412 + if not clicked:
  413 + raise PublishError(f"未找到可见范围选项: {visibility}")
  414 +
  415 + logger.info("已设置可见范围: %s", visibility)
  416 + time.sleep(0.2)
  417 +
  418 +
  419 +# ========== 原创声明 ==========
  420 +
  421 +
  422 +def _set_original(page: Page) -> None:
  423 + """设置原创声明。"""
  424 + # 查找原创声明卡片并点击开关
  425 + result = page.evaluate(
  426 + f"""
  427 + (() => {{
  428 + const cards = document.querySelectorAll({json.dumps(ORIGINAL_SWITCH_CARD)});
  429 + for (const card of cards) {{
  430 + if (!card.textContent.includes('原创声明')) continue;
  431 + const sw = card.querySelector({json.dumps(ORIGINAL_SWITCH)});
  432 + if (!sw) continue;
  433 + const input = sw.querySelector('input[type="checkbox"]');
  434 + if (input && input.checked) return 'already_on';
  435 + sw.click();
  436 + return 'clicked';
  437 + }}
  438 + return 'not_found';
  439 + }})()
  440 + """
  441 + )
  442 +
  443 + if result == "already_on":
  444 + logger.info("原创声明已开启")
  445 + return
  446 +
  447 + if result == "not_found":
  448 + raise PublishError("未找到原创声明选项")
  449 +
  450 + time.sleep(0.5)
  451 +
  452 + # 处理确认弹窗
  453 + _confirm_original_declaration(page)
  454 +
  455 +
  456 +def _confirm_original_declaration(page: Page) -> None:
  457 + """处理原创声明确认弹窗。"""
  458 + time.sleep(0.8)
  459 +
  460 + # 勾选 checkbox
  461 + page.evaluate(
  462 + """
  463 + (() => {
  464 + const footers = document.querySelectorAll('div.footer');
  465 + for (const footer of footers) {
  466 + if (!footer.textContent.includes('原创声明须知')) continue;
  467 + const cb = footer.querySelector('div.d-checkbox input[type="checkbox"]');
  468 + if (cb && !cb.checked) cb.click();
  469 + return;
  470 + }
  471 + })()
  472 + """
  473 + )
  474 + time.sleep(0.5)
  475 +
  476 + # 点击声明原创按钮
  477 + result = page.evaluate(
  478 + """
  479 + (() => {
  480 + const footers = document.querySelectorAll('div.footer');
  481 + for (const footer of footers) {
  482 + if (!footer.textContent.includes('声明原创')) continue;
  483 + const btn = footer.querySelector('button.custom-button');
  484 + if (btn) {
  485 + if (btn.classList.contains('disabled') || btn.disabled) {
  486 + const cb = footer.querySelector('div.d-checkbox input[type="checkbox"]');
  487 + if (cb && !cb.checked) cb.click();
  488 + return 'button_disabled';
  489 + }
  490 + btn.click();
  491 + return 'clicked';
  492 + }
  493 + }
  494 + return 'button_not_found';
  495 + })()
  496 + """
  497 + )
  498 +
  499 + if result == "button_not_found":
  500 + raise PublishError("未找到声明原创按钮")
  501 + if result == "button_disabled":
  502 + raise PublishError("声明原创按钮仍处于禁用状态")
  503 +
  504 + logger.info("已成功点击声明原创按钮")
  505 + time.sleep(0.3)
  1 +"""视频发布,对应 Go xiaohongshu/publish_video.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import logging
  6 +import os
  7 +import time
  8 +
  9 +from .cdp import Page
  10 +from .errors import PublishError, UploadTimeoutError
  11 +from .publish import (
  12 + _click_publish_tab,
  13 + _find_content_element,
  14 + _input_tags,
  15 + _navigate_to_publish_page,
  16 + _set_schedule_publish,
  17 + _set_visibility,
  18 +)
  19 +from .selectors import (
  20 + FILE_INPUT,
  21 + PUBLISH_BUTTON,
  22 + TITLE_INPUT,
  23 + UPLOAD_INPUT,
  24 +)
  25 +from .types import PublishVideoContent
  26 +
  27 +logger = logging.getLogger(__name__)
  28 +
  29 +
  30 +def publish_video_content(page: Page, content: PublishVideoContent) -> None:
  31 + """发布视频内容。
  32 +
  33 + Args:
  34 + page: CDP 页面对象。
  35 + content: 视频发布内容。
  36 +
  37 + Raises:
  38 + PublishError: 发布失败。
  39 + UploadTimeoutError: 上传/处理超时。
  40 + """
  41 + if not content.video_path:
  42 + raise PublishError("视频不能为空")
  43 +
  44 + # 导航到发布页
  45 + _navigate_to_publish_page(page)
  46 +
  47 + # 点击"上传视频" TAB
  48 + _click_publish_tab(page, "上传视频")
  49 + time.sleep(1)
  50 +
  51 + # 上传视频
  52 + _upload_video(page, content.video_path)
  53 +
  54 + # 提交
  55 + _submit_publish_video(
  56 + page,
  57 + content.title,
  58 + content.content,
  59 + content.tags,
  60 + content.schedule_time,
  61 + content.visibility,
  62 + )
  63 +
  64 +
  65 +def _upload_video(page: Page, video_path: str) -> None:
  66 + """上传视频文件。"""
  67 + if not os.path.exists(video_path):
  68 + raise PublishError(f"视频文件不存在: {video_path}")
  69 +
  70 + # 查找上传输入框
  71 + selector = UPLOAD_INPUT if page.has_element(UPLOAD_INPUT) else FILE_INPUT
  72 + page.set_file_input(selector, [video_path])
  73 +
  74 + # 等待发布按钮可点击(视频处理完成)
  75 + _wait_for_publish_button_clickable(page)
  76 + logger.info("视频上传/处理完成")
  77 +
  78 +
  79 +def _wait_for_publish_button_clickable(page: Page) -> None:
  80 + """等待发布按钮可点击(视频处理可能需要较长时间)。"""
  81 + max_wait = 600.0 # 10 分钟
  82 + start = time.monotonic()
  83 +
  84 + logger.info("开始等待发布按钮可点击(视频)")
  85 +
  86 + while time.monotonic() - start < max_wait:
  87 + clickable = page.evaluate(
  88 + f"""
  89 + (() => {{
  90 + const btn = document.querySelector({_js_str(PUBLISH_BUTTON)});
  91 + if (!btn) return false;
  92 + const rect = btn.getBoundingClientRect();
  93 + if (rect.width === 0 || rect.height === 0) return false;
  94 + if (btn.disabled) return false;
  95 + if (btn.classList.contains('disabled')) return false;
  96 + return true;
  97 + }})()
  98 + """
  99 + )
  100 + if clickable:
  101 + return
  102 + time.sleep(1)
  103 +
  104 + raise UploadTimeoutError("等待发布按钮可点击超时(10分钟)")
  105 +
  106 +
  107 +def _submit_publish_video(
  108 + page: Page,
  109 + title: str,
  110 + content: str,
  111 + tags: list[str],
  112 + schedule_time: str | None,
  113 + visibility: str,
  114 +) -> None:
  115 + """填写视频表单并提交。"""
  116 + # 标题
  117 + page.input_text(TITLE_INPUT, title)
  118 + time.sleep(1)
  119 +
  120 + # 正文 + 标签
  121 + content_selector = _find_content_element(page)
  122 + page.input_content_editable(content_selector, content)
  123 +
  124 + # 回点标题
  125 + time.sleep(1)
  126 + page.click_element(TITLE_INPUT)
  127 +
  128 + if tags:
  129 + _input_tags(page, content_selector, tags)
  130 + time.sleep(1)
  131 +
  132 + # 定时发布
  133 + if schedule_time:
  134 + _set_schedule_publish(page, schedule_time)
  135 +
  136 + # 可见范围
  137 + _set_visibility(page, visibility)
  138 +
  139 + # 等待发布按钮可点击
  140 + _wait_for_publish_button_clickable(page)
  141 +
  142 + # 点击发布
  143 + page.click_element(PUBLISH_BUTTON)
  144 + time.sleep(3)
  145 + logger.info("视频发布完成")
  146 +
  147 +
  148 +def _js_str(s: str) -> str:
  149 + """将 Python 字符串转为 JS 字面量。"""
  150 + import json
  151 +
  152 + return json.dumps(s)
  1 +"""搜索 Feeds,对应 Go xiaohongshu/search.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +import json
  6 +import logging
  7 +import time
  8 +
  9 +from .cdp import Page
  10 +from .errors import NoFeedsError
  11 +from .selectors import FILTER_BUTTON, FILTER_PANEL
  12 +from .types import Feed, FilterOption
  13 +from .urls import make_search_url
  14 +
  15 +logger = logging.getLogger(__name__)
  16 +
  17 +# 筛选选项映射表:{筛选组索引: [(标签索引, 文本), ...]}
  18 +_FILTER_OPTIONS: dict[int, list[tuple[int, str]]] = {
  19 + 1: [(1, "综合"), (2, "最新"), (3, "最多点赞"), (4, "最多评论"), (5, "最多收藏")],
  20 + 2: [(1, "不限"), (2, "视频"), (3, "图文")],
  21 + 3: [(1, "不限"), (2, "一天内"), (3, "一周内"), (4, "半年内")],
  22 + 4: [(1, "不限"), (2, "已看过"), (3, "未看过"), (4, "已关注")],
  23 + 5: [(1, "不限"), (2, "同城"), (3, "附近")],
  24 +}
  25 +
  26 +# 从 __INITIAL_STATE__ 提取搜索结果的 JS
  27 +_EXTRACT_SEARCH_JS = """
  28 +(() => {
  29 + if (window.__INITIAL_STATE__ &&
  30 + window.__INITIAL_STATE__.search &&
  31 + window.__INITIAL_STATE__.search.feeds) {
  32 + const feeds = window.__INITIAL_STATE__.search.feeds;
  33 + const feedsData = feeds.value !== undefined ? feeds.value : feeds._value;
  34 + if (feedsData) {
  35 + return JSON.stringify(feedsData);
  36 + }
  37 + }
  38 + return "";
  39 +})()
  40 +"""
  41 +
  42 +
  43 +def _find_internal_option(group_index: int, text: str) -> tuple[int, int]:
  44 + """查找内部筛选选项索引。
  45 +
  46 + Returns:
  47 + (filters_index, tags_index)
  48 +
  49 + Raises:
  50 + ValueError: 未找到匹配的选项。
  51 + """
  52 + options = _FILTER_OPTIONS.get(group_index)
  53 + if not options:
  54 + raise ValueError(f"筛选组 {group_index} 不存在")
  55 +
  56 + for tags_index, option_text in options:
  57 + if option_text == text:
  58 + return group_index, tags_index
  59 +
  60 + valid = [t for _, t in options]
  61 + raise ValueError(f"在筛选组 {group_index} 中未找到 '{text}',有效值: {valid}")
  62 +
  63 +
  64 +def _convert_filters(filter_opt: FilterOption) -> list[tuple[int, int]]:
  65 + """将 FilterOption 转换为内部 (filters_index, tags_index) 列表。"""
  66 + result: list[tuple[int, int]] = []
  67 +
  68 + if filter_opt.sort_by:
  69 + result.append(_find_internal_option(1, filter_opt.sort_by))
  70 + if filter_opt.note_type:
  71 + result.append(_find_internal_option(2, filter_opt.note_type))
  72 + if filter_opt.publish_time:
  73 + result.append(_find_internal_option(3, filter_opt.publish_time))
  74 + if filter_opt.search_scope:
  75 + result.append(_find_internal_option(4, filter_opt.search_scope))
  76 + if filter_opt.location:
  77 + result.append(_find_internal_option(5, filter_opt.location))
  78 +
  79 + return result
  80 +
  81 +
  82 +def search_feeds(
  83 + page: Page,
  84 + keyword: str,
  85 + filter_option: FilterOption | None = None,
  86 +) -> list[Feed]:
  87 + """搜索 Feeds。
  88 +
  89 + Args:
  90 + page: CDP 页面对象。
  91 + keyword: 搜索关键词。
  92 + filter_option: 可选筛选条件。
  93 +
  94 + Raises:
  95 + NoFeedsError: 没有捕获到搜索结果。
  96 + ValueError: 筛选选项无效。
  97 + """
  98 + search_url = make_search_url(keyword)
  99 + page.navigate(search_url)
  100 + page.wait_for_load()
  101 + page.wait_dom_stable()
  102 +
  103 + # 等待 __INITIAL_STATE__ 初始化
  104 + _wait_for_initial_state(page)
  105 +
  106 + # 应用筛选条件
  107 + if filter_option:
  108 + internal_filters = _convert_filters(filter_option)
  109 + if internal_filters:
  110 + _apply_filters(page, internal_filters)
  111 +
  112 + # 提取搜索结果
  113 + result = page.evaluate(_EXTRACT_SEARCH_JS)
  114 + if not result:
  115 + raise NoFeedsError()
  116 +
  117 + feeds_data = json.loads(result)
  118 + return [Feed.from_dict(f) for f in feeds_data]
  119 +
  120 +
  121 +def _wait_for_initial_state(page: Page, timeout: float = 10.0) -> None:
  122 + """等待 __INITIAL_STATE__ 就绪。"""
  123 + deadline = time.monotonic() + timeout
  124 + while time.monotonic() < deadline:
  125 + ready = page.evaluate("window.__INITIAL_STATE__ !== undefined")
  126 + if ready:
  127 + return
  128 + time.sleep(0.5)
  129 + logger.warning("等待 __INITIAL_STATE__ 超时")
  130 +
  131 +
  132 +def _apply_filters(page: Page, filters: list[tuple[int, int]]) -> None:
  133 + """应用筛选条件。"""
  134 + # 悬停筛选按钮
  135 + page.hover_element(FILTER_BUTTON)
  136 +
  137 + # 等待筛选面板出现
  138 + deadline = time.monotonic() + 5.0
  139 + while time.monotonic() < deadline:
  140 + if page.has_element(FILTER_PANEL):
  141 + break
  142 + time.sleep(0.3)
  143 +
  144 + # 点击各筛选项
  145 + for filters_index, tags_index in filters:
  146 + selector = (
  147 + f"div.filter-panel div.filters:nth-child({filters_index}) "
  148 + f"div.tags:nth-child({tags_index})"
  149 + )
  150 + page.click_element(selector)
  151 + time.sleep(0.3)
  152 +
  153 + # 等待页面更新
  154 + page.wait_dom_stable()
  155 + _wait_for_initial_state(page)
  1 +"""小红书页面 CSS 选择器常量。"""
  2 +
  3 +# ========== 登录 ==========
  4 +LOGIN_STATUS = ".main-container .user .link-wrapper .channel"
  5 +QRCODE_IMG = ".login-container .qrcode-img"
  6 +
  7 +# ========== 首页 / 搜索 ==========
  8 +FILTER_BUTTON = "div.filter"
  9 +FILTER_PANEL = "div.filter-panel"
  10 +
  11 +# ========== Feed 详情 ==========
  12 +COMMENTS_CONTAINER = ".comments-container"
  13 +PARENT_COMMENT = ".parent-comment"
  14 +NO_COMMENTS_TEXT = ".no-comments-text"
  15 +END_CONTAINER = ".end-container"
  16 +TOTAL_COMMENT = ".comments-container .total"
  17 +SHOW_MORE_BUTTON = ".show-more"
  18 +NOTE_SCROLLER = ".note-scroller"
  19 +INTERACTION_CONTAINER = ".interaction-container"
  20 +
  21 +# 页面不可访问容器
  22 +ACCESS_ERROR_WRAPPER = ".access-wrapper, .error-wrapper, .not-found-wrapper, .blocked-wrapper"
  23 +
  24 +# ========== 评论输入 ==========
  25 +COMMENT_INPUT_TRIGGER = "div.input-box div.content-edit span"
  26 +COMMENT_INPUT_FIELD = "div.input-box div.content-edit p.content-input"
  27 +COMMENT_SUBMIT_BUTTON = "div.bottom button.submit"
  28 +REPLY_BUTTON = ".right .interactions .reply"
  29 +
  30 +# ========== 点赞 / 收藏 ==========
  31 +LIKE_BUTTON = ".interact-container .left .like-lottie"
  32 +COLLECT_BUTTON = ".interact-container .left .reds-icon.collect-icon"
  33 +
  34 +# ========== 发布页 ==========
  35 +UPLOAD_CONTENT = "div.upload-content"
  36 +CREATOR_TAB = "div.creator-tab"
  37 +UPLOAD_INPUT = ".upload-input"
  38 +FILE_INPUT = 'input[type="file"]'
  39 +TITLE_INPUT = "div.d-input input"
  40 +CONTENT_EDITOR = "div.ql-editor"
  41 +IMAGE_PREVIEW = ".img-preview-area .pr"
  42 +PUBLISH_BUTTON = ".publish-page-publish-btn button.bg-red"
  43 +
  44 +# 标题/正文长度校验
  45 +TITLE_MAX_SUFFIX = "div.title-container div.max_suffix"
  46 +CONTENT_LENGTH_ERROR = "div.edit-container div.length-error"
  47 +
  48 +# 可见范围
  49 +VISIBILITY_DROPDOWN = "div.permission-card-wrapper div.d-select-content"
  50 +VISIBILITY_OPTIONS = "div.d-options-wrapper div.d-grid-item div.custom-option"
  51 +
  52 +# 定时发布
  53 +SCHEDULE_SWITCH = ".post-time-wrapper .d-switch"
  54 +DATETIME_INPUT = ".date-picker-container input"
  55 +
  56 +# 原创声明
  57 +ORIGINAL_SWITCH_CARD = "div.custom-switch-card"
  58 +ORIGINAL_SWITCH = "div.d-switch"
  59 +
  60 +# 标签联想
  61 +TAG_TOPIC_CONTAINER = "#creator-editor-topic-container"
  62 +TAG_FIRST_ITEM = ".item"
  63 +
  64 +# 弹窗
  65 +POPOVER = "div.d-popover"
  66 +
  67 +# ========== 用户主页 ==========
  68 +SIDEBAR_PROFILE = "div.main-container li.user.side-bar-component a.link-wrapper span.channel"
  1 +"""反检测 JS 注入 + Chrome 启动参数,对应 go-rod/stealth。"""
  2 +
  3 +# 反检测 JS 脚本:在页面加载时注入
  4 +STEALTH_JS = """
  5 +(() => {
  6 + // 1. navigator.webdriver
  7 + Object.defineProperty(navigator, 'webdriver', {
  8 + get: () => undefined,
  9 + configurable: true,
  10 + });
  11 +
  12 + // 2. chrome.runtime
  13 + if (!window.chrome) {
  14 + window.chrome = {};
  15 + }
  16 + if (!window.chrome.runtime) {
  17 + window.chrome.runtime = {
  18 + connect: () => {},
  19 + sendMessage: () => {},
  20 + };
  21 + }
  22 +
  23 + // 3. plugins
  24 + Object.defineProperty(navigator, 'plugins', {
  25 + get: () => {
  26 + return [
  27 + {
  28 + 0: {type: 'application/x-google-chrome-pdf'},
  29 + description: 'Portable Document Format',
  30 + filename: 'internal-pdf-viewer',
  31 + length: 1,
  32 + name: 'Chrome PDF Plugin',
  33 + },
  34 + {
  35 + 0: {type: 'application/pdf'},
  36 + description: '',
  37 + filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai',
  38 + length: 1,
  39 + name: 'Chrome PDF Viewer',
  40 + },
  41 + {
  42 + 0: {type: 'application/x-nacl'},
  43 + description: '',
  44 + filename: 'internal-nacl-plugin',
  45 + length: 1,
  46 + name: 'Native Client',
  47 + },
  48 + ];
  49 + },
  50 + configurable: true,
  51 + });
  52 +
  53 + // 4. languages
  54 + Object.defineProperty(navigator, 'languages', {
  55 + get: () => ['zh-CN', 'zh', 'en-US', 'en'],
  56 + configurable: true,
  57 + });
  58 +
  59 + // 5. permissions
  60 + const originalQuery = window.navigator.permissions?.query;
  61 + if (originalQuery) {
  62 + window.navigator.permissions.query = (parameters) =>
  63 + parameters.name === 'notifications'
  64 + ? Promise.resolve({ state: Notification.permission })
  65 + : originalQuery(parameters);
  66 + }
  67 +
  68 + // 6. WebGL vendor/renderer
  69 + const getParameter = WebGLRenderingContext.prototype.getParameter;
  70 + WebGLRenderingContext.prototype.getParameter = function(parameter) {
  71 + if (parameter === 37445) return 'Intel Inc.';
  72 + if (parameter === 37446) return 'Intel Iris OpenGL Engine';
  73 + return getParameter.call(this, parameter);
  74 + };
  75 +})();
  76 +"""
  77 +
  78 +# Chrome 启动参数(反检测相关)
  79 +STEALTH_ARGS = [
  80 + "--disable-blink-features=AutomationControlled",
  81 + "--disable-infobars",
  82 + "--no-first-run",
  83 + "--no-default-browser-check",
  84 + "--disable-background-timer-throttling",
  85 + "--disable-backgrounding-occluded-windows",
  86 + "--disable-renderer-backgrounding",
  87 + "--disable-component-update",
  88 +]
  1 +"""小红书数据类型定义,对应 Go types.go。"""
  2 +
  3 +from __future__ import annotations
  4 +
  5 +from dataclasses import dataclass, field
  6 +
  7 +# ========== Feed 列表 ==========
  8 +
  9 +
  10 +@dataclass
  11 +class ImageInfo:
  12 + image_scene: str = ""
  13 + url: str = ""
  14 +
  15 + @classmethod
  16 + def from_dict(cls, d: dict) -> ImageInfo:
  17 + return cls(
  18 + image_scene=d.get("imageScene", ""),
  19 + url=d.get("url", ""),
  20 + )
  21 +
  22 +
  23 +@dataclass
  24 +class VideoCapability:
  25 + duration: int = 0 # 秒
  26 +
  27 + @classmethod
  28 + def from_dict(cls, d: dict) -> VideoCapability:
  29 + return cls(duration=d.get("duration", 0))
  30 +
  31 +
  32 +@dataclass
  33 +class Video:
  34 + capa: VideoCapability = field(default_factory=VideoCapability)
  35 +
  36 + @classmethod
  37 + def from_dict(cls, d: dict) -> Video:
  38 + return cls(capa=VideoCapability.from_dict(d.get("capa", {})))
  39 +
  40 +
  41 +@dataclass
  42 +class Cover:
  43 + width: int = 0
  44 + height: int = 0
  45 + url: str = ""
  46 + file_id: str = ""
  47 + url_pre: str = ""
  48 + url_default: str = ""
  49 + info_list: list[ImageInfo] = field(default_factory=list)
  50 +
  51 + @classmethod
  52 + def from_dict(cls, d: dict) -> Cover:
  53 + return cls(
  54 + width=d.get("width", 0),
  55 + height=d.get("height", 0),
  56 + url=d.get("url", ""),
  57 + file_id=d.get("fileId", ""),
  58 + url_pre=d.get("urlPre", ""),
  59 + url_default=d.get("urlDefault", ""),
  60 + info_list=[ImageInfo.from_dict(i) for i in d.get("infoList", [])],
  61 + )
  62 +
  63 +
  64 +@dataclass
  65 +class User:
  66 + user_id: str = ""
  67 + nickname: str = ""
  68 + nick_name: str = ""
  69 + avatar: str = ""
  70 +
  71 + @classmethod
  72 + def from_dict(cls, d: dict) -> User:
  73 + return cls(
  74 + user_id=d.get("userId", ""),
  75 + nickname=d.get("nickname", ""),
  76 + nick_name=d.get("nickName", ""),
  77 + avatar=d.get("avatar", ""),
  78 + )
  79 +
  80 +
  81 +@dataclass
  82 +class InteractInfo:
  83 + liked: bool = False
  84 + liked_count: str = ""
  85 + shared_count: str = ""
  86 + comment_count: str = ""
  87 + collected_count: str = ""
  88 + collected: bool = False
  89 +
  90 + @classmethod
  91 + def from_dict(cls, d: dict) -> InteractInfo:
  92 + return cls(
  93 + liked=d.get("liked", False),
  94 + liked_count=d.get("likedCount", ""),
  95 + shared_count=d.get("sharedCount", ""),
  96 + comment_count=d.get("commentCount", ""),
  97 + collected_count=d.get("collectedCount", ""),
  98 + collected=d.get("collected", False),
  99 + )
  100 +
  101 +
  102 +@dataclass
  103 +class NoteCard:
  104 + type: str = ""
  105 + display_title: str = ""
  106 + user: User = field(default_factory=User)
  107 + interact_info: InteractInfo = field(default_factory=InteractInfo)
  108 + cover: Cover = field(default_factory=Cover)
  109 + video: Video | None = None
  110 +
  111 + @classmethod
  112 + def from_dict(cls, d: dict) -> NoteCard:
  113 + video_data = d.get("video")
  114 + return cls(
  115 + type=d.get("type", ""),
  116 + display_title=d.get("displayTitle", ""),
  117 + user=User.from_dict(d.get("user", {})),
  118 + interact_info=InteractInfo.from_dict(d.get("interactInfo", {})),
  119 + cover=Cover.from_dict(d.get("cover", {})),
  120 + video=Video.from_dict(video_data) if video_data else None,
  121 + )
  122 +
  123 +
  124 +@dataclass
  125 +class Feed:
  126 + xsec_token: str = ""
  127 + id: str = ""
  128 + model_type: str = ""
  129 + note_card: NoteCard = field(default_factory=NoteCard)
  130 + index: int = 0
  131 +
  132 + @classmethod
  133 + def from_dict(cls, d: dict) -> Feed:
  134 + return cls(
  135 + xsec_token=d.get("xsecToken", ""),
  136 + id=d.get("id", ""),
  137 + model_type=d.get("modelType", ""),
  138 + note_card=NoteCard.from_dict(d.get("noteCard", {})),
  139 + index=d.get("index", 0),
  140 + )
  141 +
  142 + def to_dict(self) -> dict:
  143 + """序列化为 JSON 兼容的字典。"""
  144 + result: dict = {
  145 + "id": self.id,
  146 + "xsecToken": self.xsec_token,
  147 + "modelType": self.model_type,
  148 + "index": self.index,
  149 + "displayTitle": self.note_card.display_title,
  150 + "type": self.note_card.type,
  151 + "user": {
  152 + "userId": self.note_card.user.user_id,
  153 + "nickname": self.note_card.user.nickname or self.note_card.user.nick_name,
  154 + },
  155 + "interactInfo": {
  156 + "likedCount": self.note_card.interact_info.liked_count,
  157 + "collectedCount": self.note_card.interact_info.collected_count,
  158 + "commentCount": self.note_card.interact_info.comment_count,
  159 + "sharedCount": self.note_card.interact_info.shared_count,
  160 + },
  161 + }
  162 + if self.note_card.video:
  163 + result["video"] = {"duration": self.note_card.video.capa.duration}
  164 + return result
  165 +
  166 +
  167 +# ========== Feed 详情 ==========
  168 +
  169 +
  170 +@dataclass
  171 +class DetailImageInfo:
  172 + width: int = 0
  173 + height: int = 0
  174 + url_default: str = ""
  175 + url_pre: str = ""
  176 + live_photo: bool = False
  177 +
  178 + @classmethod
  179 + def from_dict(cls, d: dict) -> DetailImageInfo:
  180 + return cls(
  181 + width=d.get("width", 0),
  182 + height=d.get("height", 0),
  183 + url_default=d.get("urlDefault", ""),
  184 + url_pre=d.get("urlPre", ""),
  185 + live_photo=d.get("livePhoto", False),
  186 + )
  187 +
  188 +
  189 +@dataclass
  190 +class Comment:
  191 + id: str = ""
  192 + note_id: str = ""
  193 + content: str = ""
  194 + like_count: str = ""
  195 + create_time: int = 0
  196 + ip_location: str = ""
  197 + liked: bool = False
  198 + user_info: User = field(default_factory=User)
  199 + sub_comment_count: str = ""
  200 + sub_comments: list[Comment] = field(default_factory=list)
  201 + show_tags: list[str] = field(default_factory=list)
  202 +
  203 + @classmethod
  204 + def from_dict(cls, d: dict) -> Comment:
  205 + return cls(
  206 + id=d.get("id", ""),
  207 + note_id=d.get("noteId", ""),
  208 + content=d.get("content", ""),
  209 + like_count=d.get("likeCount", ""),
  210 + create_time=d.get("createTime", 0),
  211 + ip_location=d.get("ipLocation", ""),
  212 + liked=d.get("liked", False),
  213 + user_info=User.from_dict(d.get("userInfo", {})),
  214 + sub_comment_count=d.get("subCommentCount", ""),
  215 + sub_comments=[cls.from_dict(c) for c in d.get("subComments", []) or []],
  216 + show_tags=d.get("showTags", []) or [],
  217 + )
  218 +
  219 + def to_dict(self) -> dict:
  220 + result: dict = {
  221 + "id": self.id,
  222 + "content": self.content,
  223 + "likeCount": self.like_count,
  224 + "createTime": self.create_time,
  225 + "ipLocation": self.ip_location,
  226 + "user": {
  227 + "userId": self.user_info.user_id,
  228 + "nickname": self.user_info.nickname or self.user_info.nick_name,
  229 + },
  230 + "subCommentCount": self.sub_comment_count,
  231 + }
  232 + if self.sub_comments:
  233 + result["subComments"] = [c.to_dict() for c in self.sub_comments]
  234 + return result
  235 +
  236 +
  237 +@dataclass
  238 +class CommentList:
  239 + list_: list[Comment] = field(default_factory=list)
  240 + cursor: str = ""
  241 + has_more: bool = False
  242 +
  243 + @classmethod
  244 + def from_dict(cls, d: dict) -> CommentList:
  245 + return cls(
  246 + list_=[Comment.from_dict(c) for c in d.get("list", []) or []],
  247 + cursor=d.get("cursor", ""),
  248 + has_more=d.get("hasMore", False),
  249 + )
  250 +
  251 +
  252 +@dataclass
  253 +class FeedDetail:
  254 + note_id: str = ""
  255 + xsec_token: str = ""
  256 + title: str = ""
  257 + desc: str = ""
  258 + type: str = ""
  259 + time: int = 0
  260 + ip_location: str = ""
  261 + user: User = field(default_factory=User)
  262 + interact_info: InteractInfo = field(default_factory=InteractInfo)
  263 + image_list: list[DetailImageInfo] = field(default_factory=list)
  264 +
  265 + @classmethod
  266 + def from_dict(cls, d: dict) -> FeedDetail:
  267 + return cls(
  268 + note_id=d.get("noteId", ""),
  269 + xsec_token=d.get("xsecToken", ""),
  270 + title=d.get("title", ""),
  271 + desc=d.get("desc", ""),
  272 + type=d.get("type", ""),
  273 + time=d.get("time", 0),
  274 + ip_location=d.get("ipLocation", ""),
  275 + user=User.from_dict(d.get("user", {})),
  276 + interact_info=InteractInfo.from_dict(d.get("interactInfo", {})),
  277 + image_list=[DetailImageInfo.from_dict(i) for i in d.get("imageList", []) or []],
  278 + )
  279 +
  280 + def to_dict(self) -> dict:
  281 + return {
  282 + "noteId": self.note_id,
  283 + "title": self.title,
  284 + "desc": self.desc,
  285 + "type": self.type,
  286 + "time": self.time,
  287 + "ipLocation": self.ip_location,
  288 + "user": {
  289 + "userId": self.user.user_id,
  290 + "nickname": self.user.nickname or self.user.nick_name,
  291 + },
  292 + "interactInfo": {
  293 + "liked": self.interact_info.liked,
  294 + "likedCount": self.interact_info.liked_count,
  295 + "collectedCount": self.interact_info.collected_count,
  296 + "collected": self.interact_info.collected,
  297 + "commentCount": self.interact_info.comment_count,
  298 + "sharedCount": self.interact_info.shared_count,
  299 + },
  300 + "imageList": [
  301 + {
  302 + "width": img.width,
  303 + "height": img.height,
  304 + "urlDefault": img.url_default,
  305 + }
  306 + for img in self.image_list
  307 + ],
  308 + }
  309 +
  310 +
  311 +@dataclass
  312 +class FeedDetailResponse:
  313 + note: FeedDetail = field(default_factory=FeedDetail)
  314 + comments: CommentList = field(default_factory=CommentList)
  315 +
  316 + @classmethod
  317 + def from_dict(cls, d: dict) -> FeedDetailResponse:
  318 + return cls(
  319 + note=FeedDetail.from_dict(d.get("note", {})),
  320 + comments=CommentList.from_dict(d.get("comments", {})),
  321 + )
  322 +
  323 + def to_dict(self) -> dict:
  324 + return {
  325 + "note": self.note.to_dict(),
  326 + "comments": [c.to_dict() for c in self.comments.list_],
  327 + }
  328 +
  329 +
  330 +# ========== 用户主页 ==========
  331 +
  332 +
  333 +@dataclass
  334 +class UserBasicInfo:
  335 + gender: int = 0
  336 + ip_location: str = ""
  337 + desc: str = ""
  338 + imageb: str = ""
  339 + nickname: str = ""
  340 + images: str = ""
  341 + red_id: str = ""
  342 +
  343 + @classmethod
  344 + def from_dict(cls, d: dict) -> UserBasicInfo:
  345 + return cls(
  346 + gender=d.get("gender", 0),
  347 + ip_location=d.get("ipLocation", ""),
  348 + desc=d.get("desc", ""),
  349 + imageb=d.get("imageb", ""),
  350 + nickname=d.get("nickname", ""),
  351 + images=d.get("images", ""),
  352 + red_id=d.get("redId", ""),
  353 + )
  354 +
  355 +
  356 +@dataclass
  357 +class UserInteraction:
  358 + type: str = ""
  359 + name: str = ""
  360 + count: str = ""
  361 +
  362 + @classmethod
  363 + def from_dict(cls, d: dict) -> UserInteraction:
  364 + return cls(
  365 + type=d.get("type", ""),
  366 + name=d.get("name", ""),
  367 + count=d.get("count", ""),
  368 + )
  369 +
  370 +
  371 +@dataclass
  372 +class UserProfileResponse:
  373 + user_basic_info: UserBasicInfo = field(default_factory=UserBasicInfo)
  374 + interactions: list[UserInteraction] = field(default_factory=list)
  375 + feeds: list[Feed] = field(default_factory=list)
  376 +
  377 + def to_dict(self) -> dict:
  378 + return {
  379 + "basicInfo": {
  380 + "nickname": self.user_basic_info.nickname,
  381 + "redId": self.user_basic_info.red_id,
  382 + "desc": self.user_basic_info.desc,
  383 + "gender": self.user_basic_info.gender,
  384 + "ipLocation": self.user_basic_info.ip_location,
  385 + },
  386 + "interactions": [
  387 + {"type": i.type, "name": i.name, "count": i.count} for i in self.interactions
  388 + ],
  389 + "feeds": [f.to_dict() for f in self.feeds],
  390 + }
  391 +
  392 +
  393 +# ========== 搜索 ==========
  394 +
  395 +
  396 +@dataclass
  397 +class FilterOption:
  398 + """搜索筛选选项。"""
  399 +
  400 + sort_by: str = "" # 综合|最新|最多点赞|最多评论|最多收藏
  401 + note_type: str = "" # 不限|视频|图文
  402 + publish_time: str = "" # 不限|一天内|一周内|半年内
  403 + search_scope: str = "" # 不限|已看过|未看过|已关注
  404 + location: str = "" # 不限|同城|附近
  405 +
  406 +
  407 +# ========== 发布 ==========
  408 +
  409 +
  410 +@dataclass
  411 +class PublishImageContent:
  412 + """图文发布内容。"""
  413 +
  414 + title: str = ""
  415 + content: str = ""
  416 + tags: list[str] = field(default_factory=list)
  417 + image_paths: list[str] = field(default_factory=list)
  418 + schedule_time: str | None = None # ISO8601 格式,None 表示立即发布
  419 + is_original: bool = False
  420 + visibility: str = "" # 公开可见(默认)|仅自己可见|仅互关好友可见
  421 +
  422 +
  423 +@dataclass
  424 +class PublishVideoContent:
  425 + """视频发布内容。"""
  426 +
  427 + title: str = ""
  428 + content: str = ""
  429 + tags: list[str] = field(default_factory=list)
  430 + video_path: str = ""
  431 + schedule_time: str | None = None # ISO8601 格式
  432 + visibility: str = "" # 公开可见(默认)|仅自己可见|仅互关好友可见
  433 +
  434 +
  435 +# ========== 互动 ==========
  436 +
  437 +
  438 +@dataclass
  439 +class ActionResult:
  440 + """通用动作响应(点赞/收藏等)。"""
  441 +
  442 + feed_id: str = ""
  443 + success: bool = False
  444 + message: str = ""
  445 +
  446 + def to_dict(self) -> dict:
  447 + return {
  448 + "feed_id": self.feed_id,
  449 + "success": self.success,
  450 + "message": self.message,
  451 + }
  452 +
  453 +
  454 +# ========== 评论加载配置 ==========
  455 +
  456 +
  457 +@dataclass
  458 +class CommentLoadConfig:
  459 + """评论加载配置。"""
  460 +
  461 + click_more_replies: bool = False
  462 + max_replies_threshold: int = 10
  463 + max_comment_items: int = 0 # 0 = 不限
  464 + scroll_speed: str = "normal" # slow|normal|fast
  1 +"""小红书 URL 常量和构建函数。"""
  2 +
  3 +from urllib.parse import urlencode
  4 +
  5 +# 基础页面
  6 +EXPLORE_URL = "https://www.xiaohongshu.com/explore"
  7 +HOME_URL = "https://www.xiaohongshu.com"
  8 +PUBLISH_URL = "https://creator.xiaohongshu.com/publish/publish?source=official"
  9 +
  10 +
  11 +def make_feed_detail_url(feed_id: str, xsec_token: str) -> str:
  12 + """构建 feed 详情页 URL。"""
  13 + return (
  14 + f"https://www.xiaohongshu.com/explore/{feed_id}?xsec_token={xsec_token}&xsec_source=pc_feed"
  15 + )
  16 +
  17 +
  18 +def make_search_url(keyword: str) -> str:
  19 + """构建搜索结果页 URL。"""
  20 + params = urlencode({"keyword": keyword, "source": "web_explore_feed"})
  21 + return f"https://www.xiaohongshu.com/search_result?{params}"
  22 +
  23 +
  24 +def make_user_profile_url(user_id: str, xsec_token: str) -> str:
  25 + """构建用户主页 URL。"""
  26 + return (
  27 + f"https://www.xiaohongshu.com/user/profile/{user_id}"
  28 + f"?xsec_token={xsec_token}&xsec_source=pc_note"
  29 + )