OpenClaw: suspicious
VirusTotal: benign
StaticScan: unknown
OpenClaw: suspicious
The skill's behavior (calling HuggingFace and OpenAI moderation APIs on provided text) matches its stated purpose, but there are packaging and declaration inconsistencies and privacy/networking implic... [内容已截断]
VirusTotal: benign VT 报告
静态扫描: unknown
README 未提供
无文件信息
{
"latestVersion": {
"_creationTime": 1770047689138,
"_id": "k9758sta7kjjmy3cpxwky8frfn80cxwc",
"changelog": "Initial release with two-layer content moderation for agent input and output.\n\n- Adds prompt injection detection using ProtectAI DeBERTa classifier via HuggingFace.\n- Adds content safety checks using OpenAI's omni-moderation endpoint (optional).\n- Provides `scripts\/moderate.sh` for command-line moderation of both user input and agent output.\n- Outputs structured JSON with clear verdicts and actions.\n- Supports configuration via environment variables (tokens, thresholds).\n- Designed for safer agent deployments, especially in adversarial or public scenarios.",
"changelogSource": "user",
"createdAt": 1770047689138,
"version": "1.0.0"
},
"owner": {
"_creationTime": 0,
"_id": "publishers:missing",
"displayName": "ZSkyX",
"handle": "zskyx",
"image": "https:\/\/avatars.githubusercontent.com\/u\/51038567?v=4",
"kind": "user",
"linkedUserId": "kn7agbjn3eyt73shdtcnqv0dqh80dfg6"
},
"ownerHandle": "zskyx",
"skill": {
"_creationTime": 1770047689138,
"_id": "kd77h713fq35ervjcskgz5y9gs80d9wa",
"badges": [],
"createdAt": 1770047689138,
"displayName": "Prompt injection detection skill",
"latestVersionId": "k9758sta7kjjmy3cpxwky8frfn80cxwc",
"ownerUserId": "kn7agbjn3eyt73shdtcnqv0dqh80dfg6",
"slug": "detect-injection",
"stats": {
"comments": 0,
"downloads": 1906,
"installsAllTime": 1,
"installsCurrent": 1,
"stars": 5,
"versions": 1
},
"summary": "Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi-user environments where adversarial input is expected.",
"tags": {
"latest": "k9758sta7kjjmy3cpxwky8frfn80cxwc"
},
"updatedAt": 1772248873190
}
}