风险评分

94/100 (Very Low)

OpenClaw: benign
VirusTotal: benign
StaticScan: unknown

Agent Evaluation

作者: rustyorb
Slug:agent-evaluation
版本:1.0.0
更新时间:2026-02-26 07:00:35
风险信息

OpenClaw: benign

查看 OpenClaw 分析摘要
The skill is an instruction-only evaluation framework for LLM agents and its requested resources and instructions are coherent with that purpose.

VirusTotal: benign VT 报告

静态扫描: unknown

README

README 未提供

文件列表

无文件信息

下载
下载官方 ZIP
原始 JSON 数据
{
    "latestVersion": {
        "_creationTime": 1770674841158,
        "_id": "k971gwmyjapsntsac2j2vbapx980tyae",
        "changelog": "- Initial release of agent-evaluation skill for testing and benchmarking LLM agents.\n- Supports behavioral testing, capability assessment, reliability metrics, and production monitoring.\n- Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing.\n- Highlights common anti-patterns and sharp edges in LLM agent evaluation.\n- Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.",
        "changelogSource": "auto",
        "createdAt": 1770674841158,
        "version": "1.0.0"
    },
    "owner": {
        "_creationTime": 0,
        "_id": "publishers:missing",
        "displayName": "rustyorb",
        "handle": "rustyorb",
        "image": "https:\/\/avatars.githubusercontent.com\/u\/111198602?v=4",
        "kind": "user",
        "linkedUserId": "kn76pzx058jtj181fzkk729zp5801nac"
    },
    "ownerHandle": "rustyorb",
    "skill": {
        "_creationTime": 1770674841158,
        "_id": "kd7byngtmstb21ph6zwpc6grs580va2x",
        "badges": [],
        "createdAt": 1770674841158,
        "displayName": "Agent Evaluation",
        "latestVersionId": "k971gwmyjapsntsac2j2vbapx980tyae",
        "ownerUserId": "kn76pzx058jtj181fzkk729zp5801nac",
        "slug": "agent-evaluation",
        "stats": {
            "comments": 0,
            "downloads": 3123,
            "installsAllTime": 43,
            "installsCurrent": 42,
            "stars": 6,
            "versions": 1
        },
        "summary": "Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.",
        "tags": {
            "latest": "k971gwmyjapsntsac2j2vbapx980tyae"
        },
        "updatedAt": 1772060435906
    }
}