Agent Evaluation

风险评分

94/100 (Very Low)

OpenClaw： benign
VirusTotal： benign
StaticScan： unknown

作者： rustyorb

Slug：agent-evaluation

版本：1.0.0

更新时间：2026-02-26 07:00:35

风险信息

OpenClaw： benign

查看 OpenClaw 分析摘要

The skill is an instruction-only evaluation framework for LLM agents and its requested resources and instructions are coherent with that purpose.

VirusTotal： benign VT 报告

静态扫描： unknown

README

README 未提供

文件列表

无文件信息

下载

下载官方 ZIP

原始 JSON 数据

{
    "latestVersion": {
        "_creationTime": 1770674841158,
        "_id": "k971gwmyjapsntsac2j2vbapx980tyae",
        "changelog": "- Initial release of agent-evaluation skill for testing and benchmarking LLM agents.\n- Supports behavioral testing, capability assessment, reliability metrics, and production monitoring.\n- Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing.\n- Highlights common anti-patterns and sharp edges in LLM agent evaluation.\n- Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.",
        "changelogSource": "auto",
        "createdAt": 1770674841158,
        "version": "1.0.0"
    },
    "owner": {
        "_creationTime": 0,
        "_id": "publishers:missing",
        "displayName": "rustyorb",
        "handle": "rustyorb",
        "image": "https:\/\/avatars.githubusercontent.com\/u\/111198602?v=4",
        "kind": "user",
        "linkedUserId": "kn76pzx058jtj181fzkk729zp5801nac"
    },
    "ownerHandle": "rustyorb",
    "skill": {
        "_creationTime": 1770674841158,
        "_id": "kd7byngtmstb21ph6zwpc6grs580va2x",
        "badges": [],
        "createdAt": 1770674841158,
        "displayName": "Agent Evaluation",
        "latestVersionId": "k971gwmyjapsntsac2j2vbapx980tyae",
        "ownerUserId": "kn76pzx058jtj181fzkk729zp5801nac",
        "slug": "agent-evaluation",
        "stats": {
            "comments": 0,
            "downloads": 3123,
            "installsAllTime": 43,
            "installsCurrent": 42,
            "stars": 6,
            "versions": 1
        },
        "summary": "Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.",
        "tags": {
            "latest": "k971gwmyjapsntsac2j2vbapx980tyae"
        },
        "updatedAt": 1772060435906
    }
}