OpenClaw: benign
VirusTotal: benign
StaticScan: unknown
OpenClaw: benign
The skill is an instruction-only evaluation framework for LLM agents and its requested resources and instructions are coherent with that purpose.
VirusTotal: benign VT 报告
静态扫描: unknown
README 未提供
无文件信息
{
"latestVersion": {
"_creationTime": 1770674841158,
"_id": "k971gwmyjapsntsac2j2vbapx980tyae",
"changelog": "- Initial release of agent-evaluation skill for testing and benchmarking LLM agents.\n- Supports behavioral testing, capability assessment, reliability metrics, and production monitoring.\n- Includes practical testing patterns: statistical test evaluation, behavioral contract testing, and adversarial testing.\n- Highlights common anti-patterns and sharp edges in LLM agent evaluation.\n- Designed for use alongside related skills such as multi-agent orchestration and autonomous agents.",
"changelogSource": "auto",
"createdAt": 1770674841158,
"version": "1.0.0"
},
"owner": {
"_creationTime": 0,
"_id": "publishers:missing",
"displayName": "rustyorb",
"handle": "rustyorb",
"image": "https:\/\/avatars.githubusercontent.com\/u\/111198602?v=4",
"kind": "user",
"linkedUserId": "kn76pzx058jtj181fzkk729zp5801nac"
},
"ownerHandle": "rustyorb",
"skill": {
"_creationTime": 1770674841158,
"_id": "kd7byngtmstb21ph6zwpc6grs580va2x",
"badges": [],
"createdAt": 1770674841158,
"displayName": "Agent Evaluation",
"latestVersionId": "k971gwmyjapsntsac2j2vbapx980tyae",
"ownerUserId": "kn76pzx058jtj181fzkk729zp5801nac",
"slug": "agent-evaluation",
"stats": {
"comments": 0,
"downloads": 3123,
"installsAllTime": 43,
"installsCurrent": 42,
"stars": 6,
"versions": 1
},
"summary": "Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.",
"tags": {
"latest": "k971gwmyjapsntsac2j2vbapx980tyae"
},
"updatedAt": 1772060435906
}
}