The Guardrail Company Got Owned. Skill Provenance Is the Layer Below.
On 2026-05-11 the Mini Shai-Hulud npm/PyPI worm trojanized 404 package versions, including @mistralai/mistralai and guardrails-ai. A runtime LLM-output guardrail cannot catch an install-time daemon. ATR v3.1.0 ships ATR-2026-00525 covering the gh-token-monitor signature.
_On 2026-05-11 the Mini Shai-Hulud npm/PyPI worm trojanized 404 package versions across the JavaScript and Python ecosystems. Two of the victims are the kind of vendor most teams trust by default: Mistral AI (frontier model lab) and Guardrails AI (a runtime LLM-output guardrail). A guardrail at the runtime layer cannot block an attack delivered at install time. The detection layer has to sit below the guardrail._
What landed
Researchers at Akamai, Wiz, and CSA documented the wave through 2026-05-17. Help Net Security and The Hacker News carried the timeline. The wave compromised at least:
- ●
@mistralai/mistralai— the official Mistral TypeScript SDK on npm. Payload delivered via the npmpreparelifecycle hook (the same TanStack-pattern vector). - ●
mistralaiandguardrails-ai— the official Mistral and Guardrails AI Python SDKs on PyPI, with the payload triggered by anon importside effect in__init__.pythat downloaded/tmp/transformers.pyzfromgit-tanstack[.]com. - ●TanStack packages (
@tanstack/router,@tanstack/query, others) — same family, npmpreparelifecycle hook executing through a Bun runtime drop. - ●UiPath packages (separate sub-cluster of the same wave) used the npm
preinstallhook invokingnode setup.mjsas the install vector.
The Wiz and Aikido analyses count 403 trojanized package versions across roughly 172 unique packages, against a cumulative 518 million downloads of the affected packages over the prior twelve months.
What the payload does
Three artifacts characterize the wave:
1. `gh-token-monitor` — a persistence daemon installed via macOS LaunchAgent (~/Library/LaunchAgents/com.user.gh-token-monitor.plist) or Linux systemd user service (~/.config/systemd/user/gh-token-monitor.service). The daemon polls api.github.com/user every sixty seconds using the developer's GitHub token.
2. Dead-man's switch — when the polled token returns HTTP 401 (the developer rotated or revoked it), the daemon executes rm -rf ~/ against the entire home directory (per Wiz and JFrog analyses). The destructive action is conditional, which is what makes the daemon hard to catch with static manifest scanning alone.
3. Exfiltration channels — filev2.getsession[.]org (Session Protocol), api.masscan[.]cloud (GitHub Actions secret serialization), and the typosquat git-tanstack[.]com for credential drop. The host at 83.142.209[.]194 was confirmed serving the PyPI credential stealer.
Mistral spun up a public security advisory page at docs.mistral.ai/resources/security-advisories on 2026-05-12, acknowledging the TypeScript SDK compromise. That is the first time Mistral has run a public security advisory page. It is also the right action.
Why a runtime guardrail can't catch this
Guardrails AI's product is exactly what the name says — a runtime guardrail around LLM outputs. It validates that the model response satisfies declared structural and safety constraints before that response reaches the user or a downstream tool. Lakera, PromptArmor, Protect AI, Lasso Security, and the closed-source equivalents do the same thing at the same layer.
These products are good at what they do, but they all live at the same layer in the stack. None of them inspect the contents of the __init__.py that runs when pip install guardrails-ai executes, because by the time the runtime guardrail is loaded, the install-time daemon has already been on disk for three to five minutes. The runtime guardrail is, definitionally, downstream of the install.
This is not a critique of any specific vendor. It is a layering observation. A guardrail vendor was itself compromised at install time, which means the threat model their product addresses does not extend to the supply chain that delivers their product. The detection layer for that threat sits below.
What ATR ships for this
Agent Threat Rules v3.1.0 ships ATR-2026-00525 covering the Mini Shai-Hulud family. The rule fires on three alternative shapes inside any text that reaches an agent runtime or a skill scanner:
1. The literal daemon name gh-token-monitor, gh_token_monitor, or the LaunchAgent label com.github.token.monitor.
2. A polling loop against api.github.com/user paired with a destructive shell primitive (rm -rf, shutil.rmtree, os.system rm) within five hundred characters — the dead-man's-switch shape.
3. An interval or timer construct (setInterval, threading.Timer, asyncio.sleep) paired with the /user poll AND a shell execution primitive (child_process, exec, destructive command) within three hundred characters — the install-time daemon shape.
Coverage is honest: the rule catches the canonical IOCs that appeared in the public Akamai, Wiz, and Sysdig writeups. It does not catch every possible obfuscation of the dead-man's-switch pattern. The rule's true_negatives block documents the FP cases it deliberately accepts and the cases it explicitly rejects. Source on main at Agent-Threat-Rule/agent-threat-rules under category skill-compromise.
Layer-by-layer detection map
For a team building or buying AI security today, the layering looks like this:
| Layer | What it inspects | What this attack looks like at that layer |
|---|
|---|---|---|
| Runtime LLM output guardrail | The model response after generation | Cannot see — the payload runs at install, never reaches the model. |
|---|
| Tool call gate / function-call inspector | Arguments to tool invocations | Cannot see — the daemon is a background process, not a tool call. |
|---|
| MCP / skill content scanner | The text of a SKILL.md or skill code at registration | Catches the daemon-install shape and the dead-man's-switch shape via ATR-2026-00525. |
|---|
| Package manifest scanner | package.json, pyproject.toml, setup.py for known IOC strings | Catches the preinstall: node setup.mjs shape if the IOC is current. |
|---|
| Process / runtime EDR | LaunchAgent plist additions, suspicious child processes | Catches the daemon at install if the IOC string is current. |
|---|
Three of the five layers cannot see this attack. The two that can are not where runtime guardrail vendors live. PanGuard's positioning, since v0.1 in March, has been the skill-content layer plus the manifest layer. That positioning is not a marketing line. It is where this specific attack actually surfaces.
What we are not claiming
A few honest disclosures so this post does not drift into vendor cosplay:
- ●ATR is reactive. The
gh-token-monitorstring in the rule is an IOC; the next worm will pick a different daemon name and the rule's first condition will need an update. The rule's second and third conditions (dead-man's-switch shape and timer-plus-shell shape) are pattern-based and should generalize better, but pattern-based rules also have false negatives. - ●ATR is open source under MIT. It is not unique to PanGuard. Anyone running the rule pack as an npm package, a MISP feed, or a Cisco AI Defense scanner config gets the same coverage. The differentiation is not the rule, it is the loop time from disclosure to shipped rule. For this attack the disclosure was 2026-05-11 and the matching rule shipped 2026-05-23, a twelve-day window. That is too slow. The Microsoft Copilot Semantic Kernel loop on 2026-05-11 closed in two hours sixteen minutes because that disclosure carried a usable PoC. The Mini Shai-Hulud writeups did not, which is the structural reason this loop was slower.
- ●The honest comparable framing for runtime guardrail vendors is "different layer, both needed", not "we replace them". A runtime LLM-output guardrail catches a different attack class than a skill-content scanner. Mature deployments will run both. The point of this post is that running only the runtime guardrail leaves a layer uncovered.
What to do this week
1. If you have @mistralai/mistralai, mistralai, or guardrails-ai in your dependency tree, audit the install: npm ls @mistralai/mistralai and pip show mistralai guardrails-ai. Mistral's advisory page lists the specific bad versions. Pin to the patched releases.
2. Check for the persistence daemon: launchctl list | grep -i token-monitor on macOS, systemctl list-units | grep -i token-monitor on Linux. The daemon's plist label was com.github.token.monitor and the systemd unit was named gh-token-monitor.service.
3. If you ship an agent platform, add a skill-content scan to your registration flow. The ATR rule is one file plus a YAML; it loads into any rule-runner that accepts ATR format. Microsoft Agent Governance Toolkit, Cisco AI Defense, and MISP all run it today.
4. If you operate a runtime guardrail product and you read this far, the honest move is to add a skill-content or package-manifest companion scanner. The runtime layer is necessary. It is not sufficient.
Closing
If your AI security architecture only inspects runtime outputs, your supply chain is your blind spot. Mini Shai-Hulud is the empirical proof. The guardrail company got owned. The detection layer has to be below.
_Adam Lin maintains Agent Threat Rules, an open MIT-licensed detection corpus for AI agent threats. ATR rules are in production at Microsoft, Cisco, and MISP. Founder of Panguard AI, a Delaware C-Corp building the commercial product around the ATR standard. Reach: [email protected]_