Engineering

Topic-Graph Cross-References: A Deterministic Alternative to LLM Augmentation

KUAN-HSIN LINMay 11, 20266 min

Regex extraction covered only 33% of OSCAL controls. A topic-graph approach using the AI RMF Playbook's own 46-topic taxonomy lifted coverage to 78% — without an LLM in the loop.

When we first shipped the AI RMF OSCAL catalog v0.3 with 72 controls, our regex-based cross-reference extractor surfaced 31 links across just 24 of the 72 controls. That is 33% coverage. Two-thirds of the catalog had no machine-discoverable relationships at all, which makes the catalog far less useful for downstream profile authors and OSCAL tools.

The obvious next move was to throw an LLM at it. We did not. Here is why, and what we shipped instead.

Why Deterministic Beats LLM for Standards Work

OSCAL catalogs that feed federal-adjacent standards work need to be reproducible. A reviewer at NIST should be able to regenerate the entire catalog from public source data and get bit-identical output. LLMs cannot promise that. Same input, different sampling, different output. Same input, different model version, different output.

There is also a defensibility problem. If we list a cross-reference between GOVERN-1.1 and MANAGE-4.3, and a reviewer asks why, the answer cannot be "the model decided." It has to be a rule you can read and re-run.

So we built src/topic_cross_references.py. It uses the AI RMF Playbook's own 46-topic taxonomy — the same topics NIST already publishes for each subcategory. Two controls share a topic, they are candidates for a cross-reference. No external knowledge, no inference, just graph traversal over public data.

Conservative Inverse-Frequency Thresholds

A naive "any shared topic = link" approach floods the catalog. Risk Management appears on dozens of controls. So we weight by inverse frequency: rare topics are stronger signals than common ones.

Eligibility for a candidate link:

●3+ shared topics, regardless of frequency, OR
●2+ shared topics where at least one is rare (appears on <=5 controls), OR
●1+ ultra-rare shared topic (appears on <=3 controls)

We also cap top-K = 4 per source control. That prevents one heavily-tagged subcategory from dominating its own neighborhood.

The Numbers

Topic-graph extraction added 145 new links covering 32 additional controls. Combined with the original regex-derived 31 links, total coverage is now 56 of 72 controls (78%). The catalog ships both classes of link distinguishably: topic-derived entries carry a text field listing the shared topic names, so a downstream consumer can tell at a glance whether the link came from text-pattern matching or from taxonomy overlap.

Reproducibility, Defensibility, and What Comes Next

The script is ~120 lines of Python. Anyone can pull the repo, run python src/topic_cross_references.py, and reproduce the exact 145 links. The thresholds are tunable but versioned in the source. When we adjust them, the diff against the published catalog is reviewable line by line.

For NIST OSCAL Team review (thread usnistgov/OSCAL#2234), this matters. The choice between "we used an LLM" and "we used inverse-frequency topic overlap with these published thresholds" is the difference between a reviewer's nod and a reviewer's question.

There are cases where LLM augmentation will eventually help — distinguishing genuine semantic links from coincidental topic overlap, for example. But those cases benefit from being layered on top of a deterministic baseline, not replacing it.

Source: topic_cross_references.py · Catalog repo · OSCAL Team thread