NIST SP 800-61r3: What Changed and What It Means for Your Tabletop Program

In April 2025, NIST published SP 800-61r3, Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile. It supersedes SP 800-61r2, which had stood unchanged since August 2012. R2 gave incident handlers step-by-step operational guidance. R3 operates at a higher level of abstraction, organizing its recommendations as a CSF 2.0 Community Profile and integrating incident response across all six Cybersecurity Framework functions. For practitioners who built programs around r2, several structural shifts require attention.

From a Four-Phase Lifecycle to Six CSF 2.0 Functions

R2 organized incident response into four sequential phases: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Incident Activity. Improvements identified during post-incident activities fed back into the Preparation phase, completing the loop.

R3 replaces this model with the six CSF 2.0 Functions: Govern, Identify, Protect, Detect, Respond, and Recover. The document arranges these functions into three tiers:

Foundation: Govern, Identify, and Protect. The broader cybersecurity risk management activities that support incident response.
Incident response lifecycle: Detect, Respond, and Recover. The core operational sequence.
Connective layer: the Improvement Category (ID.IM) within the Identify Function. Continuous improvement sits here, fed by lessons learned from all six functions.

Under r2, improvement was the final phase of a sequential process. Under r3, improvement is a persistent function connected to every other function through bidirectional feedback. Lessons learned from any activity, at any stage, flow into Improvement and back out to all functions. The r3 model replaces a return-to-Preparation loop with continuous feedback into Improvement, though NIST notes organizations may continue using the life cycle framework that suits them best.

NIST states the practical driver directly: incidents today "occur frequently and cause far more damage" than they did in 2012, and recovery "often takes weeks or months." A linear model built for rare, contained incidents cannot serve a landscape where incidents are continuous and complex.

Where Tabletop Exercises Sit in R3

Under the new model, tabletop exercises fall within the Identify Function, inside the Improvement Category. Subcategory ID.IM-02 states:

Improvements are identified from security tests and exercises, including those done in coordination with suppliers and relevant third parties.

NIST assigns ID.IM-02 a High priority within the context of incident response. The accompanying guidance notes that "incident response exercises and tests may provide helpful information for program evaluation and prepare staff and involved third parties (such as critical service providers and product suppliers) for future incident response activities."

Under r2, exercises belonged to the Preparation phase, positioned before incidents occurred. Under r3, the improvement identified from exercises lives in ID.IM-02 within the Identify Function a continuous, bidirectional category that feeds lessons into all six Functions. An exercise is an improvement activity: it draws lessons from all phases and feeds them back to all functions.

The explicit inclusion of suppliers and relevant third parties in ID.IM-02 underscores that incident response now involves a distributed network of actors: incident handlers, legal counsel, managed security service providers, cloud service providers, and product suppliers. Exercises limited to the internal security team miss the third-party coordination that ID.IM-02 and GV.SC-08 both highlight as valuable. GV.SC-08 specifically addresses including relevant suppliers and third parties in incident planning, response, and recovery activities.

SP 800-84 Remains the Authoritative Guidance on Exercise Design

R3 does not specify how to design and conduct exercises. That remains the domain of NIST SP 800-84, Guide to Test, Training, and Exercise Programs for IT Plans and Capabilities. The ID.IM-02 guidance note is direct: "See [SP 800-84] for more information on simulations, tabletop discussions, and other forms of exercises."

Practitioners who need procedural guidance on exercise planning, scenario development, facilitator roles, and after-action reporting should treat SP 800-84 as the operational companion to r3's strategic framework. R3 defines where exercises fit and why they matter. SP 800-84 defines how to run them.

Three Practical Implications

Frequency. ID.IM connects all functions continuously. Organizations whose programs consist of a single annual tabletop should examine whether that cadence reflects the continuous improvement model r3 describes. NIST does not prescribe a specific exercise frequency, but the shift from a terminal Post-Incident Activity phase to a persistent Improvement category suggests more frequent cycles than an annual cadence can support.

Scope. ID.IM-02 explicitly names suppliers and third parties. Exercises conducted only with internal teams leave an unaddressed gap. At minimum, exercise programs should include the critical service providers and product suppliers that carry incident response responsibilities under existing agreements.

Traceability. R3 connects improvement to all six functions. Exercise findings should be traceable to specific gaps in Govern, Identify, Protect, Detect, Respond, and Recover. After-action reports that produce only general recommendations fall short of the function-level traceability the r3 model makes possible, since Improvement feeds back into all six Functions.

How R3 Requirements Map to Practice: Reflex Security

The requirements in ID.IM-02 and the continuous improvement model of r3 align with specific capabilities in the Reflex Security platform. This section examines each requirement against documented platform features.

Frequency and cost structure. Consulting-led tabletops typically cost between $50,000 and $100,000 per engagement. At that price, most organizations run one per year. Reflex's subscription model supports monthly or quarterly exercises at a fraction of the cost. IBM and Ponemon Institute data from 2025 found that organizations testing incident response at least twice a year reduced breach costs by $1.49 million on average. Operationalizing the continuous improvement intent of ID.IM at scale is difficult without a cost structure that sustains repeated exercise cycles.

Third-party inclusion. ID.IM-02 highlights exercises done in coordination with suppliers and relevant third parties as a source of improvement, and GV.SC-08 specifically calls for relevant suppliers and third parties to be included in incident planning, response, and recovery activities. Reflex deploys AI agents that simulate the roles of legal counsel, public communications teams, cyber insurers, executives, and external stakeholders. These agents can participate regardless of whether the actual personnel are available, removing the scheduling friction that typically excludes third-party roles from exercises.

CSF 2.0 traceability. Reflex generates after-action reports that map participant behaviors and findings to NIST CSF functions. Reports draw on timestamped activity logs, producing audit-ready documentation aligned with the CSF 2.0 structure. Findings include evidence excerpts, priority rankings, and assigned action items.

Longitudinal improvement data. The continuous model of ID.IM assumes organizations track performance trends across exercise cycles. Reflex captures quantitative metrics across repeated exercises: response time trends, recurring gaps, and maturity levels mapped to industry frameworks. This produces the longitudinal record necessary to demonstrate that improvements are being identified and acted upon.

Realistic scenario design. SP 800-84 emphasizes realistic exercise design. Reflex builds scenarios from OSINT data about the organization's actual technology stack. AI adversaries respond dynamically to participant decisions, producing cascading consequences that evolve with each choice. The result is a more accurate assessment of team readiness and a stronger basis for identifying improvements.