When to Simulate
Not Every Problem Deserves a Model
A rural health network’s chief operating officer asks: “We’re thinking about consolidating two EDs. Can you model that?” The analytics team spends eight weeks building a discrete-event simulation with 14 entity types, 6 staffing configurations, and time-varying arrival distributions. The model is rigorous. It answers the question. It cost $85,000 in staff time and consulting fees.
The same COO asks: “How many nurses do we need on night shift?” The team reaches for the simulation platform again. Another six weeks. Another model. The answer — 4 nurses, given the arrival rate and target wait time — could have been produced in 20 minutes with an Erlang-C calculator and a spreadsheet.
Simulation is powerful. It is also expensive in time, data, expertise, and organizational attention. The first question warranted it. The second did not. Knowing the difference is an operational skill that most healthcare organizations lack, which means they either under-invest (spreadsheet averages for problems that require simulation) or over-invest (full DES models for problems that Little’s Law could resolve before lunch). Both failures cost real money and real time.
Three Tiers of Analytical Complexity
Tier 1 — Back of Envelope
Methods: Little’s Law (Module 2), basic utilization calculation, Erlang-C staffing tables, simple sensitivity analysis, breakeven arithmetic.
When it fits: Single resource type. Relatively stable demand. The question is directional or the stakes are moderate. You need an answer today, not next quarter.
Tools: Calculator, spreadsheet, published Erlang-C tables.
Healthcare examples:
- How many nurses on night shift? Arrival rate of 3 patients/hour, average service time of 45 minutes, target probability of waiting under 10 minutes — Erlang-C gives you the answer directly.
- Is our grant budget at risk of overrun? With well-bounded line items, a simple sensitivity analysis (what if personnel costs are 15% above estimate?) may suffice without Monte Carlo.
- What is our ED’s current patient census driven by? Little’s Law: L = arrival rate x average length of stay. If 5 patients/hour arrive and each stays 4 hours, you have 20 patients. No model needed.
The virtue of Tier 1: Transparency. Anyone can follow the arithmetic. The assumptions are visible. The answer arrives fast enough to inform the decision it was meant to support.
Tier 2 — Analytical Models
Methods: M/M/c queueing models, linear and integer programming (Module 3), network flow analysis (Module 4), CPM/PERT scheduling (Module 5), Monte Carlo simulation (Module 6).
When it fits: Well-defined problem structure. Standard or estimable probability distributions. Multiple decision variables but tractable constraints. You need an optimal or near-optimal solution, not just a directional estimate.
Tools: Specialized spreadsheet models, optimization solvers (Excel Solver, open-source LP/IP solvers), Monte Carlo add-ins or scripts.
Healthcare examples:
- How should we allocate $2M across 6 program investments? This is a constrained optimization problem. Define the objective (maximize population health impact, minimize access gaps), specify constraints (budget, minimum per program, FTE availability), solve with LP or IP.
- Is our grant budget at risk of overrun? When line items have genuine uncertainty ranges and interact (e.g., delayed hiring shifts costs to later periods), Monte Carlo simulation propagates that uncertainty into a probability distribution of total cost. You get “18% chance of exceeding budget by more than 10%” instead of “we think it’s fine.”
- What is the critical path in our 18-month transformation program? CPM analysis identifies which task sequences govern the timeline and which have float — preventing the team from treating all 47 tasks as equally urgent.
The virtue of Tier 2: Rigor without massive infrastructure. These methods produce defensible, often optimal answers. They require analytical skill but not simulation software or months of calendar time.
Tier 3 — Simulation
Methods: Discrete-event simulation, agent-based modeling, system dynamics (all Module 6).
When it fits: Multiple interacting resource types. Non-standard or empirical distributions. Time-varying demand. Complex routing logic (if-then rules, priority overrides, conditional transfers). Feedback loops where congestion in one part of the system changes behavior in another. The stakes are high enough to justify the investment. You need to test interventions before deploying them in the real system.
Tools: Simulation software (Arena, Simul8, AnyLogic, FlexSim), custom code (Python SimPy, R simmer).
Healthcare examples:
- What happens if we consolidate two EDs? Ambulance routing, triage distributions, boarding dynamics, staffing patterns, and transport times interact in ways no formula captures. DES is the right tool.
- How will adding a behavioral health crisis stabilization unit affect ED boarding? The answer depends on diversion rates, length-of-stay distributions, staffing coverage, and time-of-day effects that produce emergent system behavior.
The virtue of Tier 3: It handles reality’s full complexity. It reveals emergent behaviors — like the transport-time mortality gap in the ED consolidation example from Module 6’s simulation foundations page — that simpler methods cannot detect.
Five Questions That Route You to the Right Tier
Work through these in order. The first “yes” that triggers an escalation moves you up.
1. Are there multiple resource types that interact? A single nurse pool is Tier 1. Nurses, beds, physicians, and imaging equipment sharing patients with conditional routing is Tier 3. Two independent resource pools that don’t interact may be two separate Tier 1 problems.
2. Are the distributions non-standard or unknown? If arrivals are roughly Poisson and service times are roughly exponential, standard queueing formulas apply (Tier 1-2). If service times are bimodal (quick visits and complex cases), or arrivals surge unpredictably, you need simulation to capture the tail behavior (Tier 3).
3. Does demand vary significantly over time? Steady-state assumptions underpin most Tier 1-2 methods. If the system never reaches steady state — because demand peaks and troughs faster than the system can equilibrate — simulation captures the transient dynamics that formulas miss.
4. Is the routing complex? Patients who arrive, get served, and leave are queueing problems. Patients who arrive, get triaged to one of four tracks, may return to the queue after imaging, may board for admission, and may be transferred — that routing complexity demands simulation.
5. Are the stakes high enough to justify the cost? A staffing question for next Tuesday does not need eight weeks of model development. A $15 million facility consolidation decision does. Match the analytical investment to the decision’s reversibility and financial magnitude.
If you answered “no” to all five, you are in Tier 1. Solve it with a calculator and move on.
The Cost of Getting It Wrong
Over-engineering wastes time and obscures insight. A DES model built to answer “how many FTEs do we need in the call center?” takes weeks to develop, requires input data collection, demands validation runs, and produces an answer that Erlang-C would have delivered in an afternoon. Worse, the complexity of the model makes the answer opaque — stakeholders cannot follow the logic, so they either accept it on faith or reject it on suspicion. Neither response is useful.
Under-engineering produces confidently wrong answers. A spreadsheet that averages daily ED arrivals and divides by average service time to compute “required beds” will systematically underestimate need. It ignores variability — the force that drives the utilization-delay curve (Module 2). The answer looks precise. It is precisely wrong. And because it came from a spreadsheet that leadership can follow, it carries unwarranted credibility.
The pattern: over-engineering burns calendar time and analyst capacity on problems that did not need it. Under-engineering burns operational capacity and patient outcomes on answers that should not have been trusted. The decision tree above exists to prevent both.
Warning Signs
- Every analytical request becomes a simulation project. If the team’s only tool is DES, every problem looks like a DES problem. Most operational questions have simpler answers.
- No one can explain the model’s logic to a non-analyst. Simulation models that cannot be summarized in plain language are either solving the wrong problem or solving it at the wrong level of complexity.
- Tier 1 answers are dismissed because they “seem too simple.” Little’s Law is simple. It is also exactly correct for any stable system, with no distributional assumptions. Simplicity is a feature, not a deficiency.
- Spreadsheet averages are used for decisions involving high variability. Any time someone says “on average we have capacity,” check whether the utilization is above 75%. If it is, the average is masking a delay problem that only stochastic analysis reveals.
- The model takes longer to build than the decision window allows. A simulation that delivers results after the decision has been made is an academic exercise, not a decision tool.
Product Implications
Surface the tier recommendation, not just the answer. When an operator asks a capacity question, the product should indicate whether the answer was derived from a formula, an optimization, or a simulation — and why. Transparency about method builds trust and prevents misapplied precision.
Embed Tier 1 calculations natively. Little’s Law, utilization calculations, and Erlang-C staffing should be built into operational dashboards. These require no specialized expertise and answer 60-70% of routine capacity questions without analyst involvement.
Gate simulation requests through a complexity checklist. Before committing analyst time to a simulation project, the product or process should require answers to the five routing questions above. This prevents the reflexive escalation to Tier 3 that wastes analytical capacity.
Earliest degradation metric: the gap between Tier 1 estimates and observed reality. When a simple Erlang-C staffing estimate predicts adequate capacity but wait times are climbing, the gap signals that the system has complexity (interactions, routing, time-variance) that Tier 1 methods are not capturing. That gap is the trigger to escalate — not before.
Integration Hooks
Module 2: Queueing Theory and Wait-Time Dynamics. Tier 1 methods draw directly from Module 2. Little’s Law, utilization calculations, and Erlang-C are the back-of-envelope tools that handle single-resource, steady-state problems. The utilization-delay curve from 02-utilization-delay-curve.md is the single most important Tier 1 insight — it explains why systems fail before they appear full.
Module 6: Simulation and Scenario Analysis. This page is the gatekeeper for Module 6. The simulation foundations page (06-simulation-foundations.md) opens with a problem that clearly requires Tier 3 — the ED consolidation with interacting ambulance zones, triage distributions, and boarding dynamics. This page defines when that level of complexity is justified and when simpler methods from Modules 2-5 suffice. Read Module 6 as the “how.” Read this page as the “whether.”