Prior Authorization as a Queueing System

An Engineerable Process Disguised as Paperwork

A 15-provider multi-specialty practice submits 400 prior authorization requests per month. Each request enters a multi-stage queue: submission, payer review, decision (approval or denial), and — for denied requests — appeal. The practice’s average time-to-resolution is 8.4 business days. The denial rate is 18%. Of denied requests, 60% are appealed, and of those appeals, 60% are ultimately approved. Staff spend a combined 34 hours per week managing the PA process — collecting clinical documentation, completing payer-specific forms, following up on pending requests, filing appeals.

This is not an administrative annoyance. It is a queueing system with measurable arrival rates, service times, utilization, rework loops, and throughput. Every parameter is observable. Every bottleneck is identifiable. And the system’s behavior follows the same queueing dynamics that govern ED flow, bed management, and referral networks. The difference is that almost no one treats it that way. Prior authorization is managed as a clerical burden to be endured rather than an operations problem to be engineered.

The AMA’s 2022 Prior Authorization Physician Survey found that the average physician practice spends the equivalent of two full business days obtaining each prior authorization, and that practices complete an average of 45 prior authorizations per physician per week. Gottlieb et al., writing in Health Affairs, documented that PA-related administrative burden consumes between 12% and 18% of total practice operating costs in high-authorization specialties like rheumatology, oncology, and cardiology. These are not sunk costs. They are capacity diverted from patient care — capacity that OR methods can quantify and partially reclaim.

The Multi-Stage Queue

Prior authorization operates as a tandem queue — a series of sequential processing stages, each with its own service rate, variability, and failure mode.

Stage 1: Submission. The practice assembles clinical documentation, completes the payer-specific form, and transmits the request. This stage is labor-intensive and variable. A straightforward radiology PA for an established protocol might take 15 minutes. A complex medication PA requiring chart review, letter of medical necessity, and step-therapy documentation can take 90 minutes or more. The service time distribution at this stage has high variance — a coefficient of variation well above 1.0 — which, per the Pollaczek-Khinchine formula, inflates queue length even at moderate utilization. When one staff member handles all PAs, they become a single-server queue with highly variable service times — the worst possible configuration for wait-time performance.

Stage 2: Payer Review. The request sits in the payer’s queue. Processing times vary by payer, by authorization type, and by whether the request triggers medical director review. CMS data on Medicare Advantage prior authorization shows median decision times of 3-5 business days, but the distribution is right-skewed: 10-15% of requests take 10+ days. The practice has no control over this stage’s service rate, but it can influence arrival quality — a complete, correctly coded submission avoids the “returned for additional information” loop that restarts the clock.

Stage 3: Decision. Approval clears the queue. Denial creates a branching path: accept the denial, appeal, or pursue a peer-to-peer review. This is the fork that determines whether the request exits the system or re-enters it as rework.

Stage 4: Appeal. Denied requests that are appealed re-enter a slower queue. Appeal processing times typically run 2-4x longer than initial review. The appeal stage has its own denial rate, and in some payer systems, a second-level appeal is available — creating a potential third pass through the queue.

Each stage transition introduces delay, variability, and the possibility of information loss (incomplete documentation forwarded, clinical rationale not carried through to appeal). The total system behaves as a series of M/G/1 queues with rework loops — a structure well-studied in manufacturing operations research (Buzacott and Shanthikumar’s foundational work on queueing networks with feedback) but rarely formalized in healthcare administration.

Denial and Rework: The Feedback Loop That Inflates Demand

When 18% of 400 monthly submissions are denied, that produces 72 denials. If 60% are appealed, 43 requests re-enter the queue as appeals. These 43 cases consume staff time for appeal preparation (typically 1.5-2x the effort of initial submission), occupy payer review capacity, and extend the total time-to-resolution for those patients.

The critical insight: denials do not reduce system load. They increase it. Each denial that triggers an appeal adds a new work item to the queue without adding a new patient or a new clinical need. The effective arrival rate to the PA processing system is not 400/month — it is 400 initial submissions plus 43 appeal submissions, or 443 equivalent work units. At the higher per-unit effort of appeals, the labor-equivalent load is closer to 465 initial-submission-equivalents.

This is a rework multiplier, and it follows a geometric series. If the initial denial rate is d and the appeal rate is a, the rework multiplier on effective arrivals is approximately 1 / (1 - d * a). At d = 0.18 and a = 0.60, the multiplier is 1 / (1 - 0.108) = 1.12 — a 12% inflation in effective workload from rework alone. At higher denial rates, the multiplier grows. A practice facing a 30% denial rate with 70% appeal rate sees a multiplier of 1.27 — more than a quarter of its PA workload is pure rework, processing the same clinical need through the system a second time.

For the revenue cycle, each rework case also delays the associated procedure or medication start. If the average PA-to-procedure interval is 5 days for approved requests and 18 days for denied-then-appealed requests, the 43 monthly rework cases each carry 13 additional days of revenue delay. For a practice where the average procedure value is $2,400, that is 43 cases x 13 days of delay, representing a measurable cash-flow drag. The financial cost of PA delay is not hypothetical — it appears in days in accounts receivable and in procedures that slip from one reporting period to the next.

Auto-Approval Thresholds: A Signal Detection Problem

Not all prior authorization categories carry meaningful clinical risk. Many are high-volume, low-risk requests where the historical denial rate is below 2% — routine imaging for standard indications, generic medication renewals, established therapy continuations. Reviewing these requests costs more than approving them. The payer’s review labor, the provider’s submission labor, and the patient’s delay collectively exceed the cost of the small number of inappropriate requests that review would catch.

This is a signal detection problem — the same sensitivity/specificity framework described in Human Factors Module 3. The prior authorization review is a diagnostic test: it attempts to distinguish inappropriate requests (signal) from appropriate ones (noise). When the base rate of inappropriate requests is very low (<2%), even a highly specific review process generates more false positives (legitimate requests delayed or denied) than true positives (inappropriate requests correctly blocked). The positive predictive value of the review collapses as prevalence drops — a direct application of Bayes’ theorem to administrative process design.

The optimal threshold is calculable. When the cost of reviewing a request (staff time on both sides, patient delay, revenue cycle drag) exceeds the expected cost of approving an inappropriate request (the base rate of inappropriate requests multiplied by the cost of the inappropriate service), review destroys value. For a PA category with a 1.5% denial rate, where each review costs $45 in combined payer-provider labor and each inappropriate approval costs $800 in unnecessary services, the expected cost of blind approval is 0.015 x $800 = $12 per request. The review costs $45 to save $12. Auto-approval is the rational threshold.

CMS has recognized this logic in its 2024 final rule on prior authorization for Medicare Advantage, which requires MA plans to implement electronic PA systems and establishes transparency requirements for denial rates by service category — creating the data infrastructure needed for threshold-based auto-approval decisions.

Gold-Carding: Queue Discipline Optimization

Gold-carding exempts providers with consistently high approval rates from the PA queue entirely. Texas enacted gold-carding legislation (HB 3459, effective 2022) requiring health plans to exempt providers whose prior authorization approval rate exceeds 90% for a given service category over the preceding six months.

In queueing terms, gold-carding is a queue discipline optimization. It identifies a subset of arrivals — requests from high-approval providers — that consume service capacity (payer review time) without meaningfully changing outcomes (they would be approved anyway). Removing these arrivals from the queue reduces system load proportionally.

If 40% of a payer’s PA volume comes from providers who would qualify for gold-card status, exempting them reduces payer review queue arrival rate by 40%. Per the utilization-delay curve (Module 2), this reduction in arrival rate — if it moves the payer’s review system from, say, 92% utilization to 55% utilization — would collapse wait times for the remaining 60% of requests. The benefit is nonlinear: removing load from a heavily utilized queue produces wait-time reductions far larger than the proportional decrease in volume.

For the provider practice, gold-carding eliminates submission labor entirely for qualifying categories. A practice that achieves gold-card status for 30% of its PA volume recovers approximately 30% of its PA staff time — time that can be redirected to clinical support, complex case management, or reducing backlogs in remaining PA categories.

Worked Example: Engineering the PA Process

Return to the 15-provider multi-specialty practice. Baseline: 400 PAs/month, 18% denial rate, 60% appeal rate on denials, 60% appeal success rate. Staff: 2.0 FTE dedicated to PA processing. Mean time-to-resolution: 8.4 business days.

Intervention 1: Payer-specific submission templates. Analysis of denial reasons reveals that 55% of denials result from incomplete documentation or incorrect coding — not clinical inappropriateness. The practice builds payer-specific submission templates that pre-populate required fields and attach standard clinical documentation by procedure type. Denial rate drops from 18% to 10%.

Effect: denials fall from 72/month to 40/month. Appeals fall from 43/month to 24/month. The rework multiplier drops from 1.12 to 1.06. Effective workload decreases by approximately 32 equivalent work units per month. At an average of 45 minutes per PA (submission + follow-up), that is 24 staff-hours/month — 0.14 FTE recovered. More importantly, 32 fewer patients experience the 13-day delay extension associated with the denial-appeal cycle.

Intervention 2: Auto-approval for low-risk categories. The practice identifies three PA categories (routine MRI for established orthopedic indications, generic SSRI renewals, physical therapy continuations) that collectively represent 25% of volume (100 PAs/month) with historical denial rates below 2%. Working with payers to auto-approve these categories — or, where payer cooperation is unavailable, delegating them to the lowest-touch processing path — reduces active PA workload by 100 units/month.

Intervention 3: Gold-card status pursuit. For the remaining PA categories, the practice tracks approval rates by provider and service line, targets 90%+ approval rates through template compliance and clinical documentation standards, and applies for gold-card exemptions where available.

Combined effect of interventions 1 and 2 alone: effective PA workload drops from 443 equivalent units/month to approximately 311 — a 30% reduction. Staff time freed: approximately 0.4 FTE. Mean time-to-resolution improves because the rework loop processes fewer cases and the remaining cases have lower documentation-related denial rates.

Product Implications

Track PA cycle time by stage, payer, and category. The aggregate “average days to PA resolution” conceals the variation that drives operational decisions. A PA dashboard should decompose cycle time into submission preparation time, payer review time, and appeal time — by payer and by service category. The payer with 3-day median review for imaging but 14-day median for specialty medications requires a different workflow than one with uniform 5-day turnaround.

Surface denial reasons, not just denial rates. A denial rate of 18% is a symptom. The diagnosis requires knowing that 55% of denials are documentation-related (fixable with templates), 25% are clinical appropriateness disputes (requiring peer-to-peer review), and 20% are coding errors (fixable with validation logic). Each category has a different intervention. Software that classifies denial reasons and tracks them over time converts a compliance headache into an improvement signal.

Calculate and display the rework multiplier. Show the practice not just their denial rate but the effective workload inflation it causes. A denial rate of 18% sounds manageable. “Your denial rate adds 43 cases/month of rework and delays $103,000 in monthly revenue” changes the conversation.

Alert on gold-card eligibility thresholds. For each provider-payer-service combination, track the rolling approval rate. Alert when a provider is within striking distance of gold-card qualification (e.g., 87% when the threshold is 90%) and identify the specific denial patterns preventing qualification. This converts gold-carding from a regulatory compliance exercise into an operational improvement target.

Earliest degradation signal: rising appeal volume as a percentage of submissions. When the appeal rate climbs, it means denials are increasing, rework is inflating, and staff capacity is being consumed by reprocessing rather than new submissions. This signal leads revenue impact by 30-60 days and staff burnout by 90 days.

Warning Signs

PA processing treated as clerical work with no performance metrics. If no one tracks cycle time, denial rates by category, or staff hours per PA, the process cannot be improved because it is not measured.
Uniform submission process across all payers. Each payer has different documentation requirements, coding preferences, and review criteria. A one-size-fits-all submission process maximizes documentation-related denials.
Denial rate accepted as a fixed environmental parameter. “Our denial rate is 18% because that is what payers do” treats a controllable variable as a given. The portion of denials driven by incomplete documentation or coding errors is entirely within the practice’s control.
Appeal decisions made case-by-case without tracking ROI. If the practice does not know its appeal success rate by payer and category, it cannot determine whether appealing a given denial is worth the staff time. Some denial categories have 80% appeal success (always appeal). Others have 15% (rarely appeal). Without data, every denial gets the same ad-hoc treatment.
No financial quantification of PA delay. If the practice cannot state the average revenue delay per PA case, it cannot make a business case for process investment. The PA queue is a revenue cycle bottleneck, but it is rarely connected to revenue cycle metrics.

Integration Hooks

Human Factors M3 (Signal Detection and Diagnostic Reasoning). Auto-approval thresholds are a direct application of signal detection theory. The PA review process is a binary classifier: approve or deny. When the base rate of truly inappropriate requests is very low, the review becomes a screening test with poor positive predictive value — most denials are false positives (appropriate requests incorrectly denied). Setting auto-approval thresholds is equivalent to choosing the operating point on an ROC curve: how much sensitivity (catching inappropriate requests) are you willing to sacrifice for specificity (not delaying appropriate ones)? The answer depends on the relative costs of each error type, which are calculable for any given PA category. Ignoring this framework means the threshold is set by payer policy defaults rather than by rational cost-benefit analysis.

Module 2 (Queueing Theory and Wait-Time Dynamics). Prior authorization is a tandem queue with feedback — one of the canonical structures in queueing network theory. The denial-appeal loop is a rework feedback loop that inflates effective arrival rate, exactly as rework loops inflate throughput requirements in manufacturing systems. The utilization-delay curve applies at every stage: payer review queues operating above 85% utilization will produce explosive wait times for the remaining non-gold-card requests. Gold-carding and auto-approval are both demand-side interventions that reduce arrival rate — and per the nonlinear utilization-delay relationship, small reductions in arrival rate near the steep part of the curve produce disproportionate improvements in processing time for all remaining requests.