I've been reading Yes to the Mess by Frank Barrett, and I'm confused about a queueing problem mentioned in it. He gives an example from Kip Hawley's book Permanent Emergency, which I'll quote here:
The Powder River Basin was the single biggest leverage point for increasing profitability in Union Pacific territory [...] one of the most important coal-producing areas in the United States - a place where our trains always seemed to get bottlenecked at a single line of rail leading to the coal fields while transporting coal to many of the nation's electric power plants.
Because on-time performance from this particular spot was so important - a serious delay in delivery could endanger the supply of electricity to the entire city of Atlanta - Union Pacific spent enormous energies trying to improve efficiency. We rushed high-priority coal cars to a continuous queue just outside the single-point entry to the basin. We advanced new, empty cars right after the previous train moved out loaded with coal. But instead of maximizing efficiency, we were overdoing it. One of the consequences of focusing so much operational and tactical energy on wringing every last second out of the process is that we left ourselves precious little slack when something did go wrong. [...]
It is simply the nature of large, heterogeneous systems like a railroad network to have things go wrong all the time. And as soon as something went wrong with one train, the other trains we'd stacked up behind it were stuck. Lining up all the trains in a row, we realized, had effectively squeezed all room for error out of the system and was slowing down our delivery schedule.
After letting that conundrum soak in, one of our brainstorming teams proposed a solution that directly contradicted the time-maximization mode we'd been toiling in. What if, rather than rushing the empties to the gridlock point, we staged the coal cars far away from the troublesome intersection and then flowed them in, so they arrived when the intersection was clear? Rather than trying to cram in as many priority trains as possible, we dispatched the cars to a collection of holding points dispersed across the railroad, making sure that the Powder River Basin's access point wasn't idle for very long. It worked. Not only did it clear up the gridlock, it also increased the number of daily coal trains by 30 percent. [...]
Another way of thinking about that solution was that railroad dispatchers were building resilience into the process. Previously, we'd put all our eggs in one perfect basket, leaving us no viable secondary options if the basket filled up. It was true that our new system of flowing in trains was not technically as time-efficient as the first system, but by accounting for the time eaten up by unpredictable problems that plague any complex network, it was ultimately more successful.
I don't understand how this can possibly be true.
I understand how this could improve average on-time performance. Keeping all available coal trains queued at the Powder River Basin means that they are not available elsewhere, so if there's a delay at that single bottleneck, it will cascade throughout the network. The first queueing train probably increases utilization of the one coal field line, but the tenth improves performance much less in expectation there, while it could easily improve performance in emptier areas of the network by quite a lot.
What I don't understand is how this could possibly improve throughput at the intersection, unless there's some key factor I'm missing. It's conceivable that the bottleneck could have been overutilized, or that there wasn't just the one line and there was an alternate route that farther-away trains could have gotten to, or that the section of track trains were queueing on was also sometimes needed as an exit, but nothing like that gets mentioned by Hawley as far as I can tell. The claim seems to be, simply, that shorter queues caused higher throughput.
What am I missing?
If I'm understanding correctly, they kept the logical structure the same but moved the physical waiting points farther from the bottleneck.
This had the downside that if they miscalculated the travel time from the waiting point to the bottleneck, the bottleneck track would stand idle.
It had the upside that they could reorder the queue until the last minute just by issuing orders, or at worst by backing up a single train a short distance.
Also, it meant that a train entering the bottleneck would be continuing, rather than starting. This has a lower chance of discovering a problem at the wrong moment.
If I understand the situation correctly, the issue was that, when the line was queued up at the intersection, with train A queued in front of B on through to Z, any mechanical issue with any train resulted in all prior trains getting blocked. It's a single long queue.
Instead, the trains were queued up behind other intersections, such that A-C are in one queue, D-F are in another, and so on, minimizing the amount of time any given train spends on the bottlenecked rail(s), thus minimizing the odds any given train would have a mechanical failure on those bottlenecked rails. (They'd instead fail on a non-bottlenecked rail.) So it's multiple parallel queues all feeding into a single short queue.
I'd hazard a guess that the mechanical failures would occur most frequently when the train was stopped/starting, concentrating failures even further in the parallel section of queues, particularly if the schedule guaranteed that no stopping was necessary at the bottleneck intersection.
Attempting the math, although I probably got something wrong here:
Say 400 trains need to go through per day, 200 full, 200 empty.
If you maintain a queue of 50 at the bottleneck, 50 get through per hour.
The bottlenecks region takes 2 minutes to traverse at speed, for a throughput of 30 per hour otherwise.
The clear choice is to queue the trains up.
Add a 2% chance of mechanical failure per train hour, however, and assume a mechanical failure takes an hour to fix. With the long queue, the odds of a mechanical failure are approximately 64% per hour, reducing the effective throughput from a theoretical maximum of 1,200 to 432. (Again, I probably messed the math up somewhere, but you get the idea) On an unlucky day, you're screwed.
The odds of a failure for the short queues, however, is 2% per hour. Your average throughput is 705 trains per hour.