Counter Rollover Bites Boeing 787

Counter rollover is a classic mistake in computer software.  And, it just bit the Boeing 787.

The Problem:

The Boeing 787 aircraft's electrical power control units shut down if powered without interruption for 248 days (a bit over 8 months). In the likely case that all the control units were turned on at about the same time, that means they all shut down at the same time -- potentially in the middle of a flight. Fortunately, the power is usually not left on for 8 continuous months, so apparently this has not actually happened in flight.  But the problem was seen in a long-duration simulation and could happen in a real aircraft. (There are backup power supplies, but do you really want to be relying on them over the middle of an ocean?  I thought not.) The fix is turning off the power and turning it back on every 120 days.

That's right -- the FAA is telling the airlines they have to do a maintenance reboot of their planes every 120 days.

(Sources: NY Times ; FAA)


Analysis:

Just for fun, let's do the math and figure out what's going on.
248 days * 24 hours/day * 60 minute/hour * 60 seconds/minute = 21,427,200
Hmmm ... what if those systems keep time as an 32-bit signed integer in hundredths of a second? The maximum positive value for such a counter would give:
0x7FFFFFFF = 2147483647 / (24*60*60) = 24855 / 100 = 248.55 days.
Bingo!

If they had used a 32-bit unsigned it would still overflow after twice as long = 497.1 days.


Other Examples:

This is not the first time a counter rollover has caused a problem.  Some examples are:

  • IBM: Interface adapters hang after 497 days of uptime [IBM]
  • Windows 95: hang after 49.7 days without reboot, counting in milliseconds [Microsoft]  
  • Hong Kong rail service outage [Blog]
There are also plenty of date roll-over bugs:
  • Y2K: on 1 January 2000 (overflow of 2-digit year from 99 to 00)   [Wikipedia]
  • GPS: 1024 week rollover on 22 August 1999 [USCG]
  • Year 2038: Unix time will roll over on 19 January 2038 [Wikipedia]

There are also somewhat related capacity overflow issues such as 512K day for IPv4 routers.

If you want to dig further, there is a "zoo" of related problems on Wikipedia:  "Time formatting and storage bugs"


0 nhận xét:

Đăng nhận xét