FREE BOOKS

Author's List




PREV.   NEXT  
|<   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62  
63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86   87   >>   >|  
rid itself of all its calls, drop everything temporarily, and re-boot its software from scratch. Starting over from scratch will generally rid the switch of any software problems that may have developed in the course of running the system. Bugs that arise will be simply wiped out by this process. It is a clever idea. This process of automatically re-booting from scratch is known as the "normal fault recovery routine." Since AT&T's software is in fact exceptionally stable, systems rarely have to go into "fault recovery" in the first place; but AT&T has always boasted of its "real world" reliability, and this tactic is a belt-and-suspenders routine. The 4ESS switch used its new software to monitor its fellow switches as they recovered from faults. As other switches came back on line after recovery, they would send their "OK" signals to the switch. The switch would make a little note to that effect in its "status map," recognizing that the fellow switch was back and ready to go, and should be sent some calls and put back to regular work. Unfortunately, while it was busy bookkeeping with the status map, the tiny flaw in the brand-new software came into play. The flaw caused the 4ESS switch to interact, subtly but drastically, with incoming telephone calls from human users. If--and only if--two incoming phone-calls happened to hit the switch within a hundredth of a second, then a small patch of data would be garbled by the flaw. But the switch had been programmed to monitor itself constantly for any possible damage to its data. When the switch perceived that its data had been somehow garbled, then it too would go down, for swift repairs to its software. It would signal its fellow switches not to send any more work. It would go into the fault-recovery mode for four to six seconds. And then the switch would be fine again, and would send out its "OK, ready for work" signal. However, the "OK, ready for work" signal was the VERY THING THAT HAD CAUSED THE SWITCH TO GO DOWN IN THE FIRST PLACE. And ALL the System 7 switches had the same flaw in their status-map software. As soon as they stopped to make the bookkeeping note that their fellow switch was "OK," then they too would become vulnerable to the slight chance that two phone-calls would hit them within a hundredth of a second. At approximately 2:25 P.M. EST on Monday, January 15, one of AT&T's 4ESS toll switching systems in New York City had an actual,
PREV.   NEXT  
|<   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62  
63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86   87   >>   >|  



Top keywords:
switch
 

software

 

fellow

 

recovery

 

switches

 
signal
 
scratch
 

status

 

systems

 

monitor


bookkeeping

 
garbled
 

process

 

hundredth

 

routine

 

incoming

 

perceived

 

repairs

 

damage

 

constantly


programmed
 

approximately

 

slight

 
chance
 
Monday
 
January
 
actual
 

switching

 

vulnerable

 

CAUSED


SWITCH

 
However
 

stopped

 

System

 

seconds

 
recognizing
 

normal

 

booting

 

automatically

 
clever

exceptionally

 

boasted

 

stable

 
rarely
 

simply

 

Starting

 

temporarily

 

generally

 

problems

 
system