Applicable release versions: AP, R83
|Description||A discussion of frame faults and GFE's.
Contributed by Harvey E. Rodstein
(Originally ran in PickWorld Magazine)
It still amazes me that with each new install of the Pick Operating System I come across, at least half of new users are surprised that they just can't reboot or power down the system whenever a small glitch occurs, like a terminal or printer hanging. Unfortunately, no one has taken the time to explain to them the shortcomings as well of the virtues of a virtual system. This problem is most apparent on R83 PC-based systems. It's hard for the experienced DOS user (and first-time Pick user) to stop looking at his or her microprocessor as a Personal Computer and start looking at it as a micro-main frame. The DOS user is quite used to using <ctrl>+<alt><del> whenever a problem occurs. The support call usually sounds something like this:
Voice on the phone: "I just got this Pick System last week and now every time I try to enter data I'm getting messages all over the place about something called a GFE. I NEVER had a problem like this when I was running FoxBase."
Me: "Did you have to reboot the system?"
Voice: "One of the terminals stopped working so we pressed the reset button."
Me: "Were there any other users doing work on the system?"
Voice: "I suppose so."
Me: "Get last night's file save tape and do a full restore."
Voice: "Last night's what?"
The importance of formatted system saves aside, there are a lot of other problems implied in this snippet of conversation. The major one being that the poor user was not educated on the most important aspects of maintaining a Pick System.
"The fault, my friends, is not in ourselves but in our frames."
The Pick System divides the disk into memory-sized buffers called frames. All data and programs are stored on disk in these frames. A frame fault occurs whenever a process references data which is not already in physical memory. A disk frame is given a sequential number called a frame id (FID). When a frame is brought into memory it is assigned to an available memory buffer address. This memory address is transparent to the user and can vary based on which memory buffers are available at the current point in time.
Physical memory (RAM/Random Access Memory) is managed by a program called the Pick Monitor. The Monitor is the traffic cop (or hall monitor if you've ever tried to go to the bathroom without a pass.) The Monitor is the only part of the Pick Operating System which is always memory resident. If a requested frame is already memory resident, the FID and Buffer address should be found in the set of FID tables which the Monitor uses to quickly find a specific frame in memory. The physical location is attached and the user process can continue until its timeslice (the maximum time allotted to each process/port/user) is used up. If the frame is not in memory, the system has some work to do. Any time that a frame is referenced and is not in memory, the Monitor has to find an avail able buffer in which to deposit it. If a buffer is found AVAILABLE, a disc read is scheduled. At this point, the monitor terminates the process timeslice and activates another process on the queue. It doesn't matter if the process has a timeslice of 30 milliseconds and only 3 milliseconds have transpired. The requirement of disk I/O cuts short the process timeslice. On completion of the disk read, the corresponding memory management tables are set up and the process is available to be reactivated. What if there are no available buffers in memory? The monitor must find the least recently used buffer which is not memory locked.
Each machine has some kind of schema for determining the activity of a buffer in memory and will indicate the most inactive buffer. When looking for a free buffer, the system takes the oldest frame not marked as WRITE REQUIRED. A non write required buffer is immediately available to be used. When a WRITE REQUIRED frame is encountered, it is scheduled to be written, but the system goes on looking for a free buffer and does not wait for the write to complete.
Memory Buffer Status:
* AVAILABLE -- the buffer is free to be used for a frame from disk.
* IO BUSY -- the buffer has a frame being read from or written to disk.
* MEM LOCKED -- the buffer is locked in memory and not available for virtual paging. This may also be called CORE LOCKED as a leftover from bygone days.
* WRITE REQUIRED -- the contents of the buffer have changed and are ready to be updated to the disk. More on this when talking about frame faults.
Anytime the contents of a frame is changed, the corresponding memory buffer is flagged WRITE REQUIRED. The system does not immediately write that frame to disk. If that were so, then system throughput would be alarmingly slow. WRITE REQUIRED indicates that the frame is ready to be written when time or necessity allows. At what point? In general, there are three situations where a write occurs. First is the automatic frame write. Automatic frame writes occur when the system stands idle. Idle means that no terminal I/O requests have occurred within a reasonable period of time. The term "reasonable" is purposely vague. The Monitor also uses the time on its hands to force WRITE REQUIRED buffers to disk. This definitely doesn't occur during peak use hours. The second instance is a result of demand paging when the Monitor is searching for a free memory buffer. If a buffer has the WRITE REQUIRED status, the contents are scheduled to be written to disk while the search continues. The third instance is when disk writes are forced at predetermined intervals. This is usually coded in the Monitor by the Pick OEM. Basically, the Monitor will attempt to write a single or set of write required frames at predetermined times between process activations. McDonnell Douglas has a verb called SET-WRITES. SET-WRITES tells the monitor to force frame writes after a number of passes through the SNU (Select Next User) loop.
>SET-WRITES 10000 -- A typical setting which designates a forced write every 10,000 times through the queue.
>SET-WRITES 3 -- Designates every 3 times through the queue and creates massive system degradation.
FINAL WORDS OF CAUTION: Don't trip the juggler! Current systems use MOS (Metal Oxide Semiconductor). This RAM is fast but volatile. When power is lost, so is memory. During normal operation, the portions of the data base may exist in a state of flux. Frames are constantly moving from disk to memory and from memory to disk.
The Monitor can be likened to a juggler. If your system goes down be cause of a power failure, or hangs due to an internal system bug or hardware problem, then it is quite possible that the Monitor has been caught in mid-juggle. This can corrupt the data base. This corruption is called a GFE (group format error.) The worst case occurs when the overflow table (a table of unused disk frames) has been changed in physical memory, but not yet flushed to disk before the system went down.
In order to avoid data problems, the virtual machine must be given a chance to perform a graceful shutdown. The verb for this may be POWER-OFF, or SHUTDOWN, or :WARMSTOP, depending on the implementation. These processes make sure there are no active users and subsequently flush all WRITE REQUIRED buffers so that the system can be powered down with no data cor ruption.
Certain UPSs (Uninterruptible Power Supply) will detect a power problem and signal the machine to initiate a shutdown before everything goes to the dogs.
The moral of the story is to avoid a radical system reboot at all costs, especially if the system is in heavy use at the time. A terminal "hang" can be as simple as an X-OFF character (<ctrl>+S) having been pressed on the keyboard, (X-ON, <ctrl>+Q, should clear that up,) or it may be as complex as a failed board.
If a non-graceful reboot is required, the number of GFEs which might occur will vary based on the number of users that were addressing the disk at the time the system went down. In any case, don't let anyone on the system after it has been restarted, at least until the state of the system is resolved. GFEs are like booby traps hidden in the jungle. They may not show themselves until someone tries to use that part of the disk. Let people on slowly (one at a time). If no GFEs appear in the first few hours, you should be home free. In the worst case scenario, it may be necessary to do a full restore from the most recent file save and start the day over again.