CHAPTER 7

THE ULTIMATE SOLUTION TO THE FRAGMENTATION PROBLEM

As noted at the beginning of the previous chapter, the first defragmenter became available for VMS in 1986. Within a year, there were seven on the market. During the next few years, defragmenter competition and a skeptical marketplace weeded out the weakest of the products and one defragmenter, DISKEEPER, from Executive Software, rose to dominate the field, achieving an installed base roughly double that of all others combined. Obviously, this defragmenter had something going for it.

I believe that the reason for the success of DISKEEPER is that careful market research was done to find out what System Managers needed and wanted most in a defragmenter, careful technical research was done to determine whether that could be provided, and we then delivered to the System Managers all they had asked for and more. Since that time, we have always tried to outdo ourselves, enhancing the product to fit customer needs and take advantage of new technology, always striving for the seemingly unobtainable goal of the utter elimination of fragmentation as a System Manager headache.

We have come very close, and we are not finished yet.

I have nothing to say about other defragmenters. Most of them have disappeared from the market, for a variety of reasons. The ones that remain are offered by well-intentioned people who really believe their product is the best for you. I know that it is the customer who decides and, in a free market, he who serves the customer best wins. I want to win, but only if you win, too. By giving you everything you want and more, at a fair price, with first-class service, we all win.

Design Goals

The driving requirement for DISKEEPER was that it run automatically, in the background, safely reorganizing files as needed to keep a disk performing optimally, while users continued accessing files on the same disk.

DISKEEPER was designed with the following goals in mind:

1. The product must be completely safe to use.

2. It must make OpenVMS operate more efficiently.

3. It should process any OpenVMS supported ODS-2 disk.

4. It should process live disks without interfering with user access to files on that disk.

5. It should operate while OpenVMS is running without affecting performance.

6. It should process system disks as well as user disks.

7. It should run without operator intervention.

The implementation of each of these design goals is discussed in detail below.

Goal 1: Safe to Use

The foremost design goal was to make sure that no data is ever lost - neither user data nor VMS internal information. To accomplish this, the DISKEEPER proprietary method for relocating files was developed. It uses the following criteria for accessing files:


The program was designed to err on the side of caution. In other words, the program only moves file information on the disk when it is absolutely certain that no data will be lost, including file attributes. The only change to file attribute information is the physical location of the file on the disk. None of the file dates are changed and no reserved fields in the header are used to store DISKEEPER information. Placement control is not changed unless DISKEEPER is explicitly instructed to do so by the System Manager.

If your system crashes while DISKEEPER is running, or if DISKEEPER aborts abnormally, the worst that can happen is that some empty disk blocks may end up marked allocated when they are not part of any file. DISKEEPER properly deallocates any such blocks resulting from DISKEEPER interruption.

With OpenVMS V5.5, Digital introduced a mechanism for moving files that is guaranteed safe by Digital. This mechanism, called the MOVEFILE primitive, only moves a file if the following conditions are met:


When DISKEEPER is run on OpenVMS V5.5 or higher, you may select either of these methods of relocating files - proprietary or MOVEFILE.

Goal 2: Make OpenVMS More Efficient

When a file is moved by DISKEEPER, it is made contiguous or, at the very least, less fragmented. If it is already contiguous, the file is not moved unless moving it would markedly improve the arrangement of free space on the disk.

With plenty of contiguous free space, file creations are faster and new files tend to be created contiguously, or nearly so. To demonstrate this, try copying a large file on a non-DISKEEPER disk (use the DCL COPY command), then do the same on a disk processed by DISKEEPER (or run DISKEEPER on the same disk and COPY the same file). Use the DCL DUMP /HEADER command to examine the file headers of the copied files. You should see fewer map pointers for the file created on the DISKEEPER disk than on the other.

All this adds up to better performance because files are created faster and files can be accessed more quickly because they are contiguous.

Note that the goal was not "to make every file contiguous" or "to combine all free spaces into one large contiguous free space." Disk perfection is not a requirement to get better performance from OpenVMS. In fact, a perfect disk will perform no better than a nearly perfect disk. While a single giant contiguous free space will allow the creation of a single giant contiguous file, it does no more for performance than a small number of relatively large contiguous free spaces. It is not the difference between one 100,000 block space and four 25,000 block spaces that makes a difference in performance; it is the 30,000 three-block spaces that really hurt.

Nonetheless, DISKEEPER will do an excellent job of consolidating free space on your disks. But do not use this as a yardstick for measuring defragmentation benefits; it is the number of fragments into which your files are broken that really impacts disk I/O performance.

How much better will performance be? That depends on your particular circumstances. If your system is not I/O bound, the gains may be slight. If it is, the gains should be dramatic. It is not unreasonable to expect a 20% improvement in CPU utilization and disk I/O from even a well-managed system. Some sites may achieve a much greater improvement.

Goal 3: Process any OpenVMS ODS-2 Disk

This design goal was accomplished by using OpenVMS itself to do the "diskeeping" wherever possible.

DISKEEPER supports the entire range of OpenVMS ODS-2 disk types: system disks, common system disks, quorum disks, user disks, volume sets, stripesets and shadow sets. DISKEEPER supports fixed, removable, and floppy disks. It works in clusters whether the disk is on a local controller, an HSC, MSCP served, or LAVC-style MSCP served. It can deal with empty or full disks and anything in between.

DISKEEPER works with all Digital and third-party disk controllers.

DISKEEPER is designed for any Digital-supported configuration.

Note that system disks and common system disks really are processed. DISKEEPER does not merely exclude all files in system-rooted directories. DISKEEPER actually processes all files on a system disk except open files and a few reserved files that cannot be moved while OpenVMS is running from that disk. The same applies to common system disks.

Goal 4: Process Live Disks Without Interfering With User Access To Files

As covered earlier, it is not acceptable to force users off the disk while defragmenting it. To do so would be a case of the cure being worse than the disease. Access to fragmented files is better than no access at all.

The only acceptable solution is to defragment on-line with users active on the same disk. DISKEEPER was designed with this in mind, and accomplishes the task without compromise, primarily due to the following features:

No File Access Conflict

During most of the time DISKEEPER is processing a file, it shares the file with any other users that may access the same file. The last step of processing the file, however, involves locking the file for a very brief period, the duration of two QIO operations, a matter of milliseconds. If another user requests a file that DISKEEPER has locked, that request is suspended for the brief period until DISKEEPER releases the file. Then the request is serviced. There is never an interruption of either process as a result of this delay.

I/O Throttling

DISKEEPER limits its own I/O to the equivalent of disk I/O "idle time." This feature, especially important for the MicroVAX RQDXn disk controller, makes the impact of DISKEEPER on the load of your VAX or Alpha AXP virtually unnoticeable, even during peak levels of activity. This feature is particularly important on any system where I/O to the disks is usually at or close to the maximum possible throughput. Suspending defragmentation activity when users most need access to their data assures maximum system performance.

Exclusion List

DISKEEPER gives the System Manager the option of excluding certain files from processing. The Exclusion List is evaluated at the start of each set of multiple passes and the files specified (in the list) are skipped over by DISKEEPER.

On-Line Directory Moves

DISKEEPER moves directory files, provided the directory is not open. This allows larger contiguous free spaces to be made which, in turn, allows larger files to be defragmented by DISKEEPER, or created contiguously by the user.

Caches Updated

DISKEEPER does take into account the file header cache, and makes sure that the file headers are correctly updated so that no data is lost. The extent cache is not changed.

Open Files Ignored

Files that are always held open are not processed by DISKEEPER. These files can be made contiguous safely only by DCL COPY /CONTIGUOUS, by backup and restore, or by closing the files so DISKEEPER can process them. As long as the files remain open, they will be untouched by DISKEEPER.

Goal 5: Operate While OpenVMS Is Running Without Affecting Performance

Three steps were taken to assure that DISKEEPER overhead had the lowest possible impact on system performance:

First, DISKEEPER is designed to be run as a detached process running at priority 2. With the typical OpenVMS system running user jobs at priority 4 and batch jobs at priority 3, DISKEEPER will use only CPU time that would otherwise be idle. Priority 1 remains available for even lower priority jobs that you do not want to interfere with DISKEEPER.

Second, advanced system programming techniques were used to write DISKEEPER, to assure the highest possible performance. It uses QIOs for I/O instead of high-overhead RMS services, and it copies a file only once - directly from its original location to the new location. No intermediate copies are made, so no scratch space or second device is required.

Third, DISKEEPER includes a built-in I/O throttling capability. DISKEEPER monitors I/O on the disk being processed and adjusts its own I/O accordingly. If the I/O rate increases, DISKEEPER reduces its own I/O. If the I/O rate decreases, DISKEEPER raises its I/O level. This mechanism effectively limits DISKEEPER I/O to the equivalent of disk "idle time."

As proof of its efficiency, DISKEEPER typically requires only a few minutes of CPU time per day to keep an active 456MB RA81 disk defragmented. This constitutes overhead of a small fraction of 1%.

Goal 6: Process System Disks As Well As User Disks

A system disk by itself has little need for defragmentation because few files are ever created on the system disk. The only files ordinarily created on the system disk are log files. These do not particularly affect performance because they are rarely, if ever, read. Some sites, however, put user files on the system disk, and small systems such as MicroVAXes sometimes have only one disk for both system and user files. DISKEEPER can be run on such a shared system/data disk without having to shut the system down and without making the system unusable during the processing.

DISKEEPER processes are automatically prevented from moving all system files that OpenVMS will be expecting to find in a particular location on the disk. There are three different ways in which this is done.

First, any file that is currently open is not moved. In addition to open user files, this includes INDEXF.SYS on every disk and such files as PAGEFILE.SYS and all SYS$MANAGER:*.LOG files currently in use on a system disk. This includes installed images that are installed with the /OPEN qualifier, such as License Management, Cluster Server, Audit Server, Logical Name Server, and many other operating system components.

Finally, some files are excluded from DISKEEPER processing by file specification. Wild card file specifications are used to look up and identify the specific files on each disk to be excluded in this manner.

One system file is too critical to trust to exclusion by file specification. That is the boot image, VMB.EXE. Because it is possible for the boot image to have a different file name, DISKEEPER identifies the file by way of the boot block in the INDEXF.SYS file, rather than by file name, then excludes that file from DISKEEPER processing. This assures that the boot image is 100% safe, regardless of its file name.

DISKEEPER, running on any CPU in a cluster with separate or common system disks, can process all disks accessible by that node, including system disks.

Goal 7: Run Without Operator Intervention

Regardless of how much a defragmenter increases system performance, the System Manager has no need or desire for the added problem of figuring out how to run the defragmenter and taking the time to baby-sit it. System Managers need less work, not more. Accordingly, one of the primary design goals of DISKEEPER was for it to do its job with little or no intervention by a human operator.

We accomplished that in our design so well that a System Manager can literally install the software, start it up and just forget about fragmentation (and DISKEEPER) thereafter. DISKEEPER cleans up the existing fragmentation and then prevents it from returning.

I remember calling up one of our customers to see how he liked the product. I was calling specifically to find out how his life had changed now that he had had DISKEEPER on his three VAXes for six months. I was particularly interested in this fellow because he was the System Manager for a Computer Aided Design facility that depended so heavily on contiguous files that he had to backup his disks to tape and restore them every night. I thought, if anyone would love my product, this would be the guy.

When I asked him about DISKEEPER, he at first didn't know what I was talking about! Then he remembered and burst out laughing. "You know," he said, "I haven't spent even one evening in the office since DISKEEPER took over the defragmentation chores." DISKEEPER is so automatic, he had forgotten it was there.

How does DISKEEPER determine when to defragment a disk? It uses a heuristic formula, which means a formula based on feedback from the real world. Each time DISKEEPER defragments a disk, it waits a while and runs again. It compares the two passes and determines whether it had to work harder or not as hard the second time. If it had to work harder the second time, then clearly it waited too long, so it adjusts itself to wait a little less and work a little more often. If it had less work to do the second time, it adjusts itself to wait a little longer between passes. The waiting between passes saves DISKEEPER from incurring unnecessary overhead. This automatic mechanism keeps DISKEEPER running at just the right frequency to keep your disks optimally defragmented all the time with the smallest amount of system overhead.

Special File Handling

Certain file types are processed by DISKEEPER differently from others. These include partial files, multi-header files, multi-volume files, placed files, directory files, INDEXF.SYS, page files and swap files. The differences are explained below.

Partial Files

If a fragmented file cannot be made contiguous, DISKEEPER can make the file less fragmented by partially defragmenting it. It uses the largest free spaces on the disk and moves successive file fragments into these spaces. This feature allows DISKEEPER to process a file even when the file is bigger than the largest free space on the disk. DISKEEPER uses this mechanism to process a file to obtain the minimum number of fragments that can be achieved within free space constraints.

Multi-Header Files

Sometimes file fragmentation can become so bad that all the pointers to the pieces of a badly fragmented file will not fit in a single Files-11 file header. When this occurs, OpenVMS allocates a second file header for the same file and the file becomes known as a multi-header file.

When DISKEEPER encounters a multi-header file, it defragments the file segments that are associated with each of the file's headers. Having done that, it cannot accomplish further defragmentation of a multi-header file because it cannot safely consolidate the segments of the file mapped to different file headers. To consolidate two or more file headers would mean having to do multiple I/Os to the disk to complete the header consolidation. DISKEEPER accomplishes all defragmentation using only atomic (uninterruptable, indivisible) operations for critical actions such as updating file headers. This is not possible with a multi-header file.

There are two manual methods by which you can consolidate multi-header files. The file can be copied with either the COPY command or the BACKUP utility. Among the drawbacks of these two approaches are:

DISKEEPER includes a Multi-Header Consolidate Utility (MHC). With MHC, the System Manager has a third and better method available for consolidating multiple header files. MHC protects the files from the risks of automatic consolidation in the following ways:


MHC allows the System Manager to consolidate all eligible multi-header files on a disk, one by one, without the drawbacks of using COPY or BACKUP. This is true because:

Multi-Volume Files

DISKEEPER does not process a volume set as a set. Each disk in the volume set is processed separately and defragmented as an individual disk. Files are not relocated from one volume in the set to another.

A single file that spans two or more disk drives in a volume set, however, presents a particularly delicate problem for relocation. Often, the spanning is deliberately planned because of the unusually large size of the file. In this case, relocating the entire file to one disk may actually worsen performance. For this reason, DISKEEPER compresses each component of the multi-volume file separately and retains the component on its original disk volume. In other words, a multi-volume file remains a multi-volume file after processing by DISKEEPER, but the portion of the file on each volume is normally made contiguous.

Placed Files

Placed files are files that are deliberately located at a particular place on the disk by the System Manager. Usually, this is only done in a real time environment where file placement is critical for maximum speed. On an interactive OpenVMS system, placement control is not beneficial and can even worsen performance.

DISKEEPER leaves placed files where they are unless it is told to move them. Its Disk Analysis Utility can be used to list the placed files on your disk, if any exist. Then DISKEEPER can be used to remove the placement control from the files and relocate them as needed.

Directory Files

DISKEEPER moves directory files, unless forbidden by an override. As with any other files, directory files are moved only if moving them would improve the arrangement of free space on the disk.

Some people believe that placing directory files near the physical middle of a disk enhances performance. While this is true for some other operating systems, OpenVMS caches directories. If properly tuned, the directory cache hit rate should be at least 90%, meaning that directories are accessed from memory, not from disk. Therefore, the physical location of directory files on the disk is irrelevant for optimizing directory lookup time.

If directory files are not moved, it is more difficult for DISKEEPER to make a large contiguous free space. The free space tends to be broken up by immovable directory files.

INDEXF.SYS

INDEXF.SYS is used by OpenVMS not only for file headers but also as a container file for the OpenVMS home blocks. These blocks are scattered in physically distant locations to maximize the probability that one of them will be intact following a physical failure of the disk. Accordingly, it is neither possible nor desirable to make the INDEXF.SYS file contiguous and DISKEEPER does not do so, nor does any other means of defragmentation, such as backup and restore.

DISKEEPER holds INDEXF.SYS open for the duration of each defragmentation pass.

Page Files and Swap Files

PAGEFILE.SYS and SWAPFILE.SYS are not defragmented when DISKEEPER is run on-line. These two files and their alternates should be created contiguously initially and should remain so.

Alternate page and swap files can be processed by DISKEEPER when they are not installed. When they are installed, DISKEEPER detects them as unprocessable and skips over them, whether they are on the system disk or any other disk.

Note: Fragmentation of PAGEFILE.SYS should not be confused with fragmentation of the record space within that file. This latter form of fragmentation is reported by OpenVMS with the message PAGEFRAG, pagefile badly fragmented, system continuing. The condition warned about by this message cannot be resolved by defragmenting the page file, as it indicates that the page file is probably too small. The condition can be temporarily alleviated merely by rebooting the system, which causes the page file to be flushed and reloaded. To correct the condition permanently, it is necessary to extend the page file or create a new one with sufficient space. DISKEEPER can be used effectively to create a contiguous free space large enough for a page file of sufficient size.

The Impact of Moving Files

When DISKEEPER relocates a file on a disk, only the mapping pointers in the file header are changed. The mapping pointers tell OpenVMS where the file is located on the disk. The file ID is not changed; the creation, modification, expiration and backup dates are not changed; and no data in the file is ever changed.

No reserved fields in the file header are used by DISKEEPER. The file in its new location is bit-for-bit the same as before the move. No change is made to the file's allocation, either. Even if excess blocks are allocated to the file, DISKEEPER leaves the allocation the same.

Only with this hands-off approach can you be confident that your data is safe.

What Makes DISKEEPER Unique?

DISKEEPER is rich with features, but lives up to its reputation for being "elegant in its simplicity." By using a simple approach, useful features are incorporated, yet system overhead is kept to a minimum. Nearly twice as many VAXes are defragmented with DISKEEPER than all other defragmenters combined.

DISKEEPER was designed with the basic assumption that the files on a disk are constantly changing. In a typical OpenVMS timesharing environment, new files are being created, and existing files are being accessed, updated, and extended by a large number of diversified users. DISKEEPER was designed to operate under these conditions, without adversely affecting performance of the applications on the system.

DISKEEPER is designed to run as a detached process, alternating between brief periods of defragmenting the disk and long periods of inactivity. It automatically determines the frequency of defragmentation periods, based on the file activity experienced on each disk.

DISKEEPER can keep a reasonably active 456 MB RA81 disk, for example, defragmented in just a few minutes of CPU time per day. If it took an hour of CPU time to defragment a disk, that hour might be more than the performance benefits of defragmenting, so the cure would be worse than the disease.

DISKEEPER does not waste valuable system resources by attempting to "optimize" a disk.

DISKEEPER adjusts the level of its own direct I/O to assure that it does not interfere with the I/O requirements of application processes on the system. It typically runs as a detached process at priority 2, so it uses what would otherwise be CPU "idle" time.

DISKEEPER defragments one file at a time, choosing a new location for the file that best defragments disk free space, also. In the course of defragmenting that file, it is never in an unsafe state. The data in the file is accessible from application programs, without risk, at all times.

DISKEEPER has a unique method for checking the integrity of data blocks on a disk. This feature provides the System Manager with an early warning system for detecting potential problems. It does this by indicating to the System Manager the presence of invalid data blocks on a disk. The DISKEEPER validation procedure checks for:

1. Multiply allocated blocks. These are blocks allocated to more than one file at the same time.

2. Blocks that are allocated to a file but appear to be free according to the storage bitmap.

Based on the information it finds in the validation procedure, DISKEEPER decides whether or not to run, and lets the System Manager know exactly where the problem blocks are located and in which files, so that the System Manager can take steps to handle the situation.

DISKEEPER includes an interactive utility for safely consolidating multiple header files on a disk, without risk to file attributes or Access Control List (ACL) data.

DISKEEPER includes a 100% full satisfaction money-back guarantee.

DISKEEPER technical support is available 24 hours a day, 7 days a week.

DISKEEPER is the ultimate answer to your fragmentation problem. From the day you install it, you will never have to concern yourself with fragmentation again.

Unless you already have DISKEEPER installed and running on your OpenVMS disks, fragmentation is costing you time, money and performance every day. If you follow the advice in Appendix B on the cost of fragmentation, you will see that the cost is substantial - certainly more so than the price of DISKEEPER.

System Managers sometimes see clearly the need for and benefits of DISKEEPER, but they have a hard time communicating these effectively to management. It seems like others view this important product as a nice-to-have. If this sounds familiar to you, see Appendix C on justifying the purchase of a defragmenter to management.

Conclusion

In this book, I have explained the ins and outs and the terminology of disks. I have explained fragmentation in considerable detail. I have shown you how to detect fragmentation and explained what is wrong with it. I've shown you what you can do about it and how to get the computer to clean up after itself. I have even included methods for calculating the cost of fragmentation and justifying the cost to management. In short, I have told you just about everything I know about fragmentation and defragmentation.

I have done what I can do. The rest is up to you.

My purpose in doing this has been to educate. I believe that the more you know about fragmentation, about defragmentation and about System Management, the better off you will be.

If I have missed something, if you have any questions, or if you just want to communicate, write to me at:

Executive Software
701 North Brand Boulevard, 6th Floor
P.O Box 29077
Glendale, California 91209-9077

[PREVIOUS PAGE][NEXT PAGE][RETURN TO TOP][TABLE OF CONTENTS]