CHAPTER 5

WHAT CAN YOU DO ABOUT IT?


What can you do about fragmentation? Get rid of it, of course.

How? There are several ways, all of which will be explained here. It's not hopeless. Something can be done about it.


Clear off Disks

First off, you could keep your disks half empty. This discipline, enacted as a matter of policy, would keep enough free space so files would not fragment so badly on creation. It is at the moment of file creation that the fragmentation problem begins. When a disk is nearly full, the free space on the disk tends to fragment badly. This greatly increases the likelihood that a newly created file will be created in many small fragments. When a disk is half empty, the free space tends to occur in larger pools (because there is more of it), increasing the chances that newly created files will be created in a single contiguous piece or, at worst, in only a few larger fragments. So a policy of keeping disks half empty reduces the fragmentation problem by prevention.

Of course, this solution carries the drawback of having to have twice as much disk space as you really need. Nice if you have the budget.


Copy / Contiguous

A second solution is to use the DCL command DUMP /HEADER to examine files that are known to be in heavy use and, when a fragmented file is found, use the DCL command COPY /CONTIGUOUS to defragment the file, purging the old copy once the new one is made. This is a simple and inexpensive solution, but tedious, to say the least. It has the additional drawback of changing the creation and backup dates of each file copied, which means your incremental backups are going to swell with files that have not materially changed. It also changes the File I.D. for the file, which may cause problems for your batch and print queues. Further, you must be very sure not to attempt this at the same time an application is accessing the file. At best, the application could be locked out and may abort processing. At worst, the application could update the old copy of the file during your copy-making and the updates would be lost in the purge. Another major drawback of this technique is that it marks the file as having to be contiguous. This causes OpenVMS to recopy the file to another area of the disk whenever the file is extended if there is not enough contiguous free space immediately following the file. Still another major problem is alias file names. OpenVMS allows a file to have two or more different names, called aliases. The file must not be open when this method of defragmentation is used, but it is possible, if the file has an alias, that the file could be open under the alias and so this technique could fail.

The exact commands for this technique are:

$ DUMP /HEADER /BLOCKS=END:0 filespec
$ OPEN /ERROR=FILE_OPEN INPUT_FILE filespec
$ CLOSE INPUT_FILE
$ COPY /CONTIGUOUS filespec filespec
$ PURGE filespec
$ EXIT
$
$FILE_OPEN:
$ WRITE SYS$OUTPUT "File open, try again later"
$ EXIT



Backup and Restore

From the time Digital's users first started complaining about fragmentation, the officially recommended remedy was something called "backup and restore." The "and restore" part of this is the critical part. Omitted from the phrase "backup and restore" is the critical middle step - initialize.

"Backup" is easy. You already do that anyway, so it doesn't take any more time or effort than you already expend just to make sure your files are safely backed up. Backing up a disk, however, does absolutely nothing for fragmentation. To cure the fragmentation, it is necessary to then reinitialize the disk after the backup and then restore all the files to the disk.

Initializing the disk, of course, effectively deletes every file from the disk. The data can then be restored from the backup tape (or other media), and the data is restored in a clean unfragmented, contiguous fashion.

There are drawbacks to this solution, too. Not the least of these is the time it takes to restore the information to the disk. This takes just about as long as the backup process itself, which is not exactly quick.

Another drawback is the absolute requirement for a backup that it be precisely accurate in every respect. If the tape is badly flawed or the drive has malfunctioned, your data is lost. You simply cannot get that data back. So you are strongly encouraged to verify your backup tapes before initializing the disk. The verify pass, of course, also takes quite a long time.

Perhaps the most aggravating drawback of backup and restore is that it has to be done after hours. You can't very well erase all the users' files while they are using them, and people get really upset when you take away access to their files during the workday. So you stay late at night or come in on the weekend to handle this chore.

Now, in my experience as a System Manager, my friends weren't working nights and weekends. They used these times for sensible activities like having fun with their families or nights out on the town. And there is nothing like long hours of tedious, boring backup and restore to remind you of that fact. They're out having fun and you're not.

To compound the aggravation, it is nearly impossible to get any other useful work done while doing a backup. If you're using a 9-track tape drive, you have to jump up every ten minutes or so to change the tape. There isn't much useful work you can do that can be interrupted every ten minutes, so you are reduced to an awful lot of busy work or just plain sitting around. And this goes on for hours.

I know at least one System Manager who would have been divorced by his wife if he hadn't solved this after-hours backup and restore problem.

To look at this from the dollars and cents viewpoint, if your system doesn't sit idle every night, what does it cost your organization to shut the thing down for a night or two to defragment the disks? It's not cheap. Even a small system costs enough to make you want to avoid downtime like the plague.

For those who don't mind these difficulties or who have no better solution, the commands for doing a backup and restore operation are:

$ BACKUP /IMAGE /VERIFY disk-device tape-device
$ INITIALIZE disk-device
$ BACKUP /IMAGE /NOINITIALIZE /VERIFY tape-device/SAVE_SET disk-device


Initialization Procedures

As long as you are initializing the disk anyway, there are several things you can do at the same time to make that disk less susceptible to fragmentation and better performing too*. These are explained in detail in the Prevention section, later in this chapter.


Disk-to-Disk Copy

An abbreviated form of the backup and restore technique that is much faster is the disk-to-disk copy. This technique requires a spare disk drive exactly like the one you want to defragment.

What you do is make an image backup (a logical copy) of the fragmented disk onto the spare disk drive. The BACKUP utility automatically initializes the new disk unless you initialize it yourself and suppress BACKUP's initialization with its /NOINITIALIZE qualifier. BACKUP then copies the files contiguously to the new disk, leaving all the free space in two large areas. Then you change the disk drive number physically so OpenVMS will know where the data is. Unfortunately, you also have to power down and restart the disk drives for the plug swap to take effect.

This technique is very fast - as fast as you can copy one entire disk onto another. The obvious drawback is the expense: it requires having a spare disk drive. Another drawback is that you still have to backup the data separately, unless you can afford to keep the spare disk drive tied up as a backup to the defragmented disk. Yet another drawback is that, to ensure the data doesn't change in the middle of the copying, you have to block access to both disks, depriving the users of access to their files for the duration of the process.

The commands for this technique are:

$ INITIALIZE disk-2 label_name
$ BACKUP /IMAGE /NOINITIALIZE /VERIFY disk-1 disk-2

The initialization advice given for the backup and restore technique earlier in this chapter applies equally to this method.


Defragmentation Software Products

There are software products available that you can use to defragment disks. These are referred to as defragmenters. They come in two forms: off-line defragmenters and on-line defragmenters. We'll examine each separately.

Off-Line Defragmenters

An off-line defragmenter is a computer program used to defragment a disk. It is differentiated from on-line defragmenters in that you have to take a disk out of service (off-line) to use the defragmenter on it.

Why? This type of defragmenter analyzes a disk to determine the state of fragmentation and then maps out a rearrangement of the files on the disk that will reduce or eliminate the fragmentation. After mapping out where the files should go, it rearranges them. This type of defragmentation has to be done off-line to accommodate the drawbacks inherent in such a method:

1. Having a separate analysis pass and then the actual file rearrangement pass presents the biggest problem. If, after calculating the ideal position for each file on the disk, some user application comes along and deletes a file, adds a new file or extends an existing file, the analysis is instantly obsolete and the planned rearrangement is unlikely to provide ideal results. In fact, rearranging files with an obsolete analysis is downright dangerous. If the defragmenter were to write data into an area it thinks is free but that has become occupied since the analysis, user data could be lost. By taking the disk out of service, so no user application can access any file on the disk, this danger is eliminated.

2. This type of defragmentation is like throwing all the files up in the air and slipping them into the right slots as they come back down. What if something happens while the files are up in the air? I am not talking about adding, changing or deleting a file. I am talking about a disaster. Suppose the system goes down or the disk fails? What happens to the data files that are "up in the air?" Most likely, they are lost.

The remedy for this is a logging facility that keeps track of what files are "up in the air" at any given moment, keeping copies of the files in a scratch space so the file can be reconstructed following a catastrophic interruption. Logging and reconstruction such as this is extremely complicated in a constantly changing environment, so such a defragmenter must be run off-line in an unchanging, laboratory-like environment.

3. Since many sites tend to keep their disks very nearly full, there may not be enough room for the defragmenter to make a copy of a file to be defragmented, particularly a large file. For this reason, the off-line type of defragmenter often uses a scratch area on a second disk for copies of files being defragmented. This may require taking two disks out of service - the one being defragmented and the one with the scratch area. It certainly requires that the defragmentation be done off-line to reduce the risk of data loss. Even so, a power failure may leave you with an important file (is there any chance it would be an unimportant file?) stranded out in the scratch area, with recovery dependent upon a special procedure you need to run to get the file back. But what if the power failure is in the middle of the night when you're not around? And what if the stranded file is the program image containing the recovery procedure?

Of course, the ultimate drawback for this off-line method of defragmentation is taking the disk out of service. Taking the disk out of service, obviously, means no one can use it. The disk, if not the entire system, is "down" for the duration of the defragmentation activity. The users' data is inaccessible. Like backup and restore, this solution carries a heavy penalty.

Now take a look at this: Let's say it takes two hours to do that defragmentation job (and I have seen reports of off-line defragmentation efforts taking ten times that long). That's two hours of lost computer time. How much performance increase does your defragmenter have to achieve to make up for two hours of complete downtime? You're right, it's a lot. It seems to me that the cure is worse than the disease.

Because of these serious drawbacks, because of the outrageous cost of shutting down a computer system for the duration of the defragmentation process and because a much better solution arrived, off-line defragmenters have all but disappeared from the market.

On-Line Defragmenter

An on-line defragmenter is one that processes disks while user jobs are active, even while user applications are accessing files on the same disk that is being defragmented. It is not necessary to take the disk off-line or allocate it to the defragmenter process.

An on-line defragmenter eliminates the drawbacks inherent in off-line defragmenters. There is no analysis pass to become obsolete when users add, change or delete files. Rather, each file is analyzed individually as the defragmenter turns its attention to defragmenting that particular file. The files are not "thrown up in the air" and juggled. Instead, each file is copied into a new location and, once safely there, removed from the old location. It doesn't use a scratch area in which files can get lost. The file being defragmented is kept intact in its original position while the new, contiguous copy is created elsewhere on the same disk.

But the real advantage of an on-line defragmenter is that of keeping the disk in service while the defragmenting is done. No more hours of downtime; no more downtime at all. Only with this type of defragmenting can the system performance improve without sacrificing an equal or greater amount of system resources to do so.

How long should a defragmenter take to do its job? Less than the time and resources being lost to fragmentation. If your system loses 20% of its resources to fragmentation, a defragmenter that consumed even 19% would be worthwhile (though not much). Clearly, the fewer resources that are consumed, the more worthwhile that defragmenter would be. The point is that some defragmenters consume 21% or even more of your system's resources. So this cost of defragmentation must be weighed against the cost of performance losses due to fragmentation.

Another major factor to consider is the amount of time and effort spent by you, the System Manager, in managing the defragmenter. The ideal on-line defragmenter would be one of the "set it and forget it" variety. You just install it on your system and it takes care of everything from then on.


Prevention

There are several major performance problems that should be handled before addressing the quality of defragmentation. These are multi-header files, user files on the system disk, and unnecessary data checking. It is also well worth your while to invest a little system management time in organizing the system so fragmentation occurs less often and is a little slower to creep in. Techniques for doing this are discussed later in this section.

Like the old saying, "An ounce of prevention is worth a pound of cure," a little care taken before fragmentation becomes a problem can save a lot of time, effort and headaches cleaning up the mess later.

Clearing unneeded files off the disks is by far the most effective means of preventing fragmentation. The more space you can keep free on the disks, the less likely files are to become fragmented. It should be noted, however, that reducing disk space in use below 50% does not improve things much - not enough to warrant the cost of maintaining so much unutilized disk capacity. On the other hand, even if you cannot keep the disks half empty, keeping them 40% empty, or 30% or even 20% helps a lot. In my own experience, I have observed that VAX disk I/O performance is great when the disk is 50% empty or more. Performance worsens slightly as the disk fills to 80% of capacity. Above 80%, performance really goes to the dogs quite rapidly and, if you are running above 90% full, you have a built-in performance problem of severe proportions.

When I see a disk above 90% full, I lose all interest in fragmentation and get busy clearing off some space on that disk. The performance gains from doing so are clearly noticeable.

It is not as difficult to free up 10% of a disk as you might think. I have personally accomplished this feat numerous times by the simple expedient of issuing a notice to all users (via a NOTICE file in the login procedure). The notice says, "The disks on our computer system are too full. Delete all unnecessary files from the system within 24 hours. If sufficient space is not freed up by then, the System Manager will delete files from the directories of the worst offenders until sufficient space is available. Files not needed now which might be needed later can be stored in archives until they are needed."

This notice always gets good results. Sometimes, it brings 20% or more free space with no further action than that. Of course, you have to actually enforce it once in a while, but a check of each user's total file sizes usually reveals a user or two who is abusing the system wholesale, with dozens of versions of old, old files that haven't been touched in months. This is the guy who saves everything and consumes as much space as half the other users combined.

To detect this varmint, merely enable disk quotas briefly and run off a report of how many disk blocks each user is consuming. Then you can turn off disk quotas if you don't really need them on. (Disk quotas cost overhead.)

Another source of wasted space is users who have left the organization. Archive their files and delete them from the system.

Time spent in clearing off disks to 20% or more free space will be the best investment you can make in improving disk I/O performance and preventing fragmentation.


Preventive Measures When Initializing a Disk

Disk Cluster Size

Keep in mind that initializing a disk erases everything on that disk. Therefore, it is advisable to use the INITIALIZE command only after first doing the BACKUP/VERIFY step, to ensure that you have a backup of the disk and that its data integrity has been verified.

Make sure you choose the correct cluster size for the intended use of the disk. Disks larger than 50,000 blocks default to a cluster size of three when initialized, but this may not be the best value for your intended use. A cluster size of one incurs the maximum possible overhead in disk I/O, but assures the availability of every last block on the disk. A cluster size of three reduces the size of the storage bitmap on the disk by a factor of three and speeds file allocation, but one or two disk blocks are wasted for every file that is not a multiple of three blocks in size. If your average file size is one, this could be a tremendous waste - two-thirds of the disk!

Here is a table displaying the amount of disk space wasted for various cluster sizes when the file sizes vary randomly:

Cluster
Size
Avg Blocks
Wasted
per File
Max Files on
456MB Disk
at 80% Full
Blocks
Wasted
1
0
712,858
0
2
0.5
356,429
178,214
3
1.0
237,619
237,619
4
1.5
178,214
267,322
5
2.0
142,572
285,143
6
2.5
118,810
297,024
7
3.0
101,837
305,510
8
3.5
89,107
311,875
9
4.0
79,206
316,826
10
4.5
71,286
320,786
11
5.0
64,805
324,026
12
5.5
59,405
326,726
13
6.0
54,835
329,011
14
6.5
50,918
330,970
15
7.0
47,524
332,667
16
7.5
44,554
334,152


Table 5-1 Wasted Space Due To Cluster Size Setting
Check the current cluster size for your disks using this DCL command:

$ SHOW DEVICES /FULL

When choosing the cluster size, consider first what is most important with respect to that disk: speed of access (large cluster size) or maximum utilization of the available space (small cluster size). Then consider what the average size of a file will be on that disk. If most files will be small and saving space is important, use a small cluster size - perhaps half the size of an average file. If most files will be large and speed is more important than saving space, use a large cluster size - perhaps 16. The maximum is 1/100th the size of the disk. Research shows that the typical disk has an average file size of eight blocks.

The command for setting the cluster size when initializing a disk is:

$ INITIALIZE /CLUSTER_SIZE=n diskname label_name

Data Checking

When initializing a disk, do not use the /DATA_CHECK qualifier unless you really need the extra level of safety it affords. /DATA_CHECK increases disk I/O by causing read-after-write operation and, optionally, read-after-read to verify data integrity. That is to say, every time an application reads from or writes to the disk, a follow-up read is performed and the data compared for accuracy. Data checking need not be turned on for this feature to be used. Critical applications can use it any time, whether it is turned on or not, by specifying data checking in their I/O routines. Having data checking turned on causes this feature to be in effect for every I/O to that disk. If this is not what you want, make sure it is turned off. The default is /NODATA_CHECK, so chances are you have not been using this feature anyway.

The command for turning off data checking when initializing a disk is:

$ INITIALIZE /NODATA_CHECK diskname label_name

Directory File Pre-allocation

The DCL INITIALIZE command allows you to pre-allocate space for directories. Unfortunately, it defaults to only sixteen directories, so most disks require additional directory file space to be allocated. The additional directories are created smack in the middle of production processing, disrupting application I/O and scattering the newly-created directory files all around the disk. When you initialize a disk, estimate how many directories will be created on it, and specify a slightly larger number with the INITIALIZE /DIRECTORIES=n qualifier.

The command for preallocating directory space when initializing a disk is:

$ INITIALIZE /DIRECTORIES=n diskname label_name

File Header Pre-allocation

The DCL INITIALIZE command allows you to pre-allocate space for file headers in INDEXF.SYS. Unfortunately, like INITIALIZE /DIRECTORIES, it defaults to only sixteen files, so most disks require additional space to be allocated to INDEXF.SYS after the disk has been initialized. The extra space allocated is often not contiguous to INDEXF.SYS, so this all-important file becomes fragmented right from the start. When you initialize a disk, it is very important for consolidation of free space on your disk that you estimate how many files will be created on it and specify a slightly larger number with the INITIALIZE /HEADERS=n qualifier.

The command for preallocating space for file headers when initializing a disk is:

$ INITIALIZE /HEADERS=n diskname label_name

Index File Location

When initializing a disk, the disk's index file can be forced to the beginning of the disk (toward LBN 0), the middle of the disk, the end of the disk, or to any specific block desired. I recommend that the index file be placed at the end of the disk using the INITIALIZE /INDEX=END qualifier. This frees up the maximum amount of space near the beginning of the disk, where OpenVMS will be allocating new files. Having few or no files near the beginning of the disk guarantees the fastest possible file creation times and increases the likelihood that new files will be contiguous when created.

The command for locating the index file when initializing a disk is:

$ INITIALIZE /INDEX=END diskname label_name



Preventive Measures After a Disk Has Been Initialized

Even if you cannot reinitialize your disks to obtain better performance, you can modify volume characteristics to improve the situation somewhat. The following commands can be used after a volume has been initialized and should be considered for use on all your disks.

Turn Off Data Checking

Use the DCL command SHOW DEVICES /FULL to find out whether a disk has data checking enabled for read, write or both. If it does, you will see in the SHOW DEVICES display heading the words, "data check on reads," "data check on writes" or both. Data checking increases disk I/O by causing read-after-write operations and, optionally, read-after-read to verify data integrity. If you have data checking enabled and you do not really need the extra level of safety it affords, disable data checking with the following DCL command:

$ SET VOLUME /DATA_CHECK=(NOREAD,NOWRITE) diskname

This is the default condition for a newly initialized disk.

Turn Off Erase On Delete

Use the DCL command SHOW DEVICES /FULL to find whether a disk has erase on delete enabled. If it does, you will see in the SHOW DEVICES display footing the words, "erase on delete". Erase on delete increases disk I/O by causing a system-specified pattern to be written into a file area when the file is deleted. The pattern makes it harder to figure out what was in the file before it was deleted. Some sites, particularly in the defense industry, require this for security purposes. If you have erase on delete enabled and you do not really need the security it affords, disable it with the following DCL command:

$ SET VOLUME /NOERASE_ON_DELETE diskname

Increase Extend Quantities

Use the DCL command SHOW RMS_DEFAULT to find out what the RMS extend quantity is. The RMS extend quantity determines how many blocks are allocated each time a file is extended. This should be raised to a large value, such as 100. If a file, such as a log file, is extended many times, and the extend quantity is small, the file is likely to become extremely fragmented because the (small) extents are created in different places on the disk. If the extend quantity were large, the file would be less fragmented because the pieces of the file would be larger. There is little adverse impact to this action, as excess allocation is truncated when the file is closed. Use this DCL command to set the RMS extend quantity:

$ SET RMS_DEFAULT /SYSTEM /EXTEND=100

Note that changing the volume extension quantity with the DCL command SET VOLUME /EXTENSION=100 is overridden by the RMS_DEFAULT value.

Other Hints

With all these preventive measures, it is important to bear in mind that the whole purpose for defragmenting is to speed system performance and responsiveness. While fragmentation is guaranteed to ruin system performance, it is not the only thing that causes performance problems. If one of the things covered in the two sections above is out, the performance degradation that results may be as bad as fragmentation-induced problems or worse. Needless to say, if your purpose is to improve performance, these remedies should be used in addition to defragmenting to get all the gains you can get.

Unneeded data checking, for example, can double the number of I/O operations on a disk. If the extra safety of data checking is not needed on your system, enormous performance gains can be had by the one simple measure of disabling data checking.

Along these same lines, you should know that system disks are the worst place to store user files. System disks work pretty hard for OpenVMS and storing user files on the system disk causes excess disk I/O whenever a user is accessing those files. Worse, that user's disk I/O, since it is I/O to the system disk, affects the performance of the entire system. The situation is compounded for a common system disk, as two or more OpenVMS systems are working from the same system disk. Give yourself a big performance boost by moving all user files off the system disk.

One last recommendation I have is that you reduce subdirectory levels. Each time a file is accessed, the file itself is not the only disk access required. The Master File Directory must be accessed, the main-level directory file, and any subdirectory files as well. Each access requires a disk read, unless the directory is already in the directory cache. Fortunately, directories are usually in the cache (often 90% of the time), so this is a minor problem. Nonetheless, a definite performance improvement can be obtained by reducing the number of subdirectory levels on a disk.


When Would You Not Want to Defragment a Disk?

The simple answer to this question is that the only time you would ever want to not defragment a disk is when it is already defragmented. It is hard to imagine a circumstance when you would want to leave a disk fragmented. But I'll do my best. . . .

The INDEXF.SYS file, as noted earlier, is deliberately fragmented into four pieces. Leave it that way if you want your disk to be accessible after the next reboot. If the index file is fragmented into more than four pieces, however, you can improve free space consolidation quite a bit by defragmenting it. The only way this can be done is by BACKUP and RESTORE, but when you reinitialize the disk between BACKUP and RESTORE, you must pre-allocate sufficient file headers (see the section File Header Pre-allocation earlier in this chapter).

OpenVMS contains support for locking a file at a specific LBN. Such a file is called a placed file. Theoretically, in a realtime environment, you could lay out a disk with files placed at specific, known points around the surface of the disk and thus guarantee the fastest possible transition of the disk head from one file to the next. I say theoretically because I have never heard of anyone actually doing this and it certainly would be a meaningless exercise for a disk in an on-line, interactive user environment. The random motions of the head seeking data for multiple user applications would defeat utterly any pre-planned movement of the head amongst placed files.

Finally, I return to the same old argument: you would not want to defragment a disk when the cost of doing so exceeds the return you would get in terms of performance. So after each defragmentation pass, some little time must pass and some amount of performance degradation must occur for the cost of the next defragmentation pass to be justified. Let me tell you, we are really getting down to the picky details now. All I am saying is that you would not want to attempt to defragment a disk immediately after you had just defragmented it.


The Cost of Fragmentation

To determine whether a defragmentation pass is worth it or even whether the purchase of a defragmenter is worthwhile, you need to know the cost of fragmentation. There is no flat answer to this. It is different for every system. A little-used system in a non-critical environment has a lower cost for performance losses than one in a round-the-clock, mission critical application.

To sort this out for your system, this book includes an appendix (Appendix B) devoted to a step-by-step calculation of the costs of fragmentation and justification of the cost of handling it.


Conclusion

The conclusion is, inescapably, that while you can do something to prevent fragmentation, the prevention is incomplete and temporary. And, while there are solutions built in to OpenVMS, these solutions are incomplete and tedious in the extreme. A defragmenter is the only solution that directly addresses and is specifically designed to solve the fragmentation problem. Accordingly, the next chapter is devoted to a detailed discussion of defragmentation by means of a defragmenter.

[PREVIOUS PAGE][NEXT PAGE][RETURN TO TOP][TABLE OF CONTENTS]