You can skip this chapter if you already know all about disks.
However, you need to understand disks well to really understand
fragmentation. The rest of this book depends heavily upon the
terms and concepts presented in this chapter. Rather than skipping
this chapter altogether, I recommend that the experienced System
Manager skim it or read it quickly, if only to ensure that we
agree on all the essential terminology.
There are many different kinds of disks. We can omit some of them
from discussion. Read-Only disks, for example, such as CD-ROM,
cannot become fragmented after manufacture and, if they are already
fragmented when created, there is nothing you can do about it.
They are unchangeable. RAM
disks and semiconductor disks are really memory,
not disks, though they appear as disks
to the computer system. While it is theoretically possible for
such disks to become fragmented, little additional overhead is
caused by their fragmentation, since there are no moving parts
to cause delays finding data in "distant" parts of the
disk. Floppy disks are usually too small to suffer from fragmentation
and they are not ordinarily used for on-line storage.
They are typically used once to store a small amount of data and
then stored in a box on a shelf somewhere. Such usage does not
suffer from the effects of fragmentation.
The disks with which we will concern ourselves are hard (not floppy) disks used for storing frequently-accessed information in a computer. In an OpenVMS system, the disk is an essential, central part of the computer itself. Without a hard disk, there could be no Virtual Memory System, which is the VAX operating system itself.
Throughout this book, when the word disk is used, the above
is the type of disk being referred to.
Now let's take a look at the basic parts of a disk and the terminology that describes them:
A disk looks like this, conceptually:
![]() |
Any disk has, as a critical part, a surface on which to
record data. That surface is usually magnetic, meaning
that it is capable of storing a small amount of magnetism. Perhaps
the most remarkable aspect of disk technology is the number of
distinct amounts of magnetism that can be stored on a single surface
of a disk. At the time of this writing, commercially available
disks selling for a few hundred dollars are capable of storing
hundreds of millions of separate, distinguishable bits of information
on each square inch of surface. These numbers have been multiplying
so fast for so long that I dare not even speculate on the future
of disk storage capacities.
The surface of a disk is shaped like a platter, similar to a phonograph record.
![]() |
Each single magnetic entity is used by the computer
as a binary symbol. Binary means "having two and
only two possible states" such as on or off, true
or false, and so on. Each such entity is called a bit,
which is short for binary digit. Binary digits
are represented in written communication as zeros and ones. The
use of numbers to represent bits, however, is only a convenience
and does not mean that bits have anything to do with mathematics.
They do not. Bits are in the realm of logic, which is a matter
of reasoning and not arithmetic calculation.
![]() |
When eight bits are considered together, they are referred to
as a byte. A single eight-bit byte is the amount of computer
storage typically used to store a single letter of the alphabet
or other symbol of human communication. The word animal could
thus be stored in six bytes of computer storage.
The byte is so common to computers that disks are referred to
in terms of how many millions of bytes can be stored on them.
Disks that can hold hundreds or even thousands of millions of
bytes are commonplace today.
![]() |
The surface of a disk is divided into sections. This sectioning
is not a physical marking on the surface, but rather it is just
an idea that the disk is so divided. These sections are called
sectors or blocks. The term sector is more
common to personal computers and VAX or Alpha AXP hardware, while
block is common OpenVMS terminology. In OpenVMS, a block
is a collection of 512 bytes. OpenVMS disks are formatted
into blocks of 512 bytes each. With a disk so formatted, it is
common to talk about blocks of disk capacity. A 456 megabyte
disk, for example, could also be said to have a capacity of about
890,000 blocks (890,000 x 512 = about 456 MB).
![]()
![]() |
With larger disk capacities, it is inefficient for the computer
system to deal with millions of individual blocks one at a time.
The operating system's map of the disk's blocks is too big to
be useful unless single bits in the map can represent more than
one disk block. Accordingly, disk blocks are grouped into clusters,
which are groups of blocks read and written as a unit. In
other words, a cluster is the minimum allocation
quantity for a disk. The cluster size, in terms of number
of blocks per cluster, can be varied by reinitializing the disk.
![]() |
The blocks and clusters of storage space are arranged in groups
referred to as tracks. A single track is one strip of disk
space beginning at one point on the surface and continuing around
in a circle ending at the same point. The tracks are concentric
rings, not a spiral like the grooves on a phonograph record. Each
surface has many tracks.
![]() |
A disk may consist of one or more platters, each of which may
be recorded on both sides. The platter spins like a phonograph
record on a turntable.
![]() |
The tracks at the same radius on each platter, taken together,
are referred to as a cylinder. If you visualized these
tracks without any other part of the disk, they would form the
shape of a hollow cylinder.
![]() |
To detect magnetic information on the recording surface, the disk
has one or more heads. A head is a tiny magnetic device
capable of reading or writing magnetic bits on the disk surface.
The platter spins near the head(s), so that a single track of
recorded information is continuously passing under the head, available
for reading or writing. The head never touches the surface. Rather,
it floats on a cushion of air so thin that a human hair or even
a particle of cigarette smoke cannot pass between the head and
the surface. As foreign particles that small would cause the disk
to fail, such disks are sealed in air-tight containers.
While some disks have had one head hovering over each track, it
is far more common to have movable heads capable of moving from
track to track as needed. The movement of a head from one track
to another is called a seek. This term will be important
when we are talking about disk speeds. The time it takes for a
head to seek is one of the most critical factors in determining
the speed of a disk.
![]() |
Disk heads are mounted on arms that hold the heads close
to the platter surface at precisely the right point to read or
write data. There may be one arm for each head, but on multiple-platter
disks a single arm may support two heads - one for the platter
above the arm and one for the platter below. Some disks mount
all the heads on a group of arms that move in unison. Imagine
your spread fingers moving between the fanned pages of a book
and you will get the idea of multiple disk arms moving together
in and out of disk platters.
![]() |
A disk platter is attached to a spindle around which it
rotates like a wheel on the axle of a car. The spindle is at the
exact center of the platter. The arm moves the head from the outer
edge of the platter toward the spindle at the center and back
out again. Though most disks have only one spindle, some complex
disks are made up of two or more single-spindle disks treated
as one large disk. These are called multi-spindle disks. However,
no platter ever has more than one spindle.
![]() |
Electronic circuitry is required to sense and record the magnetism
on the surface of the platters and to move the heads. This circuitry
is commonly referred to as the electronics of the disk.
The electronics communicate data between the physical disk and
the computer.
![]() |
The combination of one or more spindles, arms, heads, platters and electronics into a single physical device for storing and retrieving data is known as a disk drive. In this book, the term drive will be used often to refer to the disk drive. Viewed outside its cabinet, a disk drive is a sealed metal object that looks something like this:
![]() |
The electronics in the disk drive are connected to circuitry in
the computer by means of cables, which are no more than
wires with a certain type of connector on each end. Often, the
individual wires are color-coded for clarity.
![]() |
While the cables attach directly to the electronics of the disk
drive on one end, they do not really attach directly to the computer
on the other end. On the computer end, the cables attach to a
controller, which is sometimes referred to as an interface.
The controller, which is attached to the computer, decodes instructions
from the computer and issues instructions to the disk drive to
do what the computer has instructed. The controller also receives
data and status information from the disk drive, which it passes
on to the computer in a form the computer can understand. A single
controller may service more than one disk drive.
Disk controllers range in complexity from a very simple controller
that merely relays instructions and data, to an intelligent controller
that uses its information about the status of the disk to help
the computer process data faster. Two examples of intelligent
disk controller functions are seek ordering and data
caching.
By keeping track of the exact position of
the heads at all times, the controller can determine which one
of multiple requests from the computer can be serviced in the
shortest time. Then, instead of servicing the computer's requests
in the order received, the controller can service first the requests
for data nearest the heads and then the requests for data farther
away. This is called seek ordering, which simply means
putting the seeks in a better order.
For an over-simplified example, let's say the head is hovering
over track 7. The computer is waiting for data from track 2 and
from track 5. To go from track 7 to track 2 to service the first
request and then back to track 5 for the second would require
more time than it would to just stop off at track 5 on the way
to track 2. So, the intelligent controller reorders the two requests
and services the track 5 request first, then the track 2 request.
The result is faster access to disk data on average.
![]() |
Dramatic performance improvements can be gained by placing a significant
amount of memory inside the disk controller. This local memory
is called a cache and is used
to store data recently retrieved from the disk by the computer.
Then, if the computer should happen to request exactly the same
data again, the controller can service the request from the local
cache at memory speed (microseconds)
instead of at disk speed (milliseconds). Of
course, these dramatic gains are only available when the same
disk block is retrieved more than once and when that block has
been saved by the controller in its local cache. The amount of
performance gain from such a system is wildly variable, being
largely application-dependent.
![]() |
Just as there are many different types of disks, there are also
many different types of disk controllers. Different computer instructions
are needed to deal with each different type of controller. The
set of instructions used to manipulate a controller is a software
component known as a driver. The driver resides at the
lowest levels of a computer's operating system, where it can interact
directly with the hardware. The driver translates the instruction
codes of the disk controller into standardized instructions recognizable
and usable by the more generalized parts of the computer system,
and vice versa. Conversely, the driver enables an application
program to issue a generic "get data" instruction, for
example, to the disk without having to concern itself with the
peculiarities of that particular disk (the number of tracks, platters,
and so on). A single disk driver may service more than one disk
controller.
A driver has associated with it a queue for
holding input/output (I/O) requests. This
queue is merely a data structure enabling
the computer to store an I/O request while it carries on with
its work without having to wait for the I/O processing to complete.
Entries can be added to the queue or deleted from it and, under
certain conditions, the entries in the queue can be rearranged.
The OpenVMS operating system contains a mechanism for queuing (inserting) an I/O request to the queue of a driver. This mechanism is called the $QIO system service. The dollar sign indicates that this abbreviation is Digital's. QIO stands for "Queue Input Output," where queue is used as a verb. An application program or a higher level of the operating system uses the $QIO system service to cause I/O to occur.
An application is a computer program which controls the
computer system to perform some useful work for the user.
OpenVMS deals with disk blocks from two
different points of view.
The actual arrangement of information on the surface of a disk
platter is referred to as a physical block. The Physical
Block Number (PBN) is an address used for identifying a particular
block on the surface of the disk.
![]() |
When the blocks on a disk are considered from a programming point
of view, they are viewed as logical blocks. The address
of a logical block on a disk is its Logical Block Number (LBN).
LBN 0 (zero) is the first LBN on a disk. Logical blocks correspond
one-for-one to physical blocks, but the logical block number
might not correspond directly to the same physical block numbers.
![]() |
OpenVMS drivers, controllers and some disk electronics are capable
of detecting a physical block that is unreliable for storage of
data and replacing it with a spare block from the same disk. When
this occurs, the logical block number does not change, even though
the physical block number is now different. In this book, when
discussing blocks or block numbers, we will be referring to logical
blocks unless otherwise specified.
In OpenVMS, the word volume refers to a structured (formatted) disk. When considering a disk as a logical (conceptual) unit of storage, rather than a physical unit of storage, it is referred to as a volume. OpenVMS has the capability of treating one or more physical disk drives as one disk. This capability is implemented by the use of software and does not involve any additional electronics or cabling. When two or more disks are so combined, the combination is referred to as a volume set, which is described more fully later.
For each disk, OpenVMS maintains a map indicating which clusters
of logical blocks are in use and which are free. Each cluster
consists of one or more logical blocks. Each bit in the map represents
one cluster. Therefore, a cluster is the minimum amount of disk
space that can be allocated to anything. The map
is called the Storage Bitmap. When OpenVMS allocates space
on the disk for a file, it can scan the storage bitmap to find
available clusters.
![]() |
When OpenVMS allocates space on a disk for a file, it is not always
possible to allocate all the needed space contiguously on the
disk. Sometimes it is necessary to allocate part of a file in
one place and the remainder in another. Files have been known
to be allocated in dozens and even hundreds of pieces scattered
around the disk. Each piece of a file so allocated is called an
extent. The concept of an extent is inherent in a study
of file fragmentation, as the allocation of multiple extents for
a single file is file fragmentation. A contiguous file
has only one extent.
![]() |
The extent cache is a portion of the system's memory that
is set aside solely for the use of the OpenVMS
file allocation mechanism. The extent cache stores the addresses
of deallocated clusters, making it fast for OpenVMS to find free
disk space by reusing these same clusters. This saves the overhead
of scanning the Storage Bitmap of a disk to find free space. The
extent cache, however, cannot store the logical block numbers
of all deleted clusters. Ordinarily, there is room for only 64
LBNs to be stored, though this number can be changed by the System
Manager.
![]() |
Information stored on a disk is ordinarily stored in a file.
In fact, for any OpenVMS disk using the ODS-2 structure,
no information can be retrieved from a disk unless it is contained
in a file. A file is "a collection of
related records treated as a unit and usually referenced by a
. . . name." While a record is a unit of data within
a file, an extent (see above) is a unit of the file as a container
for data.
![]() |
![]() |
"A block in the index file that describes a file on a . . . disk. Every file residing on the disk has at least one header, which provides the location of the file's extents." While the header actually contains much more information than this, this is all we need to know about it for the purposes of this book.
Within the file header, the information critical to our discussion
of file fragmentation is the section headed Retrieval Pointers.
These pointers indicate where the file's data is located on the
disk. Each pointer consists of the LBN of the first data block
and a count of how many successive contiguous blocks contain data
for that file. For example, this file has a single retrieval pointer:
Map area Retrieval pointers Count: 12 LBN: 27990
The first data block of this file is at Logical Block Number 27990.
The entire file is contained in that block plus the 11 blocks
following it in LBN sequence. This 12-block file is contiguous
(not fragmented). It has only one extent.
In the following example, the file is broken into four fragments.
The first consists of 6 blocks starting at LBN 5; the second is
3 blocks at LBN 297; the third, 3 blocks at LBN 200460; and the
fourth, 4104 blocks at LBN 200760. This file is fragmented. It
has four extents.
Map area Retrieval pointers Count: 6 LBN: 5 Count: 3 LBN: 297 Count: 3 LBN: 200460 Count: 4104 LBN: 200760
A directory is "a file that briefly catalogs a set of files stored on a disk. . ." From the user's point of view, a directory is a catalog of the names of files that are grouped in a particular way.
![]() |
As mentioned earlier, the movement of a disk head from one track to another is called a seek. The time it takes for a head to seek is the most critical factor in determining the speed of a disk. This is known as the disk's seek time. It consists of three parts: the time to start the head in motion and get it up to speed, the time to move the head from one track to another, and the time it takes to stop the head. Additional overhead is required to identify the correct track and, if necessary, reposition the head.
![]() |
As a disk platter spins around the spindle,
the blocks in a single track of recorded data are brought near
a disk head. The head can only read or write a block when that
block is immediately under the head. Accordingly,
the time to access a block of data on the disk varies. It is much
quicker to access a block that is currently or about to be under
the head than it is to access a block that has recently passed
under the head and is moving away. The block that has just passed
under the head has to wait nearly a full rotation of the disk
for another access opportunity. This delay is known as rotational
latency.
A common disk rotational speed is 3600 RPM, meaning that the disk spins completely around 3600 times each minute. At this speed, each revolution of the disk takes 16.67 milliseconds. Naturally, rotational latency on this disk varies from zero to 16.67 milliseconds, and the average rotational latency is half the maximum, or 8.33 milliseconds.
![]() |
To compute the time it takes to access any block on a disk, it is necessary to combine the seek time and the rotational latency. Clearly, if the head is over the proper track and the desired block is immediately under the head, the access time is approximately zero. If the head is over the proper track, the access time is the same as the rotational latency. If the head has to move from one track to another, however, you must add the rotational latency to the seek time to compute the average access time. The average access time for modern-day disks falls roughly in the range of 8 to 25 milliseconds.
![]() |
Several methods of combining disks into sets are in current use.
One reason for combining disks into sets is to improve performance
by reducing the average access time when figured for the set as
a whole.
A volume set is a group of two or more disks combined so as to be treated by OpenVMS as a single disk equal in capacity to the total capacities of all the disks in the set. In a volume set, a file can span two or more disks in the set. The primary benefit of a volume set over separate disks is size - the volume set can accommodate a file larger than any individual disk in the set. The volume set also reduces average access time by spreading the disk accesses over two (or more) disks instead of one. Theoretically, half the disk accesses will occur on one disk in a two-volume set, while the other half occur on the other disk. If the theory were borne out in practice, this would allow twice as many disk operations in the same time, but actual results are not greatly improved.
![]() |
A shadow set is a group of two (or more) identical disks
combined so as to be treated by OpenVMS as
a single disk equal in capacity to only one of the disks in the
group. Each time data is written to the shadow set, the same data
is written to all the disks in the set. That way, the data can
be retrieved even if all but one of the disks fails. The primary
benefit of a shadow set over separate disks is safety. The likelihood
of losing the data on two or more disks at the same time is much
more remote than that of losing the data on just one disk.
A shadow set also alters performance by increasing the time needed
to write data (as it has to be written to all disks in the set)
and by reducing the time it takes to read data. The two or more
disks in the set are unlikely to have exactly the same block under
their heads at exactly the same time, so OpenVMS will direct the
read to the one which can retrieve the data faster, either because
the data is closer to its head or because there is less work outstanding
for that drive to do. Therefore, on average, the read time for
the set is faster than that for either disk.
![]() |
Two or more disks can also be combined into a
stripeset. A stripeset is similar to a volume set in
that the full capacity of all the disks in the set is available
for data storage, and the whole set is treated
by OpenVMS as a single large disk. Also, the virtual
disk (the disk which is treated by OpenVMS as a single large disk)
is divided into "chunks" rather
than clusters.
The main difference is that, while each file extent must reside
entirely on a single disk in a volume set, in a stripeset file
extents are deliberately spread across multiple disks. One chunk
(which could contain one or more extents) resides on one disk,
the next chunk in sequence resides on the next disk, the next
chunk on the next disk, and so on, starting over at the first
disk when the last disk in the set is reached.
The primary benefit of a stripeset is application performance. Data read from a single file will come from several disks simultaneously, greatly reducing the application wait time. For example, if each disk has a 24 millisecond (ms) access time, the average time to collect a chunk from two disks simultaneously is the same - 24 ms. But that is 24 ms for two chunks instead of one.
![]() |
Perhaps the most interesting question that comes up with respect
to fragmentation is, "Why is the OpenVMS operating system
designed so badly as to suffer from fragmentation?"
The answer to this question is that age-old saw, "It's a
feature, not a bug!"
It's true. Originally, the VAX/VMS operating system, as it was
called then, was deliberately designed to allow fragmentation
and to deal with it efficiently. This design was actually the
solution to an older and more serious problem.
You see, when computers first appeared on the scene, they had
no disks at all. Obviously, disk fragmentation was not a problem.
No one had even had the idea yet, so the operating systems of
that age had no mechanism to deal with fragmentation at all. If
the computer even had an operating system.
This ancient era extended into the late 1960's, before the VAX
and before even the PDP-11, on which the VAX
was based. In the Digital world, these were the days of the PDP-8.
Then, disks happened. The first disks were small - unbelievably
small by today's standards. They were measured in thousands of
bytes rather than millions, never mind the billion-byte disks
now available. Nevertheless, this was a fabulous advance in the
days when memory sold for a dollar a byte. (At this writing,
memory sells for $40 a megabyte retail - 25,000 times less!)
Later, the early PDP-11 operating system, called RT-11 (Real
Time-11), was capable of storing data on disks
in files, and the files were organized in a formal file structure.
This file structure, however, required that all files be
contiguous. That is, no file could be split into two or more pieces.
It was not a question of performance; the system simply had no
capability to create or access a file split into pieces. A file
had a single location on a disk and that was that.
This requirement for contiguous files meant that a newly-created file had to fit within a single gap on the disk or not at all. It was not possible to allocate a part of the file to one gap and the rest to another, as can be done with today's OpenVMS. This was true even when there were no individual gaps large enough to accommodate the file, in spite of the fact that the total free space on the disk far exceeded the size of the new file. There had to be a large enough contiguous free space, period.
Naturally, this problem (of not enough contiguous free space to
create a file) occurred every time a disk filled up, and small
disks fill up very fast. With frequent file deletions,
it was not unusual to have a disk reach the point where no more
files could be created even though the disk was little over half
full.
The solution to this problem was the SQUEEZE command. SQUEEZE compacted the disk, rearranging the files so they were all together near the beginning (LBN 0) of the disk, leaving all the free space in one large, contiguous area at the end. After a SQUEEZE, any file could be created, provided it would fit in the total free space remaining.
![]() |
Nothing else could be done on that disk while the SQUEEZE was in progress, but this was not a big drawback, as RT-11 was a single-user system anyway. The only person inconvenienced was the one who wanted that new file created, so the benefit always outweighed the inconvenience.
Then a wondrous new operating system came along, one that allowed
multiple simultaneous users of the same PDP-11 computer - RSX-11.
Now the inconvenience of a SQUEEZE would certainly outweigh
the benefit, as all the users would have to stop working,
not just the one who wanted to create the new file. Clearly, SQUEEZE
was no longer a viable solution.
The designers of RSX-11 cleverly created a file structure that
included the revolutionary capability to locate parts of a file
in different places on the disk. Each file had a header that gave
the location and size of each piece of the file, so the file could
be in pieces scattered around the disk. Now a file could be created
anytime there was sufficient free space anywhere on the
disk; the space did not have to be contiguous.
Nor was there any drawback to this mechanism whatsoever. It really
was "a feature, not a bug." Performance losses due to
fragmentation, even when taken to extremes, caused very little
difficulty for anyone. You must realize that at this time, in
the early 1970's, disks were very small. Let's take a look at
a real example:
The RK05 disk, which was in common use at the time, held 2½
megabytes (5,000 blocks). Suppose the disk was totally and utterly
fragmented, that is, no two consecutive data blocks were contiguous.
Every single disk access likely required moving the head and waiting
for rotational latency. Even so, the whole disk could be read
in 250 seconds (50 ms times 5,000 blocks). That's a little over
four minutes - worst case. The same action on today's 700MB disk,
even with a 16 ms access time, takes over 6 hours. During
the same period of time, disk capacities have increased to over
500 times that of the RK05, and CPU speeds have increased to over
400 times that of the original PDP-11. Even though disk speeds
have increased by a factor of three, they have not kept pace with
the rest of computer technology. This makes the speed of the disk
a major bottleneck in the computer system. This point is critical
to an understanding of the fragmentation problem.
Clearly, today's larger disks brought with them more than higher capacities and speeds. They brought a susceptibility to a new kind of computer disease - fragmentation.
Here is a table showing the time required to access every block
on a disk:
AVERAGE ACCESS TIME IN MILLISECONDS 70.00 50.00 40.00 33.00 25.00 16.00 10.00 8.00 ---------------------------------------------------------------- 0.032 | 4.48 3.20 2.56 2.11 1.60 1.02 0.64 0.51 S 0.5 | 70.00 50.00 40.00 33.00 25.00 16.00 10.00 8.00 I 1 | 2.33 100.00 80.00 66.00 50.00 32.00 20.00 16.00 Z 2 | 4.67 3.33 2.67 2.20 100.00 64.00 40.00 32.00 E 5 | 11.67 8.33 6.67 5.50 4.17 2.67 100.00 80.00 10 | 23.33 16.67 13.33 11.00 8.33 5.33 3.33 2.67 I 20 | 46.67 33.33 26.67 22.00 16.67 10.67 6.67 5.33 N 40 | 93.33 66.67 53.33 44.00 33.33 21.33 13.33 10.67 100 | 3.89 2.78 2.22 110.00 83.33 53.33 33.33 26.67 M 206 | 8.01 5.72 4.58 3.78 2.86 109.87 68.67 54.93 B 456 | 17.73 12.67 10.13 8.36 6.33 4.05 2.53 2.03 700 | 27.22 19.44 15.56 12.83 9.72 6.22 3.89 3.11 1200 | 46.67 33.33 26.67 22.00 16.67 10.67 6.67 5.33 Legend: seconds minutes hours |
So the ability to deal with fragmented files, which was carried
over from the RSX-11 operating system to the OpenVMS operating
system, was a solution to an earlier problem that failed to anticipate
the enormous capacities to which disks would grow. There is no
end to this growth in sight. Deliberate fragmentation is no longer
only a feature; it is now a problem, too.
In this chapter, the inner workings of a disk have been explained, with care taken to define the terms needed to truly understand fragmentation. To be sure, there is a lot more to know about disks, but our concern here is not to learn disk design, construction or maintenance. Our purpose is to understand enough about disks so that fragmentation and its cure make sense. To this end, the first chapter has also devoted time to the basic concepts of files and file structure. With these concepts and terms well understood, we are ready to tackle the real problem - the fragmentation disease.