Operating System
White Paper
Abstract
The Microsoft Windows 2000 operating system includes key enhancements to the Windows NT storage architecture and features. These improvements address enterprise concerns about the escalating storage costs in large environments and the scalability requirements of mission-critical applications, while providing support for third party storage management solutions. The Windows 2000 Server operating system provides an enhanced storage subsystem architecture, an improved NTFS file system, and an extensive list of new storage services and tools.
This white paper provides an overview of current trends in the storage industry and the business challenges related to storage management. It then describes the storage management features in the Windows 2000 Server operating system and explains how they address the digital storage requirements of the enterprise.
© 1999 Microsoft Corporation. All rights reserved.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Microsoft, Active Directory, IntelliMirror, MS-DOS, Win32, Windows, and Windows NT are registered trademarks or trademarks of Microsoft Corporation.
Other product or company names mentioned herein may be the trademarks of their respective owners.
Microsoft Corporation .
0999
Introduction 1
Storage: A Business and Market Overview 2
Windows 2000 Storage 4
Windows 2000 Storage Subsystem Enhancements 5
Infrastructure Component Descriptions 5
Storage Subsystem Enhancement Details 7
Volume Management 7
File System Support 9
Reparse Points and File System Filter Drivers 10
Developer Information 12
New Storage Features 13
Storage Feature Descriptions 13
Storage Feature Details 13
Volume Mount Points 13
File Compression 15
Encrypting File System 15
NTFS Change Journal 15
Disk Quotas 16
NTFS Sparse File Support 17
Chkdsk Utility 18
Storage Management tools in Windows 2000 19
Storage Tools and Service Descriptions 19
Storage Application Details 20
Remote Storage 20
Removable Storage 21
Media Pools 22
Indexing Service 23
Backup Utility 24
Distributed File System 24
File Replication Service 24
Offline Files 25
Conclusion 26
For More Information 26
Introduction |
This white paper begins with a brief overview of trends in the storage industry and the business challenges related to storage management. Then, it describes the storage management features in the Microsoft Windows 2000 Server operating system, and explains how they address business computing storage needs.
The Windows NT Server 4.0 operating system addressed the storage needs of enterprise file/print and applications servers by supporting multiple file systems (including NTFS), and including a fault-tolerant disk manager and management tools, such as NTBackup. This open architecture also integrated with a wide variety of third-party management tools that provided additional storage-related functionality.
While these features are essential to enterprise computing, there are several factors that necessitate enhancements to this storage architecture and feature set. These factors include the cost of supporting growing storage requirements in large environments, the scalability requirements of mission-critical applications, and the continued need to support third-party management solutions in the storage market. To address these needs, the Windows 2000 Server operating system provides an enhanced storage subsystem architecture, an improved NTFS file system, and an extensive list of new storage services and applications.
Storage: A Business and Market Overview |
Information technology (IT) systems are experiencing rapid growth in numbers of users supported and system complexity. The IT community must handle the requirements of mission-critical applications, capacity growth rates that exceed 50 percent annually, excessive downtime, and increasing business dependence on computer systems. To add a further layer of complexity, systems management is now an issue-how can a system administrator or network manager monitor and control all of the computing resources? Without centralized management tools, he or she cannot effectively manage this data.
The quantity of data being stored on distributed systems has increased exponentially over the last decade. From 1996 to 1998, Windows NT-based server data has grown from 11 to 39 petabytes worldwide. This data explosion shows no signs of slowing down; typical server capacities that exceed 100 gigabytes are not far in the future. By 2002, data stored on the Windows Server operating system is projected to exceed 260 petabytes worldwide. Much of the data stored on Windows-based servers is business-critical. In a recent Strategic Research study of over 200 sites, 31 percent of the servers were running Windows NT version 4.0 or earlier to host mission-critical databases.
The migration of mission-critical systems to distributed environments; increases in the number of Web-based applications, and general growth in the enterprise end-user community all contribute to this rapid growth. As the number of client/server systems increases in an organization, so does the number of storage subsystems. Unfortunately, tools for remote management and management standards have only recently become mainstream in the distributed systems marketplace.
The type of data being stored on client/server systems is changing as well. Growth in the Internet/intranet space and 32-bit/64-bit architectures gives impetus to the changes in data seen in the distributed network. First, the increase in popularity in Web-based applications has resulted in an increase of multimedia data types (including large video streams) that have significant storage requirements. Moreover, the ease of use and low cost of ownership of Web-based applications is sparking a trend toward publishing data that is content-oriented, rather than application-oriented, as was previously the case. Content style data ranges from static (for example, publishing a book online) to dynamic (such as publishing a daily newspaper). Finally, 32-bit/64-bit platform support results in the development of more powerful, graphically intense applications, which in turn create large volumes of data that present significant storage management challenges.
As the popularity of the client/server infrastructure increases, the cost and complexity of managing distributed storage increases as well. The LAN environment has seen a 60 percent yearly growth rate of storage management expenditures since 1993. Today, corporations are spending more than $120 billion to store and manage data in distributed systems.
Storage management and storage recovery is a concern for many IT managers. While most recent IT magazines and articles address concerns with desktop application management and software distribution, a large portion of enterprise computing budgets is spent on storage issues. As much as 25 percent of a typical computing budget is dedicated to storage, storage management, and other storage-related activity.
Storage limitations can also constrain other areas in enterprise computing. For example, the ultimate scalability of application implementations is often limited by the effectiveness of the storage and storage-recovery mechanisms in place. Without centralized control and management of distributed systems, applications experience excessive downtime, much of which is caused by storage-related failures. Server outages and inconsistent access to data directly affect productivity. Through interviews with hundreds of IT managers, Strategic Research has determined that centralized sites (those using central control and management tools to manage distributed and host storage) typically experience less than half of the downtime of sites without centralized control and management. A representative centralized site experiences 26 hours of downtime annually, compared to a decentralized site that has 54 annual hours of downtime. Conservative estimates place downtime costs at $80,000 per hour, which means that centralized sites save over $2.2 million dollars in downtime-related costs each year.
The cost of managing a megabyte of storage on distributed systems has decreased from $8 per megabyte in 1994 to $3 per megabyte in 1998, largely as a result of storage management tools that support centralized storage management in distributed systems.
Innovations are emerging in the storage industry to meet the growing demands of client/server computing. These innovations include new storage devices, media types, data transfer protocols, and management standards. Storage concepts such as hierarchical storage management (HSM), bulk media changers/libraries, data vaulting managers, and fault tolerant storage subsystems are being introduced by a variety of vendors. Storage requirements will continue to increase because the trends described above are just the beginning of a new wave in the need for increased storage and storage management. As enterprise storage becomes more complex, it becomes essential for IT managers to be able to effectively manage it in order to be successful in meeting their short-term and long-term computing goals
Windows 2000 Storage |
In response to the expanding needs for enterprise storage in distributed computing environments, the Windows 2000 Server operating system includes improved storage management technologies. The remainder of this paper provides a high-level technical overview of these enhancements to the Windows 2000 operating system, and describes their business benefits.
The paper is divided into the following three sections that discuss the following topics:
Windows 2000 storage subsystem enhancements.
Windows 2000 storage-related features.
Windows 2000 storage applications.
Windows 2000 Storage Subsystem Enhancements |
Microsoft has enhanced the storage subsystem in the Windows 2000 Server operating system for several reasons.
First, many of the enhancements provide independent software vendors (ISVs) with the infrastructure they need to write enterprise-class storage applications and features. Microsoft has a long history of providing this type of functionality, using well-documented APIs, system services and features, and a variety of development tool kits and frameworks. These subsystem components allow ISVs to invest their time in developing business solutions, rather than requiring them to invent items that should be core operating system services. As a result, customers who invest in Windows-based platforms benefit from solutions that are built on a consistent and well-documented set of system services and interfaces.
Second, changes to the overall storage subsystem architecture provide the Windows 2000 Server operating system with a more manageable storage subsystem. This helps customers to deal with storage demands in their environment today and in the future.
A list of the core storage components and improvements delivered in the Windows 2000 Server operating system is provided in Table 1.
Table 1. Core Storage Components in the Windows 2000 Server Operating System
Infrastructure Component |
Features |
Volume Management |
Better manageability and recoverability: Disk RAID objects (RAID 5, 0 and 1) can be created, broken and rebuilt online. File systems can be extended across new partitions while the file system is in use. Microsoft Management Console (MMC) snap-in allows both local and remote management. Backwards compatibility: Both the new volume manager and the legacy fault-tolerant disk manager may manage disks. |
The NTFS file system enhancements. |
Sophisticated file encryption system. Change Journal allows applications such as indexing or backup tools to track changes to the file system. Distributed link tracking allows a shortcut to remain correct, even though the target file moves to a new drive or computer system. Reparse points allow file system functionality to be enhanced by ISV-provided file system filter drivers. |
Table 1, continued |
|
UDF File system added |
Enables DVD-Video and other UDF-formatted disks to be read by Windows 2000-based systems. |
FAT32 File system added |
Windows 95/98 compatibility. |
Plug and Play |
Disk volumes can dynamically appear and disappear. |
Storage connectivity choices |
Fiber channel support. IEEE 1394 support. |
Distributed File System |
The distributed file system (Dfs) allows distributed file systems to be united into a single name space. Its usage includes higher data availability, load balancing, name transparency, and flexible volume administration. |
Offline Files |
This caching mechanism allows users to see and modify files on network shares, even when disconnected from the network. |
Developer information, including the Platform SDK and Microsoft Developer Network, fully documents the new features of the Windows 2000 operating system. An Installable File System (IFS) Kit is available that provides documentation and sample source code to enable the development of installable file system filter drivers. See the Developing for Windows Operating Systems site on the Microsoft Web site.
This section describes in greater detail how existing storage subsystems have been enhanced in the Windows 2000 Server operating system.
Volume management in an operating system provides logical volumes of disk sectors that are used by a file system (or sometimes a database) to store user data. A logical volume may be a piece of a physical volume (disk) or may be composed of pieces of many physical volumes. The Volume Manager in the Windows 2000 operating system performs this abstraction, and is also able to provide functionality such as RAID-5, disk mirroring and disk striping.
In the Windows 2000 Server operating system, there are significant enhancements in the architecture of volume management. The goal of this updated architecture is to improve the manageability and recoverability of volumes in a Windows 2000 environment. The components of the volume management architecture are illustrated in Figure 1.
Figure 1. Volume management components
The FT Disk driver manages all partitions and fault-tolerant volumes originally created with the MS-DOS Windows, or Windows NT operating systems.
Disks managed by FT Disk are called basic disks. Basic disks may contain simple partitions, extended partitions, or fault-tolerant sets. All partitions on a basic disk are hard partitions-the underlying disk structure is statically allocated into contiguous extents.
Simple partitions and extended partitions may be created on the hard drive of a computer running Windows 2000, but once created their size is fixed. A basic disk may be converted to a dynamic disk at any time.
Logical Disk Manager (LDM) is designed to reduce the total cost of ownership (TCO) and reduce system downtime by being highly flexible and available. The features and benefits of LDM are described in Table 2.
Table 2. Logical Disk Manager
LDM Features |
LDM Benefits |
Better manageability and recoverability: Disk RAID objects (RAID 5, 0 and 1) can be created, broken and rebuilt online. |
No downtime required to reconfigure the level of performance and redundancy you require for a file system No additional hardware required to achieve online RAID management |
File systems can be extended over new partitions while the filesystem is in use |
No downtime required to extend the size of a disk volume |
Microsoft Management Console (MMC) snap-in allows both local and remote management |
The MMC architecture allows this function to be delegated to appropriate administration staff |
Architecture is designed to support third-party volume managers and tools for Windows-based systems. These include the ability to disperse hot spots or active data areas on arrays, dynamically extend arrays, and support hot-plug storage. |
ISVs are able to cleanly and elegantly extend the volume management capabilities of the Windows 2000 operating system. |
LDM-managed disks are self describing: changes to SCSI Ids, LUNS and host adapter ordering will not compromise the system configuration |
Ease of storage system reconfiguration |
Disks managed by LDM are called dynamic disks. Dynamic disks may contain simple volumes, concatenated volume sets, stripe sets, mirror sets, or RAID-5 (parity stripe) sets. Simple volumes may be extended dynamically at any time without reboot. A simple volume may be converted to a mirror set at any time or a mirror removed from a mirror set again without requiring system reboot.
LDM manages dynamic disks as a collection or group. Each disk in the group contains information on all other disks in the group in small replicated transactional databases. (Small is considered to be one cylinder, typically between 1 and 8 MB in size.) This allows LDM to control the recovery of mirror or parity stripe sets. It also means that volume configuration information is no longer contained in the registry.
LDM creates volumes on dynamic disks using soft partitions. The disk contains only one hard partition or extent. The LDM database contains all actual volume configuration. When basic disks are converted to dynamic disks, the underlying hard partitions are conserved and these volumes cannot be extended. For disks other than system or boot disks, the recommended way to convert from basic disks to dynamic disks is to first back up or copy all basic disk data to another dynamic disk; delete all basic partitions on the basic disk; and then convert the basic disk to dynamic.
Dynamic disks may be transported between systems. The disks are self-identifying, and the receiving system can determine all volume configuration information. Importing disks from one system to another causes the databases to be merged transactionally with no loss of data.
The Windows 2000 Server operating system includes a new MMC snap-in, the Disk Management snap-in. Basic and dynamic disks can be managed remotely. Moreover, volume configuration no longer requires that the server be rebooted. Volume management is integrated with Plug and Play to allow hot plugging and hot sparing of disks. This is invaluable in terms of the centralized management of storage in a distributed environment.
The Windows 2000 operating system continues to support all previous file systems as well as introduces new supported file systems. The result is that several options and conversion paths are available to address a variety of customer needs related to storage file systems. These are explained next.
FAT16 or FAT is a file system that has been a part of Microsoft operating systems since the MS-DOS operating system. To maintain an upgrade path from previous operating systems, the Windows 2000 operating system continues to support FAT. The maximum volume size for a FAT16 partition in a computer running Windows 2000 is 4 gigabytes; this is unchanged from the Windows NT operating system.
Windows 2000 introduces support for the FAT32 file system. This support offers the same format and features as those available in the Windows 95, OSR2, and Windows 98 operating systems. FAT32 in computers running Windows 95 and 98 supports volumes up to 127.53 GB and uses smaller clusters. The reduced cluster size in FAT32 results in a 20 to 30 percent increase in disk space efficiency compared to FAT16 volumes.
Note that the Windows 2000 FAT32 implementation will not create volumes larger than 32-gigabytes. FAT32 volumes larger than 32GB (created by Windows 98 for example) can still be mounted and used however. This limitation on the size of new FAT32 volumes in Windows 2000 exists because the NTFS is available on this platform, and is far more appropriate for volumes of this size.
Compact Disk File System (CDFS) support allows a
computer running Windows 2000 to read data from CD-ROM devices. The
Microsoft implementation of CDFS support adheres to the ISO 9660 specification with
additional support for
Universal Disk Format (UDF) is a file system defined by the Optical Storage Technology Association (OSTA). UDF is compliant with the ISO-13346 standard and is the successor to the CD-ROM file system (CDFS or ISO-9660).
UDF is targeted for removable disk media like DVD, CD, and Magneto-Optical (MO). Since UDF is based on open standards, it is intended to facilitate data interchange between operating systems, and between consumer devices. The standard supports a number of advanced features, including:
Long and Unicode file names, access control lists (ACLs), and streams.
Read-write (not just mastering).
Sparse files.
Support for a wide range of media types including DVD, WORM, and CD-ROM.
Windows 98 can read UDF version 1.02 disks.
Windows 2000 can read UDF version 1.02 and version 1.50.
The Windows operating system will support writeability in future versions of the operating system.
NTFS was introduced in the Windows NT operating system. Its goals were to provide file system features that were essential to enterprise computing. These features include support for file system security, transacted operations for recoverability, support for large volumes, support for long Unicode naming, and increased performance.
The on-disk format for NTFS has been enhanced in the Windows 2000 operating system to enable new functionality. The upgrade to the new on-disk format occurs when a computer running Windows 2000 mounts an existing NTFS volume. The upgrade is made quickly and silently. The conversion time is independent of volume size. Note that FAT16 volumes can be converted to NTFS format at any time.
The upgraded NTFS on-disk format can be natively recognized by versions of Windows NT 4.0 with Service Pack 4 or later. In versions of the Windows NT operating system prior to NT 4.0, SP4 will report these upgraded NTFS volumes as unknown. Configurations affected by this scenario include:
Volumes on removable media.
Volumes used with multiboot configurations.
Volumes shared within clustered configurations.
Clients that access NTFS volumes on remote systems using network protocols are not affected.
Reparse points are a specialized tag that may be applied to an NTFS file or directory.
Reparse points are used to trigger extended functionality in the I/O subsystem. This extended functionality is implemented in an installable file system filter driver. Each file system filter driver has a corresponding unique reparse point tag value associated with it.
The key benefit of the reparse point architecture is that the filter driver is effectively dormant until a specific file or directory that it enhances is opened. This allows functionality to be extended without adversely affecting performance for existing file and directory types.
Reparse points and installable file system filter drivers are introduced in the Windows 2000 storage subsystem so that independent software vendors (ISVs) have a consistent mechanism for extending storage functionality. This prevents ISVs from having to write proprietary system functionality to provide value to customers. Several features included in the Windows 2000 operating system are based on reparse points and installable file system filter drivers. These features include the following:
NTFS mount points.
Remote Storage.
Single Instance Store (part of Remote Installation Services.)
To differentiate reparse points, Microsoft assigns reparse tags to ISVs. When a file system object with a reparse point attribute is encountered during pathname resolution, it is passed back up the driver stack for an I/O reparse. The file system filter driver handles the I/O reparse, which includes identifying the ISV reparse tag. Vendor-specific file system filter drivers are responsible for executing specific I/O functionality.
The following steps and the diagram in Figure 2 illustrate how a reparse point works. The example is based on a Windows 2000 feature called NTFS volume mount point. NTFS volume mount points are based on reparse points.
A user double-clicks in Windows Explorer to open a directory called Products on an NTFS volume. The directory has an NTFS volume mount point reparse point associated with it.
The call goes from user mode to kernel mode where it reaches the file system object and encounters the reparse point attribute containing the Microsoft-specific reparse tag.
Each installable file system filter driver in the Windows 2000 I/O stack examines the tag associated with the reparse point. If there is a match, the associated file system filter driver intercepts the call. (File system filters examine both inbound and outbound calls. In this example, there is no functionality associated with the inbound call, so it is not referenced in the diagram.).
The NTFS volume mount point filter driver intercepts the call and executes the enhanced functionality associated with the reparse point. In the case of an NTFS volume mount point, the driver mounts the D drive on this system. (Drives do not require letters in Windows 2000. See the "Volume Mount Points" section in this paper.)
The file system driver returns the call to the calling application. Because this example involves an NTFS volume mount point, the file system driver mounts another name space and returns a handle to the calling function.
Figure 2. I/O reparse process flow
Microsoft provides complete documentation for its operating services so that third- party developers can create value-added solutions for customers who use the Windows 2000 operating system. A few resources are the Microsoft Platform SDK, the Microsoft Windows Installable File System Kit, the Microsoft Developer Network, and other storage-related white papers and articles available from the Windows 2000 Server Web site.
New Storage Features |
Microsoft introduces several new storage-related features in the Windows 2000 operating system. These features provide more flexibility, enhanced security, improved administrative control, and more efficient usage of network resources. Implementing these features correctly can help improve security and reduce the management costs associated with storage.
The following is a list of the storage-related features included in the Windows 2000 operating system.
Volume mount points-These are new system objects in the Windows 2000 operating system internal name space. These represent storage volumes in a persistent manner. The placement of a volume mount point on an empty NTFS directory allows an administrator to graft new volumes into the name space without requiring additional drive letters.
Storage compression-Refers to a standard feature for NTFS and continues to be supported in the Windows 2000 Server operating system.
Encrypting File System (EFS) This integrated service allows data to be stored encrypted on NTFS volumes. The Win32 APIs and utilities have been enhanced to handle encrypted files.
Change Journal-This feature tracks changes made to files on NTFS volumes. ISV system-level developers can use the change journal to provide enhanced functionality in their applications. Applications that can benefit from the change journal include file system indexing engines, content replication engines, and storage archiving and migration applications. Although fully documented, the change journal is a complex mechanism in Windows 2000 and is not intended for use by corporate developers.
Disk quotas-These are a new NTFS feature that allow an administrator to monitor and control how much disk space a user can occupy on a file system. Quotas can be used to either track or enforce limits on a per-user basis for a specific NTFS volume.
Sparse file Support for these very large files is introduced in NTFS version 5, the version included in the Windows 2000 operating system. Sparse data has large, consecutive areas of 0 bits. A user or administrator can mark these files as sparse and only allocate space for the meaningful data. The file system stores only range information that describes where the sparse data would be if it were allocated. Any time one of these ranges is accessed, the file system returns zeroes by default. This improves storage efficiency for sparse data.
The following sections describe each of the new Windows 2000 storage features in detail.
Unlike earlier versions of the Windows operating system, the storage system in Windows 2000 is not limited to 26 file system volumes. A new mechanism called volume mount points allows administrators to mount a file system to an NTFS directory instead of (or as well as) as a drive letter.
Volume mount points are directories that point to specified volumes in a persistent manner. The directory that hosts the mount point must be NTFS since the underlying mechanism uses NTFS reparse points. However the file system that is being mounted can be FAT, FAT32, NTFS, CDFS, or UDFS.
Mount points allow users or system administrators to add storage space without disrupting the name space. This helps to simplify management activities related to storage.
An example of mount point usage is when a portable computer user creates two physical drives on the portable device: one for the operating system and personal use and a second for storing work-related data. Since most personal productivity tools are set to open or save work at a common directory such as C:\My Documents, it would be disruptive to have to change drives, depending on whether personal or work-related data was being used. An NTFS directory junction that resolves to a volume mount point, therefore, could be placed on a C:\My Documents\Work directory so that this directory and all subdirectories would use physical disk space on Drive 2. Changing directories to C:\My Documents\Personal would physically place the user on drive 1.
Figure 3. Volume mount points
The example above is based on a simplistic use of NTFS volume mount points in an end-user computer environment. Though this example is a valid use of an NTFS volume mount point, the real power of NTFS volume mount points is seen in enterprise server environments where volumes can be added on a monthly basis to allow dynamic data storage growth.
This mechanism also allows administrators to build drive letters with a rich variety of storage classes. For example a D: drive could be created which was stored as RAID-5, but with a D:\TEMP directory (actually a volume mount point) that was striped for performance, and a D:\DB directory that was mirrored for maximum availability. Volume mount points are sticky in that the Windows 2000 operating system automatically prevents resolution problems due to changes in the internal device name of the target volume name (for example, changes due to hardware device reconfiguration activities). This means that a mount point is the target volume, in the same way that a drive letter is the target volume.
File compression has not changed in the Windows 2000 operating system. File compression is mutually exclusive, however, with file system encryption. (See the discussion of Encrypting File System below for more details.)
File or directory level encryption is implemented in NTFS for enhanced security in NTFS volumes. Today, NTFS provides C2-compliant security for files and directories on NTFS volumes. However, it is possible to remove a physical storage volume and mount it on another system. Once mounted by another system, the administrator of this system can take ownership of all data, effectively bypassing NTFS security. Portable computers are particularly vulnerable to this scenario. In computers running Windows 2000, Encrypting File System (EFS) stores actual data in encrypted form, thus providing security in cases where the storage media is removed from a Windows-based system.
Encryption keys are implemented on a per domain-user basis. A recovery mechanism exists so that certain nominated accounts (administrator by default, but this may be changed) have the ability to recover encrypted data in case of a forgotten password, employment termination, or security check. This recovery feature is achieved without using a key escrow scheme-the recovery agent does not actually have access to the user's private encryption keys. A 128-bit version of DESX is the current cryptography scheme that is implemented. (A 40-bit scheme is implemented for international configurations.) A flexible architecture, however, allows for the current DESX-based implementation to change in future versions of the Windows operating system.
For detailed information regard EFS, see the white paper entitled, "Encrypting File System for Windows 2000," located with other Windows 2000 white papers on the Windows 2000 Server Web site.
The change journal software is available only with the Windows 2000 operating system. It conveys significant scalability benefits to applications that would otherwise need to scan an entire volume for changes-for example, backup tools, indexing services, and virus scanning applications.
The change journal describes the nature of any changes to files on the volume. When any file or directory is created, modified, or deleted, NTFS guarantees that a record will be added to the change journal for that volume.
Applications that would normally need to rescan an entire volume to determine changes can now do the scan once, and subsequently learn of changes in this journal. The I/O cost of these applications can now be dependent on how many files have changed-not on how many files exist on the volume.
Each record in the change journal takes approximately 80 to 100 bytes of space, but the change journal has a configurable maximum size that it will never exceed on disk. When this size is reached, a proportion of the oldest records are discarded.
The APIs are fully documented and can be used by ISVs. The change journal supports any services in the Windows 2000 operating system that track changes to a volume, such as the Indexing Service. ISVs can use this feature to enhance a range of products including backup, virus protection, and auditing tools.
Disk quotas provide system administrators with a powerful tool for managing storage growth in distributed environments.
Disk quotas allow administrators to track or enforce the amount of disk space occupied by users on a given volume. User context can be defined as a domain user or a local user.
When disk quotas are being tracked, NTFS builds additional metadata in the NTFS volume which tracks disk usage on a per-volume basis. The tracking is done deep in the NTFS system so the affect on performance is kept to a minimum.
When disk quotas are being enforced, the quota limits the amount of free space reported to a user. For example, if a volume has 100 GB of free space, quota is configured to enforce that only 10 GB of space is available to that user, and the user already has 2 GB of data on the volume, then the system will report to that user context that only 8 GB is available on that volume.
By comparison, if the quotas were being only tracked (and not enforced), then the user would see the full 100 GB of free space. However, when they exceed their quota, an administrative event would be logged in the system event log for the Administrator's attention.
Note that at any point, an administrator may review quota usage on a volume using the Quota Entries tool. This is accessed from the property page for the appropriate NTFS drive.
Enforced disk quotas are otherwise transparent to the user. This design provides two benefits:
Users will have less cause to ask system administrators for more disk space if they do not know how much free space really exists.
Applications will be aware of how much space is available to the current user on a volume, and any existing application rules on how much temporary file space to use will work correctly.
The disk quota facility of the Windows 2000 operating system includes storage usage in facilities such as Remote Storage and sparse files in its quota calculations. Quota tracking is protected by NTFS metadata transactioning design. This ensures that disk quotas are a reliable mechanism for controlling storage on Windows-based systems. Also, since the quota information is stored inside the NTFS volume's data structures, quotas are naturally compatible with enterprise configurations such as Windows Clustering and Storage Area Networks (SANs).
The disk quota options included in the Windows 2000 operating system provide reports or enforcement on a per-user basis. In addition, the NTFS quota system exposes both COM and Windows Script Host interfaces. This allows developers to configure quota configuration and extract quota usage information. Such information can be used to build scripts or applications that report per-group quotas or cross-volume quotas; it can also be for tracking and charge back purposes.
Finally, the Windows 2000 File System Filter architecture allows ISVs to build additional quota management tools that provide more features than the built-in quota system, so that solutions can be made available for servers that require active quota enforcement on a per group, per share, or even cross-volume basis.
A sparse file is a file with one or more regions of unallocated data in it. An application will see these unallocated regions as containing bytes with the value zero, but there is actually no disk space used to represent these zeros. In other words, all meaningful or nonzero data is allocated, whereas all nonmeaningful data (large strings of data composed of 0s) is not allocated. When a sparse file is read, allocated data is returned as stored and nonallocated data is returned, by default, as 0s, in accordance with the C2 security requirement specification. Sparse file support allows data to be deallocated from anywhere in the file.
An example of sparse file use is a scientific application that might require 1 terabyte of storage for data used in a matrix. Actual meaningful data in the matrix may only account for 1 MB. With the sparse file attribute set, the file system can deallocate 0-filled space from anywhere in the file. If a calling application requests this data, the file system identifies the 0 data by range, instead of storing and returning actual data. The result is that file access requests are satisfied with the correct bits, and disk space is managed efficiently. The example is illustrated in Figure 4.
Figure 4. Sparse file storage allocation comparison
Sparse files are also useful for many other purposes. For example, both Remote Storage and the NTFS change journal use sparse files.
The NTFS Chkdsk utility has been rewritten for the Windows 2000 operating system to support the new features, provide more appropriate progress information, and also to provide significantly faster performance. The exact performance gains will depend on the size of the volume and the number of files, but on volumes with millions of files, performance gains of up to 10 times are possible.
In addition, the Chkdsk utility gains two new options to disable parts of the Chkdsk process: /I reduces index checking, and /C removes cycle checking. These are non-essential, but time-consuming stages that an administrator may wish to disable, and re-enable later.
Storage Management tools in Windows 2000 |
Several storage management tools and services are included in the Windows 2000 operating system. These cost-effective technologies are designed to reduce the costs associated with managing storage and writing storage applications. Some of the tools are enhanced versions of Windows NT-based tools, while others are new.
The following is a list of the storage-related tools and services included in the Windows 2000 operating system:
Remote Storage-This is a storage management service designed to lower storage costs by trading latency for media cost. Based on established criteria, data can automatically be migrated from local volumes to directly attached jukeboxes of tape-based storage libraries. Reparse points specifically related to remote storage remain on the primary storage so that migrated files can be recalled from secondary storage devices.
Removable Storage This core I/O service in the Windows 2000 operating system manages removable storage media, bulk storage libraries, and storage jukeboxes. Removable Storage provides access to storage devices through a single set of APIs. Removable Storage eliminates the need for ISVs to support bulk media devices on a per device basis. Removable Storage enables multiple applications to share a bulk storage device. Removable Storage abstracts bulk media, so that storage application developers can concentrate on customer-related features, rather than hardware issues.
Backup-The Windows 2000 Server Backup tool is integrated with the core Windows 2000 Server distributed services. It provides support for Windows 2000 services such as the Active DirectoryT directory service, encrypted files, and sparse files. Using this tool, back up can be made to tape libraries (using Removable Storage), or to a backup file on a disk.
Indexing Service-Indexing Service is a base service in the Windows 2000 operating system. Indexing Service tracks file system objects across volumes, computers, and Web sites so that Windows 2000-based file systems can become rich data stores for Internet and intranet searches.
Two-way File System Replication The Windows 2000 Server operating system allows any file system object and/or directory attribute to be duplicated on another server in a consistent fashion. File system replication provides a powerful mechanism for creating several replicas of data in the Windows 2000 distributed file system (Dfs) that are kept in a synchronized state.
Distributed File System-The Distributed File System (Dfs) is a network server component of the Windows 2000 Server operating system that presents a logical view of distributed physical storage. Dfs allows distributed file systems to be united into a single name space. Its usage provides higher data availability, load balancing, name transparency, and flexible volume administration.
Offline Files-This caching capability allows network users to access files on network shares, even if the user is disconnected from the network. The user can still see the shares, and the files and directories that have been cached on the client. If changes are made while disconnected, the changes will be transferred to the server when the system reconnects. If the file changed on the server, the client who is reconnected can resolve the conflict by choosing which version to keep.
The following sections describe each of the storage-related tools and services included in the Windows 2000 operating system in detail.
Remote Storage is a storage management service that transparently migrates data between local disk storage and slower, but less expensive, tape libraries. This service is a Hierarchical Storage Management (HSM) system, because it contains two defined levels of storage: local storage and remote storage. Local storage refers to the NTFS disk volumes on the computer running Remote Storage on Windows 2000 Server. The remote storage level refers to the tape library or stand-alone tape drive that is typically connected to the computer running Windows 2000 Server by a SCSI-2 or SCSI-3 cable connector. Remote Storage reduces the overall cost of storage by ensuring that infrequently used data is automatically archived to tape, and requires little or no administrator intervention to retrieve the data. Other benefits of Remote Storage are listed here:
Users keep their data on file servers, and are not tempted to copy excess data to local disks or backup media where it might be lost.
The location of the files (disk or tape) is transparent to the user, so that migrated files are still shown in folder listings and can be accessed by the appropriate application.
Hard disk storage is not typically used for infrequently accessed data.
The underlying tape library is provided by the Removable Storage subsystem (described later in this paper) so that Remote Storage can share a tape library with another storage management feature, such as a backup utility.
The Windows 2000 Server operating system not only includes an HSM service, but computers running Windows 2000 Professional are also HSM-aware. Many elements of the storage, application, and GUI are aware of HSM and work together to ensure that HSM is fully integrated into the user experience. This integration includes, (but is not limited to) the following scenarios:
The Windows 2000 graphical user interface (GUI) indicates that access to a remotely stored file incurs retrieval latency by applying a clock icon to such files.
When files are retrieved from the Remote Storage, the user is notified that the file is being retrieved
Network timeouts are automatically extended for files that are stored on tape.
Backup is aware of Remote Storage files and will not de-migrate files just to back them up (Remote Storage has an additional mechanism for duplicating its own tapes)
The NTFS change journal includes flags to indicate when a file shifts between storage tiers. In this way, Indexing Service is aware of Remote Storage files and understands that a migration between disk and tape does not require the file to be re-indexed.
Remote Storage can detect a rogue application that is de-migrating the entire jukebox, and can terminate the application if the configured runaway recall limit is encountered.
Reparse points ensure that no extra code is invoked for non-migrated files in the I/O subsystem
Figure 5. Remote Storage process
Remote Storage, illustrated in Figure 5, is a two-tiered remote storage solution. Third-party software developers are able to develop multi-tiered remote storage solutions that make use of the new system infrastructure such as the high latency file attribute.
The Windows 2000 operating system provides an efficient and user-friendly platform for hierarchical storage management-by running Remote Storage on the Windows 2000-based computer, and by supporting the development of ISV products.
Removable Storage provides a single set of APIs that allow applications to catalog all removable media (except floppy disks and similar small-capacity media), whether housed on shelves (offline) or in robotic libraries (online). In disguising the complexities of underlying robotic library systems and bulk media libraries, Removable Storage, illustrated in Figure 6, lowers development costs of storage applications and provides consistency to customers who purchase these applications.
Figure 6. Removable Storage architecture
Removable Storage uses media pools to organize media. Media pools have several functions in the management of a media server. They control the selection of media and media type, allow media to be shared across applications, and allow such sharing to be tracked. (See Figure 7.)
There are two classes of media pools: system and application. The system media pools are categorized as free, import, and unrecognized. A free pool holds media that contains no useful data and is freely available to any application. The import and unrecognized pools are holding places for media newly added to the system. If Removable Storage recognizes the format of a new piece of media, it adds it to the import pool. If Removable Storage does not recognize the format, the media is added to the unrecognized pool. Applications then move media from these pools into either the free pool or to the second class of pools-application pools. Application pools are pools that applications create to hold media for their own use. For example, Backup, a backup application that is included in the Windows 2000 operating system, creates its own application pool for backup media. Remote Storage creates an application pool in the same manner.
Figure 7. System and application media pools
The Indexing Service is part of the core Windows 2000 operating system. Indexing Service uses a document filter to extract the content and properties of files across volumes, computers, and the Web, and stores the information in an index. Users can then easily and efficiently search the Windows file systems, or the Web, whether they are using the Windows 2000 Search function, the Indexing Service query form, or a Web browser.
Indexing Service is an integral part of the Windows 2000 operating system, particularly when used with NTFS:
When indexing NTFS volumes, the Indexing Service uses the change journal to provide efficient index updates-there is no need to scan an entire volume's directory information to learn if each file has changed.
On NTFS volumes, the Indexing Service works with Windows Explorer to generate thumbnails of image files, and store the thumbnails in a NTFS stream on the file. If Remote Storage migrates the files to tape, then the thumbnails will not be migrated, and so the user is able to view these thumbnails without need for a recall.
On NTFS volumes, if a user has no access to read a file, Indexing Service will not return that file in the results list to the user. Similarly, Indexing Service will not indicate that a match was found, but the file cannot be accessed. Indexing Service does not compromise information security.
Indexing Service is integrated with the Windows 2000 Search function, so it is easy to use.
Indexing Service is available and installed by default on all versions of the Windows 2000 operating system. However, since Indexing Service can require noticeable levels of I/O to build the initial index, it is disabled by default. Users and system administrators should enable Indexing Service as soon as they have the opportunity to do so.
The Windows 2000 Backup utility, an enhanced version of NTBackup, is included in the Windows 2000 operating system. Backup provides support for Windows 2000 features, such as Active Directory, encrypted files, and sparse files. Encrypted files are stored on the backup media in their encrypted format. The Windows 2000 operating system also provides ways for users to back up their encryption keys separately. Users can now back up tape libraries using Removable Storage, or back up files on a disk. Backup also features a new user interface, and several wizards that make backing up and restoring data easier.
The Distributed File System (Dfs) is a network server component that presents a logical view of distributed physical storage. Dfs allows distributed file systems to be united into a single name space. Its usage provides higher data availability, load balancing, name transparency, and flexible volume administration.
The Dfs server component presents distributed volumes in a logical manner by mapping physical UNC names to logical paths. Thus, \\MS_Server\Public\Users\Bob as a logical path could actual have the following physical topology underneath:
Dfs logical path |
Physical location |
Explanation |
\\MS_Server\Public |
\\MS_Server\Pubic |
Root share |
\\MS_Server\Public\Users |
\\MS_Users1\Employees |
Junction to employee directories. |
\\MS_Server\Public\Users\Bob |
\\Bob_Wkst\_Bob_Public |
Junction to Bob's computer. |
For more information on Dfs, see the "Distributed File System: A Logical View of Physical Storage" white paper on the Windows NT Server Web site.
Full two-way file replication for NTFS is introduced in the Windows 2000 operating system. File replication provides a mechanism for duplicating any file system object and/or directory attribute to another server in a loose but consistent fashion. The Windows 2000 File Replication System (FRS) is based on the NTFS change journal.
Although Active Directory uses file replication for internal needs, the true power of FRS in enterprise storage lies in the distributed file system. Dfs allows up to 32 alternate volumes (physical locations) to be configured for a single logical share point. Dfs alternate volumes provide both scalability and fault tolerance for enterprise data. The File Replication service ensures that updatable data on these alternate volumes is kept consistent across physical locations by automatically replicating files when they are modified and closed. For more information, see "Distributed File System: A Logical View of Physical Storage," on the Windows NT Server site.
Offline Files is an IntelliMirrorT management technology, included in the Windows 2000 operating system, that allows network users to access files on network shares, even when the client computer is disconnected from the network. When a mobile user views the share while disconnected, he or she can still browse, read, and edit files, because they have been cached on the client computer. When the user later connects to the server, the system reconciles the changes with the server. This process is highly configurable-synchronization policy and behavior can be defined based on the time of day and network connection type, using Synchronization Manager. For example, synchronization might happen when the user logs on to a direct LAN connection, but only at a users' request when using a dial-in connection.
Offline Files also provides performance advantages for networks: While connected to the network, clients can still read files from the local cache, reducing the amount of data transferred over the network.
Offline Files solves a dilemma facing most enterprise organizations today. Many organizations implement a backup policy that requires all user data to be stored on managed servers. Data stored on local disks is often not backed up by the organization's IT group. This becomes a problem for mobile users of portable computers. In order to access data when offline, they need some mechanism to replicate data between the portable computer and the servers. Some organizations use the Windows Briefcase; others use batch files, or even manual procedures. In the Windows 2000 operating system, replication between client and server is managed automatically. Files may be accessed while offline, and are automatically synchronized with the managed server.
The Windows 2000 operating system provides administrators with three options to control the type of caching behavior available to clients:
Manual Caching for Documents-This option provides offline access to only those files that someone using your shared folder specifically (manually) identifies. This caching option is ideal for a shared network folder containing files that are to be accessed and modified by several people. This is the default option when you set up a shared folder to be used offline.
Automatic Caching for Documents-This option makes every file that someone opens from your shared folder available to them offline. However, this setting does not make every file in your shared folder available to them offline, only those files that are opened. Files that are not opened are not available offline.
Automatic Caching for Programs-This option provides offline access to shared folders containing files that are not to be changed. This caching option is ideal for making files (.dll and .exe) available offline that are read, referenced, or run, but that are not changed in the process. Automatic Caching for Programs reduces network traffic because offline files are opened directly, without accessing the network versions in any way, and generally start and run faster than the network versions.
Conclusion |
Rapidly evolving technologies and business needs are placing more demand than ever on distributed systems. As the demand for data and services provided by distributed systems escalates, storage availability and manageability become increasingly important. The Windows 2000 operating system offers significant enhancements in its storage subsystem architecture. These enhancements provide a foundation for storage-based features, applications, and services designed to help manage storage more effectively and efficiently at the enterprise level. Better storage management results in better control, easier growth and expansion, increased data availability, and improved recoverability. In addition, the Windows 2000 operating system includes significant features and tools to assist in this process. Microsoft is also working with independent software and hardware vendors to provide a wide range of value-added storage applications and options that address the special needs of customers in a variety of computing environments.
For the latest information on Windows 2000 Server, visit the Microsoft Windows 2000 Server Web site or the Windows 2000 Server Forum on the Microsoft Network (GO WORD: MSNTS).
|