This note should be read together with Windows File System tutorial and that follows. A file management in general can be found in File Management ppt slide.
Windows File Systems
File System Components
Volume
Directory
|
File
A file is a logical grouping of related data. A file is an entity of data in the file system that a user can access and manage.
A file must have a unique name in its directory. It consists of one or more streams of bytes that hold a set of related data, plus a set of attributes (also called properties) that describe the file or the data within the file. The creation time of a file is an example of a file attribute.
When a file is created, one unnamed default stream is created to store all data written to the file while it is open.
You can also create additional streams within the file. These additional streams are referred to as alternate streams.
File attributes are not stored in the data streams with the file data, but are stored elsewhere and managed by the operating system.
All file system data, including the system bootstrap code and directories, are stored by NTFS in files.
Other file systems store this information in disk regions external to the file system. An advantage of storing this information in files is that Windows can locate, access, and maintain the information easily.
Other advantages are that each of these files may be protected by a security descriptor and, in the case of partial disk corruption; they may be quickly relocated to a safer part of the disk.
The fundamental storage unit of all supported file systems is a cluster, which is a group of sectors.
This allows the file system to optimize the administration of disk data independently of the disk sector size set by the hardware disk controller.
If the disk to be administered is large and large amounts of data are moved and organized in a single operation, the administrator can adjust the cluster size to accommodate this.
Windows manages files through file objects, file handles, and file pointers.
Storage devices and partitions are not part of the file system, but are the required physical foundation for the logical file system components.
The following file systems are supported by Windows:
NTFS
FAT32
FAT16 and FAT12
UDF
The Universal Disk Format (UDF) is defined by the Optical Storage Technology Association (OSTA).
It is designed to replace CDFS and add support for DVD-ROM. UDF is included in the DVD specification and is more flexible than CDFS. Support for UDF was introduced in Windows 2000.
The file system implementation is compliant with ISO 13346 and supports UDF versions 1.02 and 1.5. The UDF file system has the following traits:
File names can be 255 characters long.
File names can be uppercase and lowercase.
The maximum path length is 1023 characters.
For Windows 2000, the UDF implementation was read-only. Starting with Windows XP, there is read and write support.
CDFS
The CD-ROM file system (CDFS) is a relatively simple format that was defined in 1988 as the read-only formatting standard for CD-ROM media.
Support for CDFS was introduced in Windows NT 4.0.
The Windows implementation includes long file name support defined by Level 2 of the ISO 9660 standard. Because of its simplicity, the CDFS format has the following restrictions:
Directory and file names must be fewer than 32 characters long.
Directory trees can be no more than eight levels deep.
CDFS is considered a legacy format because the industry has adopted the Universal Disk Format (UDF) as the standard for read-only media.
The following table lists three main features of the NTFS, FAT32, and FAT16 file systems and how they differ.
Feature | NTFS | FAT32 | FAT16 |
MS-DOS compatibility | No | No | Yes |
Disk quotas | Yes | No | No |
File compression | Yes | No | No |
The following table lists some of the limits imposed by the NTFS, FAT32 and FAT16 file systems, and how they differ. Note that these are theoretical, not tested, limits.
Limit | NTFS | FAT32 | FAT16 |
File size | 264 - 1 bytes | 232 - 1 bytes | 232 - 1 bytes |
Minimum cluster size | 512 bytes | 512 bytes | 512 bytes |
Maximum cluster size | 64 KB | 64 KB | 64 KB |
Minimum volume size | 1 MB | 2 GB | 2,091,520 bytes |
Maximum volume size | 232 allocation units | 4,177,198 clusters | 4 GB Windows Me/98/95: 2 GB |
Files per volume | 232 - 1 | 228 | 216 |
Files or directories per directory | Unlimited | 216 - 2 | 216 - 2 |
The following are limited only by the amount of available memory on all file systems:
The maximum amount of storage space.
The maximum number of disk drives per server.
The maximum number of open local files.
The maximum number of simultaneous file locks.
The file I/O functions enable applications to access files regardless of the underlying file system. However, capabilities may vary depending on the file system and/or operating system in use.
For example, the CreateFile() function includes a security parameter that provides no security benefits for files not residing on an NTFS volume.
The first time a file I/O function accesses a volume and whenever a diskette is placed in a floppy-disk drive, the operating system examines the volume to determine its file system.
Thereafter, the operating system manages all I/O to that volume through the device driver supporting the file system.
Because files are securable objects, access to them is regulated by the access-control model that governs access to all other securable objects in Windows.
You can specify a security descriptor for a file or directory when you call the CreateFile(), CreateDirectory(), or CreateDirectoryEx() function.
If you specify NULL for the lpSecurityAttributes parameter, the file or directory gets a default security descriptor.
The Access Control Lists (ACLs) in the default security descriptor for a file or directory are inherited from its parent directory. Note that a default security descriptor is assigned only when a file or directory is newly created, and not when it is renamed or moved.
To retrieve the security descriptor of a file or directory object, call the GetNamedSecurityInfo() or GetSecurityInfo() function. To change the security descriptor of a file or directory object, call the SetNamedSecurityInfo() or SetSecurityInfo() function.
The valid access rights for files and directories include the DELETE, READ_CONTROL, WRITE_DAC, and WRITE_OWNER standard. The following table lists the access rights that are specific to files and directories.
Access right | Description |
FILE_ADD_FILE | For a directory, the right to create a file in the directory. |
FILE_ADD_SUBDIRECTORY | For a directory, the right to create a subdirectory. |
FILE_ALL_ACCESS | All possible access rights for a file. |
FILE_APPEND_DATA | For a file object, the right to append data to the file. For a directory object, the right to create a subdirectory. |
FILE_CREATE_PIPE_INSTANCE | For a named pipe, the right to create a pipe. |
FILE_DELETE_CHILD | For a directory, the right to delete a directory and all the files it contains, including read-only files. |
FILE_EXECUTE | For a native code file, the right to execute the file. This access right given to scripts may cause the script to be executable, depending on the script interpreter. |
FILE_LIST_DIRECTORY | For a directory, the right to list the contents of the directory. |
FILE_READ_ATTRIBUTES | The right to read file attributes. |
FILE_READ_DATA | For a file object, the right to read the corresponding file data. For a directory object, the right to read the corresponding directory data. |
FILE_READ_EA | The right to read extended file attributes. |
FILE_TRAVERSE | For a directory, the right to traverse the directory. By default, users are assigned the BYPASS_TRAVERSE_CHECKING privilege, which ignores the FILE_TRAVERSE access right. See the remarks later in this section for more information. |
FILE_WRITE_ATTRIBUTES | The right to write file attributes. |
FILE_WRITE_DATA | For a file object, the right to write data to the file. For a directory object, the right to create a file in the directory. |
FILE_WRITE_EA | The right to write extended file attributes. |
STANDARD_RIGHTS_READ | Includes READ_CONTROL, which is the right to read the information in the file or directory object's security descriptor. This does not include the information in the SACL. |
STANDARD_RIGHTS_WRITE | Includes WRITE_CONTROL, which is the right to write to the directory object's security descriptor. This does not include the information in the SACL. |
SYNCHRONIZE | The right to specify a file handle in one of the wait functions. However, for asynchronous file I/O operations, you should wait on the event handle in an OVERLAPPED structure rather than using the file handle for synchronization. |
The following are the generic access rights for files and directories.
Access right | Description |
GENERIC_EXECUTE | FILE_READ_ATTRIBUTES STANDARD_RIGHTS_EXECUTE SYNCHRONIZE |
GENERIC_READ | FILE_READ_ATTRIBUTES FILE_READ_DATA FILE_READ_EA STANDARD_RIGHTS_READ SYNCHRONIZE |
GENERIC_WRITE | FILE_APPEND_DATA FILE_WRITE_ATTRIBUTES FILE_WRITE_DATA FILE_WRITE_EA STANDARD_RIGHTS_WRITE SYNCHRONIZE |
Windows compares the requested access rights and the information in the thread's access token with the information in the file or directory object's security descriptor.
If the comparison does not prohibit all of the requested access rights from being granted, a handle to the object is returned to the thread and the access rights are granted.
In the Win32 subsystem, authorization for access to a file or directory, by default, is controlled strictly by the ACLs in the security descriptor associated with that file or directory.
In particular, the security descriptor of a parent directory is not used to control access to any child file or directory. The FILE_TRAVERSE access right can be enforced by removing the BYPASS_TRAVERSE_CHECKING privilege from users. This is not recommended in the general case, as many programs do not correctly handle directory traversal errors.
The primary use for the FILE_TRAVERSE access right on directories is to enable conformance to certain IEEE and ISO POSIX standards when interoperability with Unix systems is a requirement.
The Windows security model provides a way for a child directory to inherit, or to be prevented from inheriting, one or more of the Access Control Entries (ACEs) in the parent directory's security descriptor.
Each ACE contains information that determines how it can be inherited, and whether it will have an effect on the inheriting directory object.
For example, some inherited ACEs control access to the inherited directory object, and these are called effective ACEs. All other ACEs are called inherit-only ACEs.
The Windows security model also enforces the automatic inheritance of ACEs to child objects according to the ACE inheritance rules. This automatic inheritance, along with the inheritance information in each ACE, determines how security restrictions are passed down the directory hierarchy.
Note that you cannot use an access-denied ACE to deny only GENERIC_READ or only GENERIC_WRITE access to a file. This is because for file objects, the generic mappings for both GENERIC_READ or GENERIC_WRITE include the SYNCHRONIZE access right.
If an ACE denies GENERIC_WRITE access to a trustee, and the trustee requests GENERIC_READ access, the request will fail because the request implicitly includes SYNCHRONIZE access which is implicitly denied by the ACE, and vice versa.
Instead of using access-denied ACEs, use access-allowed ACEs to explicitly allow the permitted access rights.
Another means of managing access to storage objects is encryption. The implementation of file system encryption in Windows is the Encrypted File System, or EFS.
EFS encrypts only files and not directories. The advantage of encryption is that it provides additional protection to files that is applied on the media and not through the file system and the standard Windows access control architecture.
In most cases, the ability to read and write the security settings of a file or directory object is restricted to kernel-mode processes. Clearly, you would not want any user process to be able to change the ownership or access restriction on your private file or directory.
A Backup Story
|
hFile = CreateFile(fileName,
READ_CONTROL,
0,
NULL,
OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,
NULL);
This will allow the backup application process to open your file and override the standard security checking. To restore your file, the backup application would use the following CreateFile() call syntax when opening your file to be written.
hFile = CreateFile(fileName,
WRITE_OWNER|WRITE_DAC,
0,
NULL,
CREATE_ALWAYS,
FILE_FLAG_BACKUP_SEMANTICS,
NULL);
There are situations when a backup application must be able to change the access control settings of a file or directory.
An example is when the access control settings of the disk-resident copy of a file or directory is different from the backup copy. This would happen if these settings were changed after the file or directory was backed up, or if it was corrupted.
The FILE_FLAG_BACKUP_SEMANTICS flag specified in the call to CreateFile() gives the backup application process permission to read the access-control settings of the file or directory.
With this permission, the backup application process can then call GetKernelObjectSecurity() and SetKernelObjectSecurity() to read and than reset the access-control settings.
If a backup application must have access to the system-level access control settings, the ACCESS_SYSTEM_SECURITY flag must be specified in the dwDesiredAccess parameter value passed to CreateFile().
Backup applications call BackupRead() to read the files and directories specified for the restore operation, and BackupWrite() to write them.
Windows provides the ability to perform input and output (I/O) operations on storage components located on local and remote computers.
There are two types of file I/O synchronization: synchronous file I/O and asynchronous file I/O. Asynchronous file I/O is also referred to as overlapped I/O.
In synchronous file I/O, a thread starts an I/O operation and immediately enters a wait state until the I/O request has completed. A thread performing asynchronous file I/O sends an I/O request to the kernel.
If the request is accepted by the kernel, the thread continues processing another job until the kernel signals to the thread that the I/O operation is complete. It then interrupts its current job and processes the data from the I/O operation as necessary. These two synchronization types are illustrated in the following figure.
In situations where an I/O request is expected to take a large amount of time, such as a refresh or backup of a large database, asynchronous I/O is generally a good way to optimize processing efficiency.
However, for relatively fast I/O operations, the overhead of processing kernel I/O requests and kernel signals may make asynchronous I/O less beneficial, particularly if many fast I/O operations need to be made. In this case, synchronous I/O would be better.
A process opens a file for asynchronous I/O in its call to CreateFile() by specifying the FILE_FLAG_OVERLAPPED flag in the dwFlagsAndAttributes parameter.
If FILE_FLAG_OVERLAPPED is not specified, the file is opened for synchronous I/O. When the file has been opened for asynchronous I/O, a pointer to an OVERLAPPED structure is passed into the call to ReadFile() and WriteFile(). The structure is not passed in calls to ReadFile() and WriteFile() when performing synchronous I/O.
Handles to directory objects are obtained by calling CreateDirectory() or CreateDirectoryEx(). Directory handles are almost never used. Backup applications are one of the few applications that typically access them.
After opening the file object for asynchronous I/O by calling CreateFile(), an instantiation of the OVERLAPPED structure must be instantiated and passed into each call to ReadFile() and WriteFile().
Keep the following in mind when using this structure in asynchronous read and write operations:
Do not de-allocate the OVERLAPPED structure or the data buffer until all asynchronous I/O operations to the file object have been completed. If it is de-allocated prematurely, ReadFile() or WriteFile() may incorrectly report that the I/O operation is complete.
If you declare your pointer to the OVERLAPPED structure as a local variable, do not exit the local function until all asynchronous I/O operations to the file object have been completed. If the local function is exited prematurely, the OVERLAPPED structure will go out of scope and it will be inaccessible to any ReadFile() or WriteFile() functions it encounters outside of that function.
You can also create an event and put the handle in the OVERLAPPED structure; the wait functions can then be used to wait for the I/O operation to complete by waiting on the event handle.
An application can also wait on the file handle to synchronize the completion of an I/O operation, but doing so requires extreme caution. Each time an I/O operation is started, the operating system sets the file handle to the non-signaled state.
Each time an I/O operation is completed, the operating system sets the file handle to the signaled state.
Therefore, if an application starts two I/O operations and waits on the file handle, there is no way to determine which operation is finished when the handle is set to the signaled state.
If an application must perform multiple asynchronous I/O operations on a single file, it should wait on the event handle in the OVERLAPPED structure for each I/O operation, rather than on the file handle.
To cancel all pending asynchronous I/O operations, use the CancelIo() function. This function only cancels operations issued by the calling thread for the specified file handle.
The ReadFileEx() and WriteFileEx() functions enable an application to specify a routine to execute when the asynchronous I/O request is completed.
By default, Windows caches file data that is read from disks and written to disks. This implies that read operations read file data from an area in system memory known as the system file cache, rather than from the physical disk.
Correspondingly, write operations write file data to the system file cache rather than to the disk, and this type of cache is referred to as a write-back cache. Caching is managed per file object.
Caching occurs under the direction of the cache manager, which operates continuously while Windows is running.
File data in the system file cache is written to the disk at intervals determined by the operating system, and the memory previously used by that file data is freed. This is referred to as flushing the cache.
The policy of delaying the writing of the data to the file and holding it in the cache until the cache is flushed is called lazy writing, and it is triggered by the cache manager at a determinate time interval.
The time at which a block of file data is flushed is partially based on the amount of time it has been stored in the cache and the amount of time since the data was last accessed in a read operation.
This ensures that file data that is frequently read will stay accessible in the system file cache for the maximum amount of time.
This file data caching process is illustrated in the following figure.
As depicted by the solid arrows in the previous figure, a 256KB region of data is read into a 256KB cache "slot" in system address space when it is first requested by the cache manager during a file read operation.
A user-mode process then copies the data in this slot to its own address space. When the process has completed its data access, it writes the altered data back to the same slot in the system cache, as shown by the dotted arrow between the process address space and the system cache.
When the cache manager has determined that the data will no longer be needed for a certain amount of time, it writes the altered data back to the file on the disk, as shown by the dotted arrow between the system cache and the disk.
The amount of I/O performance improvement that file data caching offers depends on the size of the file data block being read or written. When large blocks of file data are read and written, it is more likely that disk reads and writes will be necessary to finish the I/O operation. I/O performance will be increasingly impaired as more of this kind of I/O operation occurs.
In these situations, caching can be turned off. This is done at the time the file is opened by passing FILE_FLAG_NO_BUFFERING as a value for the dwFlagsAndAttributes parameter of CreateFile().
When caching is disabled, all read and write operations directly access the physical disk. However, the file metadata may still be cached. To flush the metadata to disk, use the FlushFileBuffers() function.
The frequency at which flushing occurs is an important consideration that balances system performance with system reliability.
If the system flushes the cache too often, the number of large write operations flushing incurs will degrade system performance significantly.
If the system is not flushed often enough, then the likelihood is greater that either system memory will be depleted by the cache, or a sudden system failure (such as a loss of power to the computer) will happen before the flush. In the latter instance, the cached data will be lost.
To ensure that the right amount of flushing occurs, the cache manager spawns a process every second called a lazy writer.
The lazy writer process queues one-eighth of the pages that have not been flushed recently to be written to disk. It constantly reevaluates the amount of data being flushed for optimal system performance, and if more data needs to be written it queues more data.
Lazy writers do not flush temporary files, because the assumption is that they will be deleted by the application or system.
Some applications, such as virus-checking software, require that their write operations be flushed to disk immediately; Windows provides this ability through write-through caching.
A process enables write-through caching for a specific I/O operation by passing the FILE_FLAG_WRITE_THROUGH flag into its call to CreateFile(). With write-through caching enabled, data is still written into the cache, but the cache manager writes the data immediately to disk rather than incurring a delay by using the lazy writer.
A process can also force a flush of a file it has opened by calling the FlushFileBuffers() function.
File system metadata is always cached. Therefore, to store any metadata changes to disk, the file must either be flushed or be opened with FILE_FLAG_WRITE_THROUGH.
Alertable I/O is the method by which application threads process asynchronous I/O requests only when they are in an alertable state. To understand when a thread is in an alertable state, consider the following scenario:
A thread initiates an asynchronous read request by calling ReadFile() with a pointer to a callback function.
The thread initiates an asynchronous write request by calling WriteFile() with a pointer to a callback function.
The thread calls a function that fetches a row of data from a remote database server.
In this scenario, the calls to ReadFile() and WriteFile() will most likely return before the function call in step 3.
When they do, the kernel places the pointers to the callback functions on the thread's Asynchronous Procedure Call (APC) queue. The kernel maintains this queue specifically to hold returned I/O request data until it can be processed by the corresponding thread.
When the row fetch is complete and the thread returns from the function, its highest priority is to process the returned I/O requests on the queue by calling the callback functions.
To do this, it must enter an alertable state. A thread can only do this by calling one of the following functions with the appropriate flags:
|
When the thread enters an alertable state, the following events occur:
The kernel checks the thread's APC queue. If the queue contains callback function pointers, the kernel removes the pointer from the queue and sends it to the thread.
The thread executes the callback function.
Steps 1 and 2 are repeated for each pointer remaining in the queue.
When the queue is empty, the thread returns from the function that placed it in an alertable state.
In this scenario, once the thread enters an alertable state it will call the callback functions sent to ReadFile() and WriteFile(), then return from the function that placed it in an alertable state.
If a thread enters an alertable state while its APC queue is empty, the thread's execution will be suspended by the kernel until one of the following occurs:
The kernel object that is being waited on becomes signaled.
A callback function pointer is placed in the APC queue.
A thread that uses alertable I/O processes asynchronous I/O requests more efficiently than when they simply wait on the event flag in the OVERLAPPED structure to be set, and the alertable I/O mechanism is less complicated than I/O completion ports to use.
However, alertable I/O returns the result of the I/O request only to the thread that initiated it. I/O completion ports do not have this limitation.
I/O completion ports are the mechanism by which an application uses a pool of threads that was created when the application was started to process asynchronous I/O requests.
These threads are created for the sole purpose of processing I/O requests. Applications that process many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports than by using creating threads at the time of the I/O request.
The CreateIoCompletionPort() function associates an I/O completion port with one or more file handles. When an asynchronous I/O operation started on a file handle associated with a completion port is completed, an I/O completion packet is queued to the port.
This can be used to combine the synchronization point for multiple file handles into a single object.
A thread uses the GetQueuedCompletionStatus() function to wait for a completion packet to be queued to the completion port, rather than waiting directly for the asynchronous I/O to complete.
Threads that block their execution on a completion port are released in last-in-first-out (LIFO) order. This means that when a completion packet is queued to the completion port, the system releases the last thread to block its execution on the port.
When a thread calls GetQueuedCompletionStatus(), it is associated with the specified completion port until it exits, specifies a different completion port, or frees the completion port. A thread can be associated with at most one completion port.
The most important property of a completion port is the concurrency value. The concurrency value of a completion port is specified when the completion port is created. This value limits the number of runnable threads associated with the completion port.
When the total number of runnable threads associated with the completion port reaches the concurrency value, the system blocks the execution of any subsequent threads that specify the completion port until the number of runnable threads associated with the completion port drops below the concurrency value.
The most efficient scenario occurs when there are completion packets waiting in the queue, but no waits can be satisfied because the port has reached its concurrency limit. In this case, when a running thread calls GetQueuedCompletionStatus(), it will immediately pick up the queued completion packet. No context switches will occur, because the running thread is continually picking up completion packets and the other threads are unable to run.
The best value to pick for the concurrency value is the number of CPUs on the computer. If your transaction required a lengthy computation, a larger concurrency value will allow more threads to run.
Each transaction will take longer to complete, but more transactions will be processed at the same time. It is easy to experiment with the concurrency value to achieve the best effect for your application.
The PostQueuedCompletionStatus() function allows an application to queue its own special-purpose I/O completion packets to the completion port without starting an asynchronous I/O operation. This is useful for notifying worker threads of external events.
The completion port is freed when there are no more references to it. The completion port handle and every file handle associated with the completion port reference the completion port.
All the handles must be closed to free the completion port. To close the port handle, call the CloseHandle() function.
When the underlying network protocol and redirector support I/O operations, you can use the file API to perform network I/O.
The following figure illustrates the process of a network I/O operation under Windows.
When an application calls a file I/O function to access a file on a remote computer, the following events occur:
The I/O request is intercepted by a network redirector, also referred to simply as a redirector, on the local computer. This is depicted in the preceding figure by the solid arrow between the application and the client redirector.
The redirector constructs a data packet containing all of the information about the request, and sends it to the server where the file is located. This is depicted in the preceding figure by the solid arrow between the client redirector and the server redirector.
The redirector on the server receives the packet from the client, authenticates the access to the file required by the I/O request, and, if authenticated, executes the request on behalf of the client. If not, it returns an error code to the redirector on the client. This is depicted in the preceding figure by the curved solid arrow between the server redirector and the file.
When the request has been executed, the redirector on the server sends any data resulting from the I/O request to the redirector on the client along with a success notification. This is depicted in the preceding figure by the dotted arrow between the server and the client redirector.
The redirector on the client receives the packet from the server and passes the data in the packet to the application along with a success notification. This is depicted in the preceding figure by the dotted arrow between the client redirector and the application.
The Server Message Block (SMB) Protocol is a network file sharing protocol, and as implemented in Microsoft Windows is known as Microsoft SMB Protocol.
The set of message packets that defines a particular version of the protocol is called a dialect. CIFS refers to the dialect that was implemented in the Windows NT4 operating system. SMB and CIFS are also available on VMS, several versions of Unix, and other operating systems.
Although its main purpose is file sharing, additional Microsoft SMB Protocol functionality includes the following:
Determining other Microsoft SMB Protocol servers on the network, or network browsing.
Printing over a network.
File, directory, and share access authentication.
File and record locking.
File and directory change notification.
Dialect negotiation.
Extended file attribute handling.
Unicode support.
Opportunistic locks.
In the OSI networking model, Microsoft SMB Protocol is most often used as a Application/Presentation layer protocol, and it relies on lower-level protocols for transport.
The transport layer protocol that Microsoft SMB Protocol is most often used with is NetBIOS over TCP/IP, or NBT.
However, Microsoft SMB Protocol can also be used without a separate transport protocol—the Microsoft SMB Protocol/NBT combination is generally used for backward compatibility.
The Microsoft SMB Protocol protocol is a client-server implementation and consists of a set of data packets, each containing a request sent by the client or a response sent by the server.
These packets can be broadly classified as follows:
Session control packets. Establishes and discontinues a connection to shared server resources.
File access packets. Accesses and manipulates files and directories on the remote server.
General message packets. Sends data to print queues, mailslots, and named pipes, and provides data about the status of print queues.
Some message packets may be grouped and sent in one transmission to reduce response latency and increase network bandwidth. This is called "batching."
An opportunistic lock (also called an oplock) is a lock placed by a client on a file residing on a server.
In most cases, a client requests an opportunistic lock so it can cache data locally, thus reducing network traffic and improving apparent response time.
Opportunistic locks are used by network redirectors on clients with remote servers, as well as by client applications on local servers.
Opportunistic locks coordinate data caching and coherency between clients and servers and among multiple clients.
Data that is coherent is data that is the same across the network. In other words, if data is coherent, data on the server and all the clients is synchronized.
Opportunistic locks are not commands by the client to the server. They are requests from the client to the server. From the point of view of the client, they are opportunistic.
In other words, the server grants such locks whenever other factors make the locks possible.
When a local application requests access to a remote file, the implementation of opportunistic locks is transparent to the application.
The network redirector and the server involved open and close the opportunistic locks automatically. However, opportunistic locks can also be used when a local application requests access to a local file and access by other applications and processes must be delegated to prevent corruption of the file.
In this case, the local application directly requests an opportunistic lock from the local file system and caches the file locally.
When used in this way, the opportunistic lock is effectively a semaphore managed by the local server, and is mainly used for the purposes of data coherency in the file and file access notification.
The maximum number of concurrent opportunistic locks that you can create is limited only by the amount of available memory. The maximum number created in test conditions with Windows 2000 and NTFS is 359,000 locks.
Local applications should not attempt to request opportunistic locks from remote servers. An error will be returned by DeviceIoControl if an attempt is made to do this.
Opportunistic locks are of very limited use for applications. The only practical use is to test a network redirector or a server opportunistic lock handler.
Typically, file systems implement support for opportunistic locks. Applications generally leave opportunistic lock management to the file system drivers. Anyone implementing a file system should use the Installable File System (IFS) Kit for Windows 2000.
Anyone developing a device driver other than an installable file system should use the Windows 2000 Driver Development Kit (DDK).
Opportunistic locks and the associated operations are a superset of the opportunistic lock portion of the Common Internet File System (CIFS) protocol, an Internet Draft.
The CIFS protocol is an enhanced version of the Server Message Block (SMB) protocol. Note that the CIFS Internet Draft explicitly provides that a CIFS implementation may implement opportunistic locks by refusing to grant them.
An application consists of one or more processes. A process, in the simplest terms, is an executing program. One or more threads run in the context of the process.
A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread.
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, data, object handles, environment variables, a base priority, and minimum and maximum working set sizes.
Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, and a set of structures the system will use to save the thread context until it is scheduled.
The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process.
---------------------------------------End of story---------------------------------