- Feature Name:
fs2 - Start Date: 2015-04-04
- RFC PR: rust-lang/rfcs#1044
- Rust Issue: rust-lang/rust#24796
Summary
Expand the scope of the std::fs module by enhancing existing functionality,
exposing lower-level representations, and adding a few new functions.
Motivation
The current std::fs module serves many of the basic needs of interacting with
a filesystem, but is missing a lot of useful functionality. For example, none of
these operations are possible in stable Rust today:
- Inspecting a file's modification/access times
- Reading low-level information like that contained in
libc::stat - Inspecting the unix permission bits on a file
- Blanket setting the unix permission bits on a file
- Leveraging
DirEntryfor the extra metadata it might contain - Reading the metadata of a symlink (not what it points at)
- Resolving all symlink in a path
There is some more functionality listed in the RFC issue, but this RFC
will not attempt to solve the entirety of that issue at this time. This RFC
strives to expose APIs for much of the functionality listed above that is on the
track to becoming #[stable] soon.
Non-goals of this RFC
There are a few areas of the std::fs API surface which are not considered
goals for this RFC. It will be left for future RFCs to add new APIs for these
areas:
- Enhancing
copyto copy directories recursively or configuring how copying happens. - Enhancing or stabilizing
walkand its functionality. - Temporary files or directories
Detailed design
First, a vision for how lowering APIs in general will be presented, and then a number of specific APIs will each be proposed. Many of the proposed APIs are independent from one another and this RFC may not be implemented all-in-one-go but instead piecemeal over time, allowing the designs to evolve slightly in the meantime.
Lowering APIs
The vision for the os module
One of the principles of IO reform was to:
Provide hooks for integrating with low-level and/or platform-specific APIs.
The original RFC went into some amount of detail for how this would look, in
particular by use of the os module. Part of the goal of this RFC is to flesh
out that vision in more detail.
Ultimately, the organization of os is planned to look something like the
following:
os
unix applicable to all cfg(unix) platforms; high- and low-level APIs
io extensions to std::io
fs extensions to std::fs
net extensions to std::net
env extensions to std::env
process extensions to std::process
...
linux applicable to linux only
io, fs, net, env, process, ...
macos ...
windows ...
APIs whose behavior is platform-specific are provided only within the std::os
hierarchy, making it easy to audit for usage of such APIs. Organizing the
platform modules internally in the same way as std makes it easy to find
relevant extensions when working with std.
It is emphatically not the goal of the std::os::* modules to provide
bindings to all system APIs for each platform; this work is left to external
crates. The goals are rather to:
-
Facilitate interop between abstract types like
Filethatstdprovides and the underlying system. This is done via "lowering": extension traits likeAsRawFdallow you to extract low-level, platform-specific representations out ofstdtypes likeFileandTcpStream. -
Provide high-level but platform-specific APIs that feel like those in the rest of
std. Just as with the rest ofstd, the goal here is not to include all possible functionality, but rather the most commonly-used or fundamental.
Lowering makes it possible for external crates to provide APIs that work
"seamlessly" with std abstractions. For example, a crate for Linux might
provide an epoll facility that can work directly with std::fs::File and
std::net::TcpStream values, completely hiding the internal use of file
descriptors. Eventually, such a crate could even be merged into std::os::unix,
with minimal disruption -- there is little distinction between std and other
crates in this regard.
Concretely, lowering has two ingredients:
-
Introducing one or more "raw" types that are generally direct aliases for C types (more on this in the next section).
-
Providing an extension trait that makes it possible to extract a raw type from a
stdtype. In some cases, it's possible to go the other way around as well. The conversion can be by reference or by value, where the latter is used mainly to avoid the destructor associated with astdtype (e.g. to extract a file descriptor from aFileand eliminate theFileobject, without closing the file).
While we do not seek to exhaustively bind types or APIs from the underlying
system, it is a goal to provide lowering operations for every high-level type
to a system-level data type, whenever applicable. This RFC proposes several such
lowerings that are currently missing from std::fs.
std::os::platform::raw
Each of the primitives in the standard library will expose the ability to be lowered into its component abstraction, facilitating the need to define these abstractions and organize them in the platform-specific modules. This RFC proposes the following guidelines for doing so:
- Each platform will have a
rawmodule inside ofstd::oswhich houses all of its platform specific definitions. - Only type definitions will be contained in
rawmodules, no function bindings, methods, or trait implementations. - Cross-platform types (e.g. those shared on all
unixplatforms) will be located in the respective cross-platform module. Types which only differ in the width of an integer type are considered to be cross-platform. - Platform-specific types will exist only in the
rawmodule for that platform. A platform-specific type may have different field names, components, or just not exist on other platforms.
Differences in integer widths are not considered to be enough of a platform
difference to define in each separate platform's module, meaning that it will be
possible to write code that uses os::unix but doesn't compile on all Unix
platforms. It is believed that most consumers of these types will continue to
store the same type (e.g. not assume it's an i32) throughout the application
or immediately cast it to a known type.
To reiterate, it is not planned for each raw module to provide exhaustive
bindings to each platform. Only those abstractions which the standard library is
lowering into will be defined in each raw module.
Lowering Metadata (all platforms)
Currently the Metadata structure exposes very few pieces of information about
a file. Some of this is because the information is not available across all
platforms, but some of it is also because the standard library does not have the
appropriate abstraction to return at this time (e.g. time stamps). The raw
contents of Metadata (a stat on Unix), however, should be accessible via
lowering no matter what.
The following trait hierarchy and new structures will be added to the standard library.
mod os::windows::fs {
pub trait MetadataExt {
fn file_attributes(&self) -> u32; // `dwFileAttributes` field
fn creation_time(&self) -> u64; // `ftCreationTime` field
fn last_access_time(&self) -> u64; // `ftLastAccessTime` field
fn last_write_time(&self) -> u64; // `ftLastWriteTime` field
fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields
}
impl MetadataExt for fs::Metadata { ... }
}
mod os::unix::fs {
pub trait MetadataExt {
fn as_raw(&self) -> &Metadata;
}
impl MetadataExt for fs::Metadata { ... }
pub struct Metadata(raw::stat);
impl Metadata {
// Accessors for fields available in `raw::stat` for *all* unix platforms
fn dev(&self) -> raw::dev_t; // st_dev field
fn ino(&self) -> raw::ino_t; // st_ino field
fn mode(&self) -> raw::mode_t; // st_mode field
fn nlink(&self) -> raw::nlink_t; // st_nlink field
fn uid(&self) -> raw::uid_t; // st_uid field
fn gid(&self) -> raw::gid_t; // st_gid field
fn rdev(&self) -> raw::dev_t; // st_rdev field
fn size(&self) -> raw::off_t; // st_size field
fn blksize(&self) -> raw::blksize_t; // st_blksize field
fn blocks(&self) -> raw::blkcnt_t; // st_blocks field
fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec)
fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec)
fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec)
}
}
// st_flags, st_gen, st_lspare, st_birthtim, st_qspare
mod os::{linux, macos, freebsd, ...}::fs {
pub mod raw {
pub type dev_t = ...;
pub type ino_t = ...;
// ...
pub struct stat {
// ... same public fields as libc::stat
}
}
pub trait MetadataExt {
fn as_raw_stat(&self) -> &raw::stat;
}
impl MetadataExt for os::unix::fs::RawMetadata { ... }
impl MetadataExt for fs::Metadata { ... }
}
The goal of this hierarchy is to expose all of the information in the OS-level metadata in as cross-platform of a method as possible while adhering to the design principles of the standard library.
The interesting part about working in a "cross platform" manner here is that the
makeup of libc::stat on unix platforms can vary quite a bit between platforms.
For example some platforms have a st_birthtim field while others do not.
To enable as much ergonomic usage as possible, the os::unix module will expose
the intersection of metadata available in libc::stat across all unix
platforms. The information is still exposed in a raw fashion (in terms of the
values returned), but methods are required as the raw structure is not exposed.
The unix platforms then leverage the more fine-grained modules in std::os
(e.g. linux and macos) to return the raw libc::stat structure. This will
allow full access to the information in libc::stat in all platforms with clear
opt-in to when you're using platform-specific information.
One of the major goals of the os::unix::fs design is to enable as much
functionality as possible when programming against "unix in general" while still
allowing applications to choose to only program against macos, for example.
Fate of Metadata::{accessed, modified}
At this time there is no suitable type in the standard library to represent the return type of these two functions. The type would either have to be some form of time stamp or moment in time, both of which are difficult abstractions to add lightly.
Consequently, both of these functions will be deprecated in favor of
requiring platform-specific code to access the modification/access time of
files. This information is all available via the MetadataExt traits listed
above.
Eventually, once a std type for cross-platform timestamps is available, these
methods will be re-instated as returning that type.
Lowering and setting Permissions (Unix)
Note: this section only describes behavior on unix.
Currently there is no stable method of inspecting the permission bits on a file,
and it is unclear whether the current unstable methods of doing so,
PermissionsExt::mode, should be stabilized. The main question around this
piece of functionality is whether to provide a higher level abstraction (e.g.
similar to the bitflags crate) for the permission bits on unix.
This RFC proposes considering the methods for stabilization as-is and not pursuing a higher level abstraction of the unix permission bits. To facilitate in their inspection and manipulation, however, the following constants will be added:
mod os::unix::fs {
pub const USER_READ: raw::mode_t;
pub const USER_WRITE: raw::mode_t;
pub const USER_EXECUTE: raw::mode_t;
pub const USER_RWX: raw::mode_t;
pub const OTHER_READ: raw::mode_t;
pub const OTHER_WRITE: raw::mode_t;
pub const OTHER_EXECUTE: raw::mode_t;
pub const OTHER_RWX: raw::mode_t;
pub const GROUP_READ: raw::mode_t;
pub const GROUP_WRITE: raw::mode_t;
pub const GROUP_EXECUTE: raw::mode_t;
pub const GROUP_RWX: raw::mode_t;
pub const ALL_READ: raw::mode_t;
pub const ALL_WRITE: raw::mode_t;
pub const ALL_EXECUTE: raw::mode_t;
pub const ALL_RWX: raw::mode_t;
pub const SETUID: raw::mode_t;
pub const SETGID: raw::mode_t;
pub const STICKY_BIT: raw::mode_t;
}
Finally, the set_permissions function of the std::fs module is also proposed
to be marked #[stable] soon as a method of blanket setting permissions for a
file.
Constructing Permissions
Currently there is no method to construct an instance of Permissions on any
platform. This RFC proposes adding the following APIs:
mod os::unix::fs {
pub trait PermissionsExt {
fn from_mode(mode: raw::mode_t) -> Self;
}
impl PermissionsExt for Permissions { ... }
}
This RFC does not propose yet adding a cross-platform way to construct a
Permissions structure due to the radical differences between how unix and
windows handle permissions.
Creating directories with permissions
Currently the standard library does not expose an API which allows setting the
permission bits on unix or security attributes on Windows. This RFC proposes
adding the following API to std::fs:
pub struct DirBuilder { ... }
impl DirBuilder {
/// Creates a new set of options with default mode/security settings for all
/// platforms and also non-recursive.
pub fn new() -> Self;
/// Indicate that directories create should be created recursively, creating
/// all parent directories if they do not exist with the same security and
/// permissions settings.
pub fn recursive(&mut self, recursive: bool) -> &mut Self;
/// Create the specified directory with the options configured in this
/// builder.
pub fn create<P: AsRef<Path>>(&self, path: P) -> io::Result<()>;
}
mod os::unix::fs {
pub trait DirBuilderExt {
fn mode(&mut self, mode: raw::mode_t) -> &mut Self;
}
impl DirBuilderExt for DirBuilder { ... }
}
mod os::windows::fs {
// once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added
pub trait DirBuilderExt {
fn security_attributes(&mut self, ...) -> &mut Self;
}
impl DirBuilderExt for DirBuilder { ... }
}
This sort of builder is also extendable to other flavors of functions in the future, such as C++'s template parameter:
/// Use the specified directory as a "template" for permissions and security
/// settings of the new directories to be created.
///
/// On unix this will issue a `stat` of the specified directory and new
/// directories will be created with the same permission bits. On Windows
/// this will trigger the use of the `CreateDirectoryEx` function.
pub fn template<P: AsRef<Path>>(&mut self, path: P) -> &mut Self;
At this time, however, it it not proposed to add this method to
DirBuilder.
Adding FileType
Currently there is no enumeration or newtype representing a list of "file types" on the local filesystem. This is partly done because the need is not so high right now. Some situations, however, imply that it is more efficient to learn the file type at once instead of testing for each individual file type itself.
For example some platforms' DirEntry type can know the FileType without an
extra syscall. If code were to test a DirEntry separately for whether it's a
file or a directory, it may issue more syscalls necessary than if it instead
learned the type and then tested that if it was a file or directory.
The full set of file types, however, is not always known nor portable across platforms, so this RFC proposes the following hierarchy:
#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct FileType(..);
impl FileType {
pub fn is_dir(&self) -> bool;
pub fn is_file(&self) -> bool;
pub fn is_symlink(&self) -> bool;
}
Extension traits can be added in the future for testing for other more flavorful kinds of files on various platforms (such as unix sockets on unix platforms).
Dealing with is_{file,dir} and file_type methods
Currently the fs::Metadata structure exposes stable is_file and is_dir
accessors. The struct will also grow a file_type accessor for this newtype
struct being added. It is proposed that Metadata will retain the
is_{file,dir} convenience methods, but no other "file type testers" will be
added.
Enhancing symlink support
Currently the std::fs module provides a soft_link and read_link function,
but there is no method of doing other symlink related tasks such as:
- Testing whether a file is a symlink
- Reading the metadata of a symlink, not what it points to
The following APIs will be added to std::fs:
/// Returns the metadata of the file pointed to by `p`, and this function,
/// unlike `metadata` will **not** follow symlinks.
pub fn symlink_metadata<P: AsRef<Path>>(p: P) -> io::Result<Metadata>;
Binding realpath
There's a long-standing issue that the unix function realpath is
not bound, and this RFC proposes adding the following API to the fs module:
/// Canonicalizes the given file name to an absolute path with all `..`, `.`,
/// and symlink components resolved.
///
/// On unix this function corresponds to the return value of the `realpath`
/// function, and on Windows this corresponds to the `GetFullPathName` function.
///
/// Note that relative paths given to this function will use the current working
/// directory as a base, and the current working directory is not managed in a
/// thread-local fashion, so this function may need to be synchronized with
/// other calls to `env::change_dir`.
pub fn canonicalize<P: AsRef<Path>>(p: P) -> io::Result<PathBuf>;
Tweaking PathExt
Currently the PathExt trait is unstable, yet it is quite convenient! The main
motivation for its #[unstable] tag is that it is unclear how much
functionality should be on PathExt versus the std::fs module itself.
Currently a small subset of functionality is offered, but it is unclear what the
guiding principle for the contents of this trait are.
This RFC proposes a few guiding principles for this trait:
-
Only read-only operations in
std::fswill be exposed onPathExt. All operations which require modifications to the filesystem will require calling methods throughstd::fsitself. -
Some inspection methods on
Metadatawill be exposed onPathExt, but only those where it logically makes sense forPathto be theselfreceiver. For examplePathExt::lenwill not exist (size of the file), butPathExt::is_dirwill exist.
Concretely, the PathExt trait will be expanded to:
pub trait PathExt {
fn exists(&self) -> bool;
fn is_dir(&self) -> bool;
fn is_file(&self) -> bool;
fn metadata(&self) -> io::Result<Metadata>;
fn symlink_metadata(&self) -> io::Result<Metadata>;
fn canonicalize(&self) -> io::Result<PathBuf>;
fn read_link(&self) -> io::Result<PathBuf>;
fn read_dir(&self) -> io::Result<ReadDir>;
}
impl PathExt for Path { ... }
Expanding DirEntry
Currently the DirEntry API is quite minimalistic, exposing very few of the
underlying attributes. Platforms like Windows actually contain an entire
Metadata inside of a DirEntry, enabling much more efficient walking of
directories in some situations.
The following APIs will be added to DirEntry:
impl DirEntry {
/// This function will return the filesystem metadata for this directory
/// entry. This is equivalent to calling `fs::symlink_metadata` on the
/// path returned.
///
/// On Windows this function will always return `Ok` and will not issue a
/// system call, but on unix this will always issue a call to `stat` to
/// return metadata.
pub fn metadata(&self) -> io::Result<Metadata>;
/// Return what file type this `DirEntry` contains.
///
/// On some platforms this may not require reading the metadata of the
/// underlying file from the filesystem, but on other platforms it may be
/// required to do so.
pub fn file_type(&self) -> io::Result<FileType>;
/// Returns the file name for this directory entry.
pub fn file_name(&self) -> OsString;
}
mod os::unix::fs {
pub trait DirEntryExt {
fn ino(&self) -> raw::ino_t; // read the d_ino field
}
impl DirEntryExt for fs::DirEntry { ... }
}
Drawbacks
-
This is quite a bit of surface area being added to the
std::fsAPI, and it may perhaps be best to scale it back and add it in a more incremental fashion instead of all at once. Most of it, however, is fairly straightforward, so it seems prudent to schedule many of these features for the 1.1 release. -
Exposing raw information such as
libc::statorWIN32_FILE_ATTRIBUTE_DATApossibly can hamstring altering the implementation in the future. At this point, however, it seems unlikely that the exposed pieces of information will be changing much.
Alternatives
-
Instead of exposing accessor methods in
MetadataExton Windows, the rawWIN32_FILE_ATTRIBUTE_DATAcould be returned. We may change, however, to usingBY_HANDLE_FILE_INFORMATIONone day which would make the return value from this function more difficult to implement. -
A
std::os::MetadataExttrait could be added to access truly common information such as modification/access times across all platforms. The return value would likely be au64"something" and would be clearly documented as being a lossy abstraction and also only having a platform-specific meaning. -
The
PathExttrait could perhaps be implemented onDirEntry, but it doesn't necessarily seem appropriate for all the methods and using inherent methods also seems more logical.
Unresolved questions
- What is the ultimate role of crates like
liblibc, and how do we draw the line between them andstd::osdefinitions?