Ocaml files
As a simple example, consider a program that needs to deal with multiple identifiers like usernames and hostnames. If you just represent these as strings, then it becomes easy to confuse one with the other.
A better approach is to mint new abstract types for each identifier, where those types are under the covers just implemented as strings. That way, the type system will prevent you from confusing a username with a hostname, and if you do need to convert, you can do so using explicit conversions to and from the string type. They are there purely as part of the discipline that they enforce on the code through the type system. We also chose to put in an equality function, so you can check if two usernames match.
We could have written this slightly differently, by giving the signature its own top-level module type declaration, making it possible to create multiple distinct types with the same underlying implementation in a lightweight way:. The preceding code has a bug: it compares the username in one session to the host in the other session, when it should be comparing the usernames in both cases.
Because of how we defined our types, however, the compiler will flag this bug for us. This is a trivial example, but confusing different kinds of identifiers is a very real source of bugs, and the approach of minting abstract types for different classes of identifiers is an effective way of avoiding such issues.
Most of the time, you refer to values and types within a module by using the module name as an explicit qualifier. For example, you write List. Sometimes, though, you want to be able to refer to the contents of a module without this explicit qualification. In general, opening a module adds the contents of that module to the environment that the compiler looks at to find the definition of various identifiers.
Opening a module is basically a trade-off between terseness and explicitness—the more modules you open, the fewer module qualifications you need, and the harder it is to look at an identifier and figure out where it comes from. Opening modules at the toplevel of a module should be done quite sparingly, and generally only with modules that have been specifically designed to be opened, like Base or Option. There are two syntaxes for local opens.
For example, you can write:. Rebinding modules to very short names at the top level of your module is usually a mistake. While opening a module affects the environment used to search for identifiers, including a module is a way of adding new identifiers to a module proper.
Consider the following simple module for representing a range of integer values:. We can use the include directive to create a new, extended version of the Interval module:.
Now, how do we write an interface for this new module? It turns out that include works on signatures as well, so we can pull essentially the same trick to write our mli. The only issue is that we need to get our hands on the signature for the Option module. This can be done using module type of , which computes a signature from a module:. Note that the order of declarations in the mli does not need to match the order of declarations in the ml. The order of declarations in the ml mostly matters insofar as it affects which values are shadowed.
If we wanted to replace a function in Option with a new function of the same name, the declaration of that function in the ml would have to come after the include Option declaration. When OCaml compiles a program with an ml and an mli , it will complain if it detects a mismatch between the two. The simplest kind of error is where the type specified in the signature does not match the type in the implementation of the module.
As an example, if we replace the val declaration in counter. We might decide that we want a new function in Counter for pulling out the frequency count of a given string. We could add that to the mli by adding the following line. Type definitions that show up in an mli need to match up with corresponding definitions in the ml.
Consider again the example of the type median. The order of the declaration of variants matters to the OCaml compiler, so the definition of median in the implementation listing those options in a different order:. Order is similarly important to other type declarations, including the order in which record fields are declared and the order of arguments including labeled and optional arguments to a function. If you want to create such definitions, you typically have to mark them specially.
The same is true at the module level. By default, cyclic dependencies between modules are not allowed, and cyclic dependencies among files are never allowed. The simplest example of a forbidden circular reference is a module referring to its own module name.
So, if we tried to add a reference to Counter from within counter. The problem manifests in a different way if we create cyclic references between files. We could create such a situation by adding a reference to Freq from counter. The module system is a key part of how an OCaml program is structured. When designing an mli , one choice that you need to make is whether to expose the concrete definition of your types or leave them abstract.
Most of the time, abstraction is the right choice, for two reasons: it enhances the flexibility of your design, and it makes it possible to enforce invariants on the use of your module. Abstraction enhances flexibility by restricting how users can interact with your types, thus reducing the ways in which users can depend on the details of your implementation. The first is by its file name or path name in the file system hierarchy.
Due to hard links, a file can have many different names. Names are values of type string. For example the system calls unlink , link , symlink and rename all operate at the file name level. The second way of accessing a file is by a file descriptor. Access to a file via its descriptor is independent from the access via its name. In particular whenever we get a file descriptor, the file can be destroyed or renamed but the descriptor still points on the original file. When a program is executed, three descriptors are allocated and tied to the variables stdin , stdout and stderr of the Unix module:.
They correspond, respectively, to the standard input, standard output and standard error of the process. When a program is executed on the command line without any redirections, the three descriptors refer to the terminal. The system calls stat , lstat and fstat return the meta-attributes of a file; that is, information about the node itself rather than its content. Among other things, this information contains the identity of the file, the type of file, the access rights, the time and date of last access and other information.
The system calls stat and lstat take a file name as an argument while fstat takes a previously opened descriptor and returns information about the file it points to. The result of these three calls is a record of type stats whose fields are described in table 1. We can look up them by name in a portable manner with the functions getpwnam and getgrnam or by id with getpwuid and getgrgid. The name of the user of a running process and all the groups to which it belongs can be retrieved with the commands getlogin and getgroups.
The call chown changes the owner second argument and the group third argument of a file first argument. If we have a file descriptor, fchown can be used instead. Only the super user can change this information arbitrarily. They specify special bits and read, write and execution rights for the user owner, the group owner and the other users as vector of bits:. The permissions on a file are the union of all these individual rights, as shown in table 2.
For files, the meaning of read, write and execute permissions is obvious. For a directory, the execute permission means the right to enter it to chdir to it and read permission the right to list its contents. Read permission on a directory is however not needed to read its files or sub-directories but we then need to know their names.
The special bits do not have meaning unless the x bit is set if present without x set, they do not give additional rights. This is why their representation is superimposed on the bit x and the letters S and T are used instead of s and t whenever x is not set.
The bit t allows sub-directories to inherit the permissions of the parent directory. The process also preserves its original identities unless it has super user privileges, in which case setuid and setgid change both its effective and original user and group identities.
The original identity is preserved to allow the process to subsequently recover it as its effective identity without needing further privileges. The system calls getuid and getgid return the original identities and geteuid and getegid return the effective identities. A process also has a file creation mask encoded the same way file permissions are. As its name suggests, the mask specifies prohibitions rights to remove : during file creation a bit set to 1 in the mask is set to 0 in the permissions of the created file.
The mask can be consulted and changed with the system call umask :. Like many system calls that modify system variables, the modifying function returns the old value of the variable.
Thus, to just look up the value we need to call the function twice. Once with an arbitrary value to get the mask and a second time to put it back. For example:. File access permissions can be modified with the system calls chmod and fchmod :. The function raises an error if the access rights are not granted.
Note that the information inferred by access may be more restrictive than the information returned by lstat because a file system may be mounted with restricted rights — for example in read-only mode.
In that case access will deny a write permission on a file whose meta-attributes would allow it. Only the kernel can write in directories when files are created.
Thus opening a directory in write mode is prohibited. In certain versions of Unix a directory may be opened in read only mode and read with read , but other versions prohibit it. However, even if this is possible, it is preferable not to do so because the format of directory entries vary between Unix versions and is often complex.
The following functions allow reading a directory sequentially in a portable manner:. The system call opendir returns a directory descriptor for a directory. The following library function, in Misc , iterates a function f over the entries of the directory dirname. To create a directory or remove an empty directory, we have mkdir and rmdir :. The second argument of mkdir determines the access rights of the new directory.
Note that we can only remove a directory that is already empty. To remove a directory and its contents, it is thus necessary to first recursively empty the contents of the directory and then remove the directory. The Unix command find lists the files of a hierarchy matching certain criteria file name, type and permissions etc. In this section we develop a library function Findlib.
The paths found under the root r include r as a prefix. Each found path p is given to the function action along with the data returned by Unix. The function action returns a boolean indicating, for directories, whether the search should continue for its contents true or not false.
Whenever an error occurs the arguments of the exception are given to the handler function and the traversal continues. However when an exception is raised by the functions action or handler themselves, we immediately stop the traversal and let it propagate to the caller.
A directory is identified by the id pair line 12 made of its device and inode number. The list visiting keeps track of the directories that have already been visited. In fact this information is only needed if symbolic links are followed line It is now easy to program the find command.
The essential part of the code parses the command line arguments with the Arg module. Although our find command is quite limited, the library function FindLib. Use the function FindLib. The function getcwd is not a system call but is defined in the Unix module. First describe the principle of your algorithm with words and then implement it you should avoid repeating the same system call.
The openfile function allows us to obtain a descriptor for a file of a given name the corresponding system call is open , however open is a keyword in OCaml. The first argument is the name of the file to open. These flags determine whether read or write calls can be done on the descriptor.
The call openfile fails if a process requests an open in write resp. Most programs use 0o for the third argument to openfile. This means rw-rw-rw- in symbolic notation.
With the default creation mask of 0o , the file is thus created with the permissions rw-r--r With a more lenient mask of 0o , the file is created with the permissions rw-rw-r If the file will contain executable code e.
If the file must be confidential e. The last group of flags specifies how to synchronize read and write operations. By default these operations are not synchronized. The system calls read and write read and write bytes in a file.
The first argument is the file descriptor to act on. The third argument is the position in the string of the first byte to be written or read. The fourth argument is the number of the bytes to be read or written. After the system call, the current position is advanced by the number of bytes read or written.
For writes, the number of bytes actually written is usually the number of bytes requested. However there are exceptions: i if it is not possible to write the bytes e. The reason for iii is that internally OCaml uses auxiliary buffers whose size is bounded by a maximal value.
If this value is exceeded the write will be partial. To work around this problem OCaml also provides the function write which iterates the writes until all the data is written or an error occurs.
For reads, it is possible that the number bytes actually read is smaller than the number of requested bytes. For example when the end of file is near, that is when the number of bytes between the current position and the end of file is less than the number of requested bytes. In particular, when the current position is at the end of file, read returns zero. For example, read on a terminal returns zero if we issue a ctrl-D on the input.
Another example is when we read from a terminal. In that case, read blocks until an entire line is available. If the line length is smaller than the requested bytes read returns immediately with the line without waiting for more data to reach the number of requested bytes.
This is the default behavior for terminals, but it can be changed to read character-by-character instead of line-by-line, see section 2. The following expression reads at most characters from standard input and returns them as a string. Official documentation for the modules of interest: the core library including the initially opened module Stdlib , Printf.
The normal way of opening a file in OCaml returns a channel. There are two kinds of channels:. Whenever you write or read something to or from a channel, the current position changes to the next character after what you just wrote or read. Occasionally, you may want to skip to a particular position in the file, or restart reading from the beginning.
File manipulation This is a guide to basic file manipulation in OCaml using only the standard library.
0コメント