The Linux VFS Model

Naming structure

Abstract structure of the Linux Virtual Filesystem (VFS) is presented. Only the top-level part – the naming structure – is described, providing detailed pathname resolution semantics. We formalize the hardlink structure, the mount structure and describe derived structures used for pathname resolution. We proceed by stepwise refinement – we establish a system of (classes of) structures. This system contains a main development line in which each structure is an extension of the previous one.

This way we partially provide an abstract specification of the Linux Virtual Filesystem. In fact, this document can be considered to be an abstract specification of the Linux VFS in itself, with only few specification steps (the most abstract ones) performed.

Author
Ondřej Pavlata
Jablonec nad Nisou
Czech Republic
Document date
Initial releaseJune 2, 2011
Last major release June 2, 2011
Last updateJune 2, 2011
(see Document history)
Warning
  1. This document has been created without any prepublication review except those made by the author himself.
  2. The author has no experience with the Linux operating system except for simple tests.
Linux version
This document refers to Linux 2.6.38. The following configuration has been used for testing:
DistributionUbuntu 11.04 natty
Desktop evironmentGNOME 2.32.1 (Ubuntu 2011-04-14)
KernelLinux 2.6.38-8-generic
Platformi686
GNU bashversion 4.2.8(1)-release (i686-pc-linux-gnu)

Table of contents

Introduction

VFS versus RTFS
As of time of writing this document, the acronym RTFS has two meanings (known to the author) in the context of file systems: Gray color indicates that we are concerned with the second meaning here. We start with the following premise:
User-observable semantics of the Virtual Filesystem should be described outside the RTFS country.
The main objective of this document is to contribute to the following shift in proportion relating the VFS semantics (the non-RTFS country is referred to as Abstract specification):
Abstract specification

RTFS country

    →    

Abstract specification

RTFS country

Preliminaries

Some familiarity with elementary algebra and order theory is assumed. This involves such notions as structure, substructure, sort, generating set/structure, (partial) function, relation, domain, range, restriction, extension, injectivity, reflexivity, transitivity, monotonicity, isomorphism, (order) chain, closure operator.
General conventions
Our notation mostly follows set-theoretic conventions as shown in the following table.
Notation Description
X × Y The Cartesian product of X and Y.
UU.x The universal class – the set-theoretic class of all sets.
Id The identity class – the set-theoretic class of all pairs (x,x).
Dom(R) The (definition) domain of a relation R – the least X, w.r.t. , such that R X × UU.
Rng(R) The range of a relation R – the least Y such that R UU × Y.
R X The (domain) restriction of the relation R to X,   R X = R (X × UU).
R S The composition of relations. Inner triangle indicates the left-to-right direction (first R, then S). For functions with left-dot notation, we write .r.s instead of .r .s.
R−1 The inverse (opposite) of the relation R.
X Y The set (class) of partial functions from X to Y. We write .f : X Y instead of .f X Y.
X The set (class) of all partial functions on X, shorthand for X X.
X.r Usually, the image of X under .r, X.r = { x.r | x X }.
R The reflexive transitive closure of the relation R (with the reflexive closure depending on the context).
R S The binary equi-join or relations R and S.   R S = { (x,y,z) | (x,y) R, (x,z) S }. For functions .r, .s,   .r .s is the map x (x.r, x.s).
X Y X Y = Z   iff   X Y = Z and X Y = .
Mixture of partial functions
We introduce mixture operators as follows (f, g are relations).
Notation Definition Terminology Correspondents
MySQL Ruby
f g (f (Dom(f) Dom(g))) g patch-mixture of g into f REPLACE Hash#update
f g g f INSERT IGNORE
f g f g = h   iff   f g = h
and Dom(f) Dom(g) =
def-mixture of f and g INSERT
Obviously, if f and g are partial functions, then so is f g (resp. f g if defined).
Ruby conventions
We sometimes use conventions from the Ruby programming language [], in particular, in the following cases:
Lists
By a list we mean a function whose domain, the index set, is a down-set of (, ), i.e. either the set of all natural numbers, or a set {0, …, n-1} for some natural n. A list I X is said to be a list over X. The following table shows notation used for lists over a set X.
Notation Description
X finite lists over X
X+ non-empty finite lists over X
X infinite lists over X
[a,b,c] the list {(0,a),(1,b),(2,c)}
let x, y X, A, B X
x[i] the function application x(i)
x.length number of members of x
x.last x.last[x.length-1]
x + y concatenation of lists x and y
A + B { a + b | a A, b B }
x + B [x] + B
x.shift x without the first member
x.pop x without the last member
x.uniq x without duplicate values
x.select{block} the selection from x according to block
Monounary algebra
A monounary algebra is a structure (X, .p¯) such that We introduce the following notation and terminology:
Function Terminology Definition / description
.p¯ : X X x.p¯ is the
direct successor
parent-or-root
of x
x.p¯ = x.p if x.p is defined,
x.p¯ = x otherwise.
.p : X X x.p is the
strict direct successor
parent
of x
.p is the restriction of .p¯ to non-fixed points:
x.p = x.p¯ if x.p¯ ≠ x,
x.p is undefined, otherwise.
.p¯() : X × X x.p¯(i) is the i-th successor of x x.p¯(i) is the i-th application of .p¯ to x:
x.p¯(0) = x,
x.p¯(i) = x.p¯(i-1).p¯   for i > 0.
.p() : X × X x.p(i) is the
strict i-th successor
i-th ancestor
of x
.p() is a restriction of .p¯():
x.p(i) = x.p¯(i) if i = 0 or x.p¯(i-1) ≠ x.p¯(i),
x.p(i) is undefined, otherwise.
.p¯s¯ : X X x.p¯s¯ is the successor list of x x.p¯s¯[i] = x.p¯(i)
.ps : X X X x.ps is the
strict successor list
ancestor list
of x
x.ps[i] = x.p(i)
.r : X X x.r is the
endpoint
root
of x
x.r = x.ps.last if x.ps is finite,
x.r is undefined, otherwise.
Algebraic forest
By an algebraic forest (or just forest) we mean a structure which has one of the following equivalent forms:
  1. (1) A partial order whose every principal up-set is a finite chain:
    A partial order (X, ) such that for every x X, the set x.ps = { y X | x y } is finite and totally ordered by .
  2. (2) A monounary algebra whose every non-empty subalgebra has a fixed point:
    An algebra (X, .p¯) with just one unary operator, .p¯ : X X, such that for every x X, x.p¯(i) = x.p¯(i+1) for some natural i.
  3. (3) A partial monounary algebra without proper (non-empty) non-partial subalgebras:
    A partial algebra (X, .p) with just one partial unary operator, the parent partial function .p: X X, such that for every x X, x.p(i) is defined for only finitely many i.
Because the set x.ps from (1) is a finite chain in , we can regard it as a finite list. Then the correspondence (1)↔(2)↔(3) is established as follows:
(2)←(1)
  • x.p¯(i) = x.ps[i] if i x.ps.length,
  • x.p¯(i) = x.ps.last otherwise,
(1)←(2)
  • x y iff x.p¯(i) = y for some i,   i.e.
i.e. .p¯ is the smallest function on X (w.r.t. ) satisfying (), (X,) is the reflexive transitive closure of (X, .p¯), ()


(3)←(1)
  • x.p(i) = x.ps[i] if i x.ps.length - 1,
  • x.p(i) is undefined otherwise.
(1)←(3)
  • x y iff x.p(i) = y for some i,   i.e.
i.e. .p is the smallest relation on X satisfying (), the
reflexive transitive reduction of (X,) (a.k.a Hasse diagram),
(X,) is the reflexive transitive closure of (X, .p) ().

The root map .r: X X is defined by x.r = x.ps.last. Obviously, .r is a closure operator w.r.t. (X, ). An element x is a root if it satisfies any of the following equivalent conditions:
  1. x is .r-closed,
  2. x is a fixed point w.r.t. .p¯,
  3. x has undefined parent x.p.

Notes:

  1. x.p¯(i) (resp. x.p(i)) means the i-th application of .p¯ (resp. .p) to x.
  2. The form (3) shows that an algebraic forest can be viewed as a special case of a directed acyclic graph (DAG) which in turn is a special case of a digraph (a directed graph). In this context the term algebraic forest is equivalent to that of a rooted forest [].
Algebraic tree
By an algebraic tree we mean an algebraic forest with exactly one root. As an algebra, it is a structure (X, .p¯, r), such that
  1. (X, .p¯) is an algebraic forest,
  2. r is the only fixed point of .p¯.
Pointed forest
By a pointed algebraic forest we mean an algebraic forest equipped with an artificial root. As an algebra, it is a structure (X, .p¯, r), such that
  1. (X, .p¯) is an algebraic forest,
  2. r is an arbitrary element of X.
Component-wise finite linear order
For a partial order (X, ), the following are equivalent: For such a case we introduce bidirectional terminology and notation according to the following table.
First Previous Next Last
.end(−) root in (X, ≥) .nei(−) parent in (X, ≥) .nei(+) parent in (X, ) .end(+) root in (X, )
Ordered tree
By a finite ordered tree we mean a structure which has one of the following equivalent forms:
  1. (1) A finite algebraic tree together with sibling ordering: A structure (X, ≤V, ≤S) such that
    1. (X,≤V) is a finite algebraic tree.
    2. (X,≤S) is a component-wise linear order.
    3. .rS.pV = .pV, i.e. components of S are sibling-sets of V.
  2. (2) A finite algebraic tree together with a complementary partial order: A structure (X, ≤V, ≤H) such that
    1. (X,≤V) is a finite algebraic tree.
    2. (X,≤H) is a partial order.
    3. V and H are complementary, i.e. (<V) (>V) (<H) (>H) = X × X Id.
  3. (3+) A finite algebraic tree together with a linear extension: A structure (X, ≤V, ≤L+) such that
    1. (X,≤V) is a finite algebraic tree.
    2. (X,≤L+) is a linear order.
    3. (≤V) (≤L+), i.e. L+ is an extension of V.
  4. (3−) A finite algebraic tree together with a linear extension of its opposite: A structure (X, ≤V, ≤L) such that
    1. (X,≤V) is a finite algebraic tree.
    2. (X,≤L) is a linear order.
    3. (≥V) (≤L), i.e. L is an extension of the opposite (inverse) of V.
  5. (4) A finite parent-to-children-list map: A structure (X, .qs) such that
    1. X is a finite non-empty set.
    2. .qs is function X X.uniq, i.e. .qs maps X to non-repetitive lists over X.
    3. The set { (x,y) | x y.qs } is the parent partial function of an algebraic tree on X.
The correspondence (1)–(4) is obtained as follows.
Ordered forest
By a finite ordered forest we mean a finite disjoint union of finite ordered trees. In form (3+), it is a structure (X, ≤V, ≤L+) such that
  1. (X,≤V) is a finite algebraic forest.
  2. (X,≤L+) is a component-wise linear order.
  3. (X,≤V) and (X,≤L+) have equal components.
  4. (≤V) (≤L+).
Tree navigation
Operators for ordered tree (or forest) navigation can be sumarized as follows.
Order Operators
First Previous Next / Parent Last / Root
V Vertical order relevant: .qs .pV parent .rV root
S Sibling order .endS(−) first sibling .neiS(−) previous sibling .neiS(+) next sibling .endS(+) last sibling
L+ Linear order +: children-before-parents .endL+(−) leftmost .neiL+(−) nested predecessor .neiL+(+) nested successor .endL+(+) rightmost (equal to .rV)
L Linear order −: parents-before-children .endL(−) leftmost (equal to .rV) .neiL(−) nested predecessor .neiL(+) nested successor .endL(+) rightmost
H Horizontal order
Labelled transition system
A labelled transition system is a structure (S, Λ,
)
where Notation x
α
y
is used for (x, α, y) (
)
. A labelled transition system is called deterministic if
is a partial function on S × Λ.
Multidigraph
A multidigraph is a structure (N, A, .s, .t) where
Naming multidigraph
By a naming multidigraph we mean a structure (N, A, Σ, .σ, .s, .t) where The structure is subject to the following naming consistency condition:
(NC) .s is injective, i.e. different arrows with equal source have different names.

 

Lookup system

By a lookup system we mean a structure (X, Σ, .ℓ()) where

Proposition: The following structures are equivalent:

  1. A lookup system.
  2. A deterministic labelled transition system.
  3. A naming multidigraph, up to identification of arrows.  
Using the multidigraph source-target terminology and assuming finiteness of .ℓ(), a lookup system can be interpreted as a relational database table of the following form:
Domains →X Σ X
source name target
   
   
   
Underline in source and name indicates a primary key.
Pathnames
Elements of Σ (the set of all finite, possibly empty, sequences of names) are called pathnames or paths. Given a lookup system (X, Σ, .ℓ()), the induced Σ-pathname-lookup .ℓ() : X × Σ X is naturally defined by

Proposition: For a partial function .f() : X × Σ X the following are equivalent:

  1. (a) The map p .f(p) is a monoid homomorphism between , +, [ ]) and (X, , Id X), i.e.
  2. (b) .f() is a Σ-pathname lookup, i.e. .f() = .ℓ() for some .ℓ() : X × Σ X.
    (Necessarily, .ℓ() is given by .ℓ(α) = .f([α]).)  
Again, we can interpret the induced pathname lookup as a relational database table, this time better considered as a virtual, possibly infinite, database view.
Domains →X Σ X
source pathname target
   
   
   
Pathname resolution system
By a pathname resolution system we mean a structure (X, Σ, .ϖ()) where

Proposition:

  1. The domain of .ϖ() is pathname-prefix closed, i.e. for every (x,p) X × Σ,
  2. Let .ℓ() : X × Σ X be a lookup defined by x.ℓ(α) = x.ϖ([α]). Then .ϖ() is a restriction of .ℓ().

Corollary:

  1. A pathname resolution system can be viewed as a lookup system (X, Σ, .ℓ()) equipped with a restriction .ϖ() of the induced pathname lookup .ℓ() such that the domain of .ϖ()
  2. A lookup system can be viewed as a pathname resolution system such that .ℓ() = .ϖ().
Reachability
Given a pathname resolution system (a lookup system) we introduce reachability relations as subsets of X × X according to the following table (we let P Σ, Τ Σ):
Notation Definition Terminology
.ℓ(P) { (x,y) | x.ℓ(p) = y for some pathname p P } reachability (by .ℓ()) over pathnames from P
.ℓ .ℓ) reachability (by .ℓ())
.ϖ(P) as above, with replaced by ϖ.
.ϖ
.ℓ(Τ) { (x,y) | x.ℓ(α) = y for some name α Τ } single step reachability (succession) over names from Τ
.ℓ .ℓ(Σ) single step reachability / succession
If x.ℓ y then y is said to be reachable from x.
(Double-)pointed lookup system
By a double-pointed lookup system we mean a structure (X, Σ, .ℓ(), ‹/›, r, d) where The structure is subject to the following condition: By a pointed lookup system we mean a double-pointed lookup system with r = d.

The (global) resolution target partial function .rt : Σ X maps pathnames to X as follows:

Naming Forest

Forest Naming Domain
By a forest naming domain we mean a set Σ of names such that
Σ = Σdown Σspec
ΣΌ = Σdown Σ
where
Naming forest
By a naming forest we mean a structure (A, , Σ, .σ) where For x A, the down-name x.σ, if defined, is said to be the name of x. The structure is subject to the following conditions: The induced down-lookup operator .ℓdown() : A × Σdown A is defined as the inverse to .p , i.e.
x.ℓdown(α) = y   iff   y.p = x and y.σ = α.

Proposition: Given the forest naming domain Σ, the following structures are equivalent:

  1. A naming forest (A, , Σ, .σ).
  2. A lookup system (A, Σdown, .ℓdown()) such that
  By a pointed naming forest we mean a naming forest equipped with an artificial root r A. By a naming tree we mean a pointed naming forest such that {r} = A.r.

The default lookup operator .ℓ() : A × Σ A is defined as an extension of .ℓdown() according to the following table:

Name
α
Semantics
Terminology .ℓ(α)
a down-name α-named child .ℓdown(α)
‹p› parent.p
‹p¯ parent-or-root.p¯
‹r›component root.r
‹·› selfId A
‹··› dotdot-parent
.p¯ {(r,r)} if r is defined,
.p¯otherwise.
‹/› current root
A × {r} if r is defined,
otherwise.
Pathnames
The set Σ of pathnames over a forest naming domain Σ has a (special) nomenclature according to the following table.
Terminology for
a pathname p P
Pathname set P
expression / description
Examples Unix-style correspondents
ordinary
(proper)
absolute [‹/›] + (Σdown {‹··›, ‹·›}) [‹/›, "garden", ‹··›, "beds"] "/garden/../beds"
[‹/›] "/"
  canonic [‹/›] + Σdown [‹/›, "garden", "beds"] "/garden/beds"
[‹/›] "/"
relative [‹/›] +down {‹··›, ‹·›})+ ["garden", ‹·›, "beds"] "garden/./beds"
extraordinary pathnames p that are not ordinary:
  • (a) the empty pathname,
  • (b) p with a non-leading root-name ‹/›,
  • (c) p with a special name other than ‹/›, ‹··› or ‹·›.
 
(a) [ ]
(b) ["garden", ‹/›]
(c) [‹f›, "garden"]
We also introduce notation Ϲ, Ά and Ό for the sets of all canonical, absolute and ordinary pathnames, respectively. This leads to the following inclusion chain:
canonical absolute ordinary all pathnames
Ϲ     Ά     Ό     Σ
Note that the set of all ordinary pathnames can be expressed as Ό = {[ ], [‹/›]} + (Σdown {‹··›, ‹·›}) {[ ]}.
Pathname reduction
We introduce pathname reduction operators Σ Σ as follows:
Notation Terminology Definition (reduction rules)
.reduct(‹··›)
dot-dot
pop-dot
reduction
  • (p + [‹/›, ‹··›] + q).reduct(‹··›) = p + [‹/›] + q,
  • (p + [α, ‹··›] + q).reduct(‹··›) = p + q    if α Σdown,
  • s.reduct(‹··›) = s if the pathname s does not have a sub-pathname of the form [‹/›, ‹··›] or [α, ‹··›], α Σdown.
(I.e. the following rules apply: [‹/›, ‹··›] [‹/›] and [α, ‹··›] [ ], α Σdown.)
.reduct(‹··›) Let α = ‹··›. Then
  • p.reduct(α) = p.reduct(α) + [‹·›]    if p.last = α and either
    • p.reduct(α) = [ ] or
    • p.reduct(α).last Σdown,
  • p.reduct(α) = p.reduct(α)    otherwise.
.reduct(‹·›) dot reduction
  • (p + [‹·›] + q).reduct(‹·›) = p + q,
  • s.reduct(‹·›) = s    if ‹·› s.
(All dot names ‹·› are removed.)
.reduct(‹·›) The same rules apply as for .reduct(‹··›) but with α = ‹·›.
(Any non-final dot name ‹·› is removed. The final ‹·›, if any, is only removed if preceded by ‹/› or ‹··›.)
.reduct(‹/›) root reduction
  • (p + [‹/›] + q).reduct(‹/›) = [‹/›] + q,
  • s.reduct(‹/›) = s     if ‹/› s.shift.
(All names preceding the last ‹/› are removed.)
.reduct reduction
  • .reduct = .reduct(‹/›).reduct(‹··›).reduct(‹·›).
.reduct
  • .reduct = .reduct(‹/›).reduct(‹··›).reduct(‹·›).

Notes:

  1. The symbol means care of trailing dot. The -variants of reduction preserve pathname characteristics of ending with a down-name, see proposition (B). This is important for enforcing directory/non-directory (leaf/non-leaf) pathname resolution logic.
  2. Pathname reduction .reduct(‹·›).reduct(‹··›) is used in URI resolution []. The following example shows the impact of the feature.
    URI URI (after reduction)
    http://tools.ietf.org/pdf/rfc2606.pdf/some/.. http://tools.ietf.org/pdf/rfc2606.pdf/
    http://tools.ietf.org/pdf/some/../rfc2606.pdf http://tools.ietf.org/pdf/rfc2606.pdf

Proposition:

Canonical pathname tree
Given the forest naming domain Σ, the (free) canonical pathname tree is the structure C = (Ϲ, .p, Σ, .σ, [‹/›]) where Obviously, C is a naming tree with root [‹/›]. The corresponding lookup .ℓϹ() : Ϲ × Σ Ϲ is defined as follows:
Name α Value p.ℓϹ(α)
a down-name p + [α]
‹·› p
‹··›
[‹/›]if p = [‹/›],
p.popotherwise
‹/› [‹/›]

Proposition: For every canonical pathname p and every ordinary pathname q,

Induced pathname tree
Given a pointed lookup system (X, Σ, .ℓ(), r) where with the induced pathname resolution map .rt : Σ X, we denote

Proposition:

  1. Ͼ.p Ͼ,
  2. (Ͼ, .p, Σ, .σ, [‹/›]) is a naming tree – we call it the induced pathname tree,
  3. .ℓϾ() is the corresponding lookup.  
Dot-dot handling modes

Proposition: Let (X, Σ, .ℓ(), r) be a pointed lookup system with a forest naming domain Σ such that

Let p be an ordinary pathname and consider the 3 operators .rt.ℓ(p), .ℓϹ(p).rt, .ℓϾ(p).rt : Ϲ X.
  1. If ‹··› p then the operators are equal.
  2. If ‹··› p then the operators are pairwise different in general.
  We might say that the resolution Ϲ × Ό X depends on dot-dot handling mode. This is summarized in the following table.
Operator Ϲ × Ό X Mode name Application example
The cd utility
.rt.ℓ()physical [] or [] with the -P switch
.ℓϹ().rt logicalfree [] with the -L switch URI resolution []
.ℓϾ().rt restricted [] with the -L switch

Note: For differences between [] and [], see step 8b.

Filesystem forest

Filesystem forest
By a filesystem forest we mean a structure (N, A, , .t) where The structure is subject to the following conditions:
  1. (1) (A, ) is an algebraic forest.
  2. (2) .t is injective on A.p, i.e. only leaf arrows a ≠ b can have equal target.
Filesystem forest as a multidigraph
Similarly to algebraic forests, a filesystem forest can be considered of having one of more equivalent forms:
(1a) (N, A, , .t) – the original form,
(1b) (N, A, .p¯, .t) with .p¯ : A A,
(1c) (N, A, .p, .t) with .p : A A,
(2b) (N, A, .s¯, .t) – the multidigraph (quiver) form, with .s¯ : A N,
(2c) (N, A, .s, .t) – the partial multidigraph form, with .s : A N.
The correspondence (1)–(2) is given by
.p¯ = .s¯.t−1     .s¯ = .p¯.t
.p = .s.t−1 .s = .p.t

 

S1: Filesystem forest

By an S1 structure we mean a filesystem forest N = (N, A, , .t) such that the following is satisfied:
(S1~1) Components of (A, ) are finite.
(S1~2) Components of (A, ) have disjoint .t-images (i.e. .t−1.r is a partial function N A.r).
Arrows from A are also called hardlinks. Condition (S1~2) means that cross-tree hardlinks are not allowed.

The following table shows basic correspondence of the S1 structure to its in-memory representation in the Linux kernel.

In-memory representation
DomainFunctionType (structure)Reference
N inode
.t−1.r .i_sb->s_root
A dentry
.t .d_inode
.p¯ .d_parent
.r .d_sb->s_root

S2: Mounted filesystem forest

By an S2 structure we mean an S1 structure equipped with a mount structure M = (M, , ≤mpl, .s¯, .t) where The structure is subject to the following conditions:
(S2~1) (M, ) is an algebraic forest.
(S2~2) (x.p,x.s) = (y.p,y.s) iff x.rmpl = y.rmpl for every mounts x, y (i.e. mounts have equal mountpoint source iff they are comparable in mpl).
(S2~3) M.t A.r, i.e. mount targets are roots in (A, ).
(S2~4) m.s¯.r = m.p¯.t for every mount m  (i.e. m's source belongs to m parent's target tree).
(S2~5) m.s¯.t = a.t = b.t implies a = b for every mount m and arrows a, b (i.e. (targets of) mount sources cannot be multiply hardlinked).
By .s : M A we mean the restriction of .s¯ to non-root mounts.

The following table shows basic correspondence of the mount structure to the Linux kernel source code.

In-memory representation
DomainFunctionType (structure)Reference
M vfsmount
.s¯ .mnt_mountpoint
.t .mnt_root
.p¯ .mnt_parent
.r .mnt_ns->root
.neimpl(−) .mnt_hash.prev (*)
.neimpl(+) .mnt_hash.next (*)

(*) Notes:

  1. Application of the list_entry macro is required: x.neimpl(−) corresponds to list_entry(y, struct vfsmount, mnt_hash) iff y equals x.mnt_hash.prev and is not the head of x.mnt_hash.
  2. Mountpoint lists (i.e. components of (M, ≤mpl) do not have direct one-to-one correspondence to .mnt_hash lists in general – a hash list can be merged from several mountpoint lists. The direct corresponence is established by a mountpoint equality check in __lookup_mnt.
Derived arrows
We denote the set of derived arrows (or namespace points). Informally, a derived arrow is a hardlink in the context of one of the (possibly many) mounts that bind the filesystem tree which the hardlink belongs to.

Proposition: Ѧ is finite.  

We use the (m,a) signature for derived arrows, so that .m and .a are the natural projections, e.g. (m,a).a = a for (m,a) Ѧ.

Derived arrows Ѧ have the following form in the Linux source:

struct path {
        struct vfsmount *mnt;
        struct dentry *dentry;
};
Derived nodes
For the sake of completeness we also define the set of derived nodes. Derived arrows are mapped to derived nodes by the target function .t : Ѧ N defined by (m,a).t = (m,a.t).
Mountpoints
The mountpoint list function .mpl : Ѧ Ѧ is defined by Derived arrows with non-empty mountpoint list are called mountpoints.
Mount-front and mount-back
The mount-front-predecessor partial function .mf : Ѧ Ѧ is defined by

Note: The mpl order just provides deterministic selection from the set X.

The mount-front function .f : Ѧ Ѧ is defined recursively by

The mount-back-parent partial function .mb : Ѧ Ѧ is defined by The mount-back function .b : Ѧ Ѧ is defined recursively by

Proposition:

  1. (i) .f and .b are root maps of forests (Ѧ, .mf) and (Ѧ, .mb), respectively. In addition, .mf is injective.
  2. (ii) .f.b = .b.
  3. (iii) .b.p.a = .b.a.p.
  4. (iv) A derived arrow x is a mountpoint iff x Ѧ.f.
 
The fine forest
The parent function .p¯ : Ѧ Ѧ is defined by We denote (Ѧ, ) the reflexive transitive closure of (Ѧ, p¯) and .p the restriction of .p¯ to non-fixed points.

Proposition:

  1. (i) (Ѧ, ) is an algebraic forest.
  2. (ii) .f is idempotent and decreasing, .b is idempotent, increasing and monotone (a closure operator) in (Ѧ, ). For every x,y Ѧ,
  3. (iii) (N, Ѧ, , .t) is a filesystem forest.
  4. (iv) (Ѧ, .mf−1) (Ѧ, .mb) (Ѧ, .p)   (forest inclusion).

Proof: (i) Follows from the fact that (Ѧ, .p) is a weak substructure of the lexicographic product of (M, .p) and (A, .p), respectively.  

The up-front forest
The up-front parent partial function .pf : Ѧ Ѧ is defined by
.pf = .b.p.f.
We denote (Ѧ, ≤f) the reflexive transitive closure of (Ѧ, pf).

Proposition:

  1. (i) x.b < x.pf.b whenever x.pf is defined.
  2. (ii) (Ѧ, ≤f) is an algebraic forest.
  3. (iii) .b is strictly monotone w.r.t. (Ѧ, ≤f) and (Ѧ, ), i.e. x <f y implies x.b < y.b.
  4. (iv) (N, Ѧ, ≤f, .t) is a filesystem forest.

Proof: (i) (x.pf).b = (x.b.p.f).b = x.b.p.(f.b) = x.b.p.b > x.b.p > x.b.  

The down-front forest
The down-front parent partial function .pt : Ѧ Ѧ is defined by
.pt   =   .b.p Ѧ.b.f,
i.e. x.pt = y  iff  x = x.b.f and y = x.b.p. We denote (Ѧ, ≤t) the reflexive transitive closure of (Ѧ, pt).

Proposition: Same statements apply as for the up-front forest, with f replaced by t.  

S3: Naming

By an S3 structure we mean an S2 structure equipped with a naming structure (Σ, .σ) assigning each non-root hardlink a name that is distinct within sibling hardlinks, i.e.
(S3~1) (A, , Σ, .σ) is a naming forest.
This in particular defines the lookup function .ℓ() : A × Σ A for the hardlink forest (A, ).
The fine lookup
The fine lookup operator .ℓ() : Ѧ × Σ Ѧ defines navigation in the derived forest (Ѧ, ) according to the following table.
α x.ℓ(α)
where x = (m,a)
Correspondent function
from the Linux kernel
a down-name (m, a.ℓ(α)) __d_lookup
‹r›x.r
‹p›x.p
‹p¯x.p¯ follow_dotdot without
  • equality check of nd->path and nd->root,
  • follow_mount
‹mfx.mf lookup_mnt / follow_down_one
‹f›x.f follow_mount / follow_down
‹mbx.mb follow_up
‹b›x.b iteration of follow_up in follow_dotdot

Note: There is a shift in function naming between Linux versions 2.6.37 and 2.6.38.

Down-front naming
We define partial functions t : Ѧ Σdown and .ℓt() : Ѧ × Σdown Ѧ by (Recall that .a is the projection (m,a) a).

Proposition: (Ѧ, ≤t, Σ, .σt) is a naming forest with .ℓt() the correspondent down-lookup.  

The .ℓ3 lookup
The partial function .ℓ3() : Ѧ × Σ Ѧ is defined as follows
α .ℓ3(α)
a down-name .ℓ(α).f (= .ℓt(α))
‹r› .rf
‹p¯ .p¯f
Informally, up-steps are performed in the up-front forest, down-steps are performed in the down-front forest.

S4: Current root

By an S4 structure we mean an S3 structure equipped with (r) where We have the following correspondence:
Our notation Linux kernel
r current->fs->root
The .ℓ4 lookup
The lookup operator .ℓ4() : Ѧ × Σ Ѧ is defined according to the following table:
α x.ℓ4(α) Linux kernel function
a down-name x.ℓ(α).f do_lookup
‹r›x.r.f
‹/›r
‹··›
r.fif x r x.b,
x.b.p¯.fotherwise
follow_dotdot

Proposition:

Ancestor pathname
The function .apn : Ѧ Σ+ is defined according to the (first two columns of the) following table.
x.apn Condition String representation of
x.apn[0] in Linux VFS
[‹/›] x.f = x.r4 = r.f x r "/"
[‹/›] + x.ps4.pop.reverse.map{ |x| x.b.a.σ } x.f ≠ x.r4 = r.f
[‹r›] x.f = x.r4r.f x r "(unreachable)/"
[‹r›] + x.ps4.pop.reverse.map{ |x| x.b.a.σ } x.f ≠ x.r4r.f
The pathname x.apn is said to be the ancestor pathname of x.

Proposition: In general, .apn is NOT injective.  

The canonical tree Ѧ
Going from the pointed lookup system (Ѧ, Σ, .ℓ4(), r), we denote .rt4 : Σ Ѧ the induced pathname resolution map and Ͼ4 the set of all resolvable canonical paths.

We denote Ѧ = r.ℓ4down), the canonical tree.

Proposition: The maps Ͼ4

.rt4
Ѧ and Ѧ
.apn
Ͼ4
are inverse to each other.  
The reachable (double-)tree Ѧ
We denote Ѧ = Ѧ r.ℓ4([‹··›] + Σdown), the (root-)reachable (double)-tree.

Proposition:

  1. Ѧ = r.ℓ4(Ό) i.e. Ѧ is the set of all derived arrows that are reachable from the current root r by an .ℓ4()-lookup over ordinary pathnames.
  2. , ≤f) = (Ѧ, ≤t), i.e. restrictions of the up-front forest and the down-front forest to Ѧ are equal.
  3. The set of roots of the forest , ≤f) equals { r, r.f }. The forest , ≤f) is a tree   if r is not a mountpoint,   otherwise it has two components:
  4. The restriction of .ℓ4() to Ѧ × Σdown is a down-lookup of , ≤f).
  5. The structure , .p¯4, r.f) is an algebraic tree.
  6. Ѧ {r} Ѧ.f, i.e. except possibly for the current root, an element of Ѧ is never a mountpoint.

    Note: This is in contrast to the common terminology []: If an absolute path p is said to refer to a mountpoint via pathname resolution .ϖ() then it is meant that the mount-back r.ϖ(p).b is a mountpoint, not just r.ϖ(p).

 

Example: The following diagram shows a forest , ≤f) having both the canonical / standard tree (left, blue) and the front tree (right, green).

  / (r)
bin boot usr var
  /.. (r.f )
hello   uproot
The front tree can be created by the following shell commands (root privileges are assumed).
root@paon:~# dir / bin boot usr var The standard root directory content (abbreviated)
root@paon:~# dir /.. bin boot usr var Initially, / and /.. point to the same directory.
root@paon:~# mount -t ramfs none / Mount a ramfs to the root /.
root@paon:~# dir / bin boot usr var The pathname / still points to the same directory as before.
root@paon:~# dir /.. The pathname /.. now points to a different directory – an empty directory.
root@paon:~# mkdir /../hello root@paon:~# mkdir /../uproot root@paon:~# dir /.. hello uproot The front-root directory pointed to by /.. now contains just 2 newly created directories.
cd-navigation between Ѧ and Ѧ Ѧ
As of GNU bash version 4.2.8, the switches -P and -L play role for the cd(1)-navigation to/from the front tree. This is shown in the following example (note that symbolic links are NOT involved):
root@paon:/# mount -t ramfs none / Mount a ramfs to the root /.
root@paon:/# mkdirhier /tmp/test/standard-tree root@paon:/# mkdirhier /../tmp/test/front-tree root@paon:/# mkdirhier /../hello/front-tree Create a simple testing directory hierarchy.
root@paon:/# dir /../tmp/test front-tree /../tmp/test points inside the front tree
root@paon:/# dir /tmp/test standard-tree /tmp/test points inside the standard tree
root@paon:/# cd /../tmp/test root@paon:/tmp/test# dir standard-tree cd /../tmp/test without the -P switch navigates to the standard tree.
root@paon:/tmp/test# cd -P /../tmp/test root@paon:/tmp/test# dir front-tree Using the -P switch navigates to the front tree.
root@paon:/tmp/test# cd -P . root@paon:/tmp/test# dir front-tree We stay in place but only due to the -P switch.
root@paon:/tmp/test# cd . root@paon:/tmp/test# dir standard-tree (!!!) cd . actually changes the current directory.
root@paon:/tmp/test# cd /../hello/../tmp/test root@paon:/tmp/test# dir front-tree The -P switch can be omitted when navigating to the front tree via a path that is unresolvable in the standard tree.
Root reachability stratification
We denote Ѧ = r.p−1 = { x Ѧ | x r }. We can now summarize the root-reachability stratification
Ѧ Ѧ Ѧ Ѧ = Ѧ
by .ℓ4-lookup over ordinary pathnames as follows.
Stratum Description Ancestor pathname characteristics
      Ѧ = r.ℓ4down) The canonical tree: derived arrows reachable via a canonical path. r.ℓ4(x.apn) = x
    Ѧ = Ѧ r.ℓ4([‹··›] + Σdown) The reachable (double-)tree: derived arrows reachable via an absolute path. r.ℓ4(x.apn) = x or
r.ℓ4([‹··›] + x.apn.shift) = x
  Ѧ = r.p−1 The inverse reachable tree: derived arrows under the current root. x.apn[0] = ‹/›
Ѧ = Ѧ

Notes:

  1. It is possible to have x Ѧ, y Ѧ Ѧ, and z Ѧ Ѧ such that x.apn = y.apn = z.apn.
  2. The following Ruby code shows a derived arrow (originally pointed to by /home) outside Ѧ.
    Dir.chdir "/home"
    Dir.chroot "/usr"
    puts Dir.pwd  #-> (unreachable)/home
    

S5: Basic file types

By an S5 structure we mean an S4 structure equipped with (.btype) where We introduce the following terminology / notation based on .btype:
Value y of x.btype Terminology for x Notation for the set
{ x X | x.btype = y }
Linux kernel code:
x.i_mode & S_IFMT
'-' regular file S_IFREG
'd' directory X.dirs S_IFDIR
'l' symbolic link
or symlink
X.links S_IFLNK
'b', 'c', 'p', 's', … not introduced
In particular, N.dirs denotes the set of all directories. The structure is subject to the following conditions:
(S5~1) A.s¯ N.dirs,   i.e. hardlink sources are directories.
(S5~2) M.s¯.t N.dirs,   i.e. mount sources point to directories.
(S5~3) r.a.t N.dirs,   i.e. the current root points to a directory.
The .ℓ5 lookup
The .ℓ5() lookup operator is defined as the restriction of .ℓ4() to directories. More precisely,
x.ℓ5(α) = y   iff   y = x.ℓ4(α) and x.a.t is a directory.

S6: Symbolic links

By an S6 structure we mean an S5 structure equipped with (.lpath) where If x is a symbolic link, then x.lpath is called the target pathname of x.
Path resolution stack
By a path resolution stack we mean an element of Ѧ × Σ+, i.e. it is a pair (a, paths) where We denote PRS the set of all path resolution stacks. A path resolution stack corresponds to the nameidata structure from the Linux kernel source code according to the following table.
The PRS structure (a,paths) The nameidata structure
a path
paths saved_names
paths.length depth
We define the partial operator .pp : Σ+ Σ+ by This means that .pp removes the first name from the last pathname in paths and if this results in an empty last list then this list is removed too.

The partial operation .pF : PRS PRS is then defined by

  1. (A) (a,paths).pF is undefined if paths is empty and a.a.t is not a symbolic link.
  2. (B) Else if a.a.t is a symbolic link
  3. (C) Else if a.a.t is a directory, denote α = paths.last[0].
  4. (D) Otherwise (a.a.t is neither a directory nor a symbolic link)

Note: Using 5 instead of 4 would result in a shorter prescription, with (C) and (D) merged.

For x PRS, the value x.pF, if defined, is called the full resolution successor of x. The partial operation of a normal resolution successor .pN : PRS PRS is defined as a restriction of .pF to Ѧ × Σ++.

Definition domains of .pF and .pN can be summarized as follows.

Condition for x = (a, paths) PRS,
α = paths.last[0]
POSIX
error code
Full resolution
successor x.pF
Normal resolution
successor x.pN
paths = [ ] and a.a.t N.links defined undefined
a.a.t N.links undefined
paths ≠ [ ] and a.a.t N.dirs N.links ENOTDIR
a.a.t N.dirs and a.ℓ4(α) is undefined ENOENT
otherwise defined and x.pF = x.pN

Proposition:

  1. .pF (resp. .pN) has no fixed points.
  2. The monounary algebra (PRS, p¯F) (resp. (PRS, p¯N)) is NOT an algebraic forest in general.
  In the following subsections, we will use the endpoint (alias root) partial functions .rF, .rN : PRS PRS induced by monounary algebras (PRS, p¯F) and (PRS, p¯N), respectively.
Link follower
The (strict direct) link follower partial function .plnk : Ѧ Ѧ is defined by
a.plnk = b iff
  • (a, [ ]).pF.rN = (b, [ ]), and
  • a ≠ b.
Note that only symbolic links can have link followers.
Link target
The link target partial function .lt : Ѧ Ѧ is defined by
a.lt = b iff (a, [ ]).rF = (b, [ ]).

Proposition: a.lt = b   iff   a.rlnk = b and b.a.t N.links (i.e. b is the link target of a   iff   b is the last link follower of a and does not point to a symlink).  

Linked lookup
The linked lookup operators .ℓF(), .ℓN() : Ѧ × Σ Ѧ are defined by
a.ℓF(α) = b iff (a, [[α]]).rF = (b, [ ])
(.ℓN() is defined the same way, with F replaced by N).

Proposition: Let α Σ, d Ѧ such that d.a.t is directory. Then

.ℓN(α) = .lt.ℓ5(α)   d.ℓN(α) = d.ℓ5(α)
.ℓF(α) = .lt.ℓ5(α).lt d.ℓF(α) = d.ℓ5(α).lt
 
Pathname resolution
The pathname resolution operators .ϖF(), .ϖN() : Ѧ × Σ Ѧ are defined by
a.ϖF(p) = b iff
  • p = [ ] and a = b, or
  • p ≠ [ ] and (a, [p]).rF = (b, [ ]),
and a.ϖN(p) by the same way, with F replaced by N. This means that a nonempty pathname p is resolved to the derived arrow b relative to a derived arrow a if the path resolution stack (a, [p]) resolves to the endpoint (b, [ ]).

Proposition:

  1. .ϖF() = .ϖN().lt.
  2. .ϖF() = .ℓF()   (the pathname lookup induced by .ℓF()).
  3. The previous statement with F replaced by N.
  Considering the pair of partial maps .ϖF() and .ϖN() as a single partial map ϖ : Ѧ × Σ × {F, N} Ѧ, we obtain the following correspondence with the Linux kernel source code:
Linux kernel source code
FunctionArguments Return
value
FunctionArguments
InputOutput
ϖ vfs_path_lookup
a.m mnt
a.a dentry
p name
rtype flags & LOOKUP_FOLLOW
b nd.path
F / N modes in use
The following table shows examples of Linux system calls and shell commands divided according to which pathname resolution mode they use.
F mode
(full pathname resolution .ϖF())
N mode
(normal pathname resolution .ϖN())
stat(2) lstat(2)
stat(1) -L stat(1)
file(1) -L file(1) -h
realpath(3)
readlink(3)
Induced pathname trees
Going from pointed lookup systems (Ѧ, Σ, .ℓF(), r) and (Ѧ, Σ, .ℓN(), r), we denote .rtF, .rtN : Σ Ѧ the induced pathname resolution maps and ϾF, ϾN the induced pathname trees, i.e. ϾF and ϾN are the sets all canonical pathnames resolvable via .ϖF() and .ϖN(), respectively.

Proposition:

  1. ϾN.p¯ ϾF ϾN.
  2. Ͼ4 ϾN.
  3. .rtN Ͼ4   =   .rt4 Ͼ4.
 
Virtual identity layers
There are four basic layers of identity of filesystem objects, corresponding to the function chain
ϾF
.rtF
Ѧ
.a
A
.t
N
    (or ϾN
.rtN
Ѧ
.a
A
.t
N
).
Identity layers are summarized in the following table.
Notation Terminology
N Filesystem nodes
A Hardlinks Low-level filesystem identity
Ѧ Derived arrows (namespace points) Operating system identity
ϾF / ϾN Resolvable canonical pathnames Browsing identity
Hierarchical structures of particular layers do NOT correspond in general. More precisely, the following diagram does not necessarily commute.
ϾF
.rtF
Ѧ
.a
A
.t
N
.p¯ .p¯4 .p¯
ϾF
.rtF
Ѧ
.a
A
.t
N
The non-commutativity of the left square relates to the difference between logical (.p¯.rtF) and physical (.rtF.p¯4) dot-dot handling [], [], [].
Further resolvability restrictions
We define pathname resolution operators .ϖ6F() and .ϖ6N() as restrictions of .ϖF() and .ϖN(), respectively, by imposing symbolic link limits (group B) according to the following table (only the F-variant is shown):
Condition for x = (a, [p]),   a Ѧ, p Σ POSIX
error code
Group Terminology Description
A Endpoint existence x.rF is defined ELOOP
Intermediate resolvability
/ pathname vanishment
x.rF.paths = [ ] ENOTDIR
ENOENT
B Symlink nesting limit y.paths.length 8 for every y x.psF ELOOP
Symlink count limit x.psF.select{ |y| y.a.a.t N.links }.length 40 ELOOP
C Pathname size limit Not expressible in the S6 structure ENAMETOOLONG
Access restriction Restriction of .ℓ4(), not expressible in the S6 structure EACCES
D Pathname properness (a) p is non-empty ENOENT
(b) ‹/› p.shift
(c) p has no extraordinary name
For illustration, other restrictive conditions are listed:

Proposition:

  1. .ϖ6F(p+q) .ϖ6F(p).ϖ6F(q) for every pathnames p, q. Due to the symlink count limit, equality does not hold in general. (I.e. .ϖ6F() is a pathname resolution operator, but – in contrast to .ϖF() – not a pathname lookup operator.)
  2. The previous statement with F replaced by N.

 

Bibliographic references

Tim Berners-Lee, Roy Fielding, Larry Masinter, Uniform Resource Identifier (URI): Generic Syntax, STD 66, RFC 3986, 2005, http://labs.apache.org/webarch/uri/rfc/rfc3986.html
Daniel P. Bovet, Marco Cesati, Understanding the Linux Kernel, 3rd Edition, O'Reilly, 2005
Andries Brower, The Linux kernel: Some remarks on the Linux kernel, 2003, http://www.win.tue.nl/~aeb/linux/lk/lk.html
R.T.E.H Dresens, Testing the Linux VFS with two mCRL2 models, Master thesis, 2006,
Miguel A. Ferreira, José N. Oliveira, An Integrated Formal Methods Tool-Chain and its Application to Verifying a File System Model, Formal Methods: Foundations and Applications, LNCS 5902, 2009,
Bruce Fields, Notes on the Linux kernel, http://www.fieldses.org/~bfields/kernel/
Glenn S. Fowler, David G. Korn, K.-Phong Vo, An Efficient File Hierarchy Walker, In Proc. of the Summer '89 Usenix Conference, 1989,
Andy Galloway, Gerald Lüttgen, Jan Tobias Mühlberg, Radu I. Siminiceanu, Model-Checking the Linux Virtual File System, 10th International Conference, VMCAI, 2009,
IEEE and The Open Group, IEEE Std. 1003.1-2004: Portable Operating System Interface (POSIX) Base Specifications, Issue 6, 2004, http://pubs.opengroup.org/onlinepubs/009695399/
IEEE and The Open Group, IEEE Std. 1003.1-2008: Portable Operating System Interface (POSIX) Base Specifications, Issue 7, 2008, http://pubs.opengroup.org/onlinepubs/9699919799/
Java Community Process, Day Software AG, JCR v2.0 Specification, 2009, http://www.day.com/specs/jcr/2.0/
Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010,
(The GNU/Linux community) Linux man pages, 2011, http://linux.die.net/man/
Robert Love, Linux Kernel Development, 2nd Edition, Sams Publishing, 2005
Wolfgang Mauerer, Professional Linux Kernel Architecture, Wiley Publishing, Inc., 2008
David Flanagan, Yukihiro Matsumoto, The Ruby Programming Language, O'Reilly 2008
(nLab authors), The nLab, 2011, http://ncatlab.org/nlab/
Ondřej Pavlata, The Ruby Object Model: Data Structure in Detail, 2011, http://www.atalon.cz/rb-om/ruby-object-model
Oracle Corporation, MySQL 5.0 Reference Manual, 2011, http://dev.mysql.com/doc/refman/5.0/en/
Chet Ramey, Bash - the GNU shell, ;login: the USENIX Association newsletter, December 1994, http://tiswww.case.edu/php/chet/bash/article.pdf
Claudia Salzberg Rodriguez, Gordon Fischer, Steven Smolski, The Linux Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures, Prentice Hall PTR, 2005
Gayathri Tk, Jayush Luniya, RTFS : Real Time File System, 2005, http://moss.csc.ncsu.edu/~mueller/rt/rt05/readings/g7/finalReport.pdf
Al Viro, Re: silent semantic changes with reiser4, LWN.net, 2004, http://lwn.net/Articles/100183/
Browser compatibility
To be viewed correctly, this document requires advanced browser features to be supported, including
Document history
June22011 The initial release.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.