The Linux VFS Model
Naming structure
Abstract structure of the Linux Virtual Filesystem (VFS) is presented.
Only the top-level part – the naming structure – is described,
providing detailed pathname resolution semantics.
We formalize the hardlink structure, the mount structure and
describe derived structures used for pathname resolution.
We proceed by stepwise refinement –
we establish a system of (classes of) structures.
This system contains a
main development line
in which each structure is an extension of the previous one.
This way we partially provide an abstract specification of the
Linux Virtual Filesystem.
In fact, this document can be considered to be an abstract specification
of the Linux VFS in itself,
with only few specification steps (the most abstract ones) performed.
Author
| Ondřej Pavlata |
| Jablonec nad Nisou |
| Czech Republic |
|
|
|
Document date
Initial release | June 2, 2011 |
Last major release |
June 2, 2011 |
Last update | September 2, 2018 |
(see Document history)
|
Warning
-
This document has been created without any prepublication review
except those made by the author himself.
-
The author has no experience with the Linux operating system
except for simple tests.
Linux version
This document refers to Linux 2.6.38.
The following configuration has been used for testing:
Distribution | Ubuntu 11.04 natty |
Desktop evironment | GNOME 2.32.1 (Ubuntu 2011-04-14) |
Kernel | Linux 2.6.38-8-generic |
Platform | i686 |
|
GNU bash | version 4.2.8(1)-release (i686-pc-linux-gnu) |
Table of contents
Introduction
VFS versus RTFS
As of time of writing this document,
the acronym RTFS has two meanings (known to the author)
in the context of file systems:
- (A)
The Real Time File System ([]).
- (B)
As used in [],
within the term RTFS country.
Gray color indicates that we are concerned with the second meaning here.
We start with the following premise:
User-observable semantics of the Virtual Filesystem should be described
outside the RTFS country.
The main objective of this document is to contribute to the following shift in
proportion relating the VFS semantics
(the non-RTFS country is referred to as Abstract specification):
Abstract specification |
RTFS country
|
|
→
|
Abstract specification
|
RTFS country |
|
Preliminaries
Some familiarity with elementary algebra and order theory is assumed.
This involves such notions as
structure,
substructure,
sort,
generating set/structure,
(partial) function,
relation,
domain,
range,
restriction,
extension,
injectivity,
reflexivity,
transitivity,
monotonicity,
isomorphism,
(order) chain,
closure operator.
General conventions
Our notation mostly follows set-theoretic conventions as shown in the following table.
Notation |
Description |
X × Y |
The Cartesian product of X and Y. |
UU |
The universal class – the set-theoretic class of all sets. |
Id |
The identity class –
the set-theoretic class of all pairs (x,x). |
Dom(R) |
The (definition) domain of a relation R
– the least X, w.r.t. ⊆, such that
R ⊆ X × UU. |
Rng(R) |
The range of a relation R
– the least Y such that
R ⊆ UU × Y. |
R ↾ X |
The (domain) restriction of the relation R to
X,
R ↾ X = R ∩ (X × UU).
|
R ○‣ S |
The composition of relations.
Inner triangle indicates the left-to-right direction
(first R, then S).
For functions with left-dot notation,
we write .r.s instead of .r ○‣ .s.
|
R−1 |
The inverse (opposite) of the relation R. |
X ↷ Y |
The set (class) of partial functions from X to Y.
We write .f : X ↷ Y instead of .f ∈ X ↷ Y.
|
X↷ |
The set (class) of all partial functions on X, shorthand for X ↷ X. |
X.r |
Usually, the image of X under .r,
X.r = { x.r | x ∈ X }. |
R∗ |
The reflexive transitive closure of the relation R
(with the reflexive closure depending on the context). |
R ⋈ S |
The binary equi-join of relations R and S.
R ⋈ S = { (x,y,z) | (x,y) ∈ R, (x,z) ∈ S }.
For functions .r, .s,
.r ⋈ .s is the map x ↦ (x.r, x.s).
|
X ⊎ Y |
X ⊎ Y = Z
iff X ∪ Y = Z and X ∩ Y = ∅. |
Mixture of partial functions
We introduce mixture
operators as follows
(f, g are relations).
Notation |
Definition |
Terminology |
Correspondents |
MySQL |
Ruby |
f ◑ g |
(f ↾ (Dom(f) ∖ Dom(g))) ∪ g |
patch-mixture of g into f |
REPLACE |
Hash#update |
f ◐ g |
g ◑ f |
|
INSERT IGNORE |
|
f ◖◗ g |
f ◖◗ g = h iff
f ∪ g = h and Dom(f) ∩ Dom(g) = ∅ |
def-mixture of f and g |
INSERT |
|
Obviously, if f and g are partial functions,
then so is f ◑ g
(resp. f ◖◗ g if defined).
Ruby conventions
We sometimes use conventions from the Ruby programming language
[],
in particular, in the following cases:
-
List handling functions (see Lists).
-
Ruby blocks are used for function definition:
{ |x| expression(x) }
means the map x ↦ expression(x).
Lists
By a list we mean a function whose domain, the index set, is
a down-set of (ℕ, ≤), i.e.
either the set ℕ of all natural numbers, or
a set {0, …, n-1} for some natural n.
A list I → X is said to be a list over X.
The following table shows notation used for lists over a set X.
Notation |
Description |
X∗ |
finite lists over X |
X+ |
non-empty finite lists over X |
X∞ |
infinite lists over X |
|
[a,b,c] |
the list {(0,a),(1,b),(2,c)} |
let x, y ∈ X∗, A, B ⊆ X∗
|
x[i] |
the function application x(i) |
x.length |
number of members of x |
x.last |
x.last[x.length-1] |
x + y |
concatenation of lists x and y |
A + B |
{ a + b | a ∈ A, b ∈ B } |
x + B |
[x] + B |
x.shift |
x without the first member |
x.pop |
x without the last member |
x.uniq |
x without duplicate values |
x.select{block} |
the selection from x according to block |
Monounary algebra
A monounary algebra is a structure
(X, .p¯) such that
-
X is a set,
-
.p¯ is a function X → X.
We introduce the following notation and terminology:
Function |
Terminology |
Definition / description |
.p¯ : X → X |
x.p¯ is the
direct successor |
parent-or-root |
of x |
x.p¯ = x.p if x.p is defined,
x.p¯ = x otherwise.
|
.p : X ↷ X |
x.p is the
strict direct successor |
parent |
of x |
.p is the restriction of .p¯ to non-fixed points:
x.p = x.p¯ if x.p¯ ≠ x,
x.p is undefined, otherwise.
|
.p¯() : X × ℕ → X |
x.p¯(i) is the i-th successor of x |
x.p¯(i) is the i-th application of .p¯ to x:
x.p¯(0) = x,
x.p¯(i) = x.p¯(i-1).p¯ for i > 0.
|
.p() : X × ℕ ↷ X |
x.p(i) is the
strict i-th successor |
i-th ancestor |
of x
|
.p() is a restriction of .p¯():
x.p(i) = x.p¯(i) if i = 0 or
x.p¯(i-1) ≠ x.p¯(i),
x.p(i) is undefined, otherwise.
|
.p¯s¯ : X → X∞ |
x.p¯s¯ is the successor list of x |
x.p¯s¯[i] = x.p¯(i)
|
.ps : X → X∗ ∪ X∞ |
x.ps is the
strict successor list |
ancestor list |
of x
|
x.ps[i] = x.p(i)
|
.r : X ↷ X |
x.r is the
of x |
x.r = x.ps.last if x.ps is finite,
x.r is undefined, otherwise.
|
Algebraic forest
By an algebraic forest (or just forest)
we mean a structure which has one of the following equivalent forms:
- (1)
A partial order whose every principal up-set is a finite chain:
A partial order (X, ≤) such that
for every x ∈ X,
the set x.ps = { y ∈ X | x ≤ y }
is finite and totally ordered by ≤.
- (2)
A monounary algebra whose every non-empty subalgebra has a fixed point:
An algebra (X, .p¯)
with just one unary operator, .p¯ : X → X,
such that for every x ∈ X,
x.p¯(i) = x.p¯(i+1) for some natural i.
- (3)
A partial monounary algebra without total non-empty subalgebras:
A partial algebra (X, .p)
with just one partial unary operator,
the parent partial function .p: X ↷ X,
such that for every x ∈ X,
x.p(i) is defined for only finitely many i.
Because the set x.ps from (1) is a finite chain in ≤,
we can regard it as a finite list.
Then the correspondence (1)↔(2)↔(3) is established as follows:
(2)←(1)
|
-
x.p¯(i) = x.ps[i] if i ≤ x.ps.length,
-
x.p¯(i) = x.ps.last otherwise,
|
|
|
(1)←(2)
|
-
x ≤ y iff x.p¯(i) = y for some i,
i.e.
|
i.e. .p¯ is the smallest function on X
(w.r.t. ⊆)
satisfying (∗),
|
(X,≤) is the reflexive transitive closure of
(X, .p¯), (∗)
|
|
|
(3)←(1)
|
-
x.p(i) = x.ps[i] if i ≤ x.ps.length - 1,
-
x.p(i) is undefined otherwise.
|
(1)←(3)
|
-
x ≤ y iff x.p(i) = y for some i,
i.e.
|
i.e. .p is the smallest relation on X
satisfying (∗), the
reflexive transitive reduction of (X,≤)
(a.k.a Hasse diagram),
|
(X,≤) is the reflexive transitive closure of
(X, .p) (∗).
|
The root map .r: X → X is defined by
x.r = x.ps.last.
Obviously,
.r is a closure operator w.r.t. (X, ≤).
An element x is a root if it satisfies any of the following
equivalent conditions:
-
x is .r-closed,
-
x is a fixed point w.r.t. .p¯,
-
x has undefined parent x.p.
-
x.p¯(i) (resp. x.p(i)) means the
i-th application of .p¯
(resp. .p) to x.
-
The form (3) shows that an algebraic forest
can be viewed as a special case of a directed acyclic graph
(DAG) which in turn is a special case of a
digraph (a directed graph).
In this context the term
algebraic forest
is equivalent to
that of a rooted forest
[].
Algebraic tree
By an algebraic tree we mean an algebraic forest with exactly one root.
As an algebra, it is a structure (X, .p¯, r), such that
-
(X, .p¯) is an algebraic forest,
-
r is the only fixed point of .p¯.
An algebraic tree is said to be locally finite
iff
for each node x, the set {x} has finite preimage under .p¯
(i.e. each node has only finite number of children).
Pointed forest
By a pointed algebraic forest we mean an algebraic forest
equipped with an artificial
root.
As an algebra, it is a structure (X, .p¯, r), such that
-
(X, .p¯) is an algebraic forest,
-
r is an arbitrary element of X.
Component-wise finite linear order
For a partial order (X, ≤), the following are equivalent:
- (a)
Each component of (X, ≤) is a finite chain.
- (b)
Both (X, ≤) and its opposite (X, ≥) are algebraic forests.
For such a case we introduce bidirectional
terminology and notation according to the following table.
First |
Previous |
Next |
Last |
.end(−) |
root in (X, ≥) |
.nei(−) |
parent in (X, ≥) |
.nei(+) |
parent in (X, ≤) |
.end(+) |
root in (X, ≤) |
Ordered tree
By a locally finite ordered tree
we mean a structure which has one of the following equivalent forms:
- (1)
A locally finite algebraic tree together with sibling ordering:
A structure (X, ≤V, ≤S) such that
-
(X,≤V) is a locally finite algebraic tree.
-
(X,≤S) is a component-wise linear order.
-
.rS.pV = .pV,
i.e. components of ≤S are sibling-sets of ≤V.
- (2)
A locally finite algebraic tree together with a complementary
horizontal
ordering:
A structure (X, ≤V, ≤H) such that
-
(X,≤V) is a locally finite algebraic tree.
-
(X,≤H) is a partial order.
-
≤V and ≤H are complementary, i.e.
(<V) ⊎ (>V) ⊎ (<H) ⊎ (>H)
= X × X ∖ Id.
-
≤V and ≤H are consistent:
(<H) = (≤V) ○‣ (<H) ○‣ (≥V).
(That is, for every nodes x, y such that x <H y,
all descendants of x precede all the descendants of y.)
- (3)
A locally finite algebraic tree together with a consistent linear extension:
A structure (X, ≤V, ≤L) such that
-
(X,≤V) is a locally finite algebraic tree.
-
(X,≤L) is a linear order.
-
(≤V) ⊆ (≤L),
i.e. ≤L is an extension of ≤V.
-
The same consistency condition as in (2) holds,
with (≤H) defined as (≤L) ∖ (<V).
- (4)
A parent-to-children-list map:
A structure (X, .qs) such that
-
X is a non-empty set.
-
.qs is function X → X∗.uniq,
i.e. .qs maps X to non-repetitive lists over X.
-
The set { (x,y) | x ∈ y.qs } is the parent partial function
of a locally finite algebraic tree on X.
The correspondence (1)–(4) is obtained as follows.
-
(<H) = (≤V) ○‣ (<S) ○‣ (≥V)
= (≤L) ∖ (<V)
-
(<L) = (<H) ⊎ (<V)
-
x <S y iff
x.pV = y.pV and x <L y.
-
x <S y iff
x appears before y in some list from X.qs.
-
x.pV = y iff
x ∈ y.qs.
-
x.qs = |
[ ] |
if x ∉ X.pV, |
x.qs = |
y.psS |
otherwise,
where y is the unique element of X such that
y.pV = x and y ∉ X.pS. |
Ordered forest
By a finite ordered forest we mean a finite disjoint union of
locally finite ordered trees.
Tree navigation
Operators for ordered tree (or forest) navigation can be sumarized as follows.
Order |
Operators |
First |
Previous |
Next / Parent |
Last / Root |
≤V |
Vertical order |
|
relevant: .qs |
.pV |
parent |
.rV |
root |
≤S |
Sibling order |
.endS(−) |
first sibling |
.neiS(−) |
previous sibling |
.neiS(+) |
next sibling |
.endS(+) |
last sibling |
≤L |
Linear order |
.endL(−) |
leftmost |
.neiL(−) |
nested predecessor |
.neiL(+) |
nested successor |
.endL(+) |
rightmost (equal to .rV) |
≤H |
Horizontal order |
|
Labelled transition system
A labelled transition system is a structure
(S, Λ, ) where
-
S is a set of states,
-
Λ is a set of labels,
-
is a subset of S × Λ × S,
elements of are called transitions.
Notation x y is used for (x, α, y) ∈ ().
A labelled transition system is called deterministic if
is a partial function on S × Λ.
Multidigraph
A multidigraph is a structure
(N, A, .s, .t) where
-
N is a set of nodes,
-
A is a set of arrows, disjoint from N,
-
.s and .t are functions A → N
assigning to each arrow its source and target node, respectively.
Naming multidigraph
By a naming multidigraph we mean a structure
(N, A, Σ, .σ, .s, .t) where
-
(N, A, .s, .t) is a multidigraph,
-
Σ is a naming domain – a set of names,
-
.σ is a function A → Σ, called (arrow) naming,
assigning to each arrow its name.
The structure is subject to the following naming consistency condition:
|
|
(NC) |
.s ⋈ .σ is injective,
i.e. different arrows with equal source have different names.
|
Lookup system
By a lookup system we mean a structure
(X, Σ, .ℓ()) where
-
X is a set (the lookup domain),
-
Σ is a set of names,
-
.ℓ() is a partial function X × Σ ↷ X,
called the Σ-lookup.
Proposition:
The following structures are equivalent:
- A lookup system.
- A deterministic labelled transition system.
- A naming multidigraph, up to identification of arrows.
□
Using the multidigraph source-target terminology
and assuming finiteness of .ℓ(),
a lookup system can be interpreted as a relational database table of the following form:
Domains →X |
Σ |
X |
source |
name |
target |
| | |
| | |
| | |
Underline in
source
and
name
indicates a primary key
.
Pathnames
Elements of Σ∗
(the set of all finite, possibly empty, sequences of names)
are called pathnames or paths.
Given a lookup system (X, Σ, .ℓ()),
the induced Σ-pathname-lookup
.ℓ∗() : X × Σ∗ ↷ X
is naturally defined by
- .ℓ∗([ ]) = Id ↾ X,
- .ℓ∗([α1, …, αn]) =
.ℓ(α1) ⋯ .ℓ(αn).
Proposition:
For a partial function .f() : X × Σ∗ ↷ X the following
are equivalent:
- (a)
The map p ↦ .f(p)
is a monoid homomorphism between
(Σ∗, +, [ ]) and
(X↷, ○‣, Id ↾ X), i.e.
- .f(p).f(q) = .f(p + q),
for every p, q ∈ Σ∗, and
- .f([ ]) = Id ↾ X.
- (b)
.f() is a Σ-pathname lookup,
i.e.
.f() = .ℓ∗() for some .ℓ() : X × Σ ↷ X.
(Necessarily, .ℓ() is given by
.ℓ(α) = .f([α]).)
□
Again, we can interpret the induced pathname lookup as a relational database table,
this time better considered as a virtual, possibly infinite, database view.
Domains →X |
Σ∗ |
X |
source |
pathname |
target |
| | |
| | |
| | |
Pathname resolution system
By a pathname resolution system we mean a structure
(X, Σ, .ϖ()) where
-
X, Σ are disjoint sets
(with the same semantics as in a lookup system),
-
.ϖ() is a partial function X × Σ∗ ↷ X
such that for every x ∈ X and every pathnames p, q,
- x.ϖ([ ]) = x,
- .ϖ(p + q) ⊆ .ϖ(p).ϖ(q).
(If x.ϖ(p + q) is defined then
x.ϖ(p) and x.ϖ(p).ϖ(q) are defined and
x.ϖ(p).ϖ(q) = x.ϖ(p + q).)
Proposition:
-
The domain of .ϖ() is pathname-prefix closed, i.e.
for every (x,p) ∈ X × Σ∗,
- if x.ϖ(p) is defined then x.ϖ(p.pop) is defined.
-
Let .ℓ() : X × Σ ↷ X
be a lookup defined by x.ℓ(α) = x.ϖ([α]).
Then .ϖ() is a restriction of .ℓ∗().
Corollary:
-
A pathname resolution system can be viewed as a
lookup system (X, Σ, .ℓ())
equipped with a restriction .ϖ()
of the induced pathname lookup .ℓ∗() such that
the domain of .ϖ()
- is pathname-prefix closed, and
- coincides with the domain of .ℓ∗() on pathnames of length ≤ 1.
-
A lookup system can be viewed as a pathname resolution system such that
.ℓ∗() = .ϖ().
Reachability
Given a pathname resolution system (a lookup system)
we introduce reachability relations
as subsets of X × X
according to the following table
(we let P ⊆ Σ∗, Τ ⊆ Σ):
Notation |
Definition |
Terminology |
.ℓ∗(P) |
{ (x,y) | x.ℓ∗(p) = y for some pathname p ∈ P } |
reachability (by .ℓ∗()) over pathnames from P |
.ℓ∗ |
.ℓ∗(Σ∗) |
reachability (by .ℓ∗()) |
|
.ϖ(P) |
as above,
with ℓ∗ replaced by ϖ. |
.ϖ |
|
.ℓ(Τ) |
{ (x,y) | x.ℓ(α) = y for some name α ∈ Τ } |
single step reachability (succession) over names from Τ |
.ℓ |
.ℓ(Σ) |
single step reachability / succession |
If x.ℓ∗ ∋ y then y is said to be reachable from
x.
(Double-)pointed lookup system
By a double-pointed lookup system we mean a structure
(X, Σ, .ℓ(), /, r, d) where
-
(X, Σ, .ℓ()) is a lookup system,
-
/ is a distinguished element of Σ,
-
r and d are distinguished elements of X
(called the current root and the current directory,
respectively).
The structure is subject to the following condition:
-
{r, d} × {r}
⊆
.ℓ(/)
⊆ X × {r}
(in particular, .ℓ(/) maps constantly to r).
By a pointed lookup system
we mean a double-pointed lookup system with r = d.
The (global) resolution target partial function
.rt : Σ∗ ↷ X
maps pathnames to X as follows:
-
p.rt = r.ℓ∗(p) if p[0] = r,
-
p.rt = d.ℓ∗(p) otherwise.
Naming Forest
Forest Naming Domain
By a forest naming domain we mean a set Σ of names
such that
Σ |
= |
Σdown |
⊎ |
Σspec |
∪ |
|
‖ |
|
∪ |
ΣΌ |
= |
Σdown |
⊎ |
Σ‰ |
where
- Σdown is the set of down names,
- Σspec is the set op special names
with forest-related semantics.
They are considered logical constants
(cf. []),
We distinguish these names by enclosing guillemets, like in
r or ·.
- Σ‰ is the set op permil names
containing just 3 special names ·,
/ and ··,
- ΣΌ is the set of ordinary names.
Naming forest
By a naming forest we mean a structure
(A, ≤, Σ, .σ) where
- (A, ≤) is a forest,
- Σ is a forest naming domain,
- .σ
is a function A ↷ Σdown called down-naming.
For x ∈ A, the down-name x.σ, if defined,
is said to be the name of x.
The structure is subject to the following conditions:
- (1)
Dom(.σ) = Dom(.p),
i.e.
x has a name iff x has a parent.
- (2)
.p ⋈ .σ is injective, i.e.
different siblings have different names.
The induced down-lookup operator
.ℓdown() : A × Σdown ↷ A
is defined as the inverse to .p ⋈ .σ, i.e.
x.ℓdown(α) = y iff
y.p = x and y.σ = α.
Proposition:
Given the forest naming domain Σ, the
following structures are equivalent:
-
A naming forest (A, ≤, Σ, .σ).
-
A lookup system (A, Σdown, .ℓdown()) such that
- .ℓdown() is injective,
- .ℓdown (the single
down
-step reachability)
is inverse to the parent partial function of a forest.
□
By a pointed naming forest we mean
a naming forest equipped with an artificial root r ∈ A.
By a naming tree we mean a pointed naming forest such that
{r} = A.r.
The default lookup operator
.ℓ() : A × Σ ↷ A
is defined as an extension of .ℓdown() according to the
following table:
Name α |
Semantics |
Terminology |
.ℓ(α) |
a down-name | α-named child |
.ℓdown(α) |
p | parent | .p |
p¯ | parent-or-root | .p¯ |
r | component root | .r |
· | self | Id ↾ A |
·· | dotdot-parent |
.p¯ ◑ {(r,r)} |
if r is defined, |
.p¯ | otherwise. |
|
/ | current root |
A × {r} |
if r is defined, |
∅ | otherwise. |
|
Pathnames
The set Σ∗ of pathnames over a forest naming domain Σ
has a (special) nomenclature according to the following table.
Terminology for a pathname p ∈ P |
Pathname set P expression / description |
Examples |
Unix-style correspondents |
ordinary (proper) |
absolute |
[/] + (Σdown ∪ {··, ·})∗
|
[/, "garden", ··, "beds"] |
"/garden/../beds " |
[/] |
"/ " |
|
canonic |
[/] + Σdown∗ |
[/, "garden", "beds"] |
"/garden/beds " |
[/] |
"/ " |
relative |
[/] +
(Σdown ∪ {··, ·})+
|
["garden", ·, "beds"] |
"garden/./beds " |
extraordinary |
pathnames p that are not ordinary:
- (a)
the empty pathname,
- (b)
p with a non-leading root-name /,
- (c)
p with a special name other than
/, ·· or ·.
|
|
|
(a) [ ] |
(b) ["garden", /] |
(c) [f, "garden"] |
We also introduce notation Ϲ, Ά and
Ό
for the sets of all canonical, absolute and ordinary pathnames, respectively.
This leads to the following inclusion chain:
canonical |
|
absolute |
|
ordinary |
|
all pathnames |
Ϲ |
⊆ |
Ά |
⊆ |
Ό |
⊆ |
Σ∗ |
Note that the set of all ordinary pathnames can be expressed as
Ό = {[ ], [/]} + (Σdown ∪ {··, ·})∗
∖ {[ ]}.
Pathname reduction
We introduce pathname reduction operators
Σ∗ → Σ∗ as follows:
Notation |
Terminology |
Definition (reduction rules) |
.reduct(··) |
reduction
|
-
(p + [/, ··] + q).reduct(··) = p + [/] + q,
-
(p + [α, ··] + q).reduct(··) = p + q
if α ∈ Σdown,
-
s.reduct(··) = s if the pathname s does not have
a sub-pathname of the form [/, ··]
or [α, ··],
α ∈ Σdown.
(I.e. the following rules apply:
[/, ··] ↦ [/]
and [α, ··] ↦ [ ], α ∈ Σdown.)
|
.reduct℅(··) |
Let α = ··. Then
-
p.reduct℅(α) = p.reduct(α) + [·]
if p.last = α and either
- p.reduct(α) = [ ] or
- p.reduct(α).last ∈ Σdown,
-
p.reduct℅(α) = p.reduct(α) otherwise.
|
.reduct(·) |
dot reduction |
-
(p + [·] + q).reduct(·) = p + q,
-
s.reduct(·) = s if · ∉ s.
(All dot names · are removed.)
|
.reduct℅(·) |
The same rules apply as for .reduct℅(··) but with
α = ·.
(Any non-final dot name · is removed.
The final ·, if any,
is only removed if preceded by / or ··.)
|
.reduct(/) |
root reduction |
-
(p + [/] + q).reduct(/) = [/] + q,
-
s.reduct(/) = s if / ∉ s.shift.
(All names preceding the last / are removed.)
|
.reduct℅ |
reduction |
-
.reduct℅ = .reduct(/).reduct℅(··).reduct℅(·).
|
.reduct |
-
.reduct = .reduct(/).reduct(··).reduct(·).
|
-
The ℅ symbol means
care of trailing dot
.
The ℅-variants of reduction preserve pathname characteristics
of ending with a down-name, see proposition (B).
This is important for enforcing directory/non-directory (leaf/non-leaf)
pathname resolution logic.
-
Pathname reduction .reduct℅(·).reduct℅(··) is used
in URI resolution
[].
The following example shows the impact of the ℅ feature.
Proposition:
- (A)
For every
.f, .g ∈ {.reduct(··), , reduct℅(··), …, .reduct},
p, q, r ∈ Σ∗,
- .f.f = .f,
- .f.g = .g.f,
- (p + q + r).f = (p + q.f + r).f.
- (B)
For every
.f ∈ { reduct℅(·), reduct℅(··), reduct℅},
p ∈ Σ∗,
- p.last ∈ Σdown iff p.f.last ∈ Σdown.
- (C)
Ά.reduct = Ϲ,
i.e. .reduct maps absolute pathnames to canonical pathnames.
Canonical pathname tree
Given the forest naming domain Σ,
the (free) canonical pathname tree
is the structure
C = (Ϲ, .p, Σ, .σ, [/])
where
-
.p = .pop ↾ Ϲ ∖ {[/]},
-
.σ = .last ↾ Ϲ ∖ {[/]}.
Obviously, C is a naming tree with root [/].
The corresponding lookup
.ℓϹ() : Ϲ × Σ ↷ Ϲ
is defined as follows:
Name α |
Value p.ℓϹ(α) |
a down-name | p + [α] |
· | p |
·· |
[/] | if p = [/], |
p.pop | otherwise |
|
/ | [/] |
Proposition:
For every canonical pathname p and every ordinary pathname q,
- p.p¯ = (p + [··]).reduct,
- p.ℓϹ∗(q) = (p + q).reduct.
Induced pathname tree
Given a pointed lookup system
(X, Σ, .ℓ(), r) where
with the induced pathname resolution map
.rt : Σ∗ ↷ X,
we denote
-
Ͼ = { q ∈ Ϲ | q.rt is defined },
(i.e.
Ͼ is the set of all resolvable canonical pathnames)
-
.ℓϾ() = .ℓϹ() ∩
Ͼ × Σ × Ͼ.
Proposition:
-
Ͼ.p ⊆ Ͼ,
-
(Ͼ, .p, Σ, .σ, [/])
is a naming tree
– we call it the induced pathname tree,
-
.ℓϾ() is the corresponding lookup.
□
Dot-dot handling modes
Proposition:
Let (X, Σ, .ℓ(), r) be a pointed lookup system
with a forest naming domain Σ
such that
- .ℓ(··) is defined for all x ∈ X,
but does not necessarily have the semantics of a naming forest,
- .ℓ(·) is identity on X.
Let p be an ordinary pathname and consider
the 3 operators
.rt.ℓ∗(p),
.ℓϹ∗(p).rt,
.ℓϾ∗(p).rt : Ϲ ↷ X.
- If ·· ∉ p then the operators are equal.
- If ·· ∈ p then
the operators are pairwise different in general.
□
We might say that the resolution Ϲ × Ό ↷ X
depends on dot-dot handling mode.
This is summarized in the following table.
Operator Ϲ × Ό ↷ X |
Mode name |
Application example |
The cd utility | |
.rt.ℓ∗() | physical |
[]
or
[]
with the -P switch
|
|
.ℓϹ∗().rt |
logical | free |
[]
with the -L switch
|
URI resolution [] |
.ℓϾ∗().rt |
restricted |
[]
with the -L switch
|
|
Filesystem forest
Filesystem forest
By a filesystem forest we mean a structure
(N, A, ≤, .t)
where
- N is a set of nodes,
- A is a set of arrows, disjoint from N,
- ≤ is a partial order between arrows,
- .t is a function A → N
assigning each arrow its target node.
The structure is subject to the following conditions:
- (1)
(A, ≤) is an algebraic forest.
- (2)
.t is injective on A.p,
i.e.
only leaf arrows a ≠ b can have equal target.
Filesystem forest as a multidigraph
Similarly to algebraic forests, a filesystem forest can be considered
of having one of more equivalent forms:
(1a) |
(N, A, ≤, .t) |
– the original form, |
(1b) |
(N, A, .p¯, .t) |
with .p¯ : A → A, |
(1c) |
(N, A, .p, .t) |
with .p : A ↷ A, |
|
(2b) |
(N, A, .s¯, .t) |
– the multidigraph (quiver) form, with .s¯ : A → N, |
(2c) |
(N, A, .s, .t) |
–
the partial multidigraph form, with .s : A ↷ N. |
The correspondence (1)–(2) is given by
.p¯ = .s¯.t−1 |
|
.s¯ = .p¯.t |
.p = .s.t−1 |
.s = .p.t |
S1: Filesystem forest
By an S1 structure we mean a filesystem forest
N = (N, A, ≤, .t)
such that the following is satisfied:
|
|
(S1~1) |
Components of (A, ≤) are finite.
|
(S1~2) |
Components of (A, ≤) have disjoint .t-images
(i.e. .t−1.r is a partial function N ↷ A.r).
|
Arrows from A are also called hardlinks.
Condition (S1~2) means that cross-tree hardlinks are not allowed.
The following table shows basic correspondence of the S1 structure
to its in-memory representation in the Linux kernel.
| In-memory representation |
Domain | Function | Type (structure) | Reference |
N | |
inode | |
| .t−1.r |
| .i_sb->s_root |
A | |
dentry | |
| .t |
| .d_inode |
| .p¯ |
| .d_parent |
| .r |
| .d_sb->s_root |
S2: Mounted filesystem forest
By an S2 structure we mean an S1 structure
equipped with a mount structure
M = (M, ≤, ≤mpl, .s¯, .t)
where
- M
is a finite set of mounts, disjoint from both N and A,
- ≤
is a partial order between mounts,
- ≤mpl
is a component-wise linear order between mounts,
encoding mountpoint lists,
- .s¯, .t
are functions M → A assigning each mount its source
and target, respectively
(i.e. (M, A, .s¯, .t) is a multidigraph).
The structure is subject to the following conditions:
|
|
(S2~1) |
(M, ≤) is an algebraic forest.
|
(S2~2) |
(x.p,x.s) = (y.p,y.s)
iff
x.rmpl = y.rmpl for every mounts x, y
(i.e.
mounts have equal mountpoint source iff they are comparable in ≤mpl).
|
(S2~3) |
M.t ⊆ A.r,
i.e. mount targets are roots in (A, ≤).
|
(S2~4) |
m.s¯.r = m.p¯.t for every mount m
(i.e. m's source belongs to m parent's target tree).
|
(S2~5) |
m.s¯.t = a.t = b.t implies a = b
for every mount m and arrows a, b
(i.e. (targets of) mount sources cannot be multiply hardlinked).
|
By .s : M ↷ A we mean the restriction of .s¯
to non-root mounts.
The following table shows basic correspondence of the mount structure
to the Linux kernel source code.
| In-memory representation |
Domain | Function | Type (structure) | Reference |
M | |
vfsmount | |
| .s¯ |
| .mnt_mountpoint |
| .t |
| .mnt_root |
| .p¯ |
| .mnt_parent |
| .r |
| .mnt_ns->root |
| .neimpl(−) |
| .mnt_hash.prev (*) |
| .neimpl(+) |
| .mnt_hash.next (*) |
-
Application of the
list_entry
macro is required:
x.neimpl(−)
corresponds to list_entry(y, struct vfsmount, mnt_hash)
iff y
equals x.mnt_hash.prev
and is not the head of x.mnt_hash
.
-
Mountpoint lists (i.e. components of (M, ≤mpl) do not
have direct one-to-one correspondence to
.mnt_hash
lists
in general
–
a hash list can be merged from several mountpoint lists.
The direct corresponence is established by a mountpoint equality check
in __lookup_mnt
.
Derived arrows
We denote
- Ѧ =
{ (m,a) ∈ M × A | m.t = a.r }
the set of derived arrows (or namespace points).
Informally,
a derived arrow is a hardlink in the context of one of the (possibly many)
mounts that bind the filesystem tree which the hardlink belongs to.
Proposition:
Ѧ is finite.
□
We use the (m,a) signature for derived arrows,
so that .m and .a are
the natural projections,
e.g. (m,a).a = a for (m,a) ∈ Ѧ.
Derived arrows Ѧ have the following form in the Linux source:
struct path {
struct vfsmount *mnt;
struct dentry *dentry;
};
Derived nodes
For the sake of completeness we also define the set
- N′ =
{ (m,a.t) ∈ M × N | (m,a) ∈ Ѧ }
of derived nodes.
Derived arrows are mapped to derived nodes by the
target function
.t : Ѧ ↠ N′ defined by (m,a).t = (m,a.t).
Mountpoints
The mountpoint list function
.mpl : Ѧ ↷ Ѧ∗ is defined by
-
x ∈ (m,a).mpl iff
(x.p,x.s) = (m,a),
-
x appears before y in (m,a).mpl iff
x ≤mpl y.
Derived arrows with non-empty mountpoint list are called mountpoints.
Mount-front and mount-back
The mount-front-predecessor partial function
.mf : Ѧ ↷ Ѧ is defined by
-
(m,a).mf is undefined
if (m,a) is not a mountpoint
(i.e. the set X = { x ∈ M | (x.p,x.s) = (m,a) } is empty),
-
(m,a).mf = (y, y.t) otherwise,
where y = (m,a).mpl.last
(the last mount in the mountpoint list of (m,a)).
The mount-front function
.f : Ѧ → Ѧ is defined recursively by
-
x.f = x
if
x.mf is undefined,
-
x.f = x.mf.f otherwise.
The mount-back-parent partial function
.mb : Ѧ ↷ Ѧ is defined by
-
(m,a).mb is undefined
if
a ∉ A.r or (m,a) ∈ M.r × A.r,
-
(m,a).mb = (m.p, m.s) otherwise.
The mount-back function
.b : Ѧ → Ѧ is defined recursively by
-
x.b = x
if
x.mb is undefined,
-
x.b = x.mb.b otherwise.
Proposition:
- (i)
.f and .b are root maps
of forests (Ѧ, .mf) and (Ѧ, .mb),
respectively.
In addition, .mf is injective.
- (ii)
.f.b = .b.
- (iii)
.b.p.a = .b.a.p.
- (iv)
A derived arrow x is a mountpoint iff x ∉ Ѧ.f.
□
The fine forest
The parent function .p¯ : Ѧ → Ѧ is defined by
-
(m,a).p¯ = (m,a.p) if a ∉ A.r, else
-
(m,a).p¯ = (m.p, m.s) if m ∉ M.r, else
-
(m,a).p¯ = (m,a).
We denote (Ѧ, ≤)
the reflexive transitive closure of (Ѧ, p¯)
and
.p the restriction of .p¯ to non-fixed points.
Proposition:
- (i)
(Ѧ, ≤) is an algebraic forest.
- (ii)
.f is idempotent and decreasing,
.b is idempotent, increasing and monotone
(a closure operator) in (Ѧ, ≤).
For every x,y ∈ Ѧ,
-
x.f.f = x.f ≤ x ≤ x.b = x.b.b,
-
x ≤ y implies x.b ≤ y.b.
- (iii)
(N′, Ѧ, ≤, .t) is a filesystem forest.
- (iv)
(Ѧ, .mf−1)
⊆ (Ѧ, .mb)
⊆ (Ѧ, .p)
(forest inclusion).
Proof:
(i)
Follows from the fact that
(Ѧ, .p) is a weak substructure of
the lexicographic product of (M, .p) and (A, .p),
respectively.
□
The up-front forest
The up-front parent partial function
.pf : Ѧ ↷ Ѧ is defined by
.pf = .b.p.f.
We denote (Ѧ, ≤f) the reflexive transitive closure of
(Ѧ, pf).
Proposition:
- (i)
x.b < x.pf.b whenever x.pf is defined.
- (ii)
(Ѧ, ≤f) is an algebraic forest.
- (iii)
.b is strictly monotone w.r.t. (Ѧ, ≤f)
and (Ѧ, ≤),
i.e.
x <f y implies x.b < y.b.
- (iv)
(N′, Ѧ, ≤f, .t) is a filesystem forest.
Proof:
(i)
(x.pf).b = (x.b.p.f).b
= x.b.p.(f.b)
= x.b.p.b
> x.b.p > x.b.
□
The down-front forest
The down-front parent partial function
.pt : Ѧ ↷ Ѧ is defined by
.pt = .b.p ↾ Ѧ.b.f,
i.e.
x.pt = y iff
x = x.b.f and y = x.b.p.
We denote (Ѧ, ≤t) the reflexive transitive closure of
(Ѧ, pt).
Proposition:
Same statements apply as for the up-front forest,
with f replaced by t.
□
S3: Naming
By an S3 structure we mean an S2 structure
equipped with a naming structure
(Σ, .σ)
assigning each non-root hardlink a name that is distinct within sibling hardlinks,
i.e.
|
|
(S3~1) |
(A, ≤, Σ, .σ) is a naming forest.
|
This in particular defines the lookup function
.ℓ() : A × Σ ↷ A
for the hardlink forest (A, ≤).
The fine lookup
The fine lookup operator
.ℓ() : Ѧ × Σ ↷ Ѧ
defines navigation in the derived forest
(Ѧ, ≤) according to the following table.
α |
x.ℓ(α) where x = (m,a) |
Correspondent function from the Linux kernel |
a down-name |
(m, a.ℓ(α)) |
__d_lookup |
r | x.r | |
p | x.p | |
p¯ | x.p¯ |
follow_dotdot without
equality check of nd->path and nd->root ,
follow_mount
|
mf | x.mf |
lookup_mnt / follow_down_one |
f | x.f |
follow_mount / follow_down |
mb | x.mb |
follow_up |
b | x.b |
iteration of follow_up in follow_dotdot |
-
The function
follow_down
-version-2.6.37 corresponds to
follow_down_one
-version-2.6.38.
-
The function
follow_down
-version-2.6.38 does not seem to have
a counterpart in version 2.6.37.
Down-front naming
We define partial functions .σt : Ѧ ↷ Σdown
and
.ℓt() : Ѧ × Σdown ↷ Ѧ
by
-
.σt = .b.a.σ ↾ Ѧ.f,
-
.ℓt(α) = .ℓ(α).f
for every down-name α.
(Recall that .a is the projection (m,a) ↦ a).
Proposition:
(Ѧ, ≤t, Σ, .σt)
is a naming forest with
.ℓt() the correspondent down-lookup.
□
The .ℓ3 lookup
The partial function
.ℓ3() : Ѧ × Σ ↷ Ѧ
is defined as follows
α |
.ℓ3(α) |
a down-name |
.ℓ(α).f (= .ℓt(α)) |
r |
.rf |
p¯ |
.p¯f |
Informally,
up-steps are performed in the up-front forest,
down-steps are performed in the down-front forest.
S4: Current root
By an S4 structure we mean an S3 structure
equipped with (r)
where
- r is a derived arrow called the current root.
We have the following correspondence:
Our notation |
Linux kernel |
r |
current->fs->root |
The .ℓ4 lookup
The lookup operator
.ℓ4() : Ѧ × Σ ↷ Ѧ
is defined according to the following table:
α |
x.ℓ4(α) |
Linux kernel function |
a down-name |
x.ℓ(α).f |
do_lookup |
r | x.r.f | |
/ | r | |
·· |
r.f | if x ≤ r ≤ x.b, |
x.b.p¯.f | otherwise |
|
follow_dotdot |
Proposition:
- (A)
The following holds for differences
between .ℓ4() and .ℓ3():
-
The down lookup is coincident.
-
Root lookups .ℓ4(r) and .ℓ3(r)
coincide
iff
Ѧ.r ⊆ Ѧ.f.
-
Root lookups .ℓ4(/) and .ℓ3(r)
coincide
iff
{r} = Ѧ.r.
-
Parent lookups
.ℓ4(··) and .ℓ3(p¯) coincide
iff
r ∈ Ѧ.r ⊆ Ѧ.f.
- (B)
Denote .p¯4 = .ℓ4(··).
-
(Ѧ, .p¯4) is a forest.
-
Ѧ.r4 = Ѧ.r.f ∪ {r.f}.
□
Ancestor pathname
The function .apn : Ѧ → Σ+
is defined according to the (first two columns of the) following table.
x.apn |
Condition |
String representation of
x.apn[0] in Linux VFS
|
[/] |
x.f = x.r4 = r.f |
x ≤ r |
"/ " |
[/] + x.ps4.pop.reverse.map{ |x| x.b.a.σ } |
x.f ≠ x.r4 = r.f |
[r] |
x.f = x.r4 ≠ r.f |
x ≰ r |
"(unreachable)/ " |
[r] + x.ps4.pop.reverse.map{ |x| x.b.a.σ } |
x.f ≠ x.r4 ≠ r.f |
The pathname x.apn is said to be the
ancestor pathname of x.
Proposition:
In general, .apn is NOT injective.
□
The canonical tree ѦⅠ
Going from the pointed lookup system
(Ѧ, Σ, .ℓ4(), r),
we denote
.rt4 : Σ∗ ↷ Ѧ
the induced pathname resolution map
and
Ͼ4 the set of all resolvable canonical paths.
We denote
ѦⅠ = r.ℓ4∗(Σdown∗),
the canonical tree.
Proposition:
The maps
Ͼ4 ѦⅠ
and
ѦⅠ Ͼ4
are inverse to each other.
□
The reachable (double-)tree ѦⅡ
We denote
ѦⅡ =
ѦⅠ ∪ r.ℓ4∗([··] + Σdown∗),
the (root-)reachable (double)-tree.
Proposition:
-
ѦⅡ = r.ℓ4∗(Ό)
i.e.
ѦⅡ is the set of all derived arrows that
are reachable from the current root r by
an .ℓ4∗()-lookup over ordinary pathnames.
-
(ѦⅡ, ≤f) = (ѦⅡ, ≤t),
i.e. restrictions of the up-front forest and the down-front forest to ѦⅡ
are equal.
-
The set of roots of the forest (ѦⅡ, ≤f) equals
{ r, r.f }.
The forest (ѦⅡ, ≤f) is a tree
if r is not a mountpoint,
otherwise it has two components:
-
r.ℓ4∗(Σdown∗) = ѦⅠ –
the
canonical
/ standard
tree,
-
r.f.ℓ4∗(Σdown∗)
= ѦⅡ ∖ ѦⅠ
– the
front
tree.
-
The restriction of .ℓ4() to
ѦⅡ × Σdown
is a down-lookup of (ѦⅡ, ≤f).
-
The structure (ѦⅡ, .p¯4, r.f)
is an algebraic tree.
-
ѦⅡ ∖ {r} ⊆ Ѧ.f,
i.e.
except possibly for the current root,
an element of ѦⅡ is never a mountpoint.
□
/
(r) |
bin |
boot |
… |
usr |
var |
|
… |
|
|
… |
|
|
|
… |
|
|
… |
|
|
|
The front
tree can be created by
the following shell commands (root privileges are assumed).
cd
-navigation between
ѦⅠ and
ѦⅡ ∖ ѦⅠ
As of GNU bash version 4.2.8,
the switches -P and -L
play role for
the cd
(1)-navigation to/from the front
tree.
This is shown in the following example
(note that symbolic links are NOT involved):
Root reachability stratification
We denote ѦⅢ = r.p∗−1
= { x ∈ Ѧ | x ≤ r }.
We can now summarize
the root-reachability stratification
ѦⅠ ⊆ ѦⅡ ⊆ ѦⅢ ⊆ ѦⅣ =
Ѧ
by .ℓ4∗-lookup over ordinary pathnames
as follows.
Stratum |
Description |
Ancestor pathname characteristics |
|
|
|
ѦⅠ = r.ℓ4∗(Σdown∗) |
The canonical tree:
derived arrows reachable via a canonical path. |
r.ℓ4∗(x.apn) = x |
|
|
ѦⅡ =
ѦⅠ ∪ r.ℓ4∗([··] + Σdown∗)
|
The reachable (double-)tree:
derived arrows reachable via an absolute path. |
r.ℓ4∗(x.apn) = x or
r.ℓ4∗([··] + x.apn.shift) = x |
|
ѦⅢ = r.p∗−1 |
The inverse reachable tree:
derived arrows under the current root. |
x.apn[0] = / |
ѦⅣ = Ѧ |
|
|
-
It is possible to have
x ∈ ѦⅠ,
y ∈ ѦⅡ ∖ ѦⅠ, and
z ∈ ѦⅢ ∖ ѦⅡ
such that
x.apn = y.apn = z.apn.
-
The following Ruby code shows a derived arrow
(originally pointed to by
/home
) outside
ѦⅢ.
Dir.chdir "/home"
Dir.chroot "/usr"
puts Dir.pwd #-> (unreachable)/home
S5: Basic file types
By an S5 structure we mean an S4 structure
equipped with (.btype)
where
- .btype is an attribute of nodes N
assigning each node its basic type, also called
the unix file type .
We introduce the following terminology / notation based on .btype:
Value y of x.btype |
Terminology for x |
Notation for the set { x ∈ X | x.btype = y } |
Linux kernel code:
x.i_mode & S_IFMT |
'-' |
regular file |
|
S_IFREG |
'd' |
directory |
X.dirs |
S_IFDIR |
'l' |
symbolic link or symlink |
X.links |
S_IFLNK |
'b' , 'c' , 'p' , 's' ,
… |
not introduced |
|
In particular, N.dirs denotes the set of all directories.
The structure is subject to the following conditions:
|
|
(S5~1) |
A.s¯ ⊆ N.dirs,
i.e. hardlink sources are directories.
|
(S5~2) |
M.s¯.t ⊆ N.dirs,
i.e. mount sources point to directories.
|
(S5~3) |
r.a.t ∈ N.dirs,
i.e. the current root points to a directory.
|
The .ℓ5 lookup
The .ℓ5() lookup operator is defined as
the restriction of .ℓ4() to directories.
More precisely,
x.ℓ5(α) = y
iff
y = x.ℓ4(α) and x.a.t is a directory.
S6: Symbolic links
By an S6 structure we mean an S5 structure
equipped with (.lpath)
where
- .lpath is a function N.links → Σ∗.
If x is a symbolic link, then x.lpath is called the
target pathname of x.
Path resolution stack
By a path resolution stack
we mean an element of Ѧ × Σ+∗,
i.e. it is a pair
(a, paths) where
-
a is a derived arrow,
-
paths is a (possibly empty) list of non-empty pathnames.
We denote PRS the set of all path resolution stacks.
A path resolution stack corresponds to the nameidata
structure
from the Linux kernel source code according to the following table.
The PRS structure (a,paths) |
The nameidata structure |
a |
path |
paths |
saved_names |
paths.length |
depth |
We define the partial operator
.pp : Σ+∗ ↷ Σ+∗
by
-
paths.pp is undefined if paths is empty, else
-
paths.pp = paths.pop if paths.last.length = 1, else
-
paths.pp = paths.pop + [paths.last.shift].
This means that .pp removes the first name from the last pathname in
paths and if this results in an empty last list then this list is removed too.
The partial operation .pF : PRS ↷ PRS is then defined
by
- (A)
(a,paths).pF is undefined if paths is empty and
a.a.t is not a symbolic link.
- (B)
Else if a.a.t is a symbolic link
-
(a,paths).pF = (a.p, paths)
if a.a.t.lpath is empty, else
-
(a,paths).pF = (a.p, paths + [a.a.t.lpath]).
- (C)
Else if a.a.t is a directory,
denote α = paths.last[0].
-
(a,paths).pF is undefined if a.ℓ4(α) is undefined, else
-
(a,paths).pF = (a.ℓ4(α), paths.pp).
- (D)
Otherwise (a.a.t is neither a directory nor a symbolic link)
-
(a,paths).pF is undefined.
For x ∈ PRS, the value x.pF, if defined,
is called the full resolution successor of x.
The partial operation of a normal resolution successor
.pN : PRS ↷ PRS is defined as a restriction
of .pF
to Ѧ × Σ++.
Definition domains of .pF and .pN can
be summarized as follows.
Condition for x = (a, paths) ∈ PRS,
α = paths.last[0] |
POSIX error code |
Full resolution successor x.pF |
Normal resolution successor x.pN |
paths = [ ] and |
a.a.t ∈ N.links |
|
defined |
undefined |
a.a.t ∉ N.links |
undefined |
paths ≠ [ ] and |
a.a.t ∉ N.dirs ∪ N.links |
ENOTDIR |
a.a.t ∈ N.dirs and
a.ℓ4(α) is undefined
| ENOENT |
otherwise |
|
defined and x.pF = x.pN
|
Proposition:
-
.pF (resp. .pN) has no fixed points.
-
The monounary algebra (PRS, p¯F)
(resp. (PRS, p¯N)) is NOT an algebraic forest in general.
□
In the following subsections,
we will use the endpoint (alias root) partial functions
.rF, .rN : PRS ↷ PRS
induced by monounary algebras (PRS, p¯F)
and (PRS, p¯N), respectively.
Link follower
The (strict direct) link follower partial function
.plnk : Ѧ ↷ Ѧ
is defined by
a.plnk = b |
iff |
- (a, [ ]).pF.rN = (b, [ ]), and
- a ≠ b.
|
Note that only symbolic links can have link followers.
Link target
The link target partial function
.lt : Ѧ ↷ Ѧ
is defined by
a.lt = b |
iff |
(a, [ ]).rF = (b, [ ]).
|
Proposition:
a.lt = b iff
a.rlnk = b and b.a.t ∉ N.links
(i.e. b is the link target of a iff
b is the last link follower of a and does not point to a symlink).
□
Linked lookup
The linked lookup operators
.ℓF(), .ℓN() : Ѧ × Σ ↷ Ѧ
are defined by
a.ℓF(α) = b |
iff |
(a, [[α]]).rF = (b, [ ])
|
(.ℓN() is defined the same way,
with F replaced by N).
Proposition:
Let α ∈ Σ,
d ∈ Ѧ such that d.a.t is directory. Then
.ℓN(α) = |
.lt.ℓ5(α) |
|
d.ℓN(α) = |
d.ℓ5(α) |
.ℓF(α) = |
.lt.ℓ5(α).lt |
d.ℓF(α) = |
d.ℓ5(α).lt |
□
Pathname resolution
The pathname resolution operators
.ϖF(), .ϖN() : Ѧ × Σ∗ ↷ Ѧ
are defined by
a.ϖF(p) = b |
iff |
- p = [ ] and a = b, or
- p ≠ [ ] and (a, [p]).rF = (b, [ ]),
|
and a.ϖN(p) by the same way,
with F replaced by N.
This means that a nonempty pathname p is resolved to the derived arrow
b relative to a derived arrow a
if the path resolution stack (a, [p])
resolves to the endpoint (b, [ ]).
Proposition:
-
.ϖF() = .ϖN().lt.
-
.ϖF() = .ℓF∗()
(the pathname lookup induced by .ℓF()).
-
The previous statement with F replaced by N.
□
Considering the pair of partial maps
.ϖF() and .ϖN()
as a single partial map
ϖ :
Ѧ × Σ∗ × {F, N} ↷ Ѧ,
we obtain the following correspondence with the Linux kernel source code:
|
Linux kernel source code |
Function | Arguments |
Return value |
Function | Arguments |
Input | Output |
ϖ |
|
|
vfs_path_lookup |
|
|
| a.m |
|
| mnt |
|
| a.a |
|
| dentry |
|
| p |
|
| name |
|
| rtype |
|
| flags & LOOKUP_FOLLOW |
|
| |
b |
| |
nd.path |
F / N modes in use
The following table shows examples of Linux system calls and shell commands
divided according to which pathname resolution mode they use.
F mode (full pathname resolution .ϖF()) |
N mode (normal pathname resolution .ϖN()) |
stat (2) |
lstat (2) |
stat (1) -L |
stat (1) |
file (1) -L |
file (1) -h |
realpath (3) |
|
|
readlink (3) |
Induced pathname trees
Going from pointed lookup systems
(Ѧ, Σ, .ℓF(), r) and
(Ѧ, Σ, .ℓN(), r),
we denote
.rtF, .rtN : Σ∗ ↷ Ѧ
the induced pathname resolution maps
and
ϾF, ϾN the induced pathname trees,
i.e. ϾF and ϾN are
the sets all canonical pathnames resolvable via
.ϖF() and .ϖN(), respectively.
Proposition:
-
ϾN.p¯ ⊆ ϾF ⊆ ϾN.
-
Ͼ4 ⊆ ϾN.
-
.rtN ↾ Ͼ4
=
.rt4 ↾ Ͼ4.
□
Virtual identity layers
There are four basic layers of identity of filesystem objects,
corresponding to the function chain
Identity layers are summarized in the following table.
Notation |
Terminology |
N |
Filesystem nodes |
|
A |
Hardlinks |
Low-level filesystem identity |
Ѧ |
Derived arrows (namespace points) |
Operating system identity |
ϾF / ϾN |
Resolvable canonical pathnames |
Browsing identity |
Hierarchical structures of particular layers do NOT correspond in general.
More precisely, the following diagram does not necessarily commute.
ϾF |
|
Ѧ |
|
A |
|
N |
↑ |
.p¯ |
↑ |
.p¯4 |
↑ |
.p¯ |
⤉ |
ϾF |
|
Ѧ |
|
A |
|
N |
The non-commutativity of the left square relates to
the difference between
logical (.p¯.rtF)
and
physical (.rtF.p¯4)
dot-dot handling
[],
[],
[].
Further resolvability restrictions
We define pathname resolution operators
.ϖ6F() and
.ϖ6N() as restrictions of
.ϖF() and
.ϖN(), respectively,
by imposing symbolic link limits (group B) according to the following table
(only the F-variant is shown):
Condition for x = (a, [p]),
a ∈ Ѧ, p ∈ Σ∗ |
POSIX error code |
Group |
Terminology |
Description |
A |
Endpoint existence |
x.rF is defined |
ELOOP |
Intermediate resolvability / pathname vanishment |
x.rF.paths = [ ] |
ENOTDIR |
ENOENT |
B |
Symlink nesting limit |
y.paths.length ≤ 8 for every y ∈ x.psF |
ELOOP |
Symlink count limit |
x.psF.select{ |y| y.a.a.t ∈ N.links }.length ≤ 40 |
ELOOP |
C |
Pathname size limit |
Not expressible in the S6 structure |
ENAMETOOLONG |
Access restriction |
Restriction of .ℓ4(), not expressible in the S6 structure |
EACCES |
D |
Pathname properness |
(a) p is non-empty |
ENOENT |
(b) / ∉ p.shift |
|
(c) p has no extraordinary name |
For illustration, other restrictive conditions are listed:
-
(A)
Restrictions already applied to .ϖF().
-
(C)
Restrictions to be introduced in a further refinement of the S6 structure.
-
(D)
Rejecting extraordinary pathnames.
Proposition:
-
.ϖ6F(p+q) ⊆ .ϖ6F(p).ϖ6F(q)
for every pathnames p, q.
Due to the symlink count limit,
equality does not hold in general.
(I.e. .ϖ6F() is a pathname resolution operator,
but – in contrast to .ϖF() –
not a pathname lookup operator.)
-
The previous statement with F replaced by N.
Bibliographic references
|
|
Tim Berners-Lee, Roy Fielding, Larry Masinter,
Uniform Resource Identifier (URI): Generic Syntax,
STD 66, RFC 3986,
2005,
http://labs.apache.org/webarch/uri/rfc/rfc3986.html
|
|
Daniel P. Bovet, Marco Cesati,
Understanding the Linux Kernel, 3rd Edition,
O'Reilly,
2005
|
|
Andries Brower,
The Linux kernel: Some remarks on the Linux kernel,
2003,
http://www.win.tue.nl/~aeb/linux/lk/lk.html
|
|
R.T.E.H Dresens,
Testing the Linux VFS with two mCRL2 models,
Master thesis,
2006,
|
|
Miguel A. Ferreira, José N. Oliveira,
An Integrated Formal Methods Tool-Chain and its Application to Verifying a File System Model,
Formal Methods: Foundations and Applications, LNCS 5902,
2009,
|
|
Bruce Fields,
Notes on the Linux kernel,
http://www.fieldses.org/~bfields/kernel/
|
|
Glenn S. Fowler, David G. Korn, K.-Phong Vo,
An Efficient File Hierarchy Walker,
In Proc. of the Summer '89 Usenix Conference,
1989,
|
|
Andy Galloway, Gerald Lüttgen, Jan Tobias Mühlberg, Radu I. Siminiceanu,
Model-Checking the Linux Virtual File System,
10th International Conference, VMCAI,
2009,
|
|
Wim H. Hesselink, Muhammad Ikram Lali,
Formalizing a hierarchical file system,
Electronic Notes in Theoretical Computer Science (ENTCS), Volume 259,
2009,
|
|
IEEE and The Open Group,
IEEE Std. 1003.1-2004: Portable Operating System Interface (POSIX) Base Specifications, Issue 6,
2004,
http://pubs.opengroup.org/onlinepubs/009695399/
|
|
IEEE and The Open Group,
IEEE Std. 1003.1-2008: Portable Operating System Interface (POSIX) Base Specifications, Issue 7,
2008,
http://pubs.opengroup.org/onlinepubs/9699919799/
|
|
Java Community Process, Day Software AG,
JCR v2.0 Specification,
2009,
http://www.day.com/specs/jcr/2.0/
|
|
Michael Kerrisk,
The Linux Programming Interface: A Linux and UNIX System Programming Handbook,
No Starch Press,
2010,
|
|
(The GNU/Linux community)
Linux man pages,
2011,
http://linux.die.net/man/
|
|
Robert Love,
Linux Kernel Development, 2nd Edition,
Sams Publishing,
2005
|
|
Wolfgang Mauerer,
Professional Linux Kernel Architecture,
Wiley Publishing, Inc.,
2008
|
|
David Flanagan, Yukihiro Matsumoto,
The Ruby Programming Language,
O'Reilly
2008
|
|
(nLab authors),
The nLab,
2011,
http://ncatlab.org/nlab/
|
|
Ondřej Pavlata,
The Ruby Object Model: Data Structure in Detail,
2011,
http://www.atalon.cz/rb-om/ruby-object-model
|
|
Oracle Corporation,
MySQL 5.0 Reference Manual,
2011,
http://dev.mysql.com/doc/refman/5.0/en/
|
|
Chet Ramey,
Bash - the GNU shell,
;login: the USENIX Association newsletter,
December 1994,
http://tiswww.case.edu/php/chet/bash/article.pdf
|
|
Claudia Salzberg Rodriguez, Gordon Fischer, Steven Smolski,
The Linux Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures,
Prentice Hall PTR,
2005
|
|
Gayathri Tk, Jayush Luniya,
RTFS : Real Time File System,
2005,
http://moss.csc.ncsu.edu/~mueller/rt/rt05/readings/g7/finalReport.pdf
|
|
Al Viro,
Re: silent semantic changes with reiser4,
LWN.net,
2004,
http://lwn.net/Articles/100183/
|
Browser compatibility
To be viewed correctly,
this document requires advanced browser features to be supported, including
-
CSS level 3,
-
HTML entities (Ϲ, Ͼ, ⤉, Ѧ, |, , , ¯, ·, ř, ‖, ‣, ›, ℅, ℓ, ℕ, Ⅰ, Ⅱ, Ⅲ, Ⅳ, →, ↠, ↦, ↷, ↾, ⇡, ∅, ∋, ∖, ∗, ∞, ≠, ≰, ⊎, ⋈, ⋯, Ά, Ό, □, ○, ◐, ◑, ◖, ◗, ϖ, Λ, Σ, Τ, α, ∩, ∪, é, ≥, >, ↔, …, ∈, ←, ≤, <, —, −, , –, ∉, ‰, ′, →, σ, ⊆, ×, ↑, ü, ),
-
JavaScript.
Document history
June | 2 | 2011 |
The initial release.
|
August | 27 | 2018 |
Correction in the definition of a multidigraph:
.s and .t are functions A → N.
|
September | 2 | 2018 |
Corrections in the Ordered tree section.
The ≤L+ order renamed to ≤L,
the ≤L− order removed.
|
License
This work is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.