Web File-Manager
A proposal of a web application
A web file-manager is proposed, mostly in the form of a JavaScript demo-application.
A simple naming structure is designed,
utilizing the concept of
web
symbolic links.
The demo-application focuses on request-response logic
of naming structure manipulation.
There are 6 types of operation presented:
the usual
F5–F8
copy
,
move
,
mkdir
and
delete
operations
and the special, link-related,
mklink
and
edlink
.
Operation logic is described via characteristic cases
which are organized in specially tailored tables.
The demo-application can be considered as part of the design
of a real web application.
Online
demo
The demo application is best accessed via project tables, see
Projects.
Button-like links are provided below.
Author
| Ondřej Pavlata |
| Jablonec nad Nisou |
| Czech Republic |
|
 |
|
Document date
Initial release | July 15, 2011 |
Last major release |
July 15, 2011 |
Last update | July 15, 2011 |
|
Warning
-
This document has been created without any prepublication review
except those made by the author himself.
-
The author is a habitual user of Total Commander
[].
Table of contents
Introduction
Advocacy of hierarchical filesystems
Over the last 30 years,
the concept of a hierarchical filesystem has been
the most comprehensible, and
widespread or even the most used
model of data storage.
The model can be understood as an abstraction of spatial containment.
Similarly to the physical
world, a directory might mean either of the
following:
- a container,
- a container plus its contents.
Note that this ambiguity also applies to the semantics of words like box
[],
or pack
[]
(presumably, most of the human languages contain ambiguities of this sort).
Operational semantics follows the physical
world conventions:
-
Deleting a directory means destroying both the container and its contents.
-
Moving a directory means moving both the container and its contents.
-
If a, b are directories not contained in each other,
then a change in content of a does not affect the content of b.
In an ever increasing measure,
limitations of hierarchical filesystems have been recognized,
in particular that of the single classification constraint:
Users are forced to choose just one of possibly many classifications
for the filed
data.
The problem is illustrated in the following example.
A : 1-dimensional structure |
|
B : 2-dimensional structure |
↓ Sibling directories |
agency-x
|
causa-rabbit
|
course-en
|
my-work
|
info
|
| |
Source → |
my-work
|
info
|
Topic ↓ |
agency-x
|
|
|
causa-rabbit
|
|
|
course-en
|
|
|
|
Suppose that a report has to be worked out as part of a collaboration with
agency X
.
Suppose that this is a complex report that does not fit suitably in a single
filesystem object - it is rather a directory, say report-12
,
with multiple file content.
In the A case, the problem is that both
agency-x
and my-work
are candidates for
where report-12
should be put to.
In the B case, the problem is that artificial
order must
be imposed to the multiple criteria (here Source and Topic) so that
the report is then stored either in
agency-x/my-work
or
my-work/agency-x
.
Numerous solutions have been proposed to support multiple classification
within file systems,
giving rise to so called semantic file systems.
Two main approaches can be distinguished
[],
[]:
-
Augmented approach.
The file system consists of a hierarchical filesystem (observable to the user)
equipped with a tool-set that provides alternative view(s) of data.
-
Integrated approach.
This approach tries to bypass hierarchical filesystems altogether.
An integrated semantic file system
has no intrinsic subsystem correspondent to a hierarchical filesystem.
As of 2011,
after at least two decades of research in the field,
proposed solutions
([],
[],
[],
[],
[])
seem to possess significant limitations. In particular:
-
Insufficient support for spatial containment.
Classification (tagging) semantics is established at the expense of containment
semantics.
-
Insufficient support for local naming.
Names that identify file system objects have global scope
so that the possibility of naming conflicts
is substantially higher than with local identifiers.
As a consequence, none of the solutions seems to be acceptable for practical use,
even at the conceptual level.
Experiences in the field of semantic file systems show that
the idea of
bypassing
hierarchical filesystems is not very reasonable.
Here is a quotation from the conclusion made by Yoann Padioleau,
one of the researchers behind the Logic File System
[]:
Trying to replace hierarchical filesystems by a Logic File System
was a noble research idea, but a bad idea.
Using LFS as an additional filesystem,
an additional way to access your actual hierarchical filesystem is far better;
less ambitious, but better.
We can draw our conclusion as follows:
Filesystem of the Web
There are 3 main alternatives of what can be understood under the term
hierarchy:
- (A) A tree
[]
– used in the context of hierarchical filesystems.
- (B) A forest
[].
- (C) A partially ordered set
– used e.g. within the term hierarchy of concepts
[].
We can write (A) ⊂ (B) ⊂ (C), meaning that (A) is the most strict sense.
If we use the (B) interpretation we can arguably claim that
a significant portion of the World Wide Web
is constituted of (or is expressible as) a hierarchical filesystem.
A global filesystem – just like the Web itself is global.
The Web can be considered as a huge storage medium consisting of web servers.
Most servers host hierarchical filesystems or at least contain data that can be
meaningfully expressed (and manipulated) as hierarchical filesystems.
To discover how the concept of the Web as a filesystem is currently supported we pose
the following questions:
-
Which standards (documents, research papers, concepts)
do describe the filesystem view of the Web?
-
Which web application does provide global filesystem access of the Web?
(Give the URL of such an application.)
As of 2011, the answers to these questions seems to be: 1. None, 2. None.
Main goals
This proposal aims at providing a filesystem view of the Web.
It is considered natural that such a view is provided
by a web application
– a Web File Manager
accessible via a web browser.
Main goals for the file manager can be summarized as follows.
- Provide a world-wide web-based Virtual File System accessible via a web browser.
- Use existing filesystem-conformant parts of the web, in particular, FTP sites.
- Make the filesystem operation as convenient as with desktop applications
(so that the user can have the feeling of
the web being added to his/her computer).
Data model
Data is categorized into layers and spheres.
This yields a 2d-partitition as follows.
Spheres → |
Session data |
local
|
World-Wide-Web |
Layers ↓ |
Account layer (accounts) |
|
|
|
Base layer (base nodes) |
|
|
|
The base layer
The base layer is a huge web-wide forest of base nodes
(or just nodes).
Thus, it can be written as a structure (N, ≤),
where
- N is the set of base nodes,
- ≤ is a partial order (a descendancy relation) on N
such that every component is a finite tree in the usual sense.
Three basic types of base nodes are recognized:
- a directory,
- a (regular) file, and
- a (rich (symbolic)) link.
As usual, non-directory nodes are not allowed to have descendants
(i.e. they have to be among the leaves of the forest).
The account layer
The account layer is a web-wide set of hosts and accounts,
denoted H and A, respectively.
There is the usual containment relation between hosts and accounts:
each account is contained
in exactly one host.
A forest structure (A, ≤) might be assumed,
with per-host components,
but in comparison with the base forest,
the descendancy of accounts is supposed to be quite shallow and less important.
Accounts are partially mapped to directories of the base forest
via the target-directory function .t : A ↷ N.
There can be accounts with the target directory undefined.
There is an initial account a
for which .t is always defined.
Most accounts have a connection state: they can be connected or disconnected.
Contexted nodes
We introduce the set Ѧ of (account-)contexted nodes
by
- Ѧ = { (a,n) ∈ A × N | n ≤ a.t }
i.e. a contexted node (a, n) is a base node n
together with the context of an account a
such that n belongs to the tree rooted
at
a's target directory.
Obviously, the structure (Ѧ, ≤) where
- (a,n) ≤ (b,m) iff a = b and n ≤ m
is a component-wise finite forest.
Names
Accounts and base nodes are labelled with component names
or just names (or also components).
The naming domain Σ
consists of three mutually disjoint domains of component names:
- Σdown –
local names (e.g.
orange
, docs
),
also called down names,
- Σroot –
global names (e.g.
ftp.wave.vw::timber
),
also called root names,
- Σspec –
special names (e.g.
..
or .
).
The following conditions apply to name labelling:
-
Each non-root base node (i.e. any node which has a parent directory)
has a local name which is unique within its parent directory.
-
Each account has a unique global name.
-
The initial account is named by
::
.
Global names
Global names are of the form
<host name>::<user name>
where
<host name>
is the host name,
<user name>
is the account name within the host.
Lookup
Navigation between contexted nodes is established using
lookup operators
.ℓ1(), .ℓ2() : Ѧ × Σ ↷ Ѧ
as follows.
- n.ℓ1(α) = m if one of the following holds:
-
α is a local name and
m is an α-labelled child of n,
-
α is a global name and
m equals x.t for anα-labelled account x,
-
α is
".."
and m is the parent of n,
-
α is
"."
and m = n.
- .ℓ2() is a restriction of .ℓ1() such that:
-
Only directory nodes are allowed to be in the domain.
-
Accounts which have connection state must be connected.
-
For a global name α,
the partial application .ℓ2(α) : Ѧ ↷ Ѧ
is constant (or empty).
-
Further version of the lookup operator
is parametrized with
connection passdata that can be used for
establishing account connection
on the fly
.
Pathnames
Pathnames or shortly paths
are just (finite) sequences of component names,
denoted Σ∗.
There is a usual nomenclature of pathnames, in particular,
-
A path starting with a global name is called absolute.
-
If, in addition, its all other names are local then it is called canonic.
Pathname resolution
Paths are used to address nodes by path resolution
which is a function based on composition .ℓ2() lookups.
There are several kinds of path resolution.
The most important criterium is how links are involved.
Similarly to POSIX symlinks,
any link node n provides a path,
n.lpath.
If the link is encountered,
this path is potentially subject to the resolution process.
The following modes of link resolution (of path resolution) are considered:
- literal - perform no link resolution,
- full - resolve all links,
- normal - resolve all links except for the last member of the path
See appendix for details.
Resolvability of absolute paths is independent on the
directory where the resolution starts.
The logical forest
The logical forest
(or logical view, cf.
[],
[])
corresponds to canonical paths
resolvable in normal link resolution, i.e.
it is a structure (Ͼ, ≤) where
-
Ͼ is set of canonical paths resolvable in normal resolution mode,
and
-
≤ is a descendancy relation defined by path extension,
i.e. p is a descendant of q iff
p, as a sequence, is an extension of q.
Links
Links are similar to POSIX symbolic links.
The main differences are as follows:
- A link cannot be resolved to a non-directory.
- A link can provide additional parameters for global name resolution.
There are two basic kinds of links: plain and non-plain.
A plain link just contains the lpath.
The content of a non-plain link is roughly of the form
<global name> +
<connection parameters> +
<subsequent path>
.
The components <global name>
and
<subsequent path>
determine the lpath.
Connection parameters provide data to accomplish
global name resolution
pertinent to <global name>
.
In particular, they may contain
- the name of the protocol suggested to access resources,
- a password.
Note that the only way how the connection parameters can affect the path resolution
is (un)restriction.
Non-plain links provide a persistent counterpart to a login form.
The persistency is up to the password.
By default,
the password is not stored with the link -
it is only used for establishing account connection.
VFS spheres
The VFS operates on world-wide-web data.
However, part of the data structure is a session data that
is only pertinent to the VFS instance.
In particular, the root account (named by ::
)
and the relevant base node tree is transient.
Data manipulation (VFS methods)
The VFS provides methods to view and modify its data.
The following table lists modification methods which are considered mandatory to the
manager.
Method Name |
Brief Description |
Analogues |
HTTP / WebDAV |
POSIX Utility
[]
|
delete |
Delete items from the VFS.
For each item, resolve it to a tree of the base forest,
and delete the tree, node after node.
|
DELETE |
rm |
copy |
Copy items to a given location.
For each item, resolve it to a tree of the base forest,
and copy the tree, node after node, to the requested location.
|
COPY |
cp |
move |
Rename or move items.
For each item, resolve it to a tree of the base forest,
and either rename the root of the tree
or move the tree, node after node, to the requested location,
or both.
|
MOVE |
mv |
mkdir |
Create a directory or directories with the requested (path)name. |
MKCOL |
mkdir |
mklink |
Create a new link and store it in the requested location. |
|
ln -s (*) |
edlink |
Update a link - edit the content of an existing link. |
|
|
-
Data retrieval methods are not listed - they are undefined by this proposal.
-
No methods for account retrieval / manipulation are defined by this proposal.
-
Unlike HTTP/WebDAV methods which work on a single-resource-tree basis,
delete
, copy
, move
, and mkdir
work on multiple items.
- (*)
Since Windows Vista / Windows Server 2008,
the New Technology File System (NTFS) supports the mklink command
[].
WFM operation
WFM versus VFS
The Virtual Filesystem can be considered a sub-application of the Web File Manager.
The WFM can be understood in both the broad and narrow sense:
|
WFM (broad sense) |
WFM (narrow sense) |
VFS |
| |
The file manager allows a user to view and/or modify the VFS.
The manager and the VFS act in the client-server manner.
The VFS receives requests from the manager, processes them and returns responses.
Tasks and turns
The manager provides
VFS manipulation via manipulation tasks or simply tasks.
A task can be considered interactive counterpart to a VFS method.
The interactivity is achieved by the following:
- adding UI (dialog) control,
- iteration.
Task operation is illustrated by the following diagram:
|
File Manager |
|
Directory View |
← |
|
↓ (select items, start a task) ↓ |
¦ ¦ ¦
|
← |
Dialog |
← |
↓ | | ↑ |
Request |
|
Response |
↓ | | ↑ |
|
VFS |
↓ | | ↑ |
Request |
|
Response |
→ |
process (apply a VFS method) |
→ |
| |
Task iteration cycles are called turns.
The VFS works on turn basis:
performance of a VFS method corresponds to a part of a single turn
(green-bordered boxes).
The WFM (in narrow sense) is responsible for
- updating directory views,
- displaying the response obtained from the VFS,
- providing the most suitable interface for user control, in particular, for
- building the next turn request if the task is
unfinished
.
Demo application
Main goals
- Provide a start point for application development.
In particular:
- Describe proposed VFS methods.
- Describe request-response iteration (the turn logic).
- Give a picture about possible look and feel.
Technical parameters
-
No server-side present, the demo-app runs completely in the browser as
a Javascript application.
-
PHP only used as a dynamic assemblage tool.
-
No (external) Javascript library or Ajax framework used.
-
More than 30 thousand lines of code, more than 1 thousand classes.
Browser support
The application has been developed for use in the newest (as of 2011) Gecko or WebKit
based browsers.
The following provides browser compatibility status according to tests.
Browser | Version | Status |
 | Firefox |
3.6 4.0 |
Recommended – the development browser |
 | Google Chrome |
(12.0) |
Recommended |
 | Safari |
5.0 |
Recommended |
 | Opera |
11.11 |
Usable, with deficiencies (most notably, there are screen update problems).
|
 | Internet Explorer |
7.0 8.0 |
Usable, with layout deficiencies.
|
Projects
A project is a data entry containing complete VFS + manager state.
The demo-application is equipped with a set of projects such that each project
exhibits some characteristics of operational logic.
Projects are organized in a naming hierarchy,
and can be accessed / applied
via project tables which are tailored for reasoning about operational logic.
Path resolution
PathResol > * > *
The main
table contains the following items:
PathResol > * > * > Tail
The table refers to cases which can occur
when
path resolution stops: (the conditions are required to hold simultaneously)
- at zero link-depth,
- at a directory,
- due to local lookup fault.
The unresolved rest of the path is called path tail.
PathResol > * > * > RootUp
The table refers parent lookup faults.
Going up the initial root
Going up a root (after a link)
Going up a root (after a root-name)
PathResol > * > * > RootName
The table refers to the very unusual cases when the path (directly) contains
a root-name.
Final to a (root-named) directory
Going up a root (after a link)
Going up a root (after a root-name)
Link manipulation
Link > * > *
The table refers to cases of link creation / modification.
Entirely resolvable
Link content is resolvable to a target directory.
Empty Host and/or User Name
The Host and/or User Name field has empty value.
Invalid Host and/or User Name
The Host and/or User Name field has non-empty invalid value.
Host not found
There is no account with the requested host.
User not found
There is no account with the given root-name.
Empty Password
A password is required for the root-name resolution
and the given password is empty.
Invalid Password
The given password is invalid.
Invalid Directory Path
The subsequent directory path does not conform to the naming domain.
Unresolved Directory Path
The subsequent directory path is unresolved.
No broken link is encountered - the resolution stops at zero link-depth.
Subsequently Broken Link
The subsequent directory path refers to a broken link.
Operation
Oper > *
Oper > * > Overwrite
The table refers to cases of name conflicts encountered in copy/move operations.
Overwrite files (as items)
Both source item path and target item path are resolved to regular files.
Overwrite files (as nodes during merge)
A source-target directory merge occurs.
A subpath is resolved to a regular file in both the source and target directory trees.
Merge directories
Both source item path and target item path are resolved to directories
so that a source-target directory merge is suggested.
Merge directories, overwrite files
A combination of the previous two cases.
(The overwrite files as nodes
case occurs after multiple iteration.)
Type mismatch (items)
Either the source item path is resolved to a directory and
the target item path is resolved to a non-directory or vice versa.
Type mismatch (nodes, during merge)
A source-target directory merge occurs.
A subpath is resolved to a directory in the source tree
and to a non-directory in the target tree or vice versa.
Oper > * > StartPos
The table refers to cases of start position within the first item's (source) tree.
Start before the whole item (default)
By default, the start position is placed before the item.
Start after the whole item (abnormal)
The opposite of the previous case.
This is an artificial case – it should not occur during standard user operation.
Start inside the item
The start position is placed before/after some node within the first item's (source) tree.
Such a request is created after a turn with incomplete operation
(the stop position of a turn becomes the start position of the subsequent turn).
Invalid subpath component
The start position subpath contains an invalid component.
This is an artificial case.
Nonexistent node
The start position subpath is not resolved.
This can occur if the subpath's node is deleted by another agent
in-between
turns.
Subpath having a dotdot
The start position subpath contains '..
'.
This is an artificial case.
Subpath having a root-name
The start position subpath contains a root-name.
This is an artificial case.
Oper > * > StartPos > Point
The table refers to cases of visit
logic.
During tree deletion, copy or move, tree nodes are both pre-visited and post-visited.
Node copy occurs on pre-visit, node deletion occurs on post-visit.
Before pre-visit
The start position is in the pre-visit phase before the node refered to by
the given subpath.
After pre-visit
The start position is in the pre-visit phase after the node.
Before post-visit
The start position is in the post-visit phase before the node.
After post-visit
The start position is in the post-visit phase after the node.
Basic tables
For the most part, the following tables present a re-grouping of already described
cases.
Delete > Basic
Leaf node
Leaf node deletion.
Non-empty directory
Non-leaf node deletion.
Resilient node
A non-deletable node encountered during tree deletion.
Incorrect start position
See Oper > * > StartPos > Nonexistent node.
Iterated non-leaf node
A non-leaf node is encountered on post-visit during tree deletion.
This case can occur for internal
start positions.
MkDir > Basic
The table refers to cases of directory creation.
Single new directory (standard)
A single new directory is requested and created.
Multiple items (nonstop)
Multiple new directories created without any naming conflict.
Existing directory
Creation of an already existing directory is requested.
Existing directory (special names)
Directories refered to by '..
', '.
' and '/
'
to be created.
Missing ancestors
Some intermediate paths refer to non-existing directories.
Multiple items (various stops)
Combination of cases (a) existing directories and (b) missing ancestors
via multiple items.
Existing non-directory
A naming conflict with an existing non-directory.
Non-directory parent
Path prefix refers to a regular file.
Non-directory parent (via a link)
Path prefix refers to a regular file (via a link).
New subpath having a dot-segment
Directory chain creation; the tail subpath contains a '.
'.
New name having a trail
Directory chain creation; the tail subpath ends with '/
'.
Invalid name
A name outside the naming domain is requested.
New subpath having dotdot
The requested tail subpath contains '..
' (not supported).
Root name
Creation of a root-named directory is requested.
Multiple errors
Multiple errors occur in combination.
* > Basic
Single directory copied/moved
A single directory copy/move.
Single directory copied/moved & renamed
A single directory copy/move with a new end
name.
Missing target directory (missing ancestors)
The requested target directory has to be (auto)created.
Invalid target directory path
The target directory path contains an invalid name.
Non-directory target parent
The target directory path points to a regular file.
Invalid target name
The requested target name is invalid.
Incorrect start position
The start position subpath does not point to an existing node.
Target-source descendancy
Requested target directory is a descendant of the source directory in the base forest.
Target-source identity
Requested target directory is identical to the source directory.
* > Overwrite
The following can be considered another examples of already described cases.
Overwrite files (as items)
Overwrite files (as nodes during merge)
Merge directories
Merge directories, overwrite files
Type mismatch (items)
Type mismatch (nodes, during merge)
* > Links
Copy / Move link items
An example of link copy/move.
Default
Initial state
Suggested initial state of the file manager.
A link named 'a
' is to be created in the i-links
built-in directory.
MultiTask
DirView
The table refers to cases of directory view state.
Bibliographic references
|
|
Stephan Bloehdorn, Max Völkel,
TagFS – Tag Semantics for Hierarchical File Systems,
2006,
|
|
Lisa Dusseault,
WebDAV: Next-Generation Collaborative Web Authoring,
Prentice Hall PTR,
2003,
|
|
L.M. Dusseault (Ed.),
HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV),
RFC 4918,
2007,
http://www.webdav.org/specs/rfc4918.html
|
|
Farlex, Inc.,
The Free Dictionary,
http://www.thefreedictionary.com/
|
|
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee,
Hypertext Transfer Protocol -- HTTP/1.1,
RFC 2616,
1999,
http://labs.apache.org/webarch/http/draft-fielding-http/rfc2616.html
|
|
Christian Ghisler,
Total Commander,
http://www.ghisler.com/
|
|
Burra Gopal, Udi Mamber,
Integrating Content-Based Access Mechanisms with Hierarchical File Systems,
1999,
http://www.cis.upenn.edu/~bcpierce/courses/dd/papers/gopal.pdf
|
|
IEEE and The Open Group,
IEEE Std. 1003.1-2008: Portable Operating System Interface (POSIX) Base Specifications, Issue 7,
2008,
http://pubs.opengroup.org/onlinepubs/9699919799/
|
|
Georges N'dou Kouame, Kouassi N'goran,
Semantic file Systems,
2005,
http://stromboli.it-sudparis.eu/~bernard/ASR/04-05/projets/systeme-fichiers-semantique/systeme-fichiers-semantique.pdf
|
|
David Ingram,
Insight: A Semantic File System,
2008,
http://www.dmi.me.uk/code/insight/final-report.pdf
|
|
Microsoft Corporation,
TechNet,
http://technet.microsoft.com/
|
|
Object Services and Consulting, Inc.,
Semantic File Systems,
1997,
http://www.objs.com/survey/OFSExt.htm
|
|
Yoann Padioleau,
Logic File System,
(wiki),
2008,
http://padator.org/wiki/wiki-LFS/doku.php
|
|
Yoann Padioleau, Olivier Ridoux,
A Logic File System,
2003,
http://www.usenix.org/event/usenix03/tech/full_papers/full_papers/padioleau/padioleau.pdf
|
|
Ondřej Pavlata,
The Linux VFS Model: Naming structure,
2011,
http://www.atalon.cz/vfs-m/linux-vfs-model/
|
|
Chet Ramey,
Bash - the GNU shell,
;login: the USENIX Association newsletter,
December 1994,
http://tiswww.case.edu/php/chet/bash/article.pdf
|
|
Tx0,
Tagsistant,
2010,
http://www.tagsistant.net/
|
|
J. Whitehead, G. Clemm, J. Reschke,
Web Distributed Authoring and Versioning (WebDAV) Redirect Reference Resources,
RFC 4437,
2006,
http://greenbytes.de/tech/webdav/rfc4437.html
|
|
R. Wille,
Restructuring lattice theory: an approach based on hierarchies of concepts,
1982,
|
Epilogue
Don't deceive yourself. You did know it – you have always known it.
George Orwell, 1984
License
This work is licensed under a
Creative Commons Attribution 3.0 License.