Understanding the Keybase filesystem
(For a gentler introduction to KBFS, see our launch announcement.)
The Keybase filesystem (KBFS) is a distributed filesystem with end-to-end encryption and a global namespace. The KBFS code is open source.
“Distributed” means you can access it from any device.
“Filesystem” means that there is no sync model -- files stream in and out on demand. Among other things, that means that files on KBFS don’t permanently take up space on your devices. (KBFS does use the local disk for temporary and transient data; see the "Local disk usage policy" section below for more details.)
“End-to-end encryption” means that all data stored in KBFS have guaranteed integrity and authentication, and also confidentiality when desired, and that only the people intended to read or write a piece of data can do so. In particular, we (Keybase) cannot change, read, or even know the names of your private files.
“Global namespace” means that each file on KBFS has a single unique path, regardless of the device from which you access it.
The Keybase namespace
A Keybase path has the form
/keybase/private/canonical_top_level_folder_name/subpath. Files and
/keybase/public are signed, and files and folders under
/keybase/private are encrypted in addition to being signed.
A top-level folder (TLF) is a subdirectory of either
/keybase/private. A canonical name for a public folder is in the
writer1,writer2,..., and a canonical name for a private folder
writer1,writer2,...#reader1,reader2,..., where each writer or reader
is a keybase username, and the list of writers and readers are
alphabetized separately. The writers for a TLF can both read and write
to that TLF, whereas readers can only read.
The canonical name for a TLF encodes which keybase users can read or write to it. For a public folder, it is guaranteed that only the writers for that folder have written to it. This is verified by the keybase client, which checks the signatures of the updates to that folder against the writers’ public keys.
For a private folder, it is guaranteed that only the writers for that folder have written to it, and only the writers and readers for that folder can read it. This is also verified by the keybase client.
For further details on the crypto design of KBFS, see https://keybase.io/docs/kbfs-crypto.
You can also access keybase files through paths with non-canonical TLF names. The simplest example of a non-canonical TLF name is one where the order of the writers or reader is not alphabetical, but more useful examples involve using assertions instead of usernames. See https://keybase.io/docs/command_line for assertion syntax.
akalintrivially resolves to the Keybase user
fakalin@twitterresolves to the Keybase user who has proven ownership of the
fakalinaccount at Twitter.
fakalin@twitter+akalin@githubresolves to the Keybase user who has proven ownership of the
fakalinaccount at Twitter as well as the
akalinaccount at GitHub.
At the filesystem level, each non-canonical TLF name resolves as a symlink to the canonical TLF name.
At a high level, end-user devices are trusted and Keybase/KBFS/other servers are untrusted. On desktops, we run all KBFS processes as the current user and use OS-level secret stores, but we don’t attempt to protect against other processes owned by the same user or root.
The KBFS client doesn’t trust any data coming from Keybase or KBFS servers, and verifies any received data against the relevant users’ public keys. For the nitty-gritty details, see the KBFS crypto doc. In particular, the KBFS servers cannot see into the contents or structure of your (non-public) files.
That having been said, the KBFS servers knows what users can access which data, and will only serve data to an authorized reader of a TLF with a valid session. Furthermore, they will only serve historical (archived) data to writers of a TLF, even public ones.
A TLF’s security is defined on a per-user basis, but implemented on a per-device basis. Most of the time this is invisible, but not always. In particular, when you install Keybase on a new device, that new device won’t be able to read private data unless another of your devices is online (and it might take a few minutes to rekey). If it’s your first device, you won’t be able to read from pre-existing shared private folders until another writer’s device is online. When revoking a device, new data will be encrypted such that the new device has no access, but old data is not re-encrypted (though our servers won’t serve it to revoked devices).
There are no guarantees as to the relative ordering of operations in two different TLFs.
A TLF is best thought of as a linear sequence of changes. If only a single device is operating on a TLF, then each change it makes appends to this sequence, which is called the “master branch”. However, if there are multiple devices, then a change from another device may be added to the master branch before one from the current device. In that case, a separate branch exclusive to the local device is forked off. In this state (called “staged”), operations on the local device aren’t normally visible to any other devices, although they’re still persisted locally (or, with journaling turned off, on KBFS servers) in case of process or device restarts. Then a background process periodically attempts to merge this branch into the master branch, resolving conflicts as necessary. When this succeeds, the changes from the local device are visible to everyone else (but see the section on conflict resolution below).
Within a single device, KBFS then behaves more or less like a normal (i.e., POSIX-compliant) filesystem, except for the exceptions listed below. However, it’s difficult to make general statements about the relative ordering and visibility of operations in a TLF between two different devices. But in general, once an application does an fsync that is sent to the KBFS servers successfully from a device, all the previously-written data will eventually be visible to a second device. Note that a file write is a purely local operation---writes to a file on a device will be invisible to a second device until the next sync or close, and after the sync they will eventually be visible to a second device. There is also a background process that occasionally syncs file data, in case the application does not sync or close the files.
Currently, KBFS requires network connectivity, and no offline reads are possible unless the data being read happens to be cached in memory. In theory offline writes are possible, and will be queued up on your local disk until network access is available. However, writing often involves reading first, especially for updating the directory that contains the file, so in practice the lack of offline reads could hinder offline writes.
For performance reasons, when KBFS receives a write call (either a file write or a directory modification), KBFS buffers the write in memory and responds successfully, before the new data is persisted on disk or on the servers. This is POSIX-compliant, but due to the latencies involved, KBFS holds data in memory for longer than most mounted file systems. By default, it holds data for up to 1 second (or until 100 operations, or 25 MB of file data, have been buffered).
fsync call causes all buffered data to be flushed immediately
and synchronously to either the local disk or the KBFS server,
depending on the journal configuration (see below). KBFS doesn't
currently have any optimizations for syncing individual files -- it's
all or nothing.
Applications may inspect the currently-buffered data in the special
.kbfs_status file that can be read within your TLF (e.g.,
/keybase/private/me,you/.kbfs_status). In that file, there is a list
of "DirtyPaths", indicating which files and directories have data that
is held only in memory.
By default, KBFS uses a persistent journal on your local disk to store any changes you make to a TLF temporarily, until they can be saved on our servers. This makes the writes faster, decouples your network latency from your file system latency, and provides KBFS opportunities for rolling several changes together and saving you bandwidth. This applies both to file writes and directory updates. Note that all data in this journal are encrypted before being written to disk.
The use of a journal means that a sync or close of a file does NOT
ensure the data has made it to our KBFS servers and will soon be
visible to other devices. Data in the journal are flushed to the
servers in the background. Your Keybase app icon will change to
include an up arrow while data is uploading from the journal. You can
also check the status of your journal through the TLF's
file (see above). In that file, you can also see which local
directory is being used for the journal; see the "Local disk usage
policy" section below for more details.
If you want stronger semantics, or if you want to avoid using any disk space for KBFS data even temporarily, you can disable journaling altogether, or on a per-TLF basis. For example, on Linux and macOS you can do the following:
- Persistently turn off all journaling for TLFs accessed in the
echo 1 > /keybase/.kbfs_disable_auto_journals. This doesn't affect TLFs that might already be using journaling; you'll have to disable each of those manually.)
- Turn off journaling for one TLF:
echo 1 > /keybase/private/me,you/.kbfs_disable_journal. This only works if the journal is empty.
In comparison to sync-based systems like Dropbox, the use of a journal
gives stronger ordering guarantees between file system operations on
the same device, since KBFS strictly uploads data in the order it was
written. If you know there will only be one device at a time writing
to a particular TLF, this means it's fairly safe to run something like
git in a KBFS folder, even if other devices can be reading from it
at the same time, since you're not at risk of repo corruption if the
read happens at the wrong time or the device stalls or fails. (We
have future work on our roadmap for making
git safe to use for
writing across multiple devices.)
Our conflict resolution strategy is similar to Dropbox's, but because we have stronger filesystem semantics than they do, we're able to do things in a slightly different way that more closely match the behavior of a local FS when two users are updating it concurrently (only relevant for corner cases). Here is roughly what the resolutions look like:
- When both devices do make non-conflicting changes to the same directory, those will get merged trivially.
- When both devices write to the same file, the "loser" file will
get copied to a new name, marked with the name of the user who did
the write, and the time at which the resolution happened. So if
both users with to file
a/b.txt, but user "bob" loses the race, after resolution you should see
a/b.conflicted (bob’s macbook copy 2015-11-24).txt, where “bob” is the name of the user who wrote the file, and “macbook” is the public name of the device on which bob wrote the file. KBFS does not currently attempt to merge the contents of the two copies, for any type of file.
- When both devices create a file with the same name, the same resolution as above will happen.
- When both devices create a directory with the same name, they will be intelligently merged (dealing with children conflicts recursively).
- If one device creates a directory, and another creates a file,
using the same name, the file is always renamed with the
.conflicted...suffix mentioned above, in order to preserve the directory structure as best as possible.
- Unlike Dropbox, if "alice" creates a file
a/foo, but "bob" does
mv a b, the resolution should only have the file
b/foo. That is, the updates to a directory follow that directory across renames. This is the same way it works in the terminal, if you're in a directory and someone moves that directory out from under you.
- If the devices cause a rename cycle, it's resolved with
symlinks. So for example, "alice" does
mv b/ a/and bob does
mv a/ b/. If alice wins the initial race, the resolution will look like
a/b/a, where the second
ais a symlink pointing to
- One weirdness is when alice does
mv a/ b/and bob concurrently does
mkdir b. Ideally, we would merge those two directories, but implementing that is very tricky and expensive, so right now the code treats it as a conflict.
Deviations from POSIX
Permissions are determined entirely by the TLF name, and so there are no POSIX-style permissions.
Hard links are not supported. Symbolic links are supported, but will only be globally meaningful if they refer to other KBFS paths.
O_APPEND are not supported, although they may work if
the file is operated on only from a single device. In particular,
nothing that does file locking (like git) should be used from multiple
devices yet, and appending to a single shared file (e.g., a log file)
from multiple devices should be avoided. We may support either or both
of these in the future.
KBFS does not support atime, since that turns a read into a write and would require all readers to also be writers. Also, it'd be slow.
Typical POSIX attributes like file owner, group, and permissions don't make much sense in KBFS. KBFS sets the owner of all files and directories to the UID of the local user that is running the KBFS process. Read and write permissions are set based on whether the user has read or write access to the TLF. For example:
- Non-executable files in writable private TLF:
- Executable files in writable private TLF:
- Private subdirectories:
- Public subdirectories that user has write access to:
- Read-only public subdirectories:
Note that permissions and ownership of a TLF itself may be incorrect until one has accessed the TLF.
Any requests to change the owner or group (e.g., via
chown) or to
set permissions (excluding the executable bit) will appear to succeed,
but the change will not be saved or propagated to other clients.
Ideally we would fail these calls, but too many applications (such as
unzip) fail miserably when those calls fail. In addition,
any attribute change request that doesn't result in a real change to
the underlying KBFS metadata (including setting the executable bit
when it is already set, for example) does not update the corresponding
ctime for the directory entry. This is a violation of POSIX, but it's
an important optimization for some common workloads (e.g.,
Symbolic links can lead outside a TLF
The Keybase client allows symbolic links that lead outside a TLF. This is by design, and we envision a variety of great use cases:
- a subfolder of your public folder, where you link to friends' public folders you endorse
- storing private links to all your favorite private folders
- a link entirely outside KBFS to another global filesystem you endorse. For example, to something in IPFS.
You should therefore take care to consider the possibility of blindly following a symbolic link - without noticing - by someone you don't trust. As an example, if you ran a webserver and naively served content in someone's
/keybase/public/ folder - with your server configured to follow symbolic links - a user could trick you into serving your own secret files back out.
Storage, Quotas and History
The KBFS servers store your data in opaque blobs called blocks. Both files and directories within a TLF are stored as blocks, and the servers can't tell which block belong to which file or directory within the TLF. The data in these blocks are encrypted, and their size are increased (i.e., padded) to avoid leaking information to our servers.
Each user has a quota, expressed in a number of bytes. Whenever you
write to a file or change a TLF’s structure, only the blocks that you
change count against your quota. The complete size of each block
(including encryption and padding) is what counts towards your quota.
Note that due to KBFS internal data structures, changing a file or
directory also changes all of the directories on the path back to the
root of the TLF. So, for example, if you edit a file at
/keybase/private/you/a/b/c/foo, you've ended up changing at least
five blocks: the TLF root directory block for
the subdirectory blocks for
a/b/c, and the file
You can check your quota usage using
df on Linux and macOS. Note
du also works (though it might be very slow) - however,
only counts the plaintext size of the files, and includes data written
by any user, not just you.
Blocks that are deprecated due to a new change (e.g., if you overwrite a previous version of a file, or add a new directory entry) are marked for cleanup, and a background process on each client cleans up these old blocks periodically, subtracting their size from the original writer's quota usage. Normally this will happen about a minute after the blocks are marked for cleanup, but could be delayed if all clients currently accessing that folder go offline.
Coming soon: a way to tune this cleanup process so that old, historical versions of the TLF can still be accessed just as they were.
Local disk usage policy
As mentioned above, KBFS streams data into and out of your device on demand, and doesn't store data permanently on your disk. However, for performance reasons, KBFS does use your local disk in two different ways, limiting the amount of space it uses based on the amount of disk space currently available on the disk partition storing your local home directory. (This only applies to desktop devices at the moment, since KBFS is not yet available on mobile devices.) All data stored to disk is first encrypted.
- Temporary local writes: After files are written to KBFS, but before they are uploaded to the servers, they will temporarily use disk space on your device -- see the "Journaled writes" section above. We limit this usage to 85% of your available disk space, up to a maximum of 170 GB. These files will be deleted as soon as they sync successfully to the KBFS servers.
- On-disk transient cache: KBFS also stores data in a transient cache on disk to improve performance. This is limited to 10% of your available disk space, up to a maximum of 20 GB. If other applications start using more of your disk space, data will be evicted from this cache automatically to maintain the overall usage percentage.
There is currently no way to adjust any of these limits, or the locations of the directories.
In addition, the KBFS process writes log files to your Keybase log directory. The KBFS logs are limited to about 400 MB total.
Each Keybase user has their own list of "favorites" that appear under
/keybase/public. Whenever you access a new
directory (e.g., you run
ls /keybase/public/malgorithms@twitter), it
will be added to your favorites list under its canonical name (e.g.,
You can remove entries from your favorites list using
rmdir for the
canonical TLF name (e.g.,
(There is a known bug in current releases that sometimes prevents you
from re-adding a favorite right after you've deleted it from a
different device. If you experience this, try
ls /keybase/public on the new device to refresh your favorites
list, and access the folder in question again to re-add it.)
For most users,
keybase log send should suffice. This packages up
some log files and sends them to Keybase admins.
Log files may contain metadata (names, sizes, etc.) about your files.
For the more curious, on OS X, the easiest way to access the KBFS logs
is via Console.app). On
the left under FILES, it should be under
keybase.service.log may also be useful. You can
then either copy and paste a portion of the displayed lines, or drag
and drop the “keybase.kbfs.log” to attach the entire file, or
right-click and select “Reveal in Finder” to find the actual file.
There are a number of special invisible KBFS files that either have
debugging info or turns on and off KBFS settings. They all start with
.kbfs_, and KBFS won’t let you create files with that prefix.
Since these files aren’t listed by default, you’ll need to use the terminal to access these.
From any folder, the following files are accessible:
.kbfs_error: contains a list of the last few errors and their stack traces.
.kbfs_metrics: contains a list of some metrics (mostly RPC-related).
.kbfs_profiles/: contains files representing Golang profiles.
From within a TLF the following additional files are also accessible:
.kbfs_status: lists some status info about the current TLF.
.kbfs_update_history: shows a JSON-formatted list of all the revisions for this TLF, including what operations were done when, and by which authorized TLF writer. This fetches all revisions from the server, and may be very slow for TLFs with long histories. It contains a lot of internal debugging information and may be hard to read by someone who's not a KBFS developer; making a friendlier version is future work.
.kbfs_fileinfo_XXX(where XXX is the name of a file or directory in that TLF subdirectory): shows some debugging information about the given file, including who last claimed to have written it (this is shown without explicit cryptographic verification -- verification is done at the TLF-level, not at the individual file level).