mirror of
https://github.com/gluster/glusterdocs.git
synced 2026-02-05 15:47:01 +01:00
adding missing feature planning pages to 3.5 planning
Signed-off-by: shravantc <shravantc@ymail.com>
This commit is contained in:
151
Feature Planning/GlusterFS 3.5/Brick Failure Detection.md
Normal file
151
Feature Planning/GlusterFS 3.5/Brick Failure Detection.md
Normal file
@@ -0,0 +1,151 @@
|
||||
Feature
|
||||
-------
|
||||
|
||||
Brick Failure Detection
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
This feature attempts to identify storage/file system failures and
|
||||
disable the failed brick without disrupting the remainder of the node's
|
||||
operation.
|
||||
|
||||
Owners
|
||||
------
|
||||
|
||||
Vijay Bellur with help from Niels de Vos (or the other way around)
|
||||
|
||||
Current status
|
||||
--------------
|
||||
|
||||
Currently, if the underlying storage or file system failure happens, a
|
||||
brick process will continue to function. In some cases, a brick can hang
|
||||
due to failures in the underlying system. Due to such hangs in brick
|
||||
processes, applications running on glusterfs clients can hang.
|
||||
|
||||
Detailed Description
|
||||
--------------------
|
||||
|
||||
Detecting failures on the filesystem that a brick uses makes it possible
|
||||
to handle errors that are caused from outside of the Gluster
|
||||
environment.
|
||||
|
||||
There have been hanging brick processes when the underlying storage of a
|
||||
brick went unavailable. A hanging brick process can still use the
|
||||
network and repond to clients, but actual I/O to the storage is
|
||||
impossible and can cause noticible delays on the client side.
|
||||
|
||||
Benefit to GlusterFS
|
||||
--------------------
|
||||
|
||||
Provide better detection of storage subsytem failures and prevent bricks
|
||||
from hanging.
|
||||
|
||||
Scope
|
||||
-----
|
||||
|
||||
### Nature of proposed change
|
||||
|
||||
Add a health-checker to the posix xlator that periodically checks the
|
||||
status of the filesystem (implies checking of functional
|
||||
storage-hardware).
|
||||
|
||||
### Implications on manageability
|
||||
|
||||
When a brick process detects that the underlaying storage is not
|
||||
responding anymore, the process will exit. There is no automated way
|
||||
that the brick process gets restarted, the sysadmin will need to fix the
|
||||
problem with the storage first.
|
||||
|
||||
After correcting the storage (hardware or filesystem) issue, the
|
||||
following command will start the brick process again:
|
||||
|
||||
# gluster volume start <VOLNAME> force
|
||||
|
||||
### Implications on presentation layer
|
||||
|
||||
None
|
||||
|
||||
### Implications on persistence layer
|
||||
|
||||
None
|
||||
|
||||
### Implications on 'GlusterFS' backend
|
||||
|
||||
None
|
||||
|
||||
### Modification to GlusterFS metadata
|
||||
|
||||
None
|
||||
|
||||
### Implications on 'glusterd'
|
||||
|
||||
'glusterd' can detect that the brick process has exited,
|
||||
`gluster volume status` will show that the brick process is not running
|
||||
anymore. System administrators checking the logs should be able to
|
||||
triage the cause.
|
||||
|
||||
How To Test
|
||||
-----------
|
||||
|
||||
The health-checker thread that is part of each brick process will get
|
||||
started automatically when a volume has been started. Verifying its
|
||||
functionality can be done in different ways.
|
||||
|
||||
On virtual hardware:
|
||||
|
||||
- disconnect the disk from the VM that holds the brick
|
||||
|
||||
On real hardware:
|
||||
|
||||
- simulate a RAID-card failure by unplugging the card or cables
|
||||
|
||||
On a system that uses LVM for the bricks:
|
||||
|
||||
- use device-mapper to load an error-table for the disk, see [this
|
||||
description](http://review.gluster.org/5176).
|
||||
|
||||
On any system (writing to random offsets of the block device, more
|
||||
difficult to trigger):
|
||||
|
||||
1. cause corruption on the filesystem that holds the brick
|
||||
2. read contents from the brick, hoping to hit the corrupted area
|
||||
3. the filsystem should abort after hitting a bad spot, the
|
||||
health-checker should notice that shortly afterwards
|
||||
|
||||
User Experience
|
||||
---------------
|
||||
|
||||
No more hanging brick processes when storage-hardware or the filesystem
|
||||
fails.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
Posix translator, not available for the BD-xlator.
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
The health-checker is enabled by default and runs a check every 30
|
||||
seconds. This interval can be changed per volume with:
|
||||
|
||||
# gluster volume set <VOLNAME> storage.health-check-interval <SECONDS>
|
||||
|
||||
If `SECONDS` is set to 0, the health-checker will be disabled.
|
||||
|
||||
For further details refer:
|
||||
<https://forge.gluster.org/glusterfs-core/glusterfs/blobs/release-3.5/doc/features/brick-failure-detection.md>
|
||||
|
||||
Status
|
||||
------
|
||||
|
||||
glusterfs-3.4 and newer include a health-checker for the posix xlator,
|
||||
which was introduced with [bug
|
||||
971774](https://bugzilla.redhat.com/971774):
|
||||
|
||||
- [posix: add a simple
|
||||
health-checker](http://review.gluster.org/5176)?
|
||||
|
||||
Comments and Discussion
|
||||
-----------------------
|
||||
443
Feature Planning/GlusterFS 3.5/Disk Encryption.md
Normal file
443
Feature Planning/GlusterFS 3.5/Disk Encryption.md
Normal file
@@ -0,0 +1,443 @@
|
||||
Feature
|
||||
=======
|
||||
|
||||
Transparent encryption. Allows a volume to be encrypted "at rest" on the
|
||||
server using keys only available on the client.
|
||||
|
||||
1 Summary
|
||||
=========
|
||||
|
||||
Distributed systems impose tighter requirements to at-rest encryption.
|
||||
This is because your encrypted data will be stored on servers, which are
|
||||
de facto untrusted. In particular, your private encrypted data can be
|
||||
subjected to analysis and tampering, which eventually will lead to its
|
||||
revealing, if it is not properly protected. Specifically, usually it is
|
||||
not enough to just encrypt data. In distributed systems serious
|
||||
protection of your personal data is possible only in conjunction with a
|
||||
special process, which is called authentication. GlusterFS provides such
|
||||
enhanced service: In GlusterFS encryption is enhanced with
|
||||
authentication. Currently we provide protection from "silent tampering".
|
||||
This is a kind of tampering, which is hard to detect, because it doesn't
|
||||
break POSIX compliance. Specifically, we protect encryption-specific
|
||||
file's metadata. Such metadata includes unique file's object id (GFID),
|
||||
cipher algorithm id, cipher block size and other attributes used by the
|
||||
encryption process.
|
||||
|
||||
1.1 Restrictions
|
||||
----------------
|
||||
|
||||
1. We encrypt only file content. The feature of transparent encryption
|
||||
doesn't protect file names: they are neither encrypted, nor verified.
|
||||
Protection of file names is not so critical as protection of
|
||||
encryption-specific file's metadata: any attacks based on tampering file
|
||||
names will break POSIX compliance and result in massive corruption,
|
||||
which is easy to detect.
|
||||
|
||||
2. The feature of transparent encryption doesn't work in NFS-mounts of
|
||||
GlusterFS volumes: NFS's file handles introduce security issues, which
|
||||
are hard to resolve. NFS mounts of encrypted GlusterFS volumes will
|
||||
result in failed file operations (see section "Encryption in different
|
||||
types of mount sessions" for more details).
|
||||
|
||||
3. The feature of transparent encryption is incompatible with GlusterFS
|
||||
performance translators quick-read, write-behind and open-behind.
|
||||
|
||||
2 Owners
|
||||
========
|
||||
|
||||
Jeff Darcy <jdarcy@redhat.com>
|
||||
Edward Shishkin <eshishki@redhat.com>
|
||||
|
||||
3 Current status
|
||||
================
|
||||
|
||||
Merged to the upstream.
|
||||
|
||||
4 Detailed Description
|
||||
======================
|
||||
|
||||
See Summary.
|
||||
|
||||
5 Benefit to GlusterFS
|
||||
======================
|
||||
|
||||
Besides the justifications that have applied to on-disk encryption just
|
||||
about forever, recent events have raised awareness significantly.
|
||||
Encryption using keys that are physically present at the server leaves
|
||||
data vulnerable to physical seizure of the server. Encryption using keys
|
||||
that are kept by the same organization entity leaves data vulnerable to
|
||||
"insider threat" plus coercion or capture at the organization level. For
|
||||
many, especially various kinds of service providers, only pure
|
||||
client-side encryption provides the necessary levels of privacy and
|
||||
deniability.
|
||||
|
||||
Competitively, other projects - most notably
|
||||
[Tahoe-LAFS](https://leastauthority.com/) - are already using recently
|
||||
heightened awareness of these issues to attract users who would be
|
||||
better served by our performance/scalability, usability, and diversity
|
||||
of interfaces. Only the lack of proper encryption holds us back in these
|
||||
cases.
|
||||
|
||||
6 Scope
|
||||
=======
|
||||
|
||||
6.1. Nature of proposed change
|
||||
------------------------------
|
||||
|
||||
This is a new client-side translator, using user-provided key
|
||||
information plus information stored in xattrs to encrypt data
|
||||
transparently as it's written and decrypt when it's read.
|
||||
|
||||
6.2. Implications on manageability
|
||||
----------------------------------
|
||||
|
||||
User needs to manage a per-volume master key (MK). That is:
|
||||
|
||||
1) Generate an independent MK for every volume which is to be
|
||||
encrypted. Note, that one MK is created for the whole life of the
|
||||
volume.
|
||||
|
||||
2) Provide MK on the client side at every mount in accordance with the
|
||||
location, which has been specified at volume create time, or overridden
|
||||
via respective mount option (see section How To Test).
|
||||
|
||||
3) Keep MK between mount sessions. Note that after successful mount MK
|
||||
may be removed from the specified location. In this case user should
|
||||
retain MK safely till next mount session.
|
||||
|
||||
MK is a 256-bit secret string, which is known only to user. Generating
|
||||
and retention of MK is in user's competence.
|
||||
|
||||
WARNING!!! Losing MK will make content of all regular files of your
|
||||
volume inaccessible. It is possible to mount a volume with improper MK,
|
||||
however such mount sessions will allow to access only file names as they
|
||||
are not encrypted.
|
||||
|
||||
Recommendations on MK generation
|
||||
|
||||
MK has to be a high-entropy key, appropriately generated by a key
|
||||
derivation algorithm. One of the possible ways is using rand(1) provided
|
||||
by the OpenSSL package. You need to specify the option "-hex" for proper
|
||||
output format. For example, the next command prints a generated key to
|
||||
the standard output:
|
||||
|
||||
$ openssl rand -hex 32
|
||||
|
||||
6.3. Implications on presentation layer
|
||||
---------------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
6.4. Implications on persistence layer
|
||||
--------------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
6.5. Implications on 'GlusterFS' backend
|
||||
----------------------------------------
|
||||
|
||||
All encrypted files on the servers contains padding at the end of file.
|
||||
That is, size of all enDefines location of the master volume key on the
|
||||
trusted client machine.crypted files on the servers is multiple to
|
||||
cipher block size. Real file size is stored as file's xattr with the key
|
||||
"trusted.glusterfs.crypt.att.size". The translation padded-file-size -\>
|
||||
real-file-size (and backward) is performed by the crypt translator.
|
||||
|
||||
6.6. Modification to GlusterFS metadata
|
||||
---------------------------------------
|
||||
|
||||
Encryption-specific metadata in specified format is stored as file's
|
||||
xattr with the key "trusted.glusterfs.crypt.att.cfmt". Current format of
|
||||
metadata string is described in the slide \#27 of the following [ design
|
||||
document](http://www.gluster.org/community/documentation/index.php/File:GlusterFS_transparent_encryption.pdf)
|
||||
|
||||
6.7. Options of the crypt translator
|
||||
------------------------------------
|
||||
|
||||
- data-cipher-alg
|
||||
|
||||
Specifies cipher algorithm for file data encryption. Currently only one
|
||||
option is available: AES\_XTS. This is hidden option.
|
||||
|
||||
- block-size
|
||||
|
||||
Specifies size (in bytes) of logical chunk which is encrypted as a whole
|
||||
unit in the file body. If cipher modes with initial vectors are used for
|
||||
encryption, then the initial vector gets reset for every such chunk.
|
||||
Available values are: "512", "1024", "2048" and "4096". Default value is
|
||||
"4096".
|
||||
|
||||
- data-key-size
|
||||
|
||||
Specifies size (in bits) of data cipher key. For AES\_XTS available
|
||||
values are: "256" and "512". Default value is "256". The larger key size
|
||||
("512") is for stronger security.
|
||||
|
||||
- master-key
|
||||
|
||||
Specifies pathname of the regular file, or symlink. Defines location of
|
||||
the master volume key on the trusted client machine.
|
||||
|
||||
7 Getting Started With Crypt Translator
|
||||
=======================================
|
||||
|
||||
1. Create a volume <vol_name>.
|
||||
|
||||
2. Turn on crypt xlator:
|
||||
|
||||
# gluster volume set `<vol_name>` encryption on
|
||||
|
||||
3. Turn off performance xlators that currently encryption is
|
||||
incompatible with:
|
||||
|
||||
# gluster volume set <vol_name> performance.quick-read off
|
||||
# gluster volume set <vol_name> performance.write-behind off
|
||||
# gluster volume set <vol_name> performance.open-behind off
|
||||
|
||||
4. (optional) Set location of the volume master key:
|
||||
|
||||
# gluster volume set <vol_name> encryption.master-key <master_key_location>
|
||||
|
||||
where <master_key_location> is an absolute pathname of the file, which
|
||||
will contain the volume master key (see section implications on
|
||||
manageability).
|
||||
|
||||
5. (optional) Override default options of crypt xlator:
|
||||
|
||||
# gluster volume set <vol_name> encryption.data-key-size <data_key_size>
|
||||
|
||||
where <data_key_size> should have one of the following values:
|
||||
"256"(default), "512".
|
||||
|
||||
# gluster volume set <vol_name> encryption.block-size <block_size>
|
||||
|
||||
where <block_size> should have one of the following values: "512",
|
||||
"1024", "2048", "4096"(default).
|
||||
|
||||
6. Define location of the master key on your client machine, if it
|
||||
wasn't specified at section 4 above, or you want it to be different from
|
||||
the <master_key_location>, specified at section 4.
|
||||
|
||||
7. On the client side make sure that the file with name
|
||||
<master_key_location> (or <master_key_new_location> defined at section
|
||||
6) exists and contains respective per-volume master key (see section
|
||||
implications on manageability). This key has to be in hex form, i.e.
|
||||
should be represented by 64 symbols from the set {'0', ..., '9', 'a',
|
||||
..., 'f'}. The key should start at the beginning of the file. All
|
||||
symbols at offsets \>= 64 are ignored.
|
||||
|
||||
NOTE: <master_key_location> (or <master_key_new_location> defined at
|
||||
step 6) can be a symlink. In this case make sure that the target file of
|
||||
this symlink exists and contains respective per-volume master key.
|
||||
|
||||
8. Mount the volume <vol_name> on the client side as usual. If you
|
||||
specified a location of the master key at section 6, then use the mount
|
||||
option
|
||||
|
||||
--xlator-option=<suffixed_vol_name>.master-key=<master_key_new_location>
|
||||
|
||||
where <master_key_new_location> is location of master key specified at
|
||||
section 6, <suffixed_vol_name> is <vol_name> suffixed with "-crypt". For
|
||||
example, if you created a volume "myvol" in the step 1, then
|
||||
suffixed\_vol\_name is "myvol-crypt".
|
||||
|
||||
9. During mount your client machine receives configuration info from
|
||||
the untrusted server, so this step is extremely important! Check, that
|
||||
your volume is really encrypted, and that it is encrypted with the
|
||||
proper master key (see FAQ \#1,\#2).
|
||||
|
||||
10. (optional) After successful mount the file which contains master
|
||||
key may be removed. NOTE: Next mount session will require the master-key
|
||||
again. Keeping the master key between mount sessions is in user's
|
||||
competence (see section implications on manageability).
|
||||
|
||||
8 How to test
|
||||
=============
|
||||
|
||||
From a correctness standpoint, it's sufficient to run normal tests with
|
||||
encryption enabled. From a security standpoint, there's a whole
|
||||
discipline devoted to analysing the stored data for weaknesses, and
|
||||
engagement with practitioners of that discipline will be necessary to
|
||||
develop the right tests.
|
||||
|
||||
9 Dependencies
|
||||
==============
|
||||
|
||||
Crypt translator requires OpenSSL of version \>= 1.0.1
|
||||
|
||||
10 Documentation
|
||||
================
|
||||
|
||||
10.1 Basic design concepts
|
||||
--------------------------
|
||||
|
||||
The basic design concepts are described in the following [pdf
|
||||
slides](http://www.gluster.org/community/documentation/index.php/File:GlusterFS_transparent_encryption.pdf)
|
||||
|
||||
10.2 Procedure of security open
|
||||
-------------------------------
|
||||
|
||||
So, in accordance with the basic design concepts above, before every
|
||||
access to a file's body (by read(2), write(2), truncate(2), etc) we need
|
||||
to make sure that the file's metadata is trusted. Otherwise, we risk to
|
||||
deal with untrusted file's data.
|
||||
|
||||
To make sure that file's metadata is trusted, file is subjected to a
|
||||
special procedure of security open. The procedure of security open is
|
||||
performed by crypt translator at FOP-\>open() (crypt\_open) time by the
|
||||
function open\_format(). Currently this is a hardcoded composition of 2
|
||||
checks:
|
||||
|
||||
1. verification of file's GFID by the file name;
|
||||
2. verification of file's metadata by the verified GFID;
|
||||
|
||||
If the security open succeeds, then the cache of trusted client machine
|
||||
is replenished with file descriptor and file's inode, and user can
|
||||
access the file's content by read(2), write(2), ftruncate(2), etc.
|
||||
system calls, which accept file descriptor as argument.
|
||||
|
||||
However, file API also allows to accept file body without opening the
|
||||
file. For example, truncate(2), which accepts pathname instead of file
|
||||
descriptor. To make sure that file's metadata is trusted, we create a
|
||||
temporal file descriptor and mandatory call crypt\_open() before
|
||||
truncating the file's body.
|
||||
|
||||
10.3 Encryption in different types of mount sessions
|
||||
----------------------------------------------------
|
||||
|
||||
Everything described in the section above is valid only for FUSE-mounts.
|
||||
Besides, GlusterFS also supports so-called NFS-mounts. From the
|
||||
standpoint of security the key difference between the mentioned types of
|
||||
mount sessions is that in NFS-mount sessions file operations instead of
|
||||
file name accept a so-called file handle (which is actually GFID). It
|
||||
creates problems, since the file name is a basic point for verification.
|
||||
As it follows from the section above, using the step 1, we can replenish
|
||||
the cache of trusted machine with trusted file handles (GFIDs), and
|
||||
perform a security open only by trusted GFID (by the step 2). However,
|
||||
in this case we need to make sure that there is no leaks of non-trusted
|
||||
GFIDs (and, moreover, such leaks won't be introduced by the development
|
||||
process in future). This is possible only with changed GFID format:
|
||||
everywhere in GlusterFS GFID should appear as a pair (uuid,
|
||||
is\_verified), where is\_verified is a boolean variable, which is true,
|
||||
if this GFID passed off the procedure of verification (step 1 in the
|
||||
section above).
|
||||
|
||||
The next problem is that current NFS protocol doesn't encrypt the
|
||||
channel between NFS client and NFS server. It means that in NFS-mounts
|
||||
of GlusterFS volumes NFS client and GlusterFS client should be the same
|
||||
(trusted) machine.
|
||||
|
||||
Taking into account the described problems, encryption in GlusterFS is
|
||||
not supported in NFS-mount sessions.
|
||||
|
||||
10.4 Class of cipher algorithms for file data encryption that can be supported by the crypt translator
|
||||
------------------------------------------------------------------------------------------------------
|
||||
|
||||
We'll assume that any symmetric block cipher algorithm is completely
|
||||
determined by a pair (alg\_id, mode\_id), where alg\_id is an algorithm
|
||||
defined on elementary cipher blocks (e.g. AES), and mode\_id is a mode
|
||||
of operation (e.g. ECB, XTS, etc).
|
||||
|
||||
Technically, the crypt translator is able to support any symmetric block
|
||||
cipher algorithms via additional options of the crypt translator.
|
||||
However, in practice the set of supported algorithms is narrowed because
|
||||
of various security and organization issues. Currently we support only
|
||||
one algotithm. This is AES\_XTS.
|
||||
|
||||
10.5 Bibliography
|
||||
-----------------
|
||||
|
||||
1. Recommendations for for Block Cipher Modes of Operation (NIST
|
||||
Special Publication 800-38A).
|
||||
2. Recommendation for Block Cipher Modes of Operation: The XTS-AES Mode
|
||||
for Confidentiality on Storage Devices (NIST Special Publication
|
||||
800-38E).
|
||||
3. Recommendation for Key Derivation Using Pseudorandom Functions,
|
||||
(NIST Special Publication 800-108).
|
||||
4. Recommendation for Block Cipher Modes of Operation: The CMAC Mode
|
||||
for Authentication, (NIST Special Publication 800-38B).
|
||||
5. Recommendation for Block Cipher Modes of Operation: Methods for Key
|
||||
Wrapping, (NIST Special Publication 800-38F).
|
||||
6. FIPS PUB 198-1 The Keyed-Hash Message Authentication Code (HMAC).
|
||||
7. David A. McGrew, John Viega "The Galois/Counter Mode of Operation
|
||||
(GCM)".
|
||||
|
||||
11 FAQ
|
||||
======
|
||||
|
||||
**1. How to make sure that my volume is really encrypted?**
|
||||
|
||||
Check the respective graph of translators on your trusted client
|
||||
machine. This graph is created at mount time and is stored by default in
|
||||
the file /usr/local/var/log/glusterfs/mountpoint.log
|
||||
|
||||
Here "mountpoint" is the absolute name of the mountpoint, where "/" are
|
||||
replaced with "-". For example, if your volume is mounted to
|
||||
/mnt/testfs, then you'll need to check the file
|
||||
/usr/local/var/log/glusterfs/mnt-testfs.log
|
||||
|
||||
Make sure that this graph contains the crypt translator, which looks
|
||||
like the following:
|
||||
|
||||
13: volume xvol-crypt
|
||||
14: type encryption/crypt
|
||||
15: option master-key /home/edward/mykey
|
||||
16: subvolumes xvol-dht
|
||||
17: end-volume
|
||||
|
||||
**2. How to make sure that my volume is encrypted with a proper master
|
||||
key?**
|
||||
|
||||
Check the graph of translators on your trusted client machine (see the
|
||||
FAQ\#1). Make sure that the option "master-key" of the crypt translator
|
||||
specifies correct location of the master key on your trusted client
|
||||
machine.
|
||||
|
||||
**3. Can I change the encryption status of a volume?**
|
||||
|
||||
You can change encryption status (enable/disable encryption) only for
|
||||
empty volumes. Otherwise it will be incorrect (you'll end with IO
|
||||
errors, data corruption and security problems). We strongly recommend to
|
||||
decide once and forever at volume creation time, whether your volume has
|
||||
to be encrypted, or not.
|
||||
|
||||
**4. I am able to mount my encrypted volume with improper master keys
|
||||
and get list of file names for every directory. Is it normal?**
|
||||
|
||||
Yes, it is normal. It doesn't contradict the announced functionality: we
|
||||
encrypt only file's content. File names are not encrypted, so it doesn't
|
||||
make sense to hide them on the trusted client machine.
|
||||
|
||||
**5. What is the reason for only supporting AES-XTS? This mode is not
|
||||
using Intel's AES-NI instruction thus not utilizing hardware feature..**
|
||||
|
||||
Distributed file systems impose tighter requirements to at-rest
|
||||
encryption. We offer more than "at-rest-encryption". We offer "at-rest
|
||||
encryption and authentication in distributed systems with non-trusted
|
||||
servers". Data and metadata on the server can be easily subjected to
|
||||
tampering and analysis with the purpose to reveal secret user's data.
|
||||
And we have to resist to this tampering by performing data and metadata
|
||||
authentication.
|
||||
|
||||
Unfortunately, it is technically hard to implement full-fledged data
|
||||
authentication via a stackable file system (GlusterFS translator), so we
|
||||
have decided to perform a "light" authentication by using a special
|
||||
cipher mode, which is resistant to tampering. Currently OpenSSL supports
|
||||
only one such mode: this is XTS. Tampering of ciphertext created in XTS
|
||||
mode will lead to unpredictable changes in the plain text. That said,
|
||||
user will see "unpredictable gibberish" on the client side. Of course,
|
||||
this is not an "official way" to detect tampering, but this is much
|
||||
better than nothing. The "official way" (creating/checking MACs) we use
|
||||
for metadata authentication.
|
||||
|
||||
Other modes like CBC, CFB, OFB, etc supported by OpenSSL are strongly
|
||||
not recommended for use in distributed systems with non-trusted servers.
|
||||
For example, CBC mode doesn't "survive" overwrite of a logical block in
|
||||
a file. It means that with every such overwrite (standard file system
|
||||
operation) we'll need to re-encrypt the whole(!) file with different
|
||||
key. CFB and OFB modes are sensitive to tampering: there is a way to
|
||||
perform \*predictable\* changes in plaintext, which is unacceptable.
|
||||
|
||||
Yes, XTS is slow (at least its current implementation in OpenSSL), but
|
||||
we don't promise, that CFB, OFB with full-fledged authentication will be
|
||||
faster. So..
|
||||
101
Feature Planning/GlusterFS 3.5/File Snapshot.md
Normal file
101
Feature Planning/GlusterFS 3.5/File Snapshot.md
Normal file
@@ -0,0 +1,101 @@
|
||||
Feature
|
||||
-------
|
||||
|
||||
File Snapshots in GlusterFS
|
||||
|
||||
### Summary
|
||||
|
||||
Ability to take snapshots of files in GlusterFS
|
||||
|
||||
### Owners
|
||||
|
||||
Anand Avati
|
||||
|
||||
### Source code
|
||||
|
||||
Patch for this feature - <http://review.gluster.org/5367>
|
||||
|
||||
### Detailed Description
|
||||
|
||||
The feature adds file snapshotting support to GlusterFS. '' To use this
|
||||
feature the file format should be QCOW2 (from QEMU)'' . The patch takes
|
||||
the block layer code from Qemu and converts it into a translator in
|
||||
gluster.
|
||||
|
||||
### Benefit to GlusterFS
|
||||
|
||||
Better integration with Openstack Cinder, and in general ability to take
|
||||
snapshots of files (typically VM images)
|
||||
|
||||
### Usage
|
||||
|
||||
*To take snapshot of a file, the file format should be QCOW2. To set
|
||||
file type as qcow2 check step \#2 below*
|
||||
|
||||
1. Turning on snapshot feature :
|
||||
|
||||
gluster volume set `<vol_name>` features.file-snapshot on
|
||||
|
||||
2. To set qcow2 file format:
|
||||
|
||||
setfattr -n trusted.glusterfs.block-format -v qcow2:10GB <file_name>
|
||||
|
||||
3. To create a snapshot:
|
||||
|
||||
setfattr -n trusted.glusterfs.block-snapshot-create -v <image_name> <file_name>
|
||||
|
||||
4. To apply/revert back to a snapshot:
|
||||
|
||||
setfattr -n trusted.glusterfs.block-snapshot-goto -v <image_name> <file_name>
|
||||
|
||||
### Scope
|
||||
|
||||
#### Nature of proposed change
|
||||
|
||||
The work is going to be a new translator. Very minimal changes to
|
||||
existing code (minor change in syncops)
|
||||
|
||||
#### Implications on manageability
|
||||
|
||||
Will need ability to load/unload the translator in the stack.
|
||||
|
||||
#### Implications on presentation layer
|
||||
|
||||
Feature must be presentation layer independent.
|
||||
|
||||
#### Implications on persistence layer
|
||||
|
||||
No implications
|
||||
|
||||
#### Implications on 'GlusterFS' backend
|
||||
|
||||
Internal snapshots - No implications. External snapshots - there will be
|
||||
hidden directories added.
|
||||
|
||||
#### Modification to GlusterFS metadata
|
||||
|
||||
New xattr will be added to identify files which are 'snapshot managed'
|
||||
vs raw files.
|
||||
|
||||
#### Implications on 'glusterd'
|
||||
|
||||
Yet another turn on/off feature for glusterd. Volgen will have to add a
|
||||
new translator in the generated graph.
|
||||
|
||||
### How To Test
|
||||
|
||||
Snapshots can be tested by taking snapshots along with checksum of the
|
||||
state of the file, making further changes and going back to old snapshot
|
||||
and verify the checksum again.
|
||||
|
||||
### Dependencies
|
||||
|
||||
Dependent QEMU code is imported into the codebase.
|
||||
|
||||
### Documentation
|
||||
|
||||
<http://review.gluster.org/#/c/7488/6/doc/features/file-snapshot.md>
|
||||
|
||||
### Status
|
||||
|
||||
Merged in master and available in Gluster3.5
|
||||
@@ -0,0 +1,96 @@
|
||||
Feature
|
||||
=======
|
||||
|
||||
On-Wire Compression/Decompression
|
||||
|
||||
1. Summary
|
||||
==========
|
||||
|
||||
Translator to compress/decompress data in flight between client and
|
||||
server.
|
||||
|
||||
2. Owners
|
||||
=========
|
||||
|
||||
- Venky Shankar <vshankar@redhat.com>
|
||||
- Prashanth Pai <ppai@redhat.com>
|
||||
|
||||
3. Current Status
|
||||
=================
|
||||
|
||||
Code has already been merged. Needs more testing.
|
||||
|
||||
The [initial submission](http://review.gluster.org/3251) contained a
|
||||
`compress` option, which introduced [some
|
||||
confusion](https://bugzilla.redhat.com/1053670). [A correction has been
|
||||
sent](http://review.gluster.org/6765) to rename the user visible options
|
||||
to start with `network.compression`.
|
||||
|
||||
TODO
|
||||
|
||||
- Make xlator pluggable to add support for other compression methods
|
||||
- Add support for lz4 compression: <https://code.google.com/p/lz4/>
|
||||
|
||||
4. Detailed Description
|
||||
=======================
|
||||
|
||||
- When a writev call occurs, the client compresses the data before
|
||||
sending it to server. On the server, compressed data is
|
||||
decompressed. Similarly, when a readv call occurs, the server
|
||||
compresses the data before sending it to client. On the client, the
|
||||
compressed data is decompressed. Thus the amount of data sent over
|
||||
the wire is minimized.
|
||||
|
||||
- Compression/Decompression is done using Zlib library.
|
||||
|
||||
- During normal operation, this is the format of data sent over wire:
|
||||
<compressed-data> + trailer(8 bytes). The trailer contains the CRC32
|
||||
checksum and length of original uncompressed data. This is used for
|
||||
validation.
|
||||
|
||||
5. Usage
|
||||
========
|
||||
|
||||
Turning on compression xlator:
|
||||
|
||||
# gluster volume set <vol_name> network.compression on
|
||||
|
||||
Configurable options:
|
||||
|
||||
# gluster volume set <vol_name> network.compression.compression-level 8
|
||||
# gluster volume set <vol_name> network.compression.min-size 50
|
||||
|
||||
6. Benefits to GlusterFS
|
||||
========================
|
||||
|
||||
Fewer bytes transferred over the network.
|
||||
|
||||
7. Issues
|
||||
=========
|
||||
|
||||
- Issues with striped volumes. Compression xlator cannot work with
|
||||
striped volumes
|
||||
|
||||
- Issues with write-behind: Mount point hangs when writing a file with
|
||||
write-behind xlator turned on. To overcome this, turn off
|
||||
write-behind entirely OR set "performance.strict-write-ordering" to
|
||||
on.
|
||||
|
||||
- Issues with AFR: AFR v1 currently does not propagate xdata.
|
||||
<https://bugzilla.redhat.com/show_bug.cgi?id=951800> This issue has
|
||||
been resolved in AFR v2.
|
||||
|
||||
8. Dependencies
|
||||
===============
|
||||
|
||||
Zlib library
|
||||
|
||||
9. Documentation
|
||||
================
|
||||
|
||||
<http://review.gluster.org/#/c/7479/3/doc/network_compression.md>
|
||||
|
||||
10. Status
|
||||
==========
|
||||
|
||||
Code merged upstream.
|
||||
99
Feature Planning/GlusterFS 3.5/Quota Scalability.md
Normal file
99
Feature Planning/GlusterFS 3.5/Quota Scalability.md
Normal file
@@ -0,0 +1,99 @@
|
||||
Feature
|
||||
-------
|
||||
|
||||
Quota Scalability
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
Support upto 65536 quota configurations per volume.
|
||||
|
||||
Owners
|
||||
------
|
||||
|
||||
Krishnan Parthasarathi
|
||||
Vijay Bellur
|
||||
|
||||
Current status
|
||||
--------------
|
||||
|
||||
Current implementation of Directory Quota cannot scale beyond a few
|
||||
hundred configured limits per volume. The aim of this feature is to
|
||||
support upto 65536 quota configurations per volume.
|
||||
|
||||
Detailed Description
|
||||
--------------------
|
||||
|
||||
TBD
|
||||
|
||||
Benefit to GlusterFS
|
||||
--------------------
|
||||
|
||||
More quotas can be configured in a single volume thereby leading to
|
||||
support GlusterFS for use cases like home directory.
|
||||
|
||||
Scope
|
||||
-----
|
||||
|
||||
### Nature of proposed change
|
||||
|
||||
- Move quota enforcement translator to the server
|
||||
- Introduce a new quota daemon which helps in aggregating directory
|
||||
consumption on the server
|
||||
- Enhance marker's accounting to be modular
|
||||
- Revamp configuration persistence and CLI listing for better scale
|
||||
- Allow configuration of soft limits in addition to hard limits.
|
||||
|
||||
### Implications on manageability
|
||||
|
||||
Mostly the CLI will be backward compatible. New CLI to be introduced
|
||||
needs to be enumerated here.
|
||||
|
||||
### Implications on presentation layer
|
||||
|
||||
None
|
||||
|
||||
### Implications on persistence layer
|
||||
|
||||
None
|
||||
|
||||
### Implications on 'GlusterFS' backend
|
||||
|
||||
None
|
||||
|
||||
### Modification to GlusterFS metadata
|
||||
|
||||
- Addition of a new extended attribute for storing configured hard and
|
||||
soft limits on directories.
|
||||
|
||||
### Implications on 'glusterd'
|
||||
|
||||
- New file based configuration persistence
|
||||
|
||||
How To Test
|
||||
-----------
|
||||
|
||||
TBD
|
||||
|
||||
User Experience
|
||||
---------------
|
||||
|
||||
TBD
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
TBD
|
||||
|
||||
Status
|
||||
------
|
||||
|
||||
In development
|
||||
|
||||
Comments and Discussion
|
||||
-----------------------
|
||||
192
Feature Planning/GlusterFS 3.5/Zerofill.md
Normal file
192
Feature Planning/GlusterFS 3.5/Zerofill.md
Normal file
@@ -0,0 +1,192 @@
|
||||
Feature
|
||||
-------
|
||||
|
||||
zerofill API for GlusterFS
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
zerofill() API would allow creation of pre-allocated and zeroed-out
|
||||
files on GlusterFS volumes by offloading the zeroing part to server
|
||||
and/or storage (storage offloads use SCSI WRITESAME).
|
||||
|
||||
Owners
|
||||
------
|
||||
|
||||
Bharata B Rao
|
||||
M. Mohankumar
|
||||
|
||||
Current status
|
||||
--------------
|
||||
|
||||
Patch on gerrit: <http://review.gluster.org/5327>
|
||||
|
||||
Detailed Description
|
||||
--------------------
|
||||
|
||||
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
|
||||
the specified range. This fop will be useful when a whole file needs to
|
||||
be initialized with zero (could be useful for zero filled VM disk image
|
||||
provisioning or during scrubbing of VM disk images).
|
||||
|
||||
Client/application can issue this FOP for zeroing out. Gluster server
|
||||
will zero out required range of bytes ie server offloaded zeroing. In
|
||||
the absence of this fop, client/application has to repetitively issue
|
||||
write (zero) fop to the server, which is very inefficient method because
|
||||
of the overheads involved in RPC calls and acknowledgements.
|
||||
|
||||
WRITESAME is a SCSI T10 command that takes a block of data as input and
|
||||
writes the same data to other blocks and this write is handled
|
||||
completely within the storage and hence is known as offload . Linux ,now
|
||||
has support for SCSI WRITESAME command which is exposed to the user in
|
||||
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
|
||||
implement this fop. Thus zeroing out operations can be completely
|
||||
offloaded to the storage device , making it highly efficient.
|
||||
|
||||
The fop takes two arguments offset and size. It zeroes out 'size' number
|
||||
of bytes in an opened file starting from 'offset' position.
|
||||
|
||||
Benefit to GlusterFS
|
||||
--------------------
|
||||
|
||||
Benefits GlusterFS in virtualization by providing the ability to quickly
|
||||
create pre-allocated and zeroed-out VM disk image by using
|
||||
server/storage off-loads.
|
||||
|
||||
### Scope
|
||||
|
||||
Nature of proposed change
|
||||
-------------------------
|
||||
|
||||
An FOP supported in libgfapi and FUSE.
|
||||
|
||||
Implications on manageability
|
||||
-----------------------------
|
||||
|
||||
None.
|
||||
|
||||
Implications on presentation layer
|
||||
----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on persistence layer
|
||||
---------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on 'GlusterFS' backend
|
||||
-----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Modification to GlusterFS metadata
|
||||
----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on 'glusterd'
|
||||
--------------------------
|
||||
|
||||
N/A
|
||||
|
||||
How To Test
|
||||
-----------
|
||||
|
||||
Test server offload by measuring the time taken for creating a fully
|
||||
allocated and zeroed file on Posix backend.
|
||||
|
||||
Test storage offload by measuring the time taken for creating a fully
|
||||
allocated and zeroed file on BD backend.
|
||||
|
||||
User Experience
|
||||
---------------
|
||||
|
||||
Fast provisioning of VM images when GlusterFS is used as a file system
|
||||
backend for KVM virtualization.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
zerofill() support in BD backend depends on the new BD translator -
|
||||
<http://review.gluster.org/#/c/4809/>
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
This feature add support for a new ZEROFILL fop. Zerofill writes zeroes
|
||||
to a file in the specified range. This fop will be useful when a whole
|
||||
file needs to be initialized with zero (could be useful for zero filled
|
||||
VM disk image provisioning or during scrubbing of VM disk images).
|
||||
|
||||
Client/application can issue this FOP for zeroing out. Gluster server
|
||||
will zero out required range of bytes ie server offloaded zeroing. In
|
||||
the absence of this fop, client/application has to repetitively issue
|
||||
write (zero) fop to the server, which is very inefficient method because
|
||||
of the overheads involved in RPC calls and acknowledgements.
|
||||
|
||||
WRITESAME is a SCSI T10 command that takes a block of data as input and
|
||||
writes the same data to other blocks and this write is handled
|
||||
completely within the storage and hence is known as offload . Linux ,now
|
||||
has support for SCSI WRITESAME command which is exposed to the user in
|
||||
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
|
||||
implement this fop. Thus zeroing out operations can be completely
|
||||
offloaded to the storage device , making it highly efficient.
|
||||
|
||||
The fop takes two arguments offset and size. It zeroes out 'size' number
|
||||
of bytes in an opened file starting from 'offset' position.
|
||||
|
||||
This feature adds zerofill support to the following areas:
|
||||
|
||||
- libglusterfs
|
||||
- io-stats
|
||||
- performance/md-cache,open-behind
|
||||
- quota
|
||||
- cluster/afr,dht,stripe
|
||||
- rpc/xdr
|
||||
- protocol/client,server
|
||||
- io-threads
|
||||
- marker
|
||||
- storage/posix
|
||||
- libgfapi
|
||||
|
||||
Client applications can exploit this fop by using glfs\_zerofill
|
||||
introduced in libgfapi.FUSE support to this fop has not been added as
|
||||
there is no system call for this fop.
|
||||
|
||||
Here is a performance comparison of server offloaded zeofill vs zeroing
|
||||
out using repeated writes.
|
||||
|
||||
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
|
||||
|
||||
real 3m34.155s
|
||||
user 0m0.018s
|
||||
sys 0m0.040s
|
||||
|
||||
|
||||
[root@llmvm02 remote]# time ./manually aakash-test log 20
|
||||
|
||||
real 4m23.043s
|
||||
user 0m2.197s
|
||||
sys 0m14.457s
|
||||
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
|
||||
|
||||
real 4m28.363s
|
||||
user 0m0.021s
|
||||
sys 0m0.025s
|
||||
[root@llmvm02 remote]# time ./manually aakash-test log 25
|
||||
|
||||
real 5m34.278s
|
||||
user 0m2.957s
|
||||
sys 0m18.808s
|
||||
|
||||
The argument log is a file which we want to set for logging purpose and
|
||||
the third argument is size in GB .
|
||||
|
||||
As we can see there is a performance improvement of around 20% with this
|
||||
fop.
|
||||
|
||||
Status
|
||||
------
|
||||
|
||||
Patch : <http://review.gluster.org/5327> Status : Merged
|
||||
89
Feature Planning/GlusterFS 3.5/gfid access.md
Normal file
89
Feature Planning/GlusterFS 3.5/gfid access.md
Normal file
@@ -0,0 +1,89 @@
|
||||
### Instructions
|
||||
|
||||
**Feature**
|
||||
|
||||
'gfid-access' translator to provide access to data in glusterfs using a virtual path.
|
||||
|
||||
**1 Summary**
|
||||
|
||||
This particular Translator is designed to provide direct access to files in glusterfs using its gfid.'GFID' is glusterfs's inode numbers for a file to identify it uniquely.
|
||||
|
||||
**2 Owners**
|
||||
|
||||
Amar Tumballi <atumball@redhat.com>
|
||||
Raghavendra G <rgowdapp@redhat.com>
|
||||
Anand Avati <aavati@redhat.com>
|
||||
|
||||
**3 Current status**
|
||||
|
||||
With glusterfs-3.4.0, glusterfs provides only path based access.A feature is added in 'fuse' layer in the current master branch,
|
||||
but its desirable to have it as a separate translator for long time
|
||||
maintenance.
|
||||
|
||||
**4 Detailed Description**
|
||||
|
||||
With this method, we can consume the data in changelog translator
|
||||
(which is logging 'gfid' internally) very efficiently.
|
||||
|
||||
**5 Benefit to GlusterFS**
|
||||
|
||||
Provides a way to access files quickly with direct gfid.
|
||||
|
||||
**6. Scope**
|
||||
|
||||
6.1. Nature of proposed change
|
||||
|
||||
* A new translator.
|
||||
* Fixes in 'glusterfsd.c' to add this translator automatically based
|
||||
on mount time option.
|
||||
* change to mount.glusterfs to parse this new option
|
||||
(single digit number or lines changed)
|
||||
|
||||
6.2. Implications on manageability
|
||||
|
||||
* No CLI required.
|
||||
* mount.glusterfs script gets a new option.
|
||||
|
||||
6.3. Implications on presentation layer
|
||||
|
||||
* A new virtual access path is made available. But all access protocols work seemlessly, as the complexities are handled internally.
|
||||
|
||||
6.4. Implications on persistence layer
|
||||
|
||||
* None
|
||||
|
||||
6.5. Implications on 'GlusterFS' backend
|
||||
|
||||
* None
|
||||
|
||||
6.6. Modification to GlusterFS metadata
|
||||
|
||||
* None
|
||||
|
||||
6.7. Implications on 'glusterd'
|
||||
|
||||
* None
|
||||
|
||||
7 How To Test
|
||||
|
||||
* Mount glusterfs client with '-o aux-gfid-mount' and access files using '/mount/point/.gfid/ <actual-canonical-gfid-of-the-file>'.
|
||||
|
||||
8 User Experience
|
||||
|
||||
* A new virtual path available for users.
|
||||
|
||||
9 Dependencies
|
||||
|
||||
* None
|
||||
|
||||
10 Documentation
|
||||
|
||||
This wiki.
|
||||
|
||||
11 Status
|
||||
|
||||
Patch sent upstream. More review comments required. (http://review.gluster.org/5497)
|
||||
|
||||
12 Comments and Discussion
|
||||
|
||||
Please do give comments :-)
|
||||
@@ -14,6 +14,16 @@ GlusterFS 3.5
|
||||
|
||||
- [Features/AFR CLI enhancements](./AFR CLI enhancements.md)
|
||||
- [Features/exposing volume capabilities](./Exposing Volume Capabilities.md)
|
||||
- [Features/File Snapshot](./File Snapshot.md)
|
||||
- [Features/gfid-access](./gfid access.md)
|
||||
- [Features/On-Wire Compression + Decompression](./Onwire Compression-Decompression.md)
|
||||
- [Features/Quota Scalability](./Quota Scalability.md)
|
||||
- [Features/readdir ahead](./readdir ahead.md)
|
||||
- [Features/zerofill](./Zerofill.md)
|
||||
- [Features/Brick Failure Detection](./Brick Failure Detection.md)
|
||||
- [Features/disk-encryption](./Disk-Encryption.md)
|
||||
- Changelog based parallel geo-replication
|
||||
- Improved block device translator
|
||||
|
||||
Proposing New Features
|
||||
----------------------
|
||||
|
||||
117
Feature Planning/GlusterFS 3.5/readdir ahead.md
Normal file
117
Feature Planning/GlusterFS 3.5/readdir ahead.md
Normal file
@@ -0,0 +1,117 @@
|
||||
Feature
|
||||
-------
|
||||
|
||||
readdir-ahead
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
Provide read-ahead support for directories to improve sequential
|
||||
directory read performance.
|
||||
|
||||
Owners
|
||||
------
|
||||
|
||||
Brian Foster
|
||||
|
||||
Current status
|
||||
--------------
|
||||
|
||||
Gluster currently does not attempt to improve directory read
|
||||
performance. As a result, simple operations (i.e., ls) on large
|
||||
directories are slow.
|
||||
|
||||
Detailed Description
|
||||
--------------------
|
||||
|
||||
The read-ahead feature for directories is analogous to read-ahead for
|
||||
files. The objective is to detect sequential directory read operations
|
||||
and establish a pipeline for directory content. When a readdir request
|
||||
is received and fulfilled, preemptively issue subsequent readdir
|
||||
requests to the server in anticipation of those requests from the user.
|
||||
If sequential readdir requests are received, the directory content is
|
||||
already immediately available in the client. If subsequent requests are
|
||||
not sequential or not received, said data is simply dropped and the
|
||||
optimization is bypassed.
|
||||
|
||||
Benefit to GlusterFS
|
||||
--------------------
|
||||
|
||||
Improved read performance of large directories.
|
||||
|
||||
### Scope
|
||||
|
||||
Nature of proposed change
|
||||
-------------------------
|
||||
|
||||
readdir-ahead support is enabled through a new client-side translator.
|
||||
|
||||
Implications on manageability
|
||||
-----------------------------
|
||||
|
||||
None beyond the ability to enable and disable the translator.
|
||||
|
||||
Implications on presentation layer
|
||||
----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on persistence layer
|
||||
---------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on 'GlusterFS' backend
|
||||
-----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Modification to GlusterFS metadata
|
||||
----------------------------------
|
||||
|
||||
N/A
|
||||
|
||||
Implications on 'glusterd'
|
||||
--------------------------
|
||||
|
||||
N/A
|
||||
|
||||
How To Test
|
||||
-----------
|
||||
|
||||
Performance testing. Verify that sequential reads of large directories
|
||||
complete faster (i.e., ls, xfs\_io -c readdir).
|
||||
|
||||
User Experience
|
||||
---------------
|
||||
|
||||
Improved performance on sequential read workloads. The translator should
|
||||
otherwise be invisible and not detract performance or disrupt behavior
|
||||
in any way.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
N/A
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
Set the associated config option to enable or disable directory
|
||||
read-ahead on a volume:
|
||||
|
||||
gluster volume set <vol> readdir-ahead [enable|disable]
|
||||
|
||||
readdir-ahead is disabled by default.
|
||||
|
||||
Status
|
||||
------
|
||||
|
||||
Development complete for the initial version. Minor changes and bug
|
||||
fixes likely.
|
||||
|
||||
Future versions might expand to provide generic caching and more
|
||||
flexible behavior.
|
||||
|
||||
Comments and Discussion
|
||||
-----------------------
|
||||
@@ -184,6 +184,14 @@ pages:
|
||||
- ['Feature Planning/GlusterFS 3.5/index.md','Feature Planning 3.5','index']
|
||||
- ['Feature Planning/GlusterFS 3.5/AFR CLI enhancements.md','Feature Planning 3.5','AFR CLI enhancements']
|
||||
- ['Feature Planning/GlusterFS 3.5/Exposing Volume Capabilities.md','Feature Planning 3.5','Exposing Volume Capabilities']
|
||||
- ['Feature Planning/GlusterFS 3.5/File Snapshot.md','Feature Planning 3.5','File Snapshot']
|
||||
- ['Feature Planning/GlusterFS 3.5/gfid access.md','Feature Planning 3.5','gfid access']
|
||||
- ['Feature Planning/GlusterFS 3.5/Onwire Compression-Decompression.md','Feature Planning 3.5','On wire Compression + Decompression']
|
||||
- ['Feature Planning/GlusterFS 3.5/Quota Scalability.md','Feature Planning 3.5','Quota Scalability']
|
||||
- ['Feature Planning/GlusterFS 3.5/readdir ahead.md','Feature Planning 3.5','readdir ahead']
|
||||
- ['Feature Planning/GlusterFS 3.5/Zerofill.md','Feature Planning 3.5','Zerofill']
|
||||
- ['Feature Planning/GlusterFS 3.5/Brick Failure Detection.md','Feature Planning 3.5','Brick Failure Detection']
|
||||
- ['Feature Planning/GlusterFS 3.5/Disk Encryption.md','Feature Planning 3.5','Disk Encryption']
|
||||
|
||||
#GlusterFS Tools
|
||||
- ['GlusterFS Tools/README.md', 'GlusterFS Tools', 'GlusterFS Tools List']
|
||||
|
||||
Reference in New Issue
Block a user