] [snap-max-soft-limit snapshot-directory
-3. Accessing from windows:
+3. Accessing from windows:
The glusterfs volumes can be made accessible by windows via samba. (the
glusterfs plugin for samba helps achieve this, without having to re-export
@@ -242,11 +238,12 @@ Snapshots can be accessed in 2 ways.
also be viewed in the windows explorer.
There are 2 ways:
- * Give the path of the entry point directory
+
+ - Give the path of the entry point directory
(``) in the run command
window
- * Go to the samba share via windows explorer. Make hidden files and folders
+ - Go to the samba share via windows explorer. Make hidden files and folders
visible so that in the root of the samba share a folder icon for the entry point
can be seen.
@@ -256,28 +253,28 @@ the path should be provided in the run command window.
For snapshots to be accessible from windows, below 2 options can be used.
-1. The glusterfs plugin for samba should give the option "snapdir-entry-path"
- while starting. The option is an indication to glusterfs, that samba is loading
- it and the value of the option should be the path that is being used as the
- share for windows.
+1. The glusterfs plugin for samba should give the option "snapdir-entry-path"
+ while starting. The option is an indication to glusterfs, that samba is loading
+ it and the value of the option should be the path that is being used as the
+ share for windows.
Ex: Say, there is a glusterfs volume and a directory called "export" from the
root of the volume is being used as the samba share, then samba has to load
glusterfs with this option as well.
- ret = glfs_set_xlator_option(
- fs,
- "*-snapview-client",
- "snapdir-entry-path", "/export"
- );
+ ret = glfs_set_xlator_option(
+ fs,
+ "*-snapview-client",
+ "snapdir-entry-path", "/export"
+ );
The xlator option "snapdir-entry-path" is not exposed via volume set options,
cannot be changed from CLI. Its an option that has to be provided at the time of
mounting glusterfs or when samba loads glusterfs.
-2. The accessibility of snapshots via root of the samba share from windows
- is configurable. By default it is turned off. It is a volume set option which can
- be changed via CLI.
+2. The accessibility of snapshots via root of the samba share from windows
+ is configurable. By default it is turned off. It is a volume set option which can
+ be changed via CLI.
`gluster volume set features.show-snapshot-directory `. By
default it is off.
diff --git a/docs/Administrator-Guide/Managing-Volumes.md b/docs/Administrator-Guide/Managing-Volumes.md
index 24e29c4..a36f5f1 100644
--- a/docs/Administrator-Guide/Managing-Volumes.md
+++ b/docs/Administrator-Guide/Managing-Volumes.md
@@ -15,6 +15,7 @@ operations, including the following:
- [Non Uniform File Allocation(NUFA)](#non-uniform-file-allocation)
+
## Configuring Transport Types for a Volume
A volume can support one or more transport types for communication between clients and brick processes.
@@ -24,21 +25,22 @@ To change the supported transport types of a volume, follow the procedure:
1. Unmount the volume on all the clients using the following command:
- `# umount mount-point`
+ umount mount-point
2. Stop the volumes using the following command:
- `# gluster volume stop `
+ gluster volume stop
3. Change the transport type. For example, to enable both tcp and rdma execute the followimg command:
- `# gluster volume set test-volume config.transport tcp,rdma OR tcp OR rdma`
+ gluster volume set test-volume config.transport tcp,rdma OR tcp OR rdma
4. Mount the volume on all the clients. For example, to mount using rdma transport, use the following command:
- `# mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs`
+ mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs
+
## Expanding Volumes
You can expand volumes, as needed, while the cluster is online and
@@ -49,8 +51,7 @@ of the GlusterFS volume.
Similarly, you might want to add a group of bricks to a distributed
replicated volume, increasing the capacity of the GlusterFS volume.
-> **Note**
->
+> **Note**
> When expanding distributed replicated and distributed dispersed volumes,
> you need to add a number of bricks that is a multiple of the replica
> or disperse count. For example, to expand a distributed replicated
@@ -62,7 +63,7 @@ replicated volume, increasing the capacity of the GlusterFS volume.
1. If they are not already part of the TSP, probe the servers which contain the bricks you
want to add to the volume using the following command:
- `# gluster peer probe `
+ gluster peer probe
For example:
@@ -71,7 +72,7 @@ replicated volume, increasing the capacity of the GlusterFS volume.
2. Add the brick using the following command:
- `# gluster volume add-brick `
+ gluster volume add-brick
For example:
@@ -80,7 +81,7 @@ replicated volume, increasing the capacity of the GlusterFS volume.
3. Check the volume information using the following command:
- `# gluster volume info `
+ gluster volume info
The command displays information similar to the following:
@@ -100,14 +101,14 @@ replicated volume, increasing the capacity of the GlusterFS volume.
You can use the rebalance command as described in [Rebalancing Volumes](#rebalancing-volumes)
+
## Shrinking Volumes
You can shrink volumes, as needed, while the cluster is online and
available. For example, you might need to remove a brick that has become
inaccessible in a distributed volume due to hardware or network failure.
-> **Note**
->
+> **Note**
> Data residing on the brick that you are removing will no longer be
> accessible at the Gluster mount point. Note however that only the
> configuration information is removed - you can continue to access the
@@ -128,7 +129,7 @@ operation to migrate data from the removed-bricks to the rest of the volume.
1. Remove the brick using the following command:
- `# gluster volume remove-brick start`
+ gluster volume remove-brick start
For example, to remove server2:/exp2:
@@ -138,7 +139,7 @@ operation to migrate data from the removed-bricks to the rest of the volume.
2. View the status of the remove brick operation using the
following command:
- `# gluster volume remove-brick status`
+ gluster volume remove-brick status
For example, to view the status of remove brick operation on
server2:/exp2 brick:
@@ -150,7 +151,7 @@ operation to migrate data from the removed-bricks to the rest of the volume.
3. Once the status displays "completed", commit the remove-brick operation
- # gluster volume remove-brick commit
+ gluster volume remove-brick commit
In this example:
@@ -162,7 +163,7 @@ operation to migrate data from the removed-bricks to the rest of the volume.
4. Check the volume information using the following command:
- `# gluster volume info `
+ gluster volume info
The command displays information similar to the following:
@@ -176,15 +177,15 @@ operation to migrate data from the removed-bricks to the rest of the volume.
Brick3: server3:/exp3
Brick4: server4:/exp4
-
+
## Replace faulty brick
-**Replacing a brick in a *pure* distribute volume**
+**Replacing a brick in a _pure_ distribute volume**
To replace a brick on a distribute only volume, add the new brick and then remove the brick you want to replace. This will trigger a rebalance operation which will move data from the removed brick.
-> NOTE: Replacing a brick using the 'replace-brick' command in gluster is supported only for distributed-replicate or *pure* replicate volumes.
+> NOTE: Replacing a brick using the 'replace-brick' command in gluster is supported only for distributed-replicate or _pure_ replicate volumes.
Steps to remove brick Server1:/home/gfs/r2_1 and add Server1:/home/gfs/r2_2:
@@ -200,10 +201,8 @@ Steps to remove brick Server1:/home/gfs/r2_1 and add Server1:/home/gfs/r2_2:
Brick1: Server1:/home/gfs/r2_0
Brick2: Server1:/home/gfs/r2_1
-
2. Here are the files that are present on the mount:
-
# ls
1 10 2 3 4 5 6 7 8 9
@@ -220,13 +219,11 @@ Steps to remove brick Server1:/home/gfs/r2_1 and add Server1:/home/gfs/r2_2:
5. Wait until remove-brick status indicates that it is complete.
-
# gluster volume remove-brick r2 Server1:/home/gfs/r2_1 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 5 20Bytes 15 0 0 completed 0.00
-
6. Now we can safely remove the old brick, so commit the changes:
# gluster volume remove-brick r2 Server1:/home/gfs/r2_1 commit
@@ -266,58 +263,57 @@ This section of the document describes how brick: `Server1:/home/gfs/r2_0` is re
Brick3: Server1:/home/gfs/r2_2
Brick4: Server2:/home/gfs/r2_3
-
Steps:
1. Make sure there is no data in the new brick Server1:/home/gfs/r2_5
2. Check that all the bricks are running. It is okay if the brick that is going to be replaced is down.
3. Bring the brick that is going to be replaced down if not already.
- - Get the pid of the brick by executing 'gluster volume status'
+ - Get the pid of the brick by executing 'gluster volume status'
- # gluster volume status
- Status of volume: r2
- Gluster process Port Online Pid
- ------------------------------------------------------------------------------
- Brick Server1:/home/gfs/r2_0 49152 Y 5342
- Brick Server2:/home/gfs/r2_1 49153 Y 5354
- Brick Server1:/home/gfs/r2_2 49154 Y 5365
- Brick Server2:/home/gfs/r2_3 49155 Y 5376
+ # gluster volume status
+ Status of volume: r2
+ Gluster process Port Online Pid
+ ------------------------------------------------------------------------------
+ Brick Server1:/home/gfs/r2_0 49152 Y 5342
+ Brick Server2:/home/gfs/r2_1 49153 Y 5354
+ Brick Server1:/home/gfs/r2_2 49154 Y 5365
+ Brick Server2:/home/gfs/r2_3 49155 Y 5376
- - Login to the machine where the brick is running and kill the brick.
+ - Login to the machine where the brick is running and kill the brick.
- # kill -15 5342
+ # kill -15 5342
- - Confirm that the brick is not running anymore and the other bricks are running fine.
+ - Confirm that the brick is not running anymore and the other bricks are running fine.
- # gluster volume status
- Status of volume: r2
- Gluster process Port Online Pid
- ------------------------------------------------------------------------------
- Brick Server1:/home/gfs/r2_0 N/A N 5342 <<---- brick is not running, others are running fine.
- Brick Server2:/home/gfs/r2_1 49153 Y 5354
- Brick Server1:/home/gfs/r2_2 49154 Y 5365
- Brick Server2:/home/gfs/r2_3 49155 Y 5376
+ # gluster volume status
+ Status of volume: r2
+ Gluster process Port Online Pid
+ ------------------------------------------------------------------------------
+ Brick Server1:/home/gfs/r2_0 N/A N 5342 <<---- brick is not running, others are running fine.
+ Brick Server2:/home/gfs/r2_1 49153 Y 5354
+ Brick Server1:/home/gfs/r2_2 49154 Y 5365
+ Brick Server2:/home/gfs/r2_3 49155 Y 5376
4. Using the gluster volume fuse mount (In this example: `/mnt/r2`) set up metadata so that data will be synced to new brick (In this case it is from `Server1:/home/gfs/r2_1` to `Server1:/home/gfs/r2_5`)
- - Create a directory on the mount point that doesn't already exist. Then delete that directory, do the same for metadata changelog by doing setfattr. This operation marks the pending changelog which will tell self-heal damon/mounts to perform self-heal from `/home/gfs/r2_1` to `/home/gfs/r2_5`.
+ - Create a directory on the mount point that doesn't already exist. Then delete that directory, do the same for metadata changelog by doing setfattr. This operation marks the pending changelog which will tell self-heal damon/mounts to perform self-heal from `/home/gfs/r2_1` to `/home/gfs/r2_5`.
- mkdir /mnt/r2/
- rmdir /mnt/r2/
- setfattr -n trusted.non-existent-key -v abc /mnt/r2
- setfattr -x trusted.non-existent-key /mnt/r2
+ mkdir /mnt/r2/
+ rmdir /mnt/r2/
+ setfattr -n trusted.non-existent-key -v abc /mnt/r2
+ setfattr -x trusted.non-existent-key /mnt/r2
- - Check that there are pending xattrs on the replica of the brick that is being replaced:
+ - Check that there are pending xattrs on the replica of the brick that is being replaced:
- getfattr -d -m. -e hex /home/gfs/r2_1
- # file: home/gfs/r2_1
- security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
- trusted.afr.r2-client-0=0x000000000000000300000002 <<---- xattrs are marked from source brick Server2:/home/gfs/r2_1
- trusted.afr.r2-client-1=0x000000000000000000000000
- trusted.gfid=0x00000000000000000000000000000001
- trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
- trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
+ getfattr -d -m. -e hex /home/gfs/r2_1
+ # file: home/gfs/r2_1
+ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
+ trusted.afr.r2-client-0=0x000000000000000300000002 <<---- xattrs are marked from source brick Server2:/home/gfs/r2_1
+ trusted.afr.r2-client-1=0x000000000000000000000000
+ trusted.gfid=0x00000000000000000000000000000001
+ trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
+ trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
5. Volume heal info will show that '/' needs healing.(There could be more entries based on the work load. But '/' must exist)
@@ -337,23 +333,23 @@ Steps:
6. Replace the brick with 'commit force' option. Please note that other variants of replace-brick command are not supported.
- - Execute replace-brick command
+ - Execute replace-brick command
- # gluster volume replace-brick r2 Server1:/home/gfs/r2_0 Server1:/home/gfs/r2_5 commit force
- volume replace-brick: success: replace-brick commit successful
+ # gluster volume replace-brick r2 Server1:/home/gfs/r2_0 Server1:/home/gfs/r2_5 commit force
+ volume replace-brick: success: replace-brick commit successful
- - Check that the new brick is now online
+ - Check that the new brick is now online
- # gluster volume status
- Status of volume: r2
- Gluster process Port Online Pid
- ------------------------------------------------------------------------------
- Brick Server1:/home/gfs/r2_5 49156 Y 5731 <<<---- new brick is online
- Brick Server2:/home/gfs/r2_1 49153 Y 5354
- Brick Server1:/home/gfs/r2_2 49154 Y 5365
- Brick Server2:/home/gfs/r2_3 49155 Y 5376
+ # gluster volume status
+ Status of volume: r2
+ Gluster process Port Online Pid
+ ------------------------------------------------------------------------------
+ Brick Server1:/home/gfs/r2_5 49156 Y 5731 <<<---- new brick is online
+ Brick Server2:/home/gfs/r2_1 49153 Y 5354
+ Brick Server1:/home/gfs/r2_2 49154 Y 5365
+ Brick Server2:/home/gfs/r2_3 49155 Y 5376
- - Users can track the progress of self-heal using: `gluster volume heal [volname] info`.
+ - Users can track the progress of self-heal using: `gluster volume heal [volname] info`.
Once self-heal completes the changelogs will be removed.
# getfattr -d -m. -e hex /home/gfs/r2_1
@@ -366,22 +362,23 @@ Steps:
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
- - `# gluster volume heal info` will show that no heal is required.
+ - `# gluster volume heal info` will show that no heal is required.
- # gluster volume heal r2 info
- Brick Server1:/home/gfs/r2_5
- Number of entries: 0
+ # gluster volume heal r2 info
+ Brick Server1:/home/gfs/r2_5
+ Number of entries: 0
- Brick Server2:/home/gfs/r2_1
- Number of entries: 0
+ Brick Server2:/home/gfs/r2_1
+ Number of entries: 0
- Brick Server1:/home/gfs/r2_2
- Number of entries: 0
+ Brick Server1:/home/gfs/r2_2
+ Number of entries: 0
- Brick Server2:/home/gfs/r2_3
- Number of entries: 0
+ Brick Server2:/home/gfs/r2_3
+ Number of entries: 0
+
## Rebalancing Volumes
After expanding a volume using the add-brick command, you may need to rebalance the data
@@ -393,11 +390,11 @@ layout and/or data.
This section describes how to rebalance GlusterFS volumes in your
storage environment, using the following common scenarios:
-- **Fix Layout** - Fixes the layout to use the new volume topology so that files can
- be distributed to newly added nodes.
+- **Fix Layout** - Fixes the layout to use the new volume topology so that files can
+ be distributed to newly added nodes.
-- **Fix Layout and Migrate Data** - Rebalances volume by fixing the layout
- to use the new volume topology and migrating the existing data.
+- **Fix Layout and Migrate Data** - Rebalances volume by fixing the layout
+ to use the new volume topology and migrating the existing data.
### Rebalancing Volume to Fix Layout Changes
@@ -410,27 +407,27 @@ When this command is issued, all the file stat information which is
already cached will get revalidated.
As of GlusterFS 3.6, the assignment of files to bricks will take into account
-the sizes of the bricks. For example, a 20TB brick will be assigned twice as
-many files as a 10TB brick. In versions before 3.6, the two bricks were
+the sizes of the bricks. For example, a 20TB brick will be assigned twice as
+many files as a 10TB brick. In versions before 3.6, the two bricks were
treated as equal regardless of size, and would have been assigned an equal
share of files.
A fix-layout rebalance will only fix the layout changes and does not
-migrate data. If you want to migrate the existing data,
+migrate data. If you want to migrate the existing data,
use `gluster volume rebalance start` command to rebalance data among
the servers.
**To rebalance a volume to fix layout**
-- Start the rebalance operation on any Gluster server using the
- following command:
+- Start the rebalance operation on any Gluster server using the
+ following command:
- `# gluster volume rebalance fix-layout start`
+ `# gluster volume rebalance fix-layout start`
- For example:
+ For example:
- # gluster volume rebalance test-volume fix-layout start
- Starting rebalance on volume test-volume has been successful
+ # gluster volume rebalance test-volume fix-layout start
+ Starting rebalance on volume test-volume has been successful
### Rebalancing Volume to Fix Layout and Migrate Data
@@ -439,29 +436,29 @@ among the servers. A remove-brick command will automatically trigger a rebalance
**To rebalance a volume to fix layout and migrate the existing data**
-- Start the rebalance operation on any one of the server using the
- following command:
+- Start the rebalance operation on any one of the server using the
+ following command:
- `# gluster volume rebalance start`
+ `# gluster volume rebalance start`
- For example:
+ For example:
- # gluster volume rebalance test-volume start
- Starting rebalancing on volume test-volume has been successful
+ # gluster volume rebalance test-volume start
+ Starting rebalancing on volume test-volume has been successful
-- Start the migration operation forcefully on any one of the servers
- using the following command:
+- Start the migration operation forcefully on any one of the servers
+ using the following command:
- `# gluster volume rebalance start force`
+ `# gluster volume rebalance start force`
- For example:
+ For example:
- # gluster volume rebalance test-volume start force
- Starting rebalancing on volume test-volume has been successful
+ # gluster volume rebalance test-volume start force
+ Starting rebalancing on volume test-volume has been successful
-A rebalance operation will attempt to balance the diskusage across nodes, therefore it will skip
-files where the move will result in a less balanced volume. This leads to link files that are still
-left behind in the system and hence may cause performance issues. The behaviour can be overridden
+A rebalance operation will attempt to balance the diskusage across nodes, therefore it will skip
+files where the move will result in a less balanced volume. This leads to link files that are still
+left behind in the system and hence may cause performance issues. The behaviour can be overridden
with the `force` argument.
### Displaying the Status of Rebalance Operation
@@ -469,56 +466,57 @@ with the `force` argument.
You can display the status information about rebalance volume operation,
as needed.
-- Check the status of the rebalance operation, using the following
- command:
+- Check the status of the rebalance operation, using the following
+ command:
- `# gluster volume rebalance status`
+ `# gluster volume rebalance status`
- For example:
+ For example:
- # gluster volume rebalance test-volume status
- Node Rebalanced-files size scanned status
- --------- ---------------- ---- ------- -----------
- 617c923e-6450-4065-8e33-865e28d9428f 416 1463 312 in progress
+ # gluster volume rebalance test-volume status
+ Node Rebalanced-files size scanned status
+ --------- ---------------- ---- ------- -----------
+ 617c923e-6450-4065-8e33-865e28d9428f 416 1463 312 in progress
- The time to complete the rebalance operation depends on the number
- of files on the volume along with the corresponding file sizes.
- Continue checking the rebalance status, verifying that the number of
- files rebalanced or total files scanned keeps increasing.
+ The time to complete the rebalance operation depends on the number
+ of files on the volume along with the corresponding file sizes.
+ Continue checking the rebalance status, verifying that the number of
+ files rebalanced or total files scanned keeps increasing.
- For example, running the status command again might display a result
- similar to the following:
+ For example, running the status command again might display a result
+ similar to the following:
- # gluster volume rebalance test-volume status
- Node Rebalanced-files size scanned status
- --------- ---------------- ---- ------- -----------
- 617c923e-6450-4065-8e33-865e28d9428f 498 1783 378 in progress
+ # gluster volume rebalance test-volume status
+ Node Rebalanced-files size scanned status
+ --------- ---------------- ---- ------- -----------
+ 617c923e-6450-4065-8e33-865e28d9428f 498 1783 378 in progress
- The rebalance status displays the following when the rebalance is
- complete:
+ The rebalance status displays the following when the rebalance is
+ complete:
- # gluster volume rebalance test-volume status
- Node Rebalanced-files size scanned status
- --------- ---------------- ---- ------- -----------
- 617c923e-6450-4065-8e33-865e28d9428f 502 1873 334 completed
+ # gluster volume rebalance test-volume status
+ Node Rebalanced-files size scanned status
+ --------- ---------------- ---- ------- -----------
+ 617c923e-6450-4065-8e33-865e28d9428f 502 1873 334 completed
### Stopping an Ongoing Rebalance Operation
You can stop the rebalance operation, if needed.
-- Stop the rebalance operation using the following command:
+- Stop the rebalance operation using the following command:
- `# gluster volume rebalance stop`
+ `# gluster volume rebalance stop`
- For example:
+ For example:
- # gluster volume rebalance test-volume stop
- Node Rebalanced-files size scanned status
- --------- ---------------- ---- ------- -----------
- 617c923e-6450-4065-8e33-865e28d9428f 59 590 244 stopped
- Stopped rebalance process on volume test-volume
+ # gluster volume rebalance test-volume stop
+ Node Rebalanced-files size scanned status
+ --------- ---------------- ---- ------- -----------
+ 617c923e-6450-4065-8e33-865e28d9428f 59 590 244 stopped
+ Stopped rebalance process on volume test-volume
+
## Stopping Volumes
1. Stop the volume using the following command:
@@ -536,6 +534,7 @@ You can stop the rebalance operation, if needed.
Stopping volume test-volume has been successful
+
## Deleting Volumes
1. Delete the volume using the following command:
@@ -553,6 +552,7 @@ You can stop the rebalance operation, if needed.
Deleting volume test-volume has been successful
+
## Triggering Self-Heal on Replicate
In replicate module, previously you had to manually trigger a self-heal
@@ -561,133 +561,134 @@ replicas in sync. Now the pro-active self-heal daemon runs in the
background, diagnoses issues and automatically initiates self-healing
every 10 minutes on the files which requires*healing*.
-You can view the list of files that need *healing*, the list of files
-which are currently/previously *healed*, list of files which are in
+You can view the list of files that need _healing_, the list of files
+which are currently/previously _healed_, list of files which are in
split-brain state, and you can manually trigger self-heal on the entire
-volume or only on the files which need *healing*.
+volume or only on the files which need _healing_.
-- Trigger self-heal only on the files which requires *healing*:
+- Trigger self-heal only on the files which requires _healing_:
- `# gluster volume heal `
+ `# gluster volume heal `
- For example, to trigger self-heal on files which requires *healing*
- of test-volume:
+ For example, to trigger self-heal on files which requires _healing_
+ of test-volume:
- # gluster volume heal test-volume
- Heal operation on volume test-volume has been successful
+ # gluster volume heal test-volume
+ Heal operation on volume test-volume has been successful
-- Trigger self-heal on all the files of a volume:
+- Trigger self-heal on all the files of a volume:
- `# gluster volume heal full`
+ `# gluster volume heal full`
- For example, to trigger self-heal on all the files of of
- test-volume:
+ For example, to trigger self-heal on all the files of of
+ test-volume:
- # gluster volume heal test-volume full
- Heal operation on volume test-volume has been successful
+ # gluster volume heal test-volume full
+ Heal operation on volume test-volume has been successful
-- View the list of files that needs *healing*:
+- View the list of files that needs _healing_:
- `# gluster volume heal info`
+ `# gluster volume heal info`
- For example, to view the list of files on test-volume that needs
- *healing*:
+ For example, to view the list of files on test-volume that needs
+ _healing_:
- # gluster volume heal test-volume info
- Brick server1:/gfs/test-volume_0
- Number of entries: 0
+ # gluster volume heal test-volume info
+ Brick server1:/gfs/test-volume_0
+ Number of entries: 0
- Brick server2:/gfs/test-volume_1
- Number of entries: 101
- /95.txt
- /32.txt
- /66.txt
- /35.txt
- /18.txt
- /26.txt
- /47.txt
- /55.txt
- /85.txt
- ...
+ Brick server2:/gfs/test-volume_1
+ Number of entries: 101
+ /95.txt
+ /32.txt
+ /66.txt
+ /35.txt
+ /18.txt
+ /26.txt
+ /47.txt
+ /55.txt
+ /85.txt
+ ...
-- View the list of files that are self-healed:
+- View the list of files that are self-healed:
- `# gluster volume heal info healed`
+ `# gluster volume heal info healed`
- For example, to view the list of files on test-volume that are
- self-healed:
+ For example, to view the list of files on test-volume that are
+ self-healed:
- # gluster volume heal test-volume info healed
- Brick Server1:/gfs/test-volume_0
- Number of entries: 0
+ # gluster volume heal test-volume info healed
+ Brick Server1:/gfs/test-volume_0
+ Number of entries: 0
- Brick Server2:/gfs/test-volume_1
- Number of entries: 69
- /99.txt
- /93.txt
- /76.txt
- /11.txt
- /27.txt
- /64.txt
- /80.txt
- /19.txt
- /41.txt
- /29.txt
- /37.txt
- /46.txt
- ...
+ Brick Server2:/gfs/test-volume_1
+ Number of entries: 69
+ /99.txt
+ /93.txt
+ /76.txt
+ /11.txt
+ /27.txt
+ /64.txt
+ /80.txt
+ /19.txt
+ /41.txt
+ /29.txt
+ /37.txt
+ /46.txt
+ ...
-- View the list of files of a particular volume on which the self-heal
- failed:
+- View the list of files of a particular volume on which the self-heal
+ failed:
- `# gluster volume heal info failed`
+ `# gluster volume heal info failed`
- For example, to view the list of files of test-volume that are not
- self-healed:
+ For example, to view the list of files of test-volume that are not
+ self-healed:
- # gluster volume heal test-volume info failed
- Brick Server1:/gfs/test-volume_0
- Number of entries: 0
+ # gluster volume heal test-volume info failed
+ Brick Server1:/gfs/test-volume_0
+ Number of entries: 0
- Brick Server2:/gfs/test-volume_3
- Number of entries: 72
- /90.txt
- /95.txt
- /77.txt
- /71.txt
- /87.txt
- /24.txt
- ...
+ Brick Server2:/gfs/test-volume_3
+ Number of entries: 72
+ /90.txt
+ /95.txt
+ /77.txt
+ /71.txt
+ /87.txt
+ /24.txt
+ ...
-- View the list of files of a particular volume which are in
- split-brain state:
+- View the list of files of a particular volume which are in
+ split-brain state:
- `# gluster volume heal info split-brain`
+ `# gluster volume heal info split-brain`
- For example, to view the list of files of test-volume which are in
- split-brain state:
+ For example, to view the list of files of test-volume which are in
+ split-brain state:
- # gluster volume heal test-volume info split-brain
- Brick Server1:/gfs/test-volume_2
- Number of entries: 12
- /83.txt
- /28.txt
- /69.txt
- ...
+ # gluster volume heal test-volume info split-brain
+ Brick Server1:/gfs/test-volume_2
+ Number of entries: 12
+ /83.txt
+ /28.txt
+ /69.txt
+ ...
- Brick Server2:/gfs/test-volume_3
- Number of entries: 12
- /83.txt
- /28.txt
- /69.txt
- ...
+ Brick Server2:/gfs/test-volume_3
+ Number of entries: 12
+ /83.txt
+ /28.txt
+ /69.txt
+ ...
+
## Non Uniform File Allocation
NUFA translator or Non Uniform File Access translator is designed for giving higher preference
to a local drive when used in a HPC type of environment. It can be applied to Distribute and Replica translators;
-in the latter case it ensures that *one* copy is local if space permits.
+in the latter case it ensures that _one_ copy is local if space permits.
When a client on a server creates files, the files are allocated to a brick in the volume based on the file name.
This allocation may not be ideal, as there is higher latency and unnecessary network traffic for read/write operations
@@ -723,17 +724,17 @@ The NUFA scheduler also exists, for use with the Unify translator; see below.
##### NUFA additional options
-- lookup-unhashed
+- lookup-unhashed
- This is an advanced option where files are looked up in all subvolumes if they are missing on the subvolume matching the hash value of the filename. The default is on.
+ This is an advanced option where files are looked up in all subvolumes if they are missing on the subvolume matching the hash value of the filename. The default is on.
-- local-volume-name
+- local-volume-name
- The volume name to consider local and prefer file creations on. The default is to search for a volume matching the hostname of the system.
+ The volume name to consider local and prefer file creations on. The default is to search for a volume matching the hostname of the system.
-- subvolumes
+- subvolumes
- This option lists the subvolumes that are part of this 'cluster/nufa' volume. This translator requires more than one subvolume.
+ This option lists the subvolumes that are part of this 'cluster/nufa' volume. This translator requires more than one subvolume.
## BitRot Detection
@@ -748,41 +749,43 @@ sub-commands.
1. To enable bitrot detection for a given volume :
- `# gluster volume bitrot enable`
+ `# gluster volume bitrot enable`
- and similarly to disable bitrot use:
+ and similarly to disable bitrot use:
- `# gluster volume bitrot disable`
+ `# gluster volume bitrot disable`
> NOTE: Enabling bitrot spawns the Signer & Scrubber daemon per node. Signer is responsible
+
for signing (calculating checksum for each file) an object and scrubber verifies the
calculated checksum against the objects data.
-2. Scrubber daemon has three (3) throttling modes that adjusts the rate at which objects
- are verified.
+2. Scrubber daemon has three (3) throttling modes that adjusts the rate at which objects
+ are verified.
- # volume bitrot scrub-throttle lazy
- # volume bitrot scrub-throttle normal
- # volume bitrot scrub-throttle aggressive
+ # volume bitrot scrub-throttle lazy
+ # volume bitrot scrub-throttle normal
+ # volume bitrot scrub-throttle aggressive
-3. By default scrubber scrubs the filesystem biweekly. It's possible to tune it to scrub
- based on predefined frequency such as monthly, etc. This can be done as shown below:
+3. By default scrubber scrubs the filesystem biweekly. It's possible to tune it to scrub
+ based on predefined frequency such as monthly, etc. This can be done as shown below:
- # volume bitrot scrub-frequency daily
- # volume bitrot scrub-frequency weekly
- # volume bitrot scrub-frequency biweekly
- # volume bitrot scrub-frequency monthly
+ # volume bitrot scrub-frequency daily
+ # volume bitrot scrub-frequency weekly
+ # volume bitrot scrub-frequency biweekly
+ # volume bitrot scrub-frequency monthly
> NOTE: Daily scrubbing would not be available with GA release.
4. Scrubber daemon can be paused and later resumed when required. This can be done as
shown below:
- `# volume bitrot scrub pause`
+ `# volume bitrot scrub pause`
and to resume scrubbing:
- `# volume bitrot scrub resume`
+ `# volume bitrot scrub resume`
> NOTE: Signing cannot be paused (and resumed) and would always be active as long as
+
bitrot is enabled for that particular volume.
diff --git a/docs/Administrator-Guide/Mandatory-Locks.md b/docs/Administrator-Guide/Mandatory-Locks.md
index 1273748..e552fb2 100644
--- a/docs/Administrator-Guide/Mandatory-Locks.md
+++ b/docs/Administrator-Guide/Mandatory-Locks.md
@@ -1,8 +1,9 @@
-Mandatory Locks
-===============
+# Mandatory Locks
+
Support for mandatory locks inside GlusterFS does not converge all by itself to what Linux kernel provides to user space file systems. Here we enforce core mandatory lock semantics with and without the help of file mode bits. Please read through the [design specification](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/Mandatory%20Locks.md) which explains the whole concept behind the mandatory locks implementation done for GlusterFS.
## Implications and Usage
+
By default, mandatory locking will be disabled for a volume and a volume set options is available to configure volume to operate under 3 different mandatory locking modes.
## Volume Option
@@ -11,22 +12,24 @@ By default, mandatory locking will be disabled for a volume and a volume set opt
gluster volume set locks.mandatory-locking
```
-**off** - Disable mandatory locking for specified volume.
-**file** - Enable Linux kernel style mandatory locking semantics with the help of mode bits (not well tested)
-**forced** - Check for conflicting byte range locks for every data modifying operation in a volume
-**optimal** - Combinational mode where POSIX clients can live with their advisory lock semantics which will still honour the mandatory locks acquired by other clients like SMB.
+**off** - Disable mandatory locking for specified volume.
+**file** - Enable Linux kernel style mandatory locking semantics with the help of mode bits (not well tested)
+**forced** - Check for conflicting byte range locks for every data modifying operation in a volume
+**optimal** - Combinational mode where POSIX clients can live with their advisory lock semantics which will still honour the mandatory locks acquired by other clients like SMB.
**Note**:- Please refer the design doc for more information on these key values.
#### Points to be remembered
-* Valid key values available with mandatory-locking volume set option are taken into effect only after a subsequent start/restart of the volume.
-* Due to some outstanding issues, it is recommended to turn off the performance translators in order to have the complete functionality of mandatory-locks when volume is configured in any one of the above described mandatory-locking modes. Please see the 'Known issue' section below for more details.
+
+- Valid key values available with mandatory-locking volume set option are taken into effect only after a subsequent start/restart of the volume.
+- Due to some outstanding issues, it is recommended to turn off the performance translators in order to have the complete functionality of mandatory-locks when volume is configured in any one of the above described mandatory-locking modes. Please see the 'Known issue' section below for more details.
#### Known issues
-* Since the whole logic of mandatory-locks are implemented within the locks translator loaded at the server side, early success returned to fops like open, read, write to upper/application layer by performance translators residing at the client side will impact the intended functionality of mandatory-locks. One such issue is being tracked in the following bugzilla report:
-
+- Since the whole logic of mandatory-locks are implemented within the locks translator loaded at the server side, early success returned to fops like open, read, write to upper/application layer by performance translators residing at the client side will impact the intended functionality of mandatory-locks. One such issue is being tracked in the following bugzilla report:
-* There is a possible race window uncovered with respect to mandatory locks and an ongoing read/write operation. For more details refer the bug report given below:
+
-
+- There is a possible race window uncovered with respect to mandatory locks and an ongoing read/write operation. For more details refer the bug report given below:
+
+
diff --git a/docs/Administrator-Guide/Monitoring-Workload.md b/docs/Administrator-Guide/Monitoring-Workload.md
index 514cee0..e84b398 100644
--- a/docs/Administrator-Guide/Monitoring-Workload.md
+++ b/docs/Administrator-Guide/Monitoring-Workload.md
@@ -23,11 +23,12 @@ system.
This section describes how to run GlusterFS Volume Profile command by
performing the following operations:
-- [Start Profiling](#start-profiling)
-- [Displaying the I/0 Information](#displaying-io)
-- [Stop Profiling](#stop-profiling)
+- [Start Profiling](#start-profiling)
+- [Displaying the I/0 Information](#displaying-io)
+- [Stop Profiling](#stop-profiling)
+
### Start Profiling
You must start the Profiling to view the File Operation information for
@@ -35,7 +36,7 @@ each brick.
To start profiling, use following command:
-`# gluster volume profile start `
+`# gluster volume profile start `
For example, to start profiling on test-volume:
@@ -49,11 +50,12 @@ options are displayed in the Volume Info:
diagnostics.latency-measurement: on
+
### Displaying the I/0 Information
You can view the I/O information of each brick by using the following command:
-`# gluster volume profile info`
+`# gluster volume profile info`
For example, to see the I/O information on test-volume:
@@ -107,6 +109,7 @@ For example, to see the I/O information on test-volume:
BytesWritten : 195571980
+
### Stop Profiling
You can stop profiling the volume, if you do not need profiling
@@ -132,15 +135,16 @@ top command displays up to 100 results.
This section describes how to run and view the results for the following
GlusterFS Top commands:
-- [Viewing Open fd Count and Maximum fd Count](#open-fd-count)
-- [Viewing Highest File Read Calls](#file-read)
-- [Viewing Highest File Write Calls](#file-write)
-- [Viewing Highest Open Calls on Directories](#open-dir)
-- [Viewing Highest Read Calls on Directory](#read-dir)
-- [Viewing List of Read Performance on each Brick](#read-perf)
-- [Viewing List of Write Performance on each Brick](#write-perf)
+- [Viewing Open fd Count and Maximum fd Count](#open-fd-count)
+- [Viewing Highest File Read Calls](#file-read)
+- [Viewing Highest File Write Calls](#file-write)
+- [Viewing Highest Open Calls on Directories](#open-dir)
+- [Viewing Highest Read Calls on Directory](#read-dir)
+- [Viewing List of Read Performance on each Brick](#read-perf)
+- [Viewing List of Write Performance on each Brick](#write-perf)
+
### Viewing Open fd Count and Maximum fd Count
You can view both current open fd count (list of files that are
@@ -151,224 +155,229 @@ servers are up and running). If the brick name is not specified, then
open fd metrics of all the bricks belonging to the volume will be
displayed.
-- View open fd count and maximum fd count using the following command:
+- View open fd count and maximum fd count using the following command:
- `# gluster volume top open [brick ] [list-cnt ]`
+ `# gluster volume top open [brick ] [list-cnt ]`
- For example, to view open fd count and maximum fd count on brick
- server:/export of test-volume and list top 10 open calls:
+ For example, to view open fd count and maximum fd count on brick
+ server:/export of test-volume and list top 10 open calls:
- `# gluster volume top open brick list-cnt `
+ `# gluster volume top open brick list-cnt `
- `Brick: server:/export/dir1 `
+ `Brick: server:/export/dir1 `
- `Current open fd's: 34 Max open fd's: 209 `
+ `Current open fd's: 34 Max open fd's: 209 `
- ==========Open file stats========
+ ==========Open file stats========
- open file name
- call count
+ open file name
+ call count
- 2 /clients/client0/~dmtmp/PARADOX/
- COURSES.DB
+ 2 /clients/client0/~dmtmp/PARADOX/
+ COURSES.DB
- 11 /clients/client0/~dmtmp/PARADOX/
- ENROLL.DB
+ 11 /clients/client0/~dmtmp/PARADOX/
+ ENROLL.DB
- 11 /clients/client0/~dmtmp/PARADOX/
- STUDENTS.DB
+ 11 /clients/client0/~dmtmp/PARADOX/
+ STUDENTS.DB
- 10 /clients/client0/~dmtmp/PWRPNT/
- TIPS.PPT
+ 10 /clients/client0/~dmtmp/PWRPNT/
+ TIPS.PPT
- 10 /clients/client0/~dmtmp/PWRPNT/
- PCBENCHM.PPT
+ 10 /clients/client0/~dmtmp/PWRPNT/
+ PCBENCHM.PPT
- 9 /clients/client7/~dmtmp/PARADOX/
- STUDENTS.DB
+ 9 /clients/client7/~dmtmp/PARADOX/
+ STUDENTS.DB
- 9 /clients/client1/~dmtmp/PARADOX/
- STUDENTS.DB
+ 9 /clients/client1/~dmtmp/PARADOX/
+ STUDENTS.DB
- 9 /clients/client2/~dmtmp/PARADOX/
- STUDENTS.DB
+ 9 /clients/client2/~dmtmp/PARADOX/
+ STUDENTS.DB
- 9 /clients/client0/~dmtmp/PARADOX/
- STUDENTS.DB
+ 9 /clients/client0/~dmtmp/PARADOX/
+ STUDENTS.DB
- 9 /clients/client8/~dmtmp/PARADOX/
- STUDENTS.DB
+ 9 /clients/client8/~dmtmp/PARADOX/
+ STUDENTS.DB
+
### Viewing Highest File Read Calls
You can view highest read calls on each brick. If brick name is not
specified, then by default, list of 100 files will be displayed.
-- View highest file Read calls using the following command:
+- View highest file Read calls using the following command:
- `# gluster volume top read [brick ] [list-cnt ] `
+ `# gluster volume top read [brick ] [list-cnt ] `
- For example, to view highest Read calls on brick server:/export of
- test-volume:
+ For example, to view highest Read calls on brick server:/export of
+ test-volume:
- `# gluster volume top read brick list-cnt `
+ `# gluster volume top read brick list-cnt `
- `Brick:` server:/export/dir1
+ `Brick:` server:/export/dir1
- ==========Read file stats========
+ ==========Read file stats========
- read filename
- call count
+ read filename
+ call count
- 116 /clients/client0/~dmtmp/SEED/LARGE.FIL
+ 116 /clients/client0/~dmtmp/SEED/LARGE.FIL
- 64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
+ 64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
- 54 /clients/client2/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client2/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client6/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client6/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client5/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client5/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client0/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client0/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client3/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client3/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client4/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client4/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client9/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client9/~dmtmp/SEED/LARGE.FIL
- 54 /clients/client8/~dmtmp/SEED/LARGE.FIL
+ 54 /clients/client8/~dmtmp/SEED/LARGE.FIL
+
### Viewing Highest File Write Calls
You can view list of files which has highest file write calls on each
brick. If brick name is not specified, then by default, list of 100
files will be displayed.
-- View highest file Write calls using the following command:
+- View highest file Write calls using the following command:
- `# gluster volume top write [brick ] [list-cnt ] `
+ `# gluster volume top write [brick ] [list-cnt ] `
- For example, to view highest Write calls on brick server:/export of
- test-volume:
+ For example, to view highest Write calls on brick server:/export of
+ test-volume:
- `# gluster volume top write brick list-cnt `
+ `# gluster volume top write brick list-cnt `
- `Brick: server:/export/dir1 `
+ `Brick: server:/export/dir1 `
- ==========Write file stats========
- write call count filename
+ ==========Write file stats========
+ write call count filename
- 83 /clients/client0/~dmtmp/SEED/LARGE.FIL
+ 83 /clients/client0/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client7/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client7/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client1/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client1/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client2/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client2/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client0/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client0/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client8/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client8/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client5/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client5/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client4/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client4/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client6/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client6/~dmtmp/SEED/LARGE.FIL
- 59 /clients/client3/~dmtmp/SEED/LARGE.FIL
+ 59 /clients/client3/~dmtmp/SEED/LARGE.FIL
+
### Viewing Highest Open Calls on Directories
You can view list of files which has highest open calls on directories
of each brick. If brick name is not specified, then the metrics of all
the bricks belonging to that volume will be displayed.
-- View list of open calls on each directory using the following
- command:
+- View list of open calls on each directory using the following
+ command:
- `# gluster volume top opendir [brick ] [list-cnt ] `
+ `# gluster volume top opendir [brick ] [list-cnt ] `
- For example, to view open calls on brick server:/export/ of
- test-volume:
+ For example, to view open calls on brick server:/export/ of
+ test-volume:
- `# gluster volume top opendir brick list-cnt `
+ `# gluster volume top opendir brick list-cnt `
- `Brick: server:/export/dir1 `
+ `Brick: server:/export/dir1 `
- ==========Directory open stats========
+ ==========Directory open stats========
- Opendir count directory name
+ Opendir count directory name
- 1001 /clients/client0/~dmtmp
+ 1001 /clients/client0/~dmtmp
- 454 /clients/client8/~dmtmp
+ 454 /clients/client8/~dmtmp
- 454 /clients/client2/~dmtmp
+ 454 /clients/client2/~dmtmp
- 454 /clients/client6/~dmtmp
+ 454 /clients/client6/~dmtmp
- 454 /clients/client5/~dmtmp
+ 454 /clients/client5/~dmtmp
- 454 /clients/client9/~dmtmp
+ 454 /clients/client9/~dmtmp
- 443 /clients/client0/~dmtmp/PARADOX
+ 443 /clients/client0/~dmtmp/PARADOX
- 408 /clients/client1/~dmtmp
+ 408 /clients/client1/~dmtmp
- 408 /clients/client7/~dmtmp
+ 408 /clients/client7/~dmtmp
- 402 /clients/client4/~dmtmp
+ 402 /clients/client4/~dmtmp
+
### Viewing Highest Read Calls on Directory
You can view list of files which has highest directory read calls on
each brick. If brick name is not specified, then the metrics of all the
bricks belonging to that volume will be displayed.
-- View list of highest directory read calls on each brick using the
- following command:
+- View list of highest directory read calls on each brick using the
+ following command:
- `# gluster volume top test-volume readdir [brick BRICK] [list-cnt {0..100}] `
+ `# gluster volume top test-volume readdir [brick BRICK] [list-cnt {0..100}] `
- For example, to view highest directory read calls on brick
- server:/export of test-volume:
+ For example, to view highest directory read calls on brick
+ server:/export of test-volume:
- `# gluster volume top test-volume readdir brick server:/export list-cnt 10`
+ `# gluster volume top test-volume readdir brick server:/export list-cnt 10`
- `Brick: `
+ `Brick: `
- ==========Directory readdirp stats========
+ ==========Directory readdirp stats========
- readdirp count directory name
+ readdirp count directory name
- 1996 /clients/client0/~dmtmp
+ 1996 /clients/client0/~dmtmp
- 1083 /clients/client0/~dmtmp/PARADOX
+ 1083 /clients/client0/~dmtmp/PARADOX
- 904 /clients/client8/~dmtmp
+ 904 /clients/client8/~dmtmp
- 904 /clients/client2/~dmtmp
+ 904 /clients/client2/~dmtmp
- 904 /clients/client6/~dmtmp
+ 904 /clients/client6/~dmtmp
- 904 /clients/client5/~dmtmp
+ 904 /clients/client5/~dmtmp
- 904 /clients/client9/~dmtmp
+ 904 /clients/client9/~dmtmp
- 812 /clients/client1/~dmtmp
+ 812 /clients/client1/~dmtmp
- 812 /clients/client7/~dmtmp
+ 812 /clients/client7/~dmtmp
- 800 /clients/client4/~dmtmp
+ 800 /clients/client4/~dmtmp
+
### Viewing List of Read Performance on each Brick
You can view the read throughput of files on each brick. If brick name
@@ -413,56 +422,57 @@ volume will be displayed. The output will be the read throughput.
This command will initiate a dd for the specified count and block size
and measures the corresponding throughput.
-- View list of read performance on each brick using the following
- command:
+- View list of read performance on each brick using the following
+ command:
- `# gluster volume top read-perf [bs count ] [brick ] [list-cnt ]`
+ `# gluster volume top read-perf [bs count ] [brick ] [list-cnt ]`
- For example, to view read performance on brick server:/export/ of
- test-volume, 256 block size of count 1, and list count 10:
+ For example, to view read performance on brick server:/export/ of
+ test-volume, 256 block size of count 1, and list count 10:
- `# gluster volume top read-perf bs 256 count 1 brick list-cnt `
+ `# gluster volume top read-perf bs 256 count 1 brick list-cnt `
- `Brick: server:/export/dir1 256 bytes (256 B) copied, Throughput: 4.1 MB/s `
+ `Brick: server:/export/dir1 256 bytes (256 B) copied, Throughput: 4.1 MB/s `
- ==========Read throughput file stats========
+ ==========Read throughput file stats========
- read filename Time
- through
- put(MBp
- s)
+ read filename Time
+ through
+ put(MBp
+ s)
- 2912.00 /clients/client0/~dmtmp/PWRPNT/ -2011-01-31
- TRIDOTS.POT 15:38:36.896486
+ 2912.00 /clients/client0/~dmtmp/PWRPNT/ -2011-01-31
+ TRIDOTS.POT 15:38:36.896486
- 2570.00 /clients/client0/~dmtmp/PWRPNT/ -2011-01-31
- PCBENCHM.PPT 15:38:39.815310
+ 2570.00 /clients/client0/~dmtmp/PWRPNT/ -2011-01-31
+ PCBENCHM.PPT 15:38:39.815310
- 2383.00 /clients/client2/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:52:53.631499
+ 2383.00 /clients/client2/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:52:53.631499
- 2340.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:38:36.926198
+ 2340.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:38:36.926198
- 2299.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- LARGE.FIL 15:38:36.930445
+ 2299.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ LARGE.FIL 15:38:36.930445
- 2259.00 /clients/client0/~dmtmp/PARADOX/ -2011-01-31
- COURSES.X04 15:38:40.549919
+ 2259.00 /clients/client0/~dmtmp/PARADOX/ -2011-01-31
+ COURSES.X04 15:38:40.549919
- 2221.00 /clients/client9/~dmtmp/PARADOX/ -2011-01-31
- STUDENTS.VAL 15:52:53.298766
+ 2221.00 /clients/client9/~dmtmp/PARADOX/ -2011-01-31
+ STUDENTS.VAL 15:52:53.298766
- 2221.00 /clients/client8/~dmtmp/PARADOX/ -2011-01-31
- COURSES.DB 15:39:11.776780
+ 2221.00 /clients/client8/~dmtmp/PARADOX/ -2011-01-31
+ COURSES.DB 15:39:11.776780
- 2184.00 /clients/client3/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:39:10.251764
+ 2184.00 /clients/client3/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:39:10.251764
- 2184.00 /clients/client5/~dmtmp/WORD/ -2011-01-31
- BASEMACH.DOC 15:39:09.336572
+ 2184.00 /clients/client5/~dmtmp/WORD/ -2011-01-31
+ BASEMACH.DOC 15:39:09.336572
+
### Viewing List of Write Performance on each Brick
You can view list of write throughput of files on each brick. If brick
@@ -473,107 +483,107 @@ This command will initiate a dd for the specified count and block size
and measures the corresponding throughput. To view list of write
performance on each brick:
-- View list of write performance on each brick using the following
- command:
+- View list of write performance on each brick using the following
+ command:
- `# gluster volume top write-perf [bs count ] [brick ] [list-cnt ] `
+ `# gluster volume top write-perf [bs count ] [brick ] [list-cnt ] `
- For example, to view write performance on brick server:/export/ of
- test-volume, 256 block size of count 1, and list count 10:
+ For example, to view write performance on brick server:/export/ of
+ test-volume, 256 block size of count 1, and list count 10:
- `# gluster volume top write-perf bs 256 count 1 brick list-cnt `
+ `# gluster volume top write-perf bs 256 count 1 brick list-cnt `
- `Brick`: server:/export/dir1
+ `Brick`: server:/export/dir1
- `256 bytes (256 B) copied, Throughput: 2.8 MB/s `
+ `256 bytes (256 B) copied, Throughput: 2.8 MB/s `
- ==========Write throughput file stats========
+ ==========Write throughput file stats========
- write filename Time
- throughput
- (MBps)
+ write filename Time
+ throughput
+ (MBps)
- 1170.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- SMALL.FIL 15:39:09.171494
+ 1170.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ SMALL.FIL 15:39:09.171494
- 1008.00 /clients/client6/~dmtmp/SEED/ -2011-01-31
- LARGE.FIL 15:39:09.73189
+ 1008.00 /clients/client6/~dmtmp/SEED/ -2011-01-31
+ LARGE.FIL 15:39:09.73189
- 949.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:38:36.927426
+ 949.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:38:36.927426
- 936.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- LARGE.FIL 15:38:36.933177
- 897.00 /clients/client5/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:39:09.33628
+ 936.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ LARGE.FIL 15:38:36.933177
+ 897.00 /clients/client5/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:39:09.33628
- 897.00 /clients/client6/~dmtmp/SEED/ -2011-01-31
- MEDIUM.FIL 15:39:09.27713
+ 897.00 /clients/client6/~dmtmp/SEED/ -2011-01-31
+ MEDIUM.FIL 15:39:09.27713
- 885.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
- SMALL.FIL 15:38:36.924271
+ 885.00 /clients/client0/~dmtmp/SEED/ -2011-01-31
+ SMALL.FIL 15:38:36.924271
- 528.00 /clients/client5/~dmtmp/SEED/ -2011-01-31
- LARGE.FIL 15:39:09.81893
+ 528.00 /clients/client5/~dmtmp/SEED/ -2011-01-31
+ LARGE.FIL 15:39:09.81893
- 516.00 /clients/client6/~dmtmp/ACCESS/ -2011-01-31
- FASTENER.MDB 15:39:01.797317
+ 516.00 /clients/client6/~dmtmp/ACCESS/ -2011-01-31
+ FASTENER.MDB 15:39:01.797317
## Displaying Volume Information
You can display information about a specific volume, or all volumes, as
needed.
-- Display information about a specific volume using the following
- command:
+- Display information about a specific volume using the following
+ command:
- `# gluster volume info VOLNAME`
+ `# gluster volume info VOLNAME`
- For example, to display information about test-volume:
+ For example, to display information about test-volume:
- # gluster volume info test-volume
- Volume Name: test-volume
- Type: Distribute
- Status: Created
- Number of Bricks: 4
- Bricks:
- Brick1: server1:/exp1
- Brick2: server2:/exp2
- Brick3: server3:/exp3
- Brick4: server4:/exp4
+ # gluster volume info test-volume
+ Volume Name: test-volume
+ Type: Distribute
+ Status: Created
+ Number of Bricks: 4
+ Bricks:
+ Brick1: server1:/exp1
+ Brick2: server2:/exp2
+ Brick3: server3:/exp3
+ Brick4: server4:/exp4
-- Display information about all volumes using the following command:
+- Display information about all volumes using the following command:
- `# gluster volume info all`
+ `# gluster volume info all`
- # gluster volume info all
+ # gluster volume info all
- Volume Name: test-volume
- Type: Distribute
- Status: Created
- Number of Bricks: 4
- Bricks:
- Brick1: server1:/exp1
- Brick2: server2:/exp2
- Brick3: server3:/exp3
- Brick4: server4:/exp4
+ Volume Name: test-volume
+ Type: Distribute
+ Status: Created
+ Number of Bricks: 4
+ Bricks:
+ Brick1: server1:/exp1
+ Brick2: server2:/exp2
+ Brick3: server3:/exp3
+ Brick4: server4:/exp4
- Volume Name: mirror
- Type: Distributed-Replicate
- Status: Started
- Number of Bricks: 2 X 2 = 4
- Bricks:
- Brick1: server1:/brick1
- Brick2: server2:/brick2
- Brick3: server3:/brick3
- Brick4: server4:/brick4
+ Volume Name: mirror
+ Type: Distributed-Replicate
+ Status: Started
+ Number of Bricks: 2 X 2 = 4
+ Bricks:
+ Brick1: server1:/brick1
+ Brick2: server2:/brick2
+ Brick3: server3:/brick3
+ Brick4: server4:/brick4
- Volume Name: Vol
- Type: Distribute
- Status: Started
- Number of Bricks: 1
- Bricks:
- Brick: server:/brick6
+ Volume Name: Vol
+ Type: Distribute
+ Status: Started
+ Number of Bricks: 1
+ Bricks:
+ Brick: server:/brick6
## Performing Statedump on a Volume
@@ -584,52 +594,52 @@ and nfs server process of a volume using the statedump command. The
following options can be used to determine what information is to be
dumped:
-- **mem** - Dumps the memory usage and memory pool details of the
- bricks.
+- **mem** - Dumps the memory usage and memory pool details of the
+ bricks.
-- **iobuf** - Dumps iobuf details of the bricks.
+- **iobuf** - Dumps iobuf details of the bricks.
-- **priv** - Dumps private information of loaded translators.
+- **priv** - Dumps private information of loaded translators.
-- **callpool** - Dumps the pending calls of the volume.
+- **callpool** - Dumps the pending calls of the volume.
-- **fd** - Dumps the open fd tables of the volume.
+- **fd** - Dumps the open fd tables of the volume.
-- **inode** - Dumps the inode tables of the volume.
+- **inode** - Dumps the inode tables of the volume.
**To display volume statedump**
-- Display statedump of a volume or NFS server using the following
- command:
+- Display statedump of a volume or NFS server using the following
+ command:
- `# gluster volume statedump [nfs] [all|mem|iobuf|callpool|priv|fd|inode]`
+ `# gluster volume statedump [nfs] [all|mem|iobuf|callpool|priv|fd|inode]`
- For example, to display statedump of test-volume:
+ For example, to display statedump of test-volume:
- # gluster volume statedump test-volume
- Volume statedump successful
+ # gluster volume statedump test-volume
+ Volume statedump successful
- The statedump files are created on the brick servers in the` /tmp`
- directory or in the directory set using `server.statedump-path`
- volume option. The naming convention of the dump file is
- `..dump`.
+ The statedump files are created on the brick servers in the` /tmp`
+ directory or in the directory set using `server.statedump-path`
+ volume option. The naming convention of the dump file is
+ `..dump`.
-- By defult, the output of the statedump is stored at
- ` /tmp/` file on that particular server. Change
- the directory of the statedump file using the following command:
+- By defult, the output of the statedump is stored at
+ ` /tmp/` file on that particular server. Change
+ the directory of the statedump file using the following command:
- `# gluster volume set server.statedump-path `
+ `# gluster volume set server.statedump-path `
- For example, to change the location of the statedump file of
- test-volume:
+ For example, to change the location of the statedump file of
+ test-volume:
- # gluster volume set test-volume server.statedump-path /usr/local/var/log/glusterfs/dumps/
- Set volume successful
+ # gluster volume set test-volume server.statedump-path /usr/local/var/log/glusterfs/dumps/
+ Set volume successful
- You can view the changed path of the statedump file using the
- following command:
+ You can view the changed path of the statedump file using the
+ following command:
- `# gluster volume info `
+ `# gluster volume info `
## Displaying Volume Status
@@ -640,252 +650,252 @@ Status information can also be used to monitor and debug the volume
information. You can view status of the volume along with the following
details:
-- **detail** - Displays additional information about the bricks.
+- **detail** - Displays additional information about the bricks.
-- **clients** - Displays the list of clients connected to the volume.
+- **clients** - Displays the list of clients connected to the volume.
-- **mem** - Displays the memory usage and memory pool details of the
- bricks.
+- **mem** - Displays the memory usage and memory pool details of the
+ bricks.
-- **inode** - Displays the inode tables of the volume.
+- **inode** - Displays the inode tables of the volume.
-- **fd** - Displays the open fd (file descriptors) tables of the
- volume.
+- **fd** - Displays the open fd (file descriptors) tables of the
+ volume.
-- **callpool** - Displays the pending calls of the volume.
+- **callpool** - Displays the pending calls of the volume.
**To display volume status**
-- Display information about a specific volume using the following
- command:
+- Display information about a specific volume using the following
+ command:
- `# gluster volume status [all| []] [detail|clients|mem|inode|fd|callpool]`
+ `# gluster volume status [all| []] [detail|clients|mem|inode|fd|callpool]`
- For example, to display information about test-volume:
+ For example, to display information about test-volume:
- # gluster volume status test-volume
- STATUS OF VOLUME: test-volume
- BRICK PORT ONLINE PID
- --------------------------------------------------------
- arch:/export/1 24009 Y 22445
- --------------------------------------------------------
- arch:/export/2 24010 Y 22450
+ # gluster volume status test-volume
+ STATUS OF VOLUME: test-volume
+ BRICK PORT ONLINE PID
+ --------------------------------------------------------
+ arch:/export/1 24009 Y 22445
+ --------------------------------------------------------
+ arch:/export/2 24010 Y 22450
-- Display information about all volumes using the following command:
+- Display information about all volumes using the following command:
- `# gluster volume status all`
+ `# gluster volume status all`
- # gluster volume status all
- STATUS OF VOLUME: volume-test
- BRICK PORT ONLINE PID
- --------------------------------------------------------
- arch:/export/4 24010 Y 22455
+ # gluster volume status all
+ STATUS OF VOLUME: volume-test
+ BRICK PORT ONLINE PID
+ --------------------------------------------------------
+ arch:/export/4 24010 Y 22455
- STATUS OF VOLUME: test-volume
- BRICK PORT ONLINE PID
- --------------------------------------------------------
- arch:/export/1 24009 Y 22445
- --------------------------------------------------------
- arch:/export/2 24010 Y 22450
+ STATUS OF VOLUME: test-volume
+ BRICK PORT ONLINE PID
+ --------------------------------------------------------
+ arch:/export/1 24009 Y 22445
+ --------------------------------------------------------
+ arch:/export/2 24010 Y 22450
-- Display additional information about the bricks using the following
- command:
+- Display additional information about the bricks using the following
+ command:
- `# gluster volume status detail`
+ `# gluster volume status detail`
- For example, to display additional information about the bricks of
- test-volume:
+ For example, to display additional information about the bricks of
+ test-volume:
- # gluster volume status test-volume details
- STATUS OF VOLUME: test-volume
- -------------------------------------------
- Brick : arch:/export/1
- Port : 24009
- Online : Y
- Pid : 16977
- File System : rootfs
- Device : rootfs
- Mount Options : rw
- Disk Space Free : 13.8GB
- Total Disk Space : 46.5GB
- Inode Size : N/A
- Inode Count : N/A
- Free Inodes : N/A
+ # gluster volume status test-volume details
+ STATUS OF VOLUME: test-volume
+ -------------------------------------------
+ Brick : arch:/export/1
+ Port : 24009
+ Online : Y
+ Pid : 16977
+ File System : rootfs
+ Device : rootfs
+ Mount Options : rw
+ Disk Space Free : 13.8GB
+ Total Disk Space : 46.5GB
+ Inode Size : N/A
+ Inode Count : N/A
+ Free Inodes : N/A
- Number of Bricks: 1
- Bricks:
- Brick: server:/brick6
+ Number of Bricks: 1
+ Bricks:
+ Brick: server:/brick6
-- Display the list of clients accessing the volumes using the
- following command:
+- Display the list of clients accessing the volumes using the
+ following command:
- `# gluster volume status test-volume clients`
+ `# gluster volume status test-volume clients`
- For example, to display the list of clients connected to
- test-volume:
+ For example, to display the list of clients connected to
+ test-volume:
- # gluster volume status test-volume clients
- Brick : arch:/export/1
- Clients connected : 2
- Hostname Bytes Read BytesWritten
- -------- --------- ------------
- 127.0.0.1:1013 776 676
- 127.0.0.1:1012 50440 51200
+ # gluster volume status test-volume clients
+ Brick : arch:/export/1
+ Clients connected : 2
+ Hostname Bytes Read BytesWritten
+ -------- --------- ------------
+ 127.0.0.1:1013 776 676
+ 127.0.0.1:1012 50440 51200
-- Display the memory usage and memory pool details of the bricks using
- the following command:
+- Display the memory usage and memory pool details of the bricks using
+ the following command:
- `# gluster volume status test-volume mem`
+ `# gluster volume status test-volume mem`
- For example, to display the memory usage and memory pool details of
- the bricks of test-volume:
+ For example, to display the memory usage and memory pool details of
+ the bricks of test-volume:
- Memory status for volume : test-volume
- ----------------------------------------------
- Brick : arch:/export/1
- Mallinfo
- --------
- Arena : 434176
- Ordblks : 2
- Smblks : 0
- Hblks : 12
- Hblkhd : 40861696
- Usmblks : 0
- Fsmblks : 0
- Uordblks : 332416
- Fordblks : 101760
- Keepcost : 100400
+ Memory status for volume : test-volume
+ ----------------------------------------------
+ Brick : arch:/export/1
+ Mallinfo
+ --------
+ Arena : 434176
+ Ordblks : 2
+ Smblks : 0
+ Hblks : 12
+ Hblkhd : 40861696
+ Usmblks : 0
+ Fsmblks : 0
+ Uordblks : 332416
+ Fordblks : 101760
+ Keepcost : 100400
- Mempool Stats
- -------------
- Name HotCount ColdCount PaddedSizeof AllocCount MaxAlloc
- ---- -------- --------- ------------ ---------- --------
- test-volume-server:fd_t 0 16384 92 57 5
- test-volume-server:dentry_t 59 965 84 59 59
- test-volume-server:inode_t 60 964 148 60 60
- test-volume-server:rpcsvc_request_t 0 525 6372 351 2
- glusterfs:struct saved_frame 0 4096 124 2 2
- glusterfs:struct rpc_req 0 4096 2236 2 2
- glusterfs:rpcsvc_request_t 1 524 6372 2 1
- glusterfs:call_stub_t 0 1024 1220 288 1
- glusterfs:call_stack_t 0 8192 2084 290 2
- glusterfs:call_frame_t 0 16384 172 1728 6
+ Mempool Stats
+ -------------
+ Name HotCount ColdCount PaddedSizeof AllocCount MaxAlloc
+ ---- -------- --------- ------------ ---------- --------
+ test-volume-server:fd_t 0 16384 92 57 5
+ test-volume-server:dentry_t 59 965 84 59 59
+ test-volume-server:inode_t 60 964 148 60 60
+ test-volume-server:rpcsvc_request_t 0 525 6372 351 2
+ glusterfs:struct saved_frame 0 4096 124 2 2
+ glusterfs:struct rpc_req 0 4096 2236 2 2
+ glusterfs:rpcsvc_request_t 1 524 6372 2 1
+ glusterfs:call_stub_t 0 1024 1220 288 1
+ glusterfs:call_stack_t 0 8192 2084 290 2
+ glusterfs:call_frame_t 0 16384 172 1728 6
-- Display the inode tables of the volume using the following command:
+- Display the inode tables of the volume using the following command:
- `# gluster volume status inode`
+ `# gluster volume status inode`
- For example, to display the inode tables of the test-volume:
+ For example, to display the inode tables of the test-volume:
- # gluster volume status test-volume inode
- inode tables for volume test-volume
- ----------------------------------------------
- Brick : arch:/export/1
- Active inodes:
- GFID Lookups Ref IA type
- ---- ------- --- -------
- 6f3fe173-e07a-4209-abb6-484091d75499 1 9 2
- 370d35d7-657e-44dc-bac4-d6dd800ec3d3 1 1 2
+ # gluster volume status test-volume inode
+ inode tables for volume test-volume
+ ----------------------------------------------
+ Brick : arch:/export/1
+ Active inodes:
+ GFID Lookups Ref IA type
+ ---- ------- --- -------
+ 6f3fe173-e07a-4209-abb6-484091d75499 1 9 2
+ 370d35d7-657e-44dc-bac4-d6dd800ec3d3 1 1 2
- LRU inodes:
- GFID Lookups Ref IA type
- ---- ------- --- -------
- 80f98abe-cdcf-4c1d-b917-ae564cf55763 1 0 1
- 3a58973d-d549-4ea6-9977-9aa218f233de 1 0 1
- 2ce0197d-87a9-451b-9094-9baa38121155 1 0 2
+ LRU inodes:
+ GFID Lookups Ref IA type
+ ---- ------- --- -------
+ 80f98abe-cdcf-4c1d-b917-ae564cf55763 1 0 1
+ 3a58973d-d549-4ea6-9977-9aa218f233de 1 0 1
+ 2ce0197d-87a9-451b-9094-9baa38121155 1 0 2
-- Display the open fd tables of the volume using the following
- command:
+- Display the open fd tables of the volume using the following
+ command:
- `# gluster volume status fd`
+ `# gluster volume status fd`
- For example, to display the open fd tables of the test-volume:
+ For example, to display the open fd tables of the test-volume:
- # gluster volume status test-volume fd
+ # gluster volume status test-volume fd
- FD tables for volume test-volume
- ----------------------------------------------
- Brick : arch:/export/1
- Connection 1:
- RefCount = 0 MaxFDs = 128 FirstFree = 4
- FD Entry PID RefCount Flags
- -------- --- -------- -----
- 0 26311 1 2
- 1 26310 3 2
- 2 26310 1 2
- 3 26311 3 2
+ FD tables for volume test-volume
+ ----------------------------------------------
+ Brick : arch:/export/1
+ Connection 1:
+ RefCount = 0 MaxFDs = 128 FirstFree = 4
+ FD Entry PID RefCount Flags
+ -------- --- -------- -----
+ 0 26311 1 2
+ 1 26310 3 2
+ 2 26310 1 2
+ 3 26311 3 2
- Connection 2:
- RefCount = 0 MaxFDs = 128 FirstFree = 0
- No open fds
+ Connection 2:
+ RefCount = 0 MaxFDs = 128 FirstFree = 0
+ No open fds
- Connection 3:
- RefCount = 0 MaxFDs = 128 FirstFree = 0
- No open fds
+ Connection 3:
+ RefCount = 0 MaxFDs = 128 FirstFree = 0
+ No open fds
-- Display the pending calls of the volume using the following command:
+- Display the pending calls of the volume using the following command:
- `# gluster volume status callpool`
+ `# gluster volume status callpool`
- Each call has a call stack containing call frames.
+ Each call has a call stack containing call frames.
- For example, to display the pending calls of test-volume:
+ For example, to display the pending calls of test-volume:
- # gluster volume status test-volume
+ # gluster volume status test-volume
- Pending calls for volume test-volume
- ----------------------------------------------
- Brick : arch:/export/1
- Pending calls: 2
- Call Stack1
- UID : 0
- GID : 0
- PID : 26338
- Unique : 192138
- Frames : 7
- Frame 1
- Ref Count = 1
- Translator = test-volume-server
- Completed = No
- Frame 2
- Ref Count = 0
- Translator = test-volume-posix
- Completed = No
- Parent = test-volume-access-control
- Wind From = default_fsync
- Wind To = FIRST_CHILD(this)->fops->fsync
- Frame 3
- Ref Count = 1
- Translator = test-volume-access-control
- Completed = No
- Parent = repl-locks
- Wind From = default_fsync
- Wind To = FIRST_CHILD(this)->fops->fsync
- Frame 4
- Ref Count = 1
- Translator = test-volume-locks
- Completed = No
- Parent = test-volume-io-threads
- Wind From = iot_fsync_wrapper
- Wind To = FIRST_CHILD (this)->fops->fsync
- Frame 5
- Ref Count = 1
- Translator = test-volume-io-threads
- Completed = No
- Parent = test-volume-marker
- Wind From = default_fsync
- Wind To = FIRST_CHILD(this)->fops->fsync
- Frame 6
- Ref Count = 1
- Translator = test-volume-marker
- Completed = No
- Parent = /export/1
- Wind From = io_stats_fsync
- Wind To = FIRST_CHILD(this)->fops->fsync
- Frame 7
- Ref Count = 1
- Translator = /export/1
- Completed = No
- Parent = test-volume-server
- Wind From = server_fsync_resume
- Wind To = bound_xl->fops->fsync
+ Pending calls for volume test-volume
+ ----------------------------------------------
+ Brick : arch:/export/1
+ Pending calls: 2
+ Call Stack1
+ UID : 0
+ GID : 0
+ PID : 26338
+ Unique : 192138
+ Frames : 7
+ Frame 1
+ Ref Count = 1
+ Translator = test-volume-server
+ Completed = No
+ Frame 2
+ Ref Count = 0
+ Translator = test-volume-posix
+ Completed = No
+ Parent = test-volume-access-control
+ Wind From = default_fsync
+ Wind To = FIRST_CHILD(this)->fops->fsync
+ Frame 3
+ Ref Count = 1
+ Translator = test-volume-access-control
+ Completed = No
+ Parent = repl-locks
+ Wind From = default_fsync
+ Wind To = FIRST_CHILD(this)->fops->fsync
+ Frame 4
+ Ref Count = 1
+ Translator = test-volume-locks
+ Completed = No
+ Parent = test-volume-io-threads
+ Wind From = iot_fsync_wrapper
+ Wind To = FIRST_CHILD (this)->fops->fsync
+ Frame 5
+ Ref Count = 1
+ Translator = test-volume-io-threads
+ Completed = No
+ Parent = test-volume-marker
+ Wind From = default_fsync
+ Wind To = FIRST_CHILD(this)->fops->fsync
+ Frame 6
+ Ref Count = 1
+ Translator = test-volume-marker
+ Completed = No
+ Parent = /export/1
+ Wind From = io_stats_fsync
+ Wind To = FIRST_CHILD(this)->fops->fsync
+ Frame 7
+ Ref Count = 1
+ Translator = /export/1
+ Completed = No
+ Parent = test-volume-server
+ Wind From = server_fsync_resume
+ Wind To = bound_xl->fops->fsync
diff --git a/docs/Administrator-Guide/NFS-Ganesha-GlusterFS-Integration.md b/docs/Administrator-Guide/NFS-Ganesha-GlusterFS-Integration.md
index 94f0e80..3861aba 100644
--- a/docs/Administrator-Guide/NFS-Ganesha-GlusterFS-Integration.md
+++ b/docs/Administrator-Guide/NFS-Ganesha-GlusterFS-Integration.md
@@ -2,32 +2,33 @@
NFS-Ganesha is a user-space file server for the NFS protocol with support for NFSv3, v4, v4.1, pNFS. It provides a FUSE-compatible File System Abstraction Layer(FSAL) to allow the file-system developers to plug in their storage mechanism and access it from any NFS client. NFS-Ganesha can access the FUSE filesystems directly through its FSAL without copying any data to or from the kernel, thus potentially improving response times.
-## Installing nfs-ganesha
+## Installing nfs-ganesha
#### Gluster RPMs (>= 3.10)
-> glusterfs-server
-
-> glusterfs-api
+> glusterfs-server
+> glusterfs-api
> glusterfs-ganesha
#### Ganesha RPMs (>= 2.5)
-> nfs-ganesha
+> nfs-ganesha
> nfs-ganesha-gluster
## Start NFS-Ganesha manually
- To start NFS-Ganesha manually, use the command:
- - *service nfs-ganesha start*
+ `service nfs-ganesha start`
+
```sh
where:
/var/log/ganesha.log is the default log file for the ganesha process.
/etc/ganesha/ganesha.conf is the default configuration file
NIV_EVENT is the default log level.
```
-- If the user wants to run ganesha in a preferred mode, execute the following command :
- - *#ganesha.nfsd -f \ -L \ -N \*
+
+- If the user wants to run ganesha in a preferred mode, execute the following command :
+ `ganesha.nfsd -f -L -N `
```sh
For example:
@@ -37,6 +38,7 @@ nfs-ganesha.log is the log file for the ganesha.nfsd process.
nfs-ganesha.conf is the configuration file
NIV_DEBUG is the log level.
```
+
- By default, the export list for the server will be Null
```sh
@@ -52,12 +54,14 @@ NFS_Core_Param {
#Enable_RQUOTA = false;
}
```
+
## Step by step procedures to exporting GlusterFS volume via NFS-Ganesha
#### step 1 :
-To export any GlusterFS volume or directory inside a volume, create the EXPORT block for each of those entries in an export configuration file. The following parameters are required to export any entry.
-- *#cat export.conf*
+To export any GlusterFS volume or directory inside a volume, create the EXPORT block for each of those entries in an export configuration file. The following parameters are required to export any entry.
+
+- `cat export.conf`
```sh
EXPORT{
@@ -83,7 +87,8 @@ EXPORT{
#### step 2 :
Now include the export configuration file in the ganesha configuration file (by default). This can be done by adding the line below at the end of file
- - %include “\”
+
+- `%include “”`
```sh
Note :
@@ -95,33 +100,40 @@ Also, it will add the above entry to ganesha.conf
```
#### step 3 :
+
Turn on features.cache-invalidation for that volume
-- gluster volume set \ features.cache-invalidation on
+
+- `gluster volume set \ features.cache-invalidation on`
#### step 4 :
+
dbus commands are used to export/unexport volume
+
- export
- - *#dbus-send --system --print-reply --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/exports/export..conf string:"EXPORT(Path=/\)"*
+
+ - `dbus-send --system --print-reply --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/exports/export..conf string:"EXPORT(Path=/)"`
- unexport
- - *#dbus-send --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.RemoveExport uint16:\*
+ - `dbus-send --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.RemoveExport uint16:`
```sh
Note :
Step 4 can be performed via following script
#/usr/libexec/ganesha/dbus-send.sh [on|off]
```
+
Above scripts (mentioned in step 3 and step 4) are available in glusterfs 3.10 rpms.
You can download it from [here](https://github.com/gluster/glusterfs/blob/release-3.10/extras/ganesha/scripts/)
#### step 5 :
- - To check if the volume is exported, run
- - *#showmount -e localhost*
- - Or else use the following dbus command
- - *#dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.ShowExports*
- - To see clients
- - *#dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ClientMgr org.ganesha.nfsd.clientmgr.ShowClients*
+
+- To check if the volume is exported, run
+ - `showmount -e localhost`
+- Or else use the following dbus command
+ - `dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.ShowExports`
+- To see clients
+ - `dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ClientMgr org.ganesha.nfsd.clientmgr.ShowClients`
## Using Highly Available Active-Active NFS-Ganesha And GlusterFS cli
@@ -132,69 +144,72 @@ The cluster is maintained using Pacemaker and Corosync. Pacemaker acts as a reso
Data coherency across the multi-head NFS-Ganesha servers in the cluster is achieved using the UPCALL infrastructure. UPCALL infrastructure is a generic and extensible framework that sends notifications to the respective glusterfs clients (in this case NFS-Ganesha server) in case of any changes detected in the backend filesystem.
The Highly Available cluster is configured in the following three stages:
+
### Creating the ganesha-ha.conf file
+
The ganesha-ha.conf.example is created in the following location /etc/ganesha when Gluster Storage is installed. Rename the file to ganesha-ha.conf and make the changes as suggested in the following example:
sample ganesha-ha.conf file:
-> \# Name of the HA cluster created. must be unique within the subnet
-
-> HA_NAME="ganesha-ha-360"
-
-> \# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster.
-
-> \# Hostname is specified.
-
-> HA_CLUSTER_NODES="server1,server2,..."
-
-> \#HA_CLUSTER_NODES="server1.lab.redhat.com,server2.lab.redhat.com,..."
-
-> \# Virtual IPs for each of the nodes specified above.
-
-> VIP_server1="10.0.2.1"
-
+> \# Name of the HA cluster created. must be unique within the subnet
+> HA_NAME="ganesha-ha-360"
+> \# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA cluster.
+> \# Hostname is specified.
+> HA_CLUSTER_NODES="server1,server2,..."
+> \#HA_CLUSTER_NODES="server1.lab.redhat.com,server2.lab.redhat.com,..."
+> \# Virtual IPs for each of the nodes specified above.
+> VIP_server1="10.0.2.1"
> VIP_server2="10.0.2.2"
### Configuring NFS-Ganesha using gluster CLI
+
The HA cluster can be set up or torn down using gluster CLI. Also, it can export and unexport specific volumes. For more information, see section Configuring NFS-Ganesha using gluster CLI.
### Modifying the HA cluster using the `ganesha-ha.sh` script
+
Post the cluster creation any further modification can be done using the `ganesha-ha.sh` script. For more information, see the section Modifying the HA cluster using the `ganesha-ha.sh` script.
## Step-by-step guide
+
### Configuring NFS-Ganesha using Gluster CLI
+
#### Pre-requisites to run NFS-Ganesha
+
Ensure that the following pre-requisites are taken into consideration before you run NFS-Ganesha in your environment:
- * A Gluster Storage volume must be available for export and NFS-Ganesha rpms are installed on all the nodes.
- * IPv6 must be enabled on the host interface which is used by the NFS-Ganesha daemon. To enable IPv6 support, perform the following steps:
- - Comment or remove the line options ipv6 disable=1 in the /etc/modprobe.d/ipv6.conf file.
- - Reboot the system.
+- A Gluster Storage volume must be available for export and NFS-Ganesha rpms are installed on all the nodes.
+- IPv6 must be enabled on the host interface which is used by the NFS-Ganesha daemon. To enable IPv6 support, perform the following steps:
-* Ensure that all the nodes in the cluster are DNS resolvable. For example, you can populate the /etc/hosts with the details of all the nodes in the cluster.
-* Disable and stop NetworkManager service.
-* Enable and start network service on all machines.
-* Create and mount a gluster shared volume.
- * `gluster volume set all cluster.enable-shared-storage enable`
-* Install Pacemaker and Corosync on all machines.
-* Set the cluster auth password on all the machines.
-* Passwordless ssh needs to be enabled on all the HA nodes. Follow these steps,
+ - Comment or remove the line options ipv6 disable=1 in the /etc/modprobe.d/ipv6.conf file.
+ - Reboot the system.
- - On one (primary) node in the cluster, run:
- - ssh-keygen -f /var/lib/glusterd/nfs/secret.pem
- - Deploy the pubkey ~root/.ssh/authorized keys on _all_ nodes, run:
- - ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@$node
- - Copy the keys to _all_ nodes in the cluster, run:
- - scp /var/lib/glusterd/nfs/secret.* $node:/var/lib/glusterd/nfs/
-* Create a directory named "nfs-ganesha" in shared storage path and create ganesha.conf & ganesha-ha.conf in it (from glusterfs 3.9 onwards)
+- Ensure that all the nodes in the cluster are DNS resolvable. For example, you can populate the /etc/hosts with the details of all the nodes in the cluster.
+- Disable and stop NetworkManager service.
+- Enable and start network service on all machines.
+- Create and mount a gluster shared volume.
+ - `gluster volume set all cluster.enable-shared-storage enable`
+- Install Pacemaker and Corosync on all machines.
+- Set the cluster auth password on all the machines.
+- Passwordless ssh needs to be enabled on all the HA nodes. Follow these steps,
+
+ - On one (primary) node in the cluster, run:
+ - `ssh-keygen -f /var/lib/glusterd/nfs/secret.pem`
+ - Deploy the pubkey ~root/.ssh/authorized keys on _all_ nodes, run:
+ - `ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@$node`
+ - Copy the keys to _all_ nodes in the cluster, run:
+ - `scp /var/lib/glusterd/nfs/secret.\* $node:/var/lib/glusterd/nfs/`
+
+- Create a directory named "nfs-ganesha" in shared storage path and create ganesha.conf & ganesha-ha.conf in it (from glusterfs 3.9 onwards)
#### Configuring the HA Cluster
+
To set up the HA cluster, enable NFS-Ganesha by executing the following command:
- #gluster nfs-ganesha enable
+ gluster nfs-ganesha enable
To tear down the HA cluster, execute the following command:
- #gluster nfs-ganesha disable
+ gluster nfs-ganesha disable
+
```sh
Note :
Enable command performs the following
@@ -209,28 +224,32 @@ Also if gluster nfs-ganesha [enable/disable] fails of please check following log
```
#### Exporting Volumes through NFS-Ganesha using cli
+
To export a Red Hat Gluster Storage volume, execute the following command:
- #gluster volume set ganesha.enable on
+ gluster volume set ganesha.enable on
To unexport a Red Hat Gluster Storage volume, execute the following command:
- #gluster volume set ganesha.enable off
+ gluster volume set ganesha.enable off
This command unexports the Red Hat Gluster Storage volume without affecting other exports.
To verify the status of the volume set options, follow the guidelines mentioned below:
-* Check if NFS-Ganesha is started by executing the following command:
- - `ps aux | grep ganesha.nfsd`
-* Check if the volume is exported.
- - `showmount -e localhost`
+- Check if NFS-Ganesha is started by executing the following command:
+ - `ps aux | grep ganesha.nfsd`
+- Check if the volume is exported.
+ - `showmount -e localhost`
The logs of ganesha.nfsd daemon is written to /var/log/ganesha.log. Check the log file on noticing any unexpected behavior.
### Modifying the HA cluster using the ganesha-ha.sh script
+
To modify the existing HA cluster and to change the default values of the exports use the ganesha-ha.sh script located at /usr/libexec/ganesha/.
+
#### Adding a node to the cluster
+
Before adding a node to the cluster, ensure all the prerequisites mentioned in section `Pre-requisites to run NFS-Ganesha` are met. To add a node to the cluster. execute the following command on any of the nodes in the existing NFS-Ganesha cluster:
#./ganesha-ha.sh --add
@@ -238,7 +257,9 @@ Before adding a node to the cluster, ensure all the prerequisites mentioned in s
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file.
HOSTNAME: Hostname of the new node to be added
NODE-VIP: Virtual IP of the new node to be added.
+
#### Deleting a node in the cluster
+
To delete a node from the cluster, execute the following command on any of the nodes in the existing NFS-Ganesha cluster:
#./ganesha-ha.sh --delete
@@ -246,22 +267,25 @@ To delete a node from the cluster, execute the following command on any of the n
where,
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file.
HOSTNAME: Hostname of the new node to be added
+
#### Modifying the default export configuration
+
To modify the default export configurations perform the following steps on any of the nodes in the existing ganesha cluster:
-* Edit/add the required fields in the corresponding export file located at `/etc/ganesha/exports`.
+- Edit/add the required fields in the corresponding export file located at `/etc/ganesha/exports`.
-* Execute the following command:
+- Execute the following command:
- #./ganesha-ha.sh --refresh-config
+ #./ganesha-ha.sh --refresh-config
- where,
- HA_CONF_DIR: The directory path containing the ganesha-ha.conf file.
- volname: The name of the volume whose export configuration has to be changed.
+ where,
+ HA_CONF_DIR: The directory path containing the ganesha-ha.conf file.
+ volname: The name of the volume whose export configuration has to be changed.
- Note:
- The export ID must not be changed.
-
+ Note:
+ The export ID must not be changed.
+
+
### Configure ganesha ha cluster outside of gluster nodes
@@ -269,39 +293,43 @@ Currently, ganesha HA cluster creating tightly integrated with glusterd. So here
Exporting/Unexporting should be performed without using glusterd cli (follow the manual steps, before performing step 4 replace localhost with required hostname/ip "hostname=localhost;" in the export configuration file)
## Configuring Gluster volume for pNFS
+
The Parallel Network File System (pNFS) is part of the NFS v4.1 protocol that allows computing clients to access storage devices directly and in parallel. The pNFS cluster consists of MDS (Meta-Data-Server) and DS (Data-Server). The client sends all the read/write requests directly to DS and all other operations are handle by the MDS.
### Step by step guide
- - Turn on `feature.cache-invalidation` for the volume.
- - gluster v set \ features.cache-invalidation on
+- Turn on `feature.cache-invalidation` for the volume.
+
+ - `gluster v set features.cache-invalidation on`
+
+- Select one of the nodes in the cluster as MDS and configure it adding the following block to ganesha configuration file
-- Select one of the nodes in the cluster as MDS and configure it adding the following block to ganesha configuration file
```sh
GLUSTER
{
PNFS_MDS = true;
}
```
-- Manually start NFS-Ganesha in every node in the cluster.
+
+- Manually start NFS-Ganesha in every node in the cluster.
- Check whether the volume is exported via nfs-ganesha in all the nodes.
- - *#showmount -e localhost*
-- Mount the volume using NFS version 4.1 protocol with the ip of MDS
- - *#mount -t nfs4 -o minorversion=1 \:/\ \*
+ - `showmount -e localhost`
+
+- Mount the volume using NFS version 4.1 protocol with the ip of MDS
+ - `mount -t nfs4 -o minorversion=1 : `
### Points to be Noted
- - The current architecture supports only a single MDS and multiple DS. The server with which client mounts will act as MDS and all servers including MDS can act as DS.
+- The current architecture supports only a single MDS and multiple DS. The server with which client mounts will act as MDS and all servers including MDS can act as DS.
- - Currently, HA is not supported for pNFS (more specifically MDS). Although it is configurable, consistency is guaranteed across the cluster.
+- Currently, HA is not supported for pNFS (more specifically MDS). Although it is configurable, consistency is guaranteed across the cluster.
- - If any of the DS goes down, then MDS will handle those I/O's.
+- If any of the DS goes down, then MDS will handle those I/O's.
- - Hereafter, all the subsequent NFS clients need to use the same server for mounting that volume via pNFS. i.e more than one MDS for a volume is not preferred
+- Hereafter, all the subsequent NFS clients need to use the same server for mounting that volume via pNFS. i.e more than one MDS for a volume is not preferred
- - pNFS support is only tested with distributed, replicated, or distribute-replicate volumes
-
- - It is tested and verified with RHEL 6.5 , fedora 20, fedora 21 nfs clients. It is always better to use latest nfs-clients
+- pNFS support is only tested with distributed, replicated, or distribute-replicate volumes
+- It is tested and verified with RHEL 6.5 , fedora 20, fedora 21 nfs clients. It is always better to use latest nfs-clients
diff --git a/docs/Administrator-Guide/Network-Configurations-Techniques.md b/docs/Administrator-Guide/Network-Configurations-Techniques.md
index f4beb65..cef47fe 100644
--- a/docs/Administrator-Guide/Network-Configurations-Techniques.md
+++ b/docs/Administrator-Guide/Network-Configurations-Techniques.md
@@ -1,23 +1,24 @@
# Network Configurations Techniques
+
#### Bonding best practices
Bonded network interfaces incorporate multiple physical interfaces into a single logical bonded interface, with a single IP addr. An N-way bonded interface can survive loss of N-1 physical interfaces, and performance can be improved in some cases.
###### When to bond?
-- Need high availability for network link
-- Workload: sequential access to large files (most time spent reading/writing)
-- Network throughput limit of client/server \<\< storage throughput limit
- - 1 GbE (almost always)
- - 10-Gbps links or faster -- for writes, replication doubles the load on the network and replicas are usually on different peers to which the client can transmit in parallel.
-- LIMITATION: Bonding mode 6 doesn't improve throughput if network peers are not on the same VLAN.
+- Need high availability for network link
+- Workload: sequential access to large files (most time spent reading/writing)
+- Network throughput limit of client/server \<\< storage throughput limit
+ - 1 GbE (almost always)
+ - 10-Gbps links or faster -- for writes, replication doubles the load on the network and replicas are usually on different peers to which the client can transmit in parallel.
+- LIMITATION: Bonding mode 6 doesn't improve throughput if network peers are not on the same VLAN.
###### How to configure
-- [Bonding-howto](http://www.linuxquestions.org/linux/answers/Networking/Linux_bonding_howto_0)
-- Best bonding mode for Gluster client is mode 6 (balance-alb), this allows client to transmit writes in parallel on separate NICs much of the time. A peak throughput of 750 MB/s on writes from a single client was observed with bonding mode 6 on 2 10-GbE NICs with jumbo frames. That's 1.5 GB/s of network traffic.
-- Another way to balance both transmit and receive traffic is bonding mode 4 (802.3ad) but this requires switch configuration (trunking commands)
-- Still another way to load balance is bonding mode 2 (balance-xor) with option "xmit\_hash\_policy=layer3+4". The bonding modes 6 and 2 will not improve single-connection throughput, but improve aggregate throughput across all connections.
+- [Bonding-howto](http://www.linuxquestions.org/linux/answers/Networking/Linux_bonding_howto_0)
+- Best bonding mode for Gluster client is mode 6 (balance-alb), this allows client to transmit writes in parallel on separate NICs much of the time. A peak throughput of 750 MB/s on writes from a single client was observed with bonding mode 6 on 2 10-GbE NICs with jumbo frames. That's 1.5 GB/s of network traffic.
+- Another way to balance both transmit and receive traffic is bonding mode 4 (802.3ad) but this requires switch configuration (trunking commands)
+- Still another way to load balance is bonding mode 2 (balance-xor) with option "xmit_hash_policy=layer3+4". The bonding modes 6 and 2 will not improve single-connection throughput, but improve aggregate throughput across all connections.
##### Jumbo frames
@@ -25,18 +26,18 @@ Jumbo frames are Ethernet (or Infiniband) frames with size greater than the defa
###### When to configure?
-- Any network faster than 1-GbE
-- Workload is sequential large-file reads/writes
-- LIMITATION: Requires all network switches in VLAN must be configured to handle jumbo frames, do not configure otherwise.
+- Any network faster than 1-GbE
+- Workload is sequential large-file reads/writes
+- LIMITATION: Requires all network switches in VLAN must be configured to handle jumbo frames, do not configure otherwise.
###### How to configure?
-- Edit network interface file at /etc/sysconfig/network-scripts/ifcfg-your-interface
-- Ethernet (on ixgbe driver): add "MTU=9000" (MTU means "maximum transfer unit") record to network interface file
-- Infiniband (on mlx4 driver): add "CONNECTED\_MODE=yes" and "MTU=65520" records to network interface file
-- ifdown your-interface; ifup your-interface
-- Test with "ping -s 16384 other-host-on-VLAN"
-- Switch requires max frame size larger than MTU because of protocol headers, usually 9216 bytes
+- Edit network interface file at /etc/sysconfig/network-scripts/ifcfg-your-interface
+- Ethernet (on ixgbe driver): add "MTU=9000" (MTU means "maximum transfer unit") record to network interface file
+- Infiniband (on mlx4 driver): add "CONNECTED_MODE=yes" and "MTU=65520" records to network interface file
+- ifdown your-interface; ifup your-interface
+- Test with "ping -s 16384 other-host-on-VLAN"
+- Switch requires max frame size larger than MTU because of protocol headers, usually 9216 bytes
##### Configuring a backend network for storage
@@ -44,10 +45,10 @@ This method lets you add network capacity for multi-protocol sites by segregatin
###### When to configure?
-- For non-Gluster services such as NFS, Swift (REST), CIFS being provided on Gluster servers. It will not help Gluster clients (external nodes with Gluster mountpoints on them).
-- Network port is over-utilized.
+- For non-Gluster services such as NFS, Swift (REST), CIFS being provided on Gluster servers. It will not help Gluster clients (external nodes with Gluster mountpoints on them).
+- Network port is over-utilized.
###### How to configure?
-- Most network cards have multiple ports on them -- make port 1 the non-Gluster port and port 2 the Gluster port.
-- Separate Gluster ports onto a separate VLAN from non-Gluster ports, to simplify configuration.
+- Most network cards have multiple ports on them -- make port 1 the non-Gluster port and port 2 the Gluster port.
+- Separate Gluster ports onto a separate VLAN from non-Gluster ports, to simplify configuration.
diff --git a/docs/Administrator-Guide/Object-Storage.md b/docs/Administrator-Guide/Object-Storage.md
index 71edab6..50c335a 100644
--- a/docs/Administrator-Guide/Object-Storage.md
+++ b/docs/Administrator-Guide/Object-Storage.md
@@ -6,8 +6,7 @@ API to be accessed as files over filesystem interface and vice versa i.e files
created over filesystem interface (NFS/FUSE/native) can be accessed as objects
over Swift's RESTful API.
-SwiftOnFile project was formerly known as `gluster-swift` and also as `UFO
-(Unified File and Object)` before that. More information about SwiftOnFile can
+SwiftOnFile project was formerly known as `gluster-swift` and also as `UFO (Unified File and Object)` before that. More information about SwiftOnFile can
be found [here](https://github.com/swiftonfile/swiftonfile/blob/master/doc/markdown/quick_start_guide.md).
There are differences in working of gluster-swift (now obsolete) and swiftonfile
projects. The older gluster-swift code and relevant documentation can be found
@@ -17,10 +16,9 @@ of swiftonfile repo.
## SwiftOnFile vs gluster-swift
| Gluster-Swift | SwiftOnFile |
-|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| One GlusterFS volume maps to and stores only one Swift account. Mountpoint Hierarchy: `container/object` | One GlusterFS volume or XFS partition can have multiple accounts. Mountpoint Hierarchy: `acc/container/object` |
+| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| One GlusterFS volume maps to and stores only one Swift account. Mountpoint Hierarchy: `container/object` | One GlusterFS volume or XFS partition can have multiple accounts. Mountpoint Hierarchy: `acc/container/object` |
| Over-rides account server, container server and object server. We need to keep in sync with upstream Swift and often may need code changes or workarounds to support new Swift features | Implements only object-server. Very less need to catch-up to Swift as new features at proxy,container and account level would very likely be compatible with SwiftOnFile as it's just a storage policy. |
-| Does not use DBs for accounts and container.A container listing involves a filesystem crawl.A HEAD on account/container gives inaccurate or stale results without FS crawl. | Uses Swift's DBs to store account and container information. An account or container listing does not involve FS crawl. Accurate info on HEAD to account/container – ability to support account quotas. |
-| GET on a container and account lists actual files in filesystem. | GET on a container and account only lists objects PUT over Swift. Files created over filesystem interface do not appear in container and object listings. |
-| Standalone deployment required and does not integrate with existing Swift cluster. | Integrates with any existing Swift deployment as a Storage Policy. |
-
+| Does not use DBs for accounts and container.A container listing involves a filesystem crawl.A HEAD on account/container gives inaccurate or stale results without FS crawl. | Uses Swift's DBs to store account and container information. An account or container listing does not involve FS crawl. Accurate info on HEAD to account/container – ability to support account quotas. |
+| GET on a container and account lists actual files in filesystem. | GET on a container and account only lists objects PUT over Swift. Files created over filesystem interface do not appear in container and object listings. |
+| Standalone deployment required and does not integrate with existing Swift cluster. | Integrates with any existing Swift deployment as a Storage Policy. |
diff --git a/docs/Administrator-Guide/Performance-Testing.md b/docs/Administrator-Guide/Performance-Testing.md
index 0beb8de..3920974 100644
--- a/docs/Administrator-Guide/Performance-Testing.md
+++ b/docs/Administrator-Guide/Performance-Testing.md
@@ -1,5 +1,4 @@
-Gluster performance testing
-===========================
+# Gluster performance testing
Once you have created a Gluster volume, you need to verify that it has
adequate performance for your application, and if it does not, you need
@@ -7,18 +6,18 @@ a way to isolate the root cause of the problem.
There are two kinds of workloads:
-* synthetic - run a test program such as ones below
-* application - run existing application
+- synthetic - run a test program such as ones below
+- application - run existing application
# Profiling tools
-Ideally it's best to use the actual application that you want to run on Gluster, but applications often don't tell the sysadmin much about where the performance problems are, particularly latency (response-time) problems. So there are non-invasive profiling tools built into Gluster that can measure performance as seen by the application, without changing the application. Gluster profiling methods at present are based on the io-stats translator, and include:
+Ideally it's best to use the actual application that you want to run on Gluster, but applications often don't tell the sysadmin much about where the performance problems are, particularly latency (response-time) problems. So there are non-invasive profiling tools built into Gluster that can measure performance as seen by the application, without changing the application. Gluster profiling methods at present are based on the io-stats translator, and include:
-* client-side profiling - instrument a Gluster mountpoint or libgfapi process to sample profiling data. In this case, the io-stats translator is at the "top" of the translator stack, so the profile data truly represents what the application (or FUSE mountpoint) is asking Gluster to do. For example, a single application write is counted once as a WRITE FOP (file operation) call, and the latency for that WRITE FOP includes latency of the data replication done by the AFR translator lower in the stack.
+- client-side profiling - instrument a Gluster mountpoint or libgfapi process to sample profiling data. In this case, the io-stats translator is at the "top" of the translator stack, so the profile data truly represents what the application (or FUSE mountpoint) is asking Gluster to do. For example, a single application write is counted once as a WRITE FOP (file operation) call, and the latency for that WRITE FOP includes latency of the data replication done by the AFR translator lower in the stack.
-* server-side profiling - this is done using the "gluster volume profile" command (and "gluster volume top" can be used to identify particular hot files in use as well). Server-side profiling can measure the throughput of an entire Gluster volume over time, and can measure server-side latencies. However, it does not incorporate network or client-side latencies. It is also hard to infer application behavior because of client-side translators that alter the I/O workload (examples: erasure coding, cache tiering).
+- server-side profiling - this is done using the "gluster volume profile" command (and "gluster volume top" can be used to identify particular hot files in use as well). Server-side profiling can measure the throughput of an entire Gluster volume over time, and can measure server-side latencies. However, it does not incorporate network or client-side latencies. It is also hard to infer application behavior because of client-side translators that alter the I/O workload (examples: erasure coding, cache tiering).
-In short, use client-side profiling for understanding "why is my application unresponsive"? and use server-side profiling for understanding how busy your Gluster volume is, what kind of workload is being applied to it (i.e. is it mostly-read? is it small-file?), and how well the I/O load is spread across the volume.
+In short, use client-side profiling for understanding "why is my application unresponsive"? and use server-side profiling for understanding how busy your Gluster volume is, what kind of workload is being applied to it (i.e. is it mostly-read? is it small-file?), and how well the I/O load is spread across the volume.
## client-side profiling
@@ -27,7 +26,7 @@ To run client-side profiling,
- gluster volume profile your-volume start
- setfattr -n trusted.io-stats-dump -v io-stats-pre.txt /your/mountpoint
-This will generate the specified file (`/var/run/gluster/io-stats-pre.txt`) on the client. A script like [gvp-client.sh](https://github.com/bengland2/gluster-profile-analysis) can automate collection of this data.
+This will generate the specified file (`/var/run/gluster/io-stats-pre.txt`) on the client. A script like [gvp-client.sh](https://github.com/bengland2/gluster-profile-analysis) can automate collection of this data.
TBS: what the different FOPs are and what they mean.
@@ -58,11 +57,11 @@ that can be run from a single system. While single-system results are
important, they are far from a definitive measure of the performance
capabilities of a distributed filesystem.
-- [fio](http://freecode.com/projects/fio) - for large file I/O tests.
-- [smallfile](https://github.com/bengland2/smallfile) - for
- pure-workload small-file tests
-- [iozone](http://www.iozone.org) - for pure-workload large-file tests
-- [parallel-libgfapi](https://github.com/bengland2/parallel-libgfapi) - for pure-workload libgfapi tests
+- [fio](http://freecode.com/projects/fio) - for large file I/O tests.
+- [smallfile](https://github.com/bengland2/smallfile) - for
+ pure-workload small-file tests
+- [iozone](http://www.iozone.org) - for pure-workload large-file tests
+- [parallel-libgfapi](https://github.com/bengland2/parallel-libgfapi) - for pure-workload libgfapi tests
The "netmist" mixed-workload generator of SPECsfs2014 may be suitable in some cases, but is not technically an open-source tool. This tool was written by Don Capps, who was an author of iozone.
@@ -78,13 +77,13 @@ And make sure your firewall allows port 8765 through for it. You can now run tes
You can also use it for distributed testing, however, by launching fio instances on separate hosts, taking care to start all fio instances as close to the same time as possible, limiting per-thread throughput, and specifying the run duration rather than the amount of data, so that all fio instances end at around the same time. You can then aggregate the fio results from different hosts to get a meaningful aggregate result.
-fio also has different I/O engines, in particular Huamin Chen authored the ***libgfapi*** engine for fio so that you can use fio to test Gluster performance without using FUSE.
+fio also has different I/O engines, in particular Huamin Chen authored the **_libgfapi_** engine for fio so that you can use fio to test Gluster performance without using FUSE.
Limitations of fio in distributed mode:
-- stonewalling - fio calculates throughput based on when the last thread finishes a test run. In contrast, iozone calculates throughput by default based on when the FIRST thread finishes the workload. This can lead to (deceptively?) higher throughput results for iozone, since there are inevitably some "straggler" threads limping to the finish line later than others. It is possible in some cases to overcome this limitation by specifying a time limit for the test. This works well for random I/O tests, where typically you do not want to read/write the entire file/device anyway.
-- inaccuracy when response times > 1 sec - at least in some cases fio has reported excessively high IOPS when fio threads encounter response times much greater than 1 second, this can happen for distributed storage when there is unfairness in the implementation.
-- io engines are not integrated.
+- stonewalling - fio calculates throughput based on when the last thread finishes a test run. In contrast, iozone calculates throughput by default based on when the FIRST thread finishes the workload. This can lead to (deceptively?) higher throughput results for iozone, since there are inevitably some "straggler" threads limping to the finish line later than others. It is possible in some cases to overcome this limitation by specifying a time limit for the test. This works well for random I/O tests, where typically you do not want to read/write the entire file/device anyway.
+- inaccuracy when response times > 1 sec - at least in some cases fio has reported excessively high IOPS when fio threads encounter response times much greater than 1 second, this can happen for distributed storage when there is unfairness in the implementation.
+- io engines are not integrated.
### smallfile Distributed I/O Benchmark
@@ -108,10 +107,10 @@ option (below).
The "-a" option for automated testing of all use cases is discouraged,
because:
-- this does not allow you to drop the read cache in server before a
- test.
-- most of the data points being measured will be irrelevant to the
- problem you are solving.
+- this does not allow you to drop the read cache in server before a
+ test.
+- most of the data points being measured will be irrelevant to the
+ problem you are solving.
Single-thread testing is an important use case, but to fully utilize the
available hardware you typically need to do multi-thread and even
@@ -124,16 +123,16 @@ re-read and re-write tests. "-w" option tells iozone not to delete any
files that it accessed, so that subsequent tests can use them. Specify
these options with each test:
-- -i -- test type, 0=write, 1=read, 2=random read/write
-- -r -- data transfer size -- allows you to simulate I/O size used by
- application
-- -s -- per-thread file size -- choose this to be large enough for the
- system to reach steady state (typically multiple GB needed)
-- -t -- number of threads -- how many subprocesses will be
- concurrently issuing I/O requests
-- -F -- list of files -- what files to write/read. If you do not
- specify then the filenames iozone.DUMMY.\* will be used in the
- default directory.
+- -i -- test type, 0=write, 1=read, 2=random read/write
+- -r -- data transfer size -- allows you to simulate I/O size used by
+ application
+- -s -- per-thread file size -- choose this to be large enough for the
+ system to reach steady state (typically multiple GB needed)
+- -t -- number of threads -- how many subprocesses will be
+ concurrently issuing I/O requests
+- -F -- list of files -- what files to write/read. If you do not
+ specify then the filenames iozone.DUMMY.\* will be used in the
+ default directory.
Example of an 8-thread sequential write test with 64-KB transfer size
and file size of 1 GB to shared Gluster mountpoint directory
@@ -213,11 +212,11 @@ This test exercises Gluster performance using the libgfapi API,
bypassing FUSE - no mountpoints are used. Available
[here](https://github.com/bengland2/parallel-libgfapi).
-To use it, you edit the script parameters in parallel\_gfapi\_test.sh
+To use it, you edit the script parameters in parallel_gfapi_test.sh
script - all of them are above the comment "NO EDITABLE PARAMETERS BELOW
THIS LINE". These include such things as the Gluster volume name, a host
serving that volume, number of files, etc. You then make sure that the
-gfapi\_perf\_test executable is distributed to the client machines at
+gfapi_perf_test executable is distributed to the client machines at
the specified directory, and then run the script. The script starts all
libgfapi workload generator processes in parallel in such a way that
they all start the test at the same time. It waits until they all
@@ -240,8 +239,7 @@ S3 workload generation.
part of OpenStack Swift toolset and is command-line tool with a workload
definition file format.
-Workload
---------
+## Workload
An application can be as simple as writing some files, or it can be as
complex as running a cloud on top of Gluster. But all applications have
@@ -253,10 +251,10 @@ application spends most of its time doing with Gluster are called the
the filesystem requests being delivered to Gluster by the application.
There are two ways to look at workload:
-- top-down - what is the application trying to get the filesystem to
- do?
-- bottom-up - what requests is the application actually generating to
- the filesystem?
+- top-down - what is the application trying to get the filesystem to
+ do?
+- bottom-up - what requests is the application actually generating to
+ the filesystem?
### data vs metadata
@@ -277,21 +275,21 @@ Often this is what users will be able to help you with -- for example, a
workload might consist of ingesting a billion .mp3 files. Typical
questions that need to be answered (approximately) are:
-- what is file size distribution? Averages are often not enough - file
- size distributions can be bi-modal (i.e. consist mostly of the very
- large and very small file sizes). TBS: provide pointers to scripts
- that can collect this.
-- what fraction of file accesses are reads vs writes?
-- how cache-friendly is the workload? Do the same files get read
- repeatedly by different Gluster clients, or by different
- processes/threads on these clients?
-- for large-file workloads, what fraction of accesses are
- sequential/random? Sequential file access means that the application
- thread reads/writes the file from start to finish in byte offset
- order, and random file access is the exact opposite -- the thread
- may read/write from any offset at any time. Virtual machine disk
- images are typically accessed randomly, since the VM's filesystem is
- embedded in a Gluster file.
+- what is file size distribution? Averages are often not enough - file
+ size distributions can be bi-modal (i.e. consist mostly of the very
+ large and very small file sizes). TBS: provide pointers to scripts
+ that can collect this.
+- what fraction of file accesses are reads vs writes?
+- how cache-friendly is the workload? Do the same files get read
+ repeatedly by different Gluster clients, or by different
+ processes/threads on these clients?
+- for large-file workloads, what fraction of accesses are
+ sequential/random? Sequential file access means that the application
+ thread reads/writes the file from start to finish in byte offset
+ order, and random file access is the exact opposite -- the thread
+ may read/write from any offset at any time. Virtual machine disk
+ images are typically accessed randomly, since the VM's filesystem is
+ embedded in a Gluster file.
Why do these questions matter? For example, if you have a large-file
sequential read workload, network configuration + Gluster and Linux
@@ -311,20 +309,19 @@ and the bottlenecks which are limiting performance of that workload.
TBS: links to documentation for these tools and scripts that reduce the data to usable form.
-Configuration
--------------
+## Configuration
There are 4 basic hardware dimensions to a Gluster server, listed here
in order of importance:
-- network - possibly the most important hardware component of a
- Gluster site
- - access protocol - what kind of client is used to get to the
- files/objects?
-- storage - this is absolutely critical to get right up front
-- cpu - on client, look for hot threads (see below)
-- memory - can impact performance of read-intensive, cacheable
- workloads
+- network - possibly the most important hardware component of a
+ Gluster site
+ - access protocol - what kind of client is used to get to the
+ files/objects?
+- storage - this is absolutely critical to get right up front
+- cpu - on client, look for hot threads (see below)
+- memory - can impact performance of read-intensive, cacheable
+ workloads
### network testing
@@ -338,7 +335,7 @@ To measure network performance, consider use of a
[netperf-based](http://www.cs.kent.edu/~farrell/dist/ref/Netperf.html)
script.
-The purpose of these two tools is to characterize the capacity of your entire network infrastructure to support the desired level of traffic induced by distributed storage, using multiple network connections in parallel. The latter script is probably the most realistic network workload for distributed storage.
+The purpose of these two tools is to characterize the capacity of your entire network infrastructure to support the desired level of traffic induced by distributed storage, using multiple network connections in parallel. The latter script is probably the most realistic network workload for distributed storage.
The two most common hardware problems impacting distributed storage are,
not surprisingly, disk drive failures and network failures. Some of
@@ -379,7 +376,7 @@ To simulate a mixed read-write workload, use both sets of pairs:
(c1,s1), (c2, s2), (c3, s1), (c4, s2), (s1, c1), (s2, c2), (s1, c3), (s2, c4)
-More complicated flows can model behavior of non-native protocols, where a cluster node acts as a proxy server- it is a server (for non-native protocol) and a client (for native protocol). For example, such protocols often induce full-duplex traffic which can stress the network differently than unidirectional in/out traffic. For example, try adding this set of flows to preceding flow:
+More complicated flows can model behavior of non-native protocols, where a cluster node acts as a proxy server- it is a server (for non-native protocol) and a client (for native protocol). For example, such protocols often induce full-duplex traffic which can stress the network differently than unidirectional in/out traffic. For example, try adding this set of flows to preceding flow:
(s1, s2),.(s2, s3),.(s3, s4),.(s4, s1)
@@ -391,8 +388,8 @@ do not need ssh access to each other -- they only have to allow
password-less ssh access from the head node. The script does not rely on
root privileges, so you can run it from a non-root account. Just create
a public key on the head node in the right account (usually in
-\$HOME/.ssh/id\_rsa.pub ) and then append this public key to
-\$HOME/.ssh/authorized\_keys on each host participating in the test.
+\$HOME/.ssh/id_rsa.pub ) and then append this public key to
+\$HOME/.ssh/authorized_keys on each host participating in the test.
We input senders and receivers using separate text files, 1 host per
line. For pair (sender[j], receiver[j]), you get sender[j] from line j
@@ -401,23 +398,22 @@ You have to use the IP address/name that corresponds to the interface
you want to test, and you have to be able to ssh to each host from the
head node using this interface.
-Results
--------
+## Results
There are 3 basic forms of performance results, not in order of
importance:
-- throughput -- how much work is done in a unit of time? Best metrics
- typically are workload-dependent:
- - for large-file random: IOPS
- - for large-file sequential: MB/s
- - for small-file: files/sec
-- response time -- IMPORTANT, how long does it take for filesystem
- request to complete?
-- utilization -- how busy is the hardware while the workload is
- running?
-- scalability -- can we linearly scale throughput without sacrificing
- response time as we add servers to a Gluster volume?
+- throughput -- how much work is done in a unit of time? Best metrics
+ typically are workload-dependent:
+ - for large-file random: IOPS
+ - for large-file sequential: MB/s
+ - for small-file: files/sec
+- response time -- IMPORTANT, how long does it take for filesystem
+ request to complete?
+- utilization -- how busy is the hardware while the workload is
+ running?
+- scalability -- can we linearly scale throughput without sacrificing
+ response time as we add servers to a Gluster volume?
Typically throughput results get the most attention, but in a
distributed-storage environment, the hardest goal to achieve may well be
diff --git a/docs/Administrator-Guide/Performance-Tuning.md b/docs/Administrator-Guide/Performance-Tuning.md
index e94d37a..c50cd01 100644
--- a/docs/Administrator-Guide/Performance-Tuning.md
+++ b/docs/Administrator-Guide/Performance-Tuning.md
@@ -1,80 +1,91 @@
# Performance tuning
## Enable Metadata cache
+
Metadata caching improves performance in almost all the workloads, except for use cases
with most of the workload accessing a file sumultaneously from multiple clients.
- 1. Execute the following command to enable metadata caching and cache invalidation:
- ```
- # gluster volume set group metadata-cache
+
+1. Execute the following command to enable metadata caching and cache invalidation:
+
+ ```console
+ gluster volume set group metadata-cache
```
+
This group command enables caching of stat and xattr information of a file or directory.
The caching is refreshed every 10 min, and cache-invalidation is enabled to ensure cache
consistency.
- 2. To increase the number of files that can be cached, execute the following command:
- ```
- # gluster volume set network.inode-lru-limit
+2. To increase the number of files that can be cached, execute the following command:
+
+ ```console
+ gluster volume set network.inode-lru-limit
```
+
n, is set to 50000. It can be increased if the number of active files in the volume
is very high. Increasing this number increases the memory footprint of the brick processes.
- 3. Execute the following command to enable samba specific metadata caching:
- ```
- # gluster volume set cache-samba-metadata on
+3. Execute the following command to enable samba specific metadata caching:
+
+ ```console
+ gluster volume set cache-samba-metadata on
```
- 4. By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs
+4. By default, some xattrs are cached by gluster like: capability xattrs, ima xattrs
ACLs, etc. If there are any other xattrs that are used by the application using
the Gluster storage, execute the following command to add these xattrs to the metadata
cache list:
- ```
- # gluster volume set xattr-cache-list "comma separated xattr list"
+ ```console
+ gluster volume set xattr-cache-list "comma separated xattr list"
```
Eg:
- ```
- # gluster volume set xattr-cache-list "user.org.netatalk.*,user.swift.metadata"
+ ```console
+ gluster volume set xattr-cache-list "user.org.netatalk.*,user.swift.metadata"
```
## Directory operations
+
Along with enabling the metadata caching, the following options can be set to
increase performance of directory operations:
- ### Directory listing Performance:
+### Directory listing Performance:
- - Enable `parallel-readdir`
- ```
- # gluster volume set performance.readdir-ahead on
- # gluster volume set performance.parallel-readdir on
- ```
+- Enable `parallel-readdir`
- ### File/Directory Create Performance
+ ```console
+ gluster volume set performance.readdir-ahead on
+ gluster volume set performance.parallel-readdir on
+ ```
- - Enable `nl-cache`
- ```
- # gluster volume set group nl-cache
- # gluster volume set nl-cache-positive-entry on
- ```
+### File/Directory Create Performance
+
+- Enable `nl-cache`
+
+ ```console
+ gluster volume set group nl-cache
+ gluster volume set nl-cache-positive-entry on
+ ```
The above command also enables cache invalidation and increases the timeout to
10 minutes
## Small file Read operations
+
For use cases with dominant small file reads, enable the following options
- # gluster volume set performance.cache-invalidation on
- # gluster volume set features.cache-invalidation on
- # gluster volume set performance.qr-cache-timeout 600 --> 10 min recommended setting
- # gluster volume set cache-invalidation-timeout 600 --> 10 min recommended setting
+ gluster volume set performance.cache-invalidation on
+ gluster volume set features.cache-invalidation on
+ gluster volume set performance.qr-cache-timeout 600 # 10 min recommended setting
+ gluster volume set cache-invalidation-timeout 600 # 10 min recommended setting
This command enables caching of the content of small file, in the client cache.
Enabling cache invalidation ensures cache consistency.
The total cache size can be set using
- # gluster volume set cache-size
+ gluster volume set cache-size
By default, the files with size `<=64KB` are cached. To change this value:
- # gluster volume set performance.cache-max-file-size
+ gluster volume set performance.cache-max-file-size
Note that the `size` arguments use SI unit suffixes, e.g. `64KB` or `2MB`.
diff --git a/docs/Administrator-Guide/RDMA-Transport.md b/docs/Administrator-Guide/RDMA-Transport.md
index 1bc72ac..d1008e3 100644
--- a/docs/Administrator-Guide/RDMA-Transport.md
+++ b/docs/Administrator-Guide/RDMA-Transport.md
@@ -3,13 +3,13 @@
THE RDMA is no longer supported in Gluster builds. This has been removed from release 8 onwards.
Currently we dont have
+
1. The expertise to support RDMA
2. Infrastructure to test/verify the performances each release
-The options are getting discussed here - https://github.com/gluster/glusterfs/issues/2000
+ The options are getting discussed here - https://github.com/gluster/glusterfs/issues/2000
Ready to enable as a compile time option, if there is proper support and testing infrastructure.
-
# Introduction
GlusterFS supports using RDMA protocol for communication between glusterfs clients and glusterfs bricks.
@@ -17,20 +17,22 @@ GlusterFS clients include FUSE client, libgfapi clients(Samba and NFS-Ganesha in
NOTE: As of now only FUSE client and gNFS server would support RDMA transport.
-
NOTE:
NFS client to gNFS Server/NFS Ganesha Server communication would still happen over tcp.
CIFS Clients/Windows Clients to Samba Server communication would still happen over tcp.
# Setup
+
Please refer to these external documentation to setup RDMA on your machines
-http://people.redhat.com/dledford/infiniband_get_started.html
+http://people.redhat.com/dledford/infiniband_get_started.html
## Creating Trusted Storage Pool
+
All the servers in the Trusted Storage Pool must have RDMA devices if either RDMA or TCP,RDMA volumes are created in the storage pool.
The peer probe must be performed using IP/hostname assigned to the RDMA device.
## Ports and Firewall
+
Process glusterd will listen on both tcp and rdma if rdma device is found. Port used for rdma is 24008. Similarly, brick processes will also listen on two ports for a volume created with transport "tcp,rdma".
Make sure you update the firewall to accept packets on these ports.
@@ -46,36 +48,49 @@ Creation of test-volume has been successful
Please start the volume to access data.
# Changing Transport of Volume
-To change the supported transport types of a existing volume, follow the procedure:
-NOTE: This is possible only if the volume was created with IP/hostname assigned to RDMA device.
- 1. Unmount the volume on all the clients using the following command:
-`# umount mount-point`
- 2. Stop the volumes using the following command:
-`# gluster volume stop volname`
- 3. Change the transport type.
-For example, to enable both tcp and rdma execute the followimg command:
-`# gluster volume set volname config.transport tcp,rdma`
- 4. Mount the volume on all the clients.
-For example, to mount using rdma transport, use the following command:
-`# mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs`
+To change the supported transport types of a existing volume, follow the procedure:
+NOTE: This is possible only if the volume was created with IP/hostname assigned to RDMA device.
+
+1. Unmount the volume on all the clients using the following command:
+
+ umount mount-point
+
+2. Stop the volumes using the following command:
+
+ gluster volume stop volname
+
+3. Change the transport type.
+ For example, to enable both tcp and rdma execute the followimg command:
+
+ gluster volume set volname config.transport tcp,rdma
+
+4. Mount the volume on all the clients.
+ For example, to mount using rdma transport, use the following command:
+
+ mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs`
NOTE:
-config.transport option does not have a entry in help of gluster cli.
-`#gluster vol set help | grep config.transport`
-However, the key is a valid one.
+config.transport option does not have a entry in help of gluster cli.
+
+```console
+gluster vol set help | grep config.transport`
+```
+
+However, the key is a valid one.
# Mounting a Volume using RDMA
You can use the mount option "transport" to specify the transport type that FUSE client must use to communicate with bricks. If the volume was created with only one transport type, then that becomes the default when no value is specified. In case of tcp,rdma volume, tcp is the default.
-For example, to mount using rdma transport, use the following command:
-`# mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs`
+For example, to mount using rdma transport, use the following command:
+
+```console
+mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs
+```
# Transport used by auxillary processes
+
All the auxillary processes like self-heal daemon, rebalance process etc use the default transport.In case you have a tcp,rdma volume it will use tcp.
In case of rdma volume, rdma will be used.
Configuration options to select transport used by these processes when volume is tcp,rdma are not yet available and will be coming in later releases.
-
-
-
diff --git a/docs/Administrator-Guide/SSL.md b/docs/Administrator-Guide/SSL.md
index a42a8f0..b9cf38b 100644
--- a/docs/Administrator-Guide/SSL.md
+++ b/docs/Administrator-Guide/SSL.md
@@ -2,67 +2,67 @@
GlusterFS allows its communication to be secured using the [Transport Layer
Security][tls] standard (which supersedes Secure Sockets Layer), using the
-[OpenSSL][ossl] library. Setting this up requires a basic working knowledge of
+[OpenSSL][ossl] library. Setting this up requires a basic working knowledge of
some SSL/TLS concepts, which can only be briefly summarized here.
- * "Authentication" is the process of one entity (e.g. a machine, process, or
- person) proving its identity to a second entity.
+- "Authentication" is the process of one entity (e.g. a machine, process, or
+ person) proving its identity to a second entity.
- * "Authorization" is the process of checking whether an entity has permission
- to perform an action.
+- "Authorization" is the process of checking whether an entity has permission
+ to perform an action.
- * TLS provides authentication and encryption. It does not provide
- authorization, though GlusterFS can use TLS-authenticated identities to
- authorize client connections to bricks/volumes.
+- TLS provides authentication and encryption. It does not provide
+ authorization, though GlusterFS can use TLS-authenticated identities to
+ authorize client connections to bricks/volumes.
- * An entity X which must authenticate to a second entity Y does so by sharing
- with Y a *certificate*, which contains information sufficient to prove X's
- identity. X's proof of identity also requires possession of a *private key*
- which matches its certificate, but this key is never seen by Y or anyone
- else. Because the certificate is already public, anyone who has the key can
- claim that identity.
+- An entity X which must authenticate to a second entity Y does so by sharing
+ with Y a _certificate_, which contains information sufficient to prove X's
+ identity. X's proof of identity also requires possession of a _private key_
+ which matches its certificate, but this key is never seen by Y or anyone
+ else. Because the certificate is already public, anyone who has the key can
+ claim that identity.
- * Each certificate contains the identity of its principal (owner) along with
- the identity of a *certifying authority* or CA who can verify the integrity
- of the certificate's contents. The principal and CA can be the same (a
- "self-signed certificate"). If they are different, the CA must *sign* the
- certificate by appending information derived from both the certificate
- contents and the CA's own private key.
+- Each certificate contains the identity of its principal (owner) along with
+ the identity of a _certifying authority_ or CA who can verify the integrity
+ of the certificate's contents. The principal and CA can be the same (a
+ "self-signed certificate"). If they are different, the CA must _sign_ the
+ certificate by appending information derived from both the certificate
+ contents and the CA's own private key.
- * Certificate-signing relationships can extend through multiple levels. For
- example, a company X could sign another company Y's certificate, which could
- then be used to sign a third certificate Z for a specific user or purpose.
- Anyone who trusts X (and is willing to extend that trust through a
- *certificate depth* of two or more) would therefore be able to authenticate
- Y and Z as well.
+- Certificate-signing relationships can extend through multiple levels. For
+ example, a company X could sign another company Y's certificate, which could
+ then be used to sign a third certificate Z for a specific user or purpose.
+ Anyone who trusts X (and is willing to extend that trust through a
+ _certificate depth_ of two or more) would therefore be able to authenticate
+ Y and Z as well.
- * Any entity willing to accept other entities' authentication attempts must
- have some sort of database seeded with the certificates that already accept.
+- Any entity willing to accept other entities' authentication attempts must
+ have some sort of database seeded with the certificates that already accept.
In GlusterFS's case, a client or server X uses the following files to contain
TLS-related information:
- * /etc/ssl/glusterfs.pem X's own certificate
+- /etc/ssl/glusterfs.pem X's own certificate
- * /etc/ssl/glusterfs.key X's private key
+- /etc/ssl/glusterfs.key X's private key
- * /etc/ssl/glusterfs.ca concatenation of *others'* certificates
+- /etc/ssl/glusterfs.ca concatenation of _others'_ certificates
-GlusterFS always performs *mutual authentication*, though clients do not
-currently do anything with the authenticated server identity. Thus, if client X
+GlusterFS always performs _mutual authentication_, though clients do not
+currently do anything with the authenticated server identity. Thus, if client X
wants to communicate with server Y, then X's certificate (or that of a signer)
must be in Y's CA file, and vice versa.
For all uses of TLS in GlusterFS, if one side of a connection is configured to
-use TLS then the other side must use it as well. There is no automatic fallback
+use TLS then the other side must use it as well. There is no automatic fallback
to non-TLS communication, or allowance for concurrent TLS and non-TLS access to
-the same resource, because either would be insecure. Instead, any such "mixed
+the same resource, because either would be insecure. Instead, any such "mixed
mode" connections will be rejected by the TLS-using side, sacrificing
availability to maintain security.
-**NOTE**The TLS certificate verification will fail if the machines' date and
-time are not in sync with each other. Certificate verification depends on the
-time of the client as well as the server and if that is not found to be in
+**NOTE**The TLS certificate verification will fail if the machines' date and
+time are not in sync with each other. Certificate verification depends on the
+time of the client as well as the server and if that is not found to be in
sync then it is deemed to be an invalid certificate. To get the date and times
in sync, tools such as ntpdate can be used.
@@ -70,50 +70,50 @@ in sync, tools such as ntpdate can be used.
Certmonger can be used to generate keys, request certs from a CA and then
automatically keep the Gluster certificate and the CA bundle updated as
-required, simplifying deployment. Either a commercial CA or a local CA can
-be used. E.g., FreeIPA (with dogtag CA) is an open-source CA with
+required, simplifying deployment. Either a commercial CA or a local CA can
+be used. E.g., FreeIPA (with dogtag CA) is an open-source CA with
user-friendly tooling.
If using FreeIPA, first add the host. This is required for FreeIPA to issue
certificates. This can be done via the web UI, or the CLI with:
- ipa host-add
+ ipa host-add
If the host has been added the following should show the host:
- ipa host-show
+ ipa host-show
And it should show a kerberos principal for the host in the form of:
- host/
+ host/
Now use certmonger on the gluster server or client to generate the key (if
-required), and submit a CSR to the CA. Certmonger will monitor the request,
-and create and update the files as required. For FreeIPA we need to specify
-the Kerberos principal from above to -K. E.g.:
+required), and submit a CSR to the CA. Certmonger will monitor the request,
+and create and update the files as required. For FreeIPA we need to specify
+the Kerberos principal from above to -K. E.g.:
- getcert request -r \
- -K host/$(hostname) \
- -f /etc/ssl/gluster.pem \
- -k /etc/ssl/gluster.key \
- -D $(hostname) \
- -F /etc/ssl/gluster.ca
+ getcert request -r \
+ -K host/$(hostname) \
+ -f /etc/ssl/gluster.pem \
+ -k /etc/ssl/gluster.key \
+ -D $(hostname) \
+ -F /etc/ssl/gluster.ca
Certmonger should print out an ID for the request, e.g.:
- New signing request "20210801190305" added.
+ New signing request "20210801190305" added.
You can check the status of the request with this ID:
- getcert list -i 20210801190147
+ getcert list -i 20210801190147
If the CA approves the CSR and issues the cert, then the previous command
should print a status field with:
- status: MONITORING
+ status: MONITORING
As this point, the key, the cert and the CA bundle should all be in /etc/ssl
-ready for Gluster to use. Certmonger will renew the certificates as
+ready for Gluster to use. Certmonger will renew the certificates as
required for you.
You do not need to manually concatenate certs to a trusted cert bundle and
@@ -123,7 +123,7 @@ You may need to set the certificate depth to allow the CA signed certs to be
used, if there are intermediate CAs in the signing path. E.g., on every server
and client:
- echo "option transport.socket.ssl-cert-depth 3" > /var/lib/glusterd/secure-access
+ echo "option transport.socket.ssl-cert-depth 3" > /var/lib/glusterd/secure-access
This should not be necessary where a local CA (e.g., FreeIPA) has directly
signed the cart.
@@ -133,45 +133,44 @@ signed the cart.
To enable authentication and encryption between clients and brick servers, two
options must be set:
- gluster volume set MYVOLUME client.ssl on
- gluster volume set MYVOLUME server.ssl on
+ gluster volume set MYVOLUME client.ssl on
+ gluster volume set MYVOLUME server.ssl on
->**Note** that the above options affect only the GlusterFS native protocol.
->For foreign protocols such as NFS, SMB, or Swift the encryption will not be
->affected between:
+> **Note** that the above options affect only the GlusterFS native protocol.
+> For foreign protocols such as NFS, SMB, or Swift the encryption will not be
+> affected between:
>
->1. NFS client and Glusterfs NFS Ganesha Server
->2. SMB client and Glusterfs SMB server
+> 1. NFS client and Glusterfs NFS Ganesha Server
+> 2. SMB client and Glusterfs SMB server
>
->While it affects the encryption between the following:
+> While it affects the encryption between the following:
>
->1. NFS Ganesha server and Glusterfs bricks
->2. Glusterfs SMB server and Glusterfs bricks
-
+> 1. NFS Ganesha server and Glusterfs bricks
+> 2. Glusterfs SMB server and Glusterfs bricks
## Using TLS Identities for Authorization
Once TLS has been enabled on the I/O path, TLS identities can be used instead of
-IP addresses or plain usernames to control access to specific volumes. For
+IP addresses or plain usernames to control access to specific volumes. For
example:
- gluster volume set MYVOLUME auth.ssl-allow Zaphod
+ gluster volume set MYVOLUME auth.ssl-allow Zaphod
Here, we're allowing the TLS-authenticated identity "Zaphod" to access MYVOLUME.
This is intentionally identical to the existing "auth.allow" option, except that
-the name is taken from a TLS certificate instead of a command-line string. Note
+the name is taken from a TLS certificate instead of a command-line string. Note
that infelicities in the gluster CLI preclude using names that include spaces,
which would otherwise be allowed.
## Enabling TLS on the Management Path
-Management-daemon traffic is not controlled by an option. Instead, it is
+Management-daemon traffic is not controlled by an option. Instead, it is
controlled by the presence of a file on each machine:
- /var/lib/glusterd/secure-access
+ /var/lib/glusterd/secure-access
Creating this file will cause glusterd connections made from that machine to use
-TLS. Note that even clients must do this to communicate with a remote glusterd
+TLS. Note that even clients must do this to communicate with a remote glusterd
while mounting, but not thereafter.
## Additional Options
@@ -182,22 +181,22 @@ internals.
The first option allows the user to set the certificate depth, as mentioned
above.
- gluster volume set MYVOLUME ssl.certificate-depth 2
+ gluster volume set MYVOLUME ssl.certificate-depth 2
Here, we're setting our certificate depth to two, as in the introductory
-example. By default this value is zero, meaning that only certificates which
+example. By default this value is zero, meaning that only certificates which
are directly specified in the local CA file will be accepted (i.e. no signed
certificates at all).
The second option allows the user to specify the set of allowed TLS ciphers.
- gluster volume set MYVOLUME ssl.cipher-list 'HIGH:!SSLv2'
+ gluster volume set MYVOLUME ssl.cipher-list 'HIGH:!SSLv2'
Cipher lists are negotiated between the two parties to a TLS connection so
-that both sides' security needs are satisfied. In this example, we're setting
+that both sides' security needs are satisfied. In this example, we're setting
the initial cipher list to HIGH, representing ciphers that the cryptography
-community still believes to be unbroken. We are also explicitly disallowing
-ciphers specific to SSL version 2. The default is based on this example but
+community still believes to be unbroken. We are also explicitly disallowing
+ciphers specific to SSL version 2. The default is based on this example but
also excludes CBC-based cipher modes to provide extra mitigation against the
[POODLE][poo] attack.
diff --git a/docs/Administrator-Guide/Setting-Up-Clients.md b/docs/Administrator-Guide/Setting-Up-Clients.md
index 88b10b4..53b46e4 100644
--- a/docs/Administrator-Guide/Setting-Up-Clients.md
+++ b/docs/Administrator-Guide/Setting-Up-Clients.md
@@ -31,12 +31,12 @@ the required modules as follows:
1. Add the FUSE loadable kernel module (LKM) to the Linux kernel:
- `# modprobe fuse`
+ modprobe fuse
2. Verify that the FUSE module is loaded:
- `# dmesg | grep -i fuse `
- `fuse init (API version 7.13)`
+ # dmesg | grep -i fuse
+ fuse init (API version 7.13)
### Installing on Red Hat Package Manager (RPM) Distributions
@@ -45,7 +45,7 @@ To install Gluster Native Client on RPM distribution-based systems
1. Install required prerequisites on the client using the following
command:
- `$ sudo yum -y install openssh-server wget fuse fuse-libs openib libibverbs`
+ sudo yum -y install openssh-server wget fuse fuse-libs openib libibverbs
2. Ensure that TCP and UDP ports 24007 and 24008 are open on all
Gluster servers. Apart from these ports, you need to open one port
@@ -64,13 +64,12 @@ To install Gluster Native Client on RPM distribution-based systems
into effect.
You can use the following chains with iptables:
-~~~
- `$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT `
- `$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -j ACCEPT`
-~~~
- > **Note**
- >
+ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT
+
+ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -j ACCEPT
+
+ > **Note**
> If you already have iptable chains, make sure that the above
> ACCEPT rules precede the DROP rules. This can be achieved by
> providing a lower rule number than the DROP rule.
@@ -84,15 +83,15 @@ To install Gluster Native Client on RPM distribution-based systems
You can download the software at [GlusterFS download page][1].
-4. Install Gluster Native Client on the client.
-
- **Note**
+4. Install Gluster Native Client on the client.
+
+ **Note**
The package versions listed in the example below may not be the latest release. Please refer to the download page to ensure that you have the recently released packages.
-~~~
- `$ sudo rpm -i glusterfs-3.8.5-1.x86_64`
- `$ sudo rpm -i glusterfs-fuse-3.8.5-1.x86_64`
- `$ sudo rpm -i glusterfs-rdma-3.8.5-1.x86_64`
-~~~
+
+ sudo rpm -i glusterfs-3.8.5-1.x86_64
+ sudo rpm -i glusterfs-fuse-3.8.5-1.x86_64
+ sudo rpm -i glusterfs-rdma-3.8.5-1.x86_64
+
> **Note:**
> The RDMA module is only required when using Infiniband.
@@ -102,7 +101,7 @@ To install Gluster Native Client on Debian-based distributions
1. Install OpenSSH Server on each client using the following command:
- `$ sudo apt-get install openssh-server vim wget`
+ sudo apt-get install openssh-server vim wget
2. Download the latest GlusterFS .deb file and checksum to each client.
@@ -112,14 +111,14 @@ To install Gluster Native Client on Debian-based distributions
and compare it against the checksum for that file in the md5sum
file.
- `$ md5sum GlusterFS_DEB_file.deb `
+ md5sum GlusterFS_DEB_file.deb
The md5sum of the packages is available at: [GlusterFS download page][2]
4. Uninstall GlusterFS v3.1 (or an earlier version) from the client
using the following command:
- `$ sudo dpkg -r glusterfs `
+ sudo dpkg -r glusterfs
(Optional) Run `$ sudo dpkg -purge glusterfs `to purge the
configuration files.
@@ -127,11 +126,11 @@ To install Gluster Native Client on Debian-based distributions
5. Install Gluster Native Client on the client using the following
command:
- `$ sudo dpkg -i GlusterFS_DEB_file `
+ sudo dpkg -i GlusterFS_DEB_file
For example:
- `$ sudo dpkg -i glusterfs-3.8.x.deb `
+ sudo dpkg -i glusterfs-3.8.x.deb
6. Ensure that TCP and UDP ports 24007 and 24008 are open on all
Gluster servers. Apart from these ports, you need to open one port
@@ -151,12 +150,11 @@ To install Gluster Native Client on Debian-based distributions
You can use the following chains with iptables:
-~~~
- `$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT `
- `$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -j ACCEPT`
-~~~
-> **Note**
->
+ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT
+
+ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -j ACCEPT
+
+> **Note**
> If you already have iptable chains, make sure that the above
> ACCEPT rules precede the DROP rules. This can be achieved by
> providing a lower rule number than the DROP rule.
@@ -167,10 +165,8 @@ To build and install Gluster Native Client from the source code
1. Create a new directory using the following commands:
-~~~
- `# mkdir glusterfs `
- `# cd glusterfs`
-~~~
+ mkdir glusterfs
+ cd glusterfs
2. Download the source code.
@@ -178,11 +174,11 @@ To build and install Gluster Native Client from the source code
3. Extract the source code using the following command:
- `# tar -xvzf SOURCE-FILE `
+ tar -xvzf SOURCE-FILE
4. Run the configuration utility using the following command:
- `# ./configure `
+ $ ./configure
GlusterFS configure summary
===========================
@@ -198,26 +194,24 @@ To build and install Gluster Native Client from the source code
5. Build the Gluster Native Client software using the following
commands:
-~~~
- `# make `
- `# make install`
-~~~
+
+ make
+ make install`
6. Verify that the correct version of Gluster Native Client is
installed, using the following command:
- `# glusterfs --version`
+ glusterfs --version
## Mounting Volumes
After installing the Gluster Native Client, you need to mount Gluster
volumes to access data. There are two methods you can choose:
-- [Manually Mounting Volumes](#manual-mount)
-- [Automatically Mounting Volumes](#auto-mount)
+- [Manually Mounting Volumes](#manual-mount)
+- [Automatically Mounting Volumes](#auto-mount)
-> **Note**
->
+> **Note**
> Server names selected during creation of Volumes should be resolvable
> in the client machine. You can use appropriate /etc/hosts entries or
> DNS server to resolve server names to IP addresses.
@@ -226,26 +220,25 @@ volumes to access data. There are two methods you can choose:
### Manually Mounting Volumes
-- To mount a volume, use the following command:
+- To mount a volume, use the following command:
- `# mount -t glusterfs HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR`
+ mount -t glusterfs HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR
- For example:
+ For example:
- `# mount -t glusterfs server1:/test-volume /mnt/glusterfs`
+ mount -t glusterfs server1:/test-volume /mnt/glusterfs
- > **Note**
- >
- > The server specified in the mount command is only used to fetch
- > the gluster configuration volfile describing the volume name.
- > Subsequently, the client will communicate directly with the
- > servers mentioned in the volfile (which might not even include the
- > one used for mount).
- >
- > If you see a usage message like "Usage: mount.glusterfs", mount
- > usually requires you to create a directory to be used as the mount
- > point. Run "mkdir /mnt/glusterfs" before you attempt to run the
- > mount command listed above.
+ > **Note**
+ > The server specified in the mount command is only used to fetch
+ > the gluster configuration volfile describing the volume name.
+ > Subsequently, the client will communicate directly with the
+ > servers mentioned in the volfile (which might not even include the
+ > one used for mount).
+ >
+ > If you see a usage message like "Usage: mount.glusterfs", mount
+ > usually requires you to create a directory to be used as the mount
+ > point. Run "mkdir /mnt/glusterfs" before you attempt to run the
+ > mount command listed above.
**Mounting Options**
@@ -253,7 +246,7 @@ You can specify the following options when using the
`mount -t glusterfs` command. Note that you need to separate all options
with commas.
-~~~
+```text
backupvolfile-server=server-name
volfile-max-fetch-attempts=number of attempts
@@ -268,11 +261,11 @@ direct-io-mode=[enable|disable]
use-readdirp=[yes|no]
-~~~
+```
For example:
-`# mount -t glusterfs -o backupvolfile-server=volfile_server2,use-readdirp=no,volfile-max-fetch-attempts=2,log-level=WARNING,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs`
+`mount -t glusterfs -o backupvolfile-server=volfile_server2,use-readdirp=no,volfile-max-fetch-attempts=2,log-level=WARNING,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs`
If `backupvolfile-server` option is added while mounting fuse client,
when the first volfile server fails, then the server specified in
@@ -288,6 +281,7 @@ If `use-readdirp` is set to ON, it forces the use of readdirp
mode in fuse kernel module
+
### Automatically Mounting Volumes
You can configure your system to automatically mount the Gluster volume
@@ -298,21 +292,21 @@ gluster configuration volfile describing the volume name. Subsequently,
the client will communicate directly with the servers mentioned in the
volfile (which might not even include the one used for mount).
-- To mount a volume, edit the /etc/fstab file and add the following
- line:
+- To mount a volume, edit the /etc/fstab file and add the following
+ line:
- `HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR glusterfs defaults,_netdev 0 0 `
+ `HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR glusterfs defaults,_netdev 0 0 `
- For example:
+ For example:
- `server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0`
+ `server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0`
**Mounting Options**
You can specify the following options when updating the /etc/fstab file.
Note that you need to separate all options with commas.
-~~~
+```
log-level=loglevel
log-file=logfile
@@ -322,7 +316,7 @@ transport=transport-type
direct-io-mode=[enable|disable]
use-readdirp=no
-~~~
+```
For example:
@@ -332,40 +326,41 @@ For example:
To test mounted volumes
-- Use the following command:
+- Use the following command:
- `# mount `
+ `# mount `
- If the gluster volume was successfully mounted, the output of the
- mount command on the client will be similar to this example:
+ If the gluster volume was successfully mounted, the output of the
+ mount command on the client will be similar to this example:
- `server1:/test-volume on /mnt/glusterfs type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072`
+ `server1:/test-volume on /mnt/glusterfs type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072`
-- Use the following command:
+- Use the following command:
- `# df`
+ `# df`
- The output of df command on the client will display the aggregated
- storage space from all the bricks in a volume similar to this
- example:
+ The output of df command on the client will display the aggregated
+ storage space from all the bricks in a volume similar to this
+ example:
- # df -h /mnt/glusterfs
- Filesystem Size Used Avail Use% Mounted on
- server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
+ # df -h /mnt/glusterfs
+ Filesystem Size Used Avail Use% Mounted on
+ server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
-- Change to the directory and list the contents by entering the
- following:
-~~~
+- Change to the directory and list the contents by entering the
+ following:
+
+```
`# cd MOUNTDIR `
`# ls`
-~~~
+```
-- For example,
+- For example,
-~~~
+```
`# cd /mnt/glusterfs `
`# ls`
-~~~
+```
# NFS
@@ -388,59 +383,59 @@ mounted successfully.
## Using NFS to Mount Volumes
-
You can use either of the following methods to mount Gluster volumes:
-- [Manually Mounting Volumes Using NFS](#manual-nfs)
-- [Automatically Mounting Volumes Using NFS](#auto-nfs)
+- [Manually Mounting Volumes Using NFS](#manual-nfs)
+- [Automatically Mounting Volumes Using NFS](#auto-nfs)
**Prerequisite**: Install nfs-common package on both servers and clients
(only for Debian-based distribution), using the following command:
-`$ sudo aptitude install nfs-common `
+ sudo aptitude install nfs-common
+
### Manually Mounting Volumes Using NFS
**To manually mount a Gluster volume using NFS**
-- To mount a volume, use the following command:
+- To mount a volume, use the following command:
- `# mount -t nfs -o vers=3 HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR`
+ mount -t nfs -o vers=3 HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR
- For example:
+ For example:
- `# mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs`
+ mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs
- > **Note**
- >
- > Gluster NFS server does not support UDP. If the NFS client you are
- > using defaults to connecting using UDP, the following message
- > appears:
- >
- > `requested NFS version or transport protocol is not supported`.
+ > **Note**
+ > Gluster NFS server does not support UDP. If the NFS client you are
+ > using defaults to connecting using UDP, the following message
+ > appears:
+ >
+ > `requested NFS version or transport protocol is not supported`.
- **To connect using TCP**
+ **To connect using TCP**
-- Add the following option to the mount command:
+- Add the following option to the mount command:
- `-o mountproto=tcp `
+ `-o mountproto=tcp `
- For example:
+ For example:
- `# mount -o mountproto=tcp -t nfs server1:/test-volume /mnt/glusterfs`
+ mount -o mountproto=tcp -t nfs server1:/test-volume /mnt/glusterfs
**To mount Gluster NFS server from a Solaris client**
-- Use the following command:
+- Use the following command:
- `# mount -o proto=tcp,vers=3 nfs://HOSTNAME-OR-IPADDRESS:38467/VOLNAME MOUNTDIR`
+ mount -o proto=tcp,vers=3 nfs://HOSTNAME-OR-IPADDRESS:38467/VOLNAME MOUNTDIR
- For example:
+ For example:
- ` # mount -o proto=tcp,vers=3 nfs://server1:38467/test-volume /mnt/glusterfs`
+ mount -o proto=tcp,vers=3 nfs://server1:38467/test-volume /mnt/glusterfs
+
### Automatically Mounting Volumes Using NFS
You can configure your system to automatically mount Gluster volumes
@@ -448,32 +443,31 @@ using NFS each time the system starts.
**To automatically mount a Gluster volume using NFS**
-- To mount a volume, edit the /etc/fstab file and add the following
- line:
+- To mount a volume, edit the /etc/fstab file and add the following
+ line:
- `HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR nfs defaults,_netdev,vers=3 0 0`
+ HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR nfs defaults,_netdev,vers=3 0 0
- For example,
+ For example,
- `server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,vers=3 0 0`
+ `server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,vers=3 0 0`
- > **Note**
- >
- > Gluster NFS server does not support UDP. If the NFS client you are
- > using defaults to connecting using UDP, the following message
- > appears:
- >
- > `requested NFS version or transport protocol is not supported.`
+ > **Note**
+ > Gluster NFS server does not support UDP. If the NFS client you are
+ > using defaults to connecting using UDP, the following message
+ > appears:
+ >
+ > `requested NFS version or transport protocol is not supported.`
- To connect using TCP
+ To connect using TCP
-- Add the following entry in /etc/fstab file :
+- Add the following entry in /etc/fstab file :
- `HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR nfs defaults,_netdev,mountproto=tcp 0 0`
+ HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR nfs defaults,_netdev,mountproto=tcp 0 0
- For example,
+ For example,
- `server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0`
+ `server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0`
**To automount NFS mounts**
@@ -488,31 +482,31 @@ You can confirm that Gluster directories are mounting successfully.
**To test mounted volumes**
-- Use the mount command by entering the following:
+- Use the mount command by entering the following:
- `# mount`
+ `# mount`
- For example, the output of the mount command on the client will
- display an entry like the following:
+ For example, the output of the mount command on the client will
+ display an entry like the following:
- `server1:/test-volume on /mnt/glusterfs type nfs (rw,vers=3,addr=server1)`
+ `server1:/test-volume on /mnt/glusterfs type nfs (rw,vers=3,addr=server1)`
-- Use the df command by entering the following:
+- Use the df command by entering the following:
- `# df`
+ `# df`
- For example, the output of df command on the client will display the
- aggregated storage space from all the bricks in a volume.
+ For example, the output of df command on the client will display the
+ aggregated storage space from all the bricks in a volume.
- # df -h /mnt/glusterfs
- Filesystem Size Used Avail Use% Mounted on
- server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
+ # df -h /mnt/glusterfs
+ Filesystem Size Used Avail Use% Mounted on
+ server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
-- Change to the directory and list the contents by entering the
- following:
+- Change to the directory and list the contents by entering the
+ following:
- `# cd MOUNTDIR`
- `# ls`
+ `# cd MOUNTDIR`
+ `# ls`
# CIFS
@@ -535,14 +529,15 @@ verify that the volume has mounted successfully.
You can use either of the following methods to mount Gluster volumes:
-- [Exporting Gluster Volumes Through Samba](#export-samba)
-- [Manually Mounting Volumes Using CIFS](#cifs-manual)
-- [Automatically Mounting Volumes Using CIFS](#cifs-auto)
+- [Exporting Gluster Volumes Through Samba](#export-samba)
+- [Manually Mounting Volumes Using CIFS](#cifs-manual)
+- [Automatically Mounting Volumes Using CIFS](#cifs-auto)
You can also use Samba for exporting Gluster Volumes through CIFS
protocol.
+
### Exporting Gluster Volumes Through Samba
We recommend you to use Samba for exporting Gluster volumes through the
@@ -560,7 +555,7 @@ CIFS protocol.
smb.conf file in an editor and add the following lines for a simple
configuration:
-~~~
+```
[glustertest]
comment = For testing a Gluster volume exported through CIFS
@@ -570,14 +565,14 @@ CIFS protocol.
read only = no
guest ok = yes
-~~~
+```
Save the changes and start the smb service using your systems init
scripts (/etc/init.d/smb [re]start). Abhove steps is needed for doing
-multiple mount. If you want only samba mount then in your smb.conf you
+multiple mount. If you want only samba mount then in your smb.conf you
need to add
-~~~
+```
kernel share modes = no
kernel oplocks = no
map archive = no
@@ -585,8 +580,7 @@ need to add
map read only = no
map system = no
store dos attributes = yes
-~~~
-
+```
> **Note**
>
@@ -595,6 +589,7 @@ need to add
> configurations, see Samba documentation.
+
### Manually Mounting Volumes Using CIFS
You can manually mount Gluster volumes using CIFS on Microsoft
@@ -618,6 +613,7 @@ Alternatively, to manually mount a Gluster volume using CIFS by going to
**Start \> Run** and entering Network path manually.
+
### Automatically Mounting Volumes Using CIFS
You can configure your system to automatically mount Gluster volumes
diff --git a/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md b/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md
index 601f8d2..10e02b5 100644
--- a/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md
+++ b/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md
@@ -1,59 +1,73 @@
# Split brain and the ways to deal with it
### Split brain:
+
Split brain is a situation where two or more replicated copies of a file become divergent. When a file is in split brain, there is an inconsistency in either data or metadata of the file amongst the bricks of a replica and do not have enough information to authoritatively pick a copy as being pristine and heal the bad copies, despite all bricks being up and online. For a directory, there is also an entry split brain where a file inside it can have different gfid/file-type across the bricks of a replica. Split brain can happen mainly because of 2 reasons:
-1. Due to network disconnect:
-Where a client temporarily loses connection to the bricks.
- - There is a replica pair of 2 bricks, brick1 on server1 and brick2 on server2.
- - Client1 loses connection to brick2 and client2 loses connection to brick1 due to network split.
- - Writes from client1 goes to brick1 and from client2 goes to brick2, which is nothing but split-brain.
-2. Gluster brick processes going down or returning error:
- - Server1 is down and server2 is up: Writes happen on server 2.
- - Server1 comes up, server2 goes down (Heal not happened / data on server 2 is not replicated on server1): Writes happen on server1.
- - Server2 comes up: Both server1 and server2 has data independent of each other.
+
+- Due to network disconnect Where a client temporarily loses connection to the bricks.
+
+ > 1. There is a replica pair of 2 bricks, brick1 on server1 and brick2 on server2.
+
+ > 2. Client1 loses connection to brick2 and client2 loses connection to brick1 due to network split.
+
+ > 3. Writes from client1 goes to brick1 and from client2 goes to brick2, which is nothing but split-brain.
+
+- Gluster brick processes going down or returning error:
+
+ > 1. Server1 is down and server2 is up: Writes happen on server 2.
+
+ > 2. Server1 comes up, server2 goes down (Heal not happened / data on server 2 is not replicated on server1): Writes happen on server1.
+
+ > 3. Server2 comes up: Both server1 and server2 has data independent of each other.
If we use the replica 2 volume, it is not possible to prevent split-brain without losing availability.
### Ways to deal with split brain:
+
In glusterfs there are ways to resolve split brain. You can see the detailed description of how to resolve a split-brain [here](../Troubleshooting/resolving-splitbrain.md). Moreover, there are ways to reduce the chances of ending up in split-brain situations. They are:
+
1. Replica 3 volume
2. Arbiter volume
Both of these uses the client-quorum option of glusterfs to avoid the split-brain situations.
### Client quorum:
+
This is a feature implemented in Automatic File Replication (AFR here on) module, to prevent split-brains in the I/O path for replicate/distributed-replicate volumes. By default, if the client-quorum is not met for a particular replica subvol, it becomes read-only. The other subvols (in a dist-rep volume) will still have R/W access. [Here](arbiter-volumes-and-quorum.md#client-quorum) you can see more details about client-quorum.
#### Client quorum in replica 2 volumes:
+
In a replica 2 volume it is not possible to achieve high availability and consistency at the same time, without sacrificing tolerance to partition. If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only.
-If the quorum-type is set to fixed, and the quorum-count is set to 1, then we may end up in split brain.
- - Brick1 is up and brick2 is down. Quorum is met and write happens on brick1.
- - Brick1 goes down and brick2 comes up (No heal happened). Quorum is met, write happens on brick2.
- - Brick1 comes up. Quorum is met, but both the bricks have independent writes - split-brain.
+If the quorum-type is set to fixed, and the quorum-count is set to 1, then we may end up in split brain. - Brick1 is up and brick2 is down. Quorum is met and write happens on brick1. - Brick1 goes down and brick2 comes up (No heal happened). Quorum is met, write happens on brick2. - Brick1 comes up. Quorum is met, but both the bricks have independent writes - split-brain.
To avoid this we have to set the quorum-count to 2, which will cost the availability. Even if we have one replica brick up and running, the quorum is not met and we end up seeing EROFS.
### 1. Replica 3 volume:
+
When we create a replicated or distributed replicated volume with replica count 3, the cluster.quorum-type option is set to auto by default. That means at least 2 bricks should be up and running to satisfy the quorum and allow the writes. This is the recommended setting for a replica 3 volume and this should not be changed. Here is how it prevents files from ending up in split brain:
B1, B2, and B3 are the 3 bricks of a replica 3 volume.
+
1. B1 & B2 are up and B3 is down. Quorum is met and write happens on B1 & B2.
2. B3 comes up and B2 is down. Quorum is met and write happens on B1 & B3.
3. B2 comes up and B1 goes down. Quorum is met. But when a write request comes, AFR sees that B2 & B3 are blaming each other (B2 says that some writes are pending on B3 and B3 says that some writes are pending on B2), therefore the write is not allowed and is failed with EIO.
Command to create a replica 3 volume:
+
```sh
-$gluster volume create replica 3 host1:brick1 host2:brick2 host3:brick3
+gluster volume create replica 3 host1:brick1 host2:brick2 host3:brick3
```
### 2. Arbiter volume:
+
Arbiter offers the sweet spot between replica 2 and replica 3, where user wants the split-brain protection offered by replica 3 but does not want to invest in 3x storage space. Arbiter is also an replica 3 volume where the third brick of the replica is automatically configured as an arbiter node. This means that the third brick stores only the file name and metadata, but not any data. This will help in avoiding split brain while providing the same level of consistency as a normal replica 3 volume.
Command to create a arbiter volume:
+
```sh
-$gluster volume create replica 3 arbiter 1 host1:brick1 host2:brick2 host3:brick3
+gluster volume create replica 3 arbiter 1 host1:brick1 host2:brick2 host3:brick3
```
-The only difference in the command is, we need to add one more keyword ``` arbiter 1 ``` after the replica count. Since it is also a replica 3 volume, the cluster.quorum-type option is set to auto by default and at least 2 bricks should be up to satisfy the quorum and allow writes.
+The only difference in the command is, we need to add one more keyword `arbiter 1` after the replica count. Since it is also a replica 3 volume, the cluster.quorum-type option is set to auto by default and at least 2 bricks should be up to satisfy the quorum and allow writes.
Since the arbiter brick has only name and metadata of the files, there are some more checks to guarantee consistency. Arbiter works as follows:
1. Clients take full file locks while writing (replica 3 takes range locks).
@@ -65,6 +79,7 @@ Since the arbiter brick has only name and metadata of the files, there are some
You can find more details on arbiter [here](arbiter-volumes-and-quorum.md).
### Differences between replica 3 and arbiter volumes:
+
1. In case of a replica 3 volume, we store the entire file in all the bricks and it is recommended to have bricks of same size. But in case of arbiter, since we do not store data, the size of the arbiter brick is comparatively lesser than the other bricks.
2. Arbiter is a state between replica 2 and replica 3 volume. If we have only arbiter and one of the other brick is up and the arbiter brick blames the other brick, then we can not proceed with the FOPs.
-4. Replica 3 gives high availability compared to arbiter, because unlike in arbiter, replica 3 has a full copy of the data in all 3 bricks.
+3. Replica 3 gives high availability compared to arbiter, because unlike in arbiter, replica 3 has a full copy of the data in all 3 bricks.
diff --git a/docs/Administrator-Guide/Start-Stop-Daemon.md b/docs/Administrator-Guide/Start-Stop-Daemon.md
index 2fbf712..85c796c 100644
--- a/docs/Administrator-Guide/Start-Stop-Daemon.md
+++ b/docs/Administrator-Guide/Start-Stop-Daemon.md
@@ -19,53 +19,47 @@ following ways:
## Distributions with systemd
+
### Starting and stopping glusterd manually
+
- To start `glusterd` manually:
-```console
-systemctl start glusterd
-```
+ systemctl start glusterd
- To stop `glusterd` manually:
-```console
-systemctl stop glusterd
-```
+ systemctl stop glusterd
+
### Starting glusterd automatically
+
- To enable the glusterd service and start it if stopped:
-```console
-systemctl enable --now glusterd
-```
+ systemctl enable --now glusterd
- To disable the glusterd service and stop it if started:
-```console
-systemctl disable --now glusterd
-```
+ systemctl disable --now glusterd
## Distributions without systemd
+
### Starting and stopping glusterd manually
This section describes how to start and stop glusterd manually
- To start glusterd manually, enter the following command:
-```console
-# /etc/init.d/glusterd start
-```
+ /etc/init.d/glusterd start
-- To stop glusterd manually, enter the following command:
+- To stop glusterd manually, enter the following command:
-```console
-# /etc/init.d/glusterd stop
-```
+ /etc/init.d/glusterd stop
+
### Starting glusterd Automatically
This section describes how to configure the system to automatically
@@ -78,7 +72,7 @@ service every time the system boots, enter the following from the
command line:
```console
-# chkconfig glusterd on
+chkconfig glusterd on
```
#### Debian and derivatives like Ubuntu
@@ -88,7 +82,7 @@ service every time the system boots, enter the following from the
command line:
```console
-# update-rc.d glusterd defaults
+update-rc.d glusterd defaults
```
#### Systems Other than Red Hat and Debian
@@ -98,5 +92,5 @@ the glusterd service every time the system boots, enter the following
entry to the*/etc/rc.local* file:
```console
-# echo "glusterd" >> /etc/rc.local
+echo "glusterd" >> /etc/rc.local
```
diff --git a/docs/Administrator-Guide/Storage-Pools.md b/docs/Administrator-Guide/Storage-Pools.md
index d953384..e3a9644 100644
--- a/docs/Administrator-Guide/Storage-Pools.md
+++ b/docs/Administrator-Guide/Storage-Pools.md
@@ -1,6 +1,5 @@
# Managing Trusted Storage Pools
-
### Overview
A trusted storage pool(TSP) is a trusted network of storage servers. Before you can configure a
@@ -11,19 +10,19 @@ The servers in a TSP are peers of each other.
After installing Gluster on your servers and before creating a trusted storage pool,
each server belongs to a storage pool consisting of only that server.
-- [Adding Servers](#adding-servers)
-- [Listing Servers](#listing-servers)
-- [Viewing Peer Status](#peer-status)
-- [Removing Servers](#removing-servers)
-
-
+- [Managing Trusted Storage Pools](#managing-trusted-storage-pools)
+ - [Overview](#overview)
+ - [Adding Servers](#adding-servers)
+ - [Listing Servers](#listing-servers)
+ - [Viewing Peer Status](#viewing-peer-status)
+ - [Removing Servers](#removing-servers)
**Before you start**:
- The servers used to create the storage pool must be resolvable by hostname.
- The glusterd daemon must be running on all storage servers that you
-want to add to the storage pool. See [Managing the glusterd Service](./Start-Stop-Daemon.md) for details.
+ want to add to the storage pool. See [Managing the glusterd Service](./Start-Stop-Daemon.md) for details.
- The firewall on the servers must be configured to allow access to port 24007.
@@ -31,6 +30,7 @@ The following commands were run on a TSP consisting of 3 servers - server1, serv
and server3.
+
### Adding Servers
To add a server to a TSP, peer probe it from a server already in the pool.
@@ -59,9 +59,8 @@ Verify the peer status from the first server (server1):
Uuid: 3e0cabaa-9df7-4f66-8e5d-cbc348f29ff7
State: Peer in Cluster (Connected)
-
-
+
### Listing Servers
To list all nodes in the TSP:
@@ -73,9 +72,8 @@ To list all nodes in the TSP:
1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 server3 Connected
3e0cabaa-9df7-4f66-8e5d-cbc348f29ff7 server4 Connected
-
-
+
### Viewing Peer Status
To view the status of the peers in the TSP:
@@ -95,9 +93,8 @@ To view the status of the peers in the TSP:
Uuid: 3e0cabaa-9df7-4f66-8e5d-cbc348f29ff7
State: Peer in Cluster (Connected)
-
-
+
### Removing Servers
To remove a server from the TSP, run the following command from another server in the pool:
@@ -109,7 +106,6 @@ For example, to remove server4 from the trusted storage pool:
server1# gluster peer detach server4
Detach successful
-
Verify the peer status:
server1# gluster peer status
diff --git a/docs/Administrator-Guide/Thin-Arbiter-Volumes.md b/docs/Administrator-Guide/Thin-Arbiter-Volumes.md
index 31a145a..f80de31 100644
--- a/docs/Administrator-Guide/Thin-Arbiter-Volumes.md
+++ b/docs/Administrator-Guide/Thin-Arbiter-Volumes.md
@@ -9,13 +9,13 @@ all files as a whole. So, even different file, if the write fails
on the other data brick but succeeds on this 'bad' brick we will return
failure for the write.
-
+- [Thin Arbiter volumes in gluster](#thin-arbiter-volumes-in-gluster)
- [Why Thin Arbiter?](#why-thin-arbiter)
- [Setting UP Thin Arbiter Volume](#setting-up-thin-arbiter-volume)
- [How Thin Arbiter works](#how-thin-arbiter-works)
-
# Why Thin Arbiter?
+
This is a solution for handling stretch cluster kind of workload,
but it can be used for regular workloads as well in case users are
satisfied with this kind of quorum in comparison to arbiter/3-way-replication.
@@ -31,28 +31,34 @@ thin-arbiter only in the case of first failure until heal completes.
# Setting UP Thin Arbiter Volume
The command to run thin-arbiter process on node:
+
+```console
+/usr/local/sbin/glusterfsd -N --volfile-id ta-vol -f /var/lib/glusterd/vols/thin-arbiter.vol --brick-port 24007 --xlator-option ta-vol-server.transport.socket.listen-port=24007
```
-#/usr/local/sbin/glusterfsd -N --volfile-id ta-vol -f /var/lib/glusterd/vols/thin-arbiter.vol --brick-port 24007 --xlator-option ta-vol-server.transport.socket.listen-port=24007
-```
+
Creating a thin arbiter replica 2 volume:
+
+```console
+glustercli volume create --replica 2 : : --thin-arbiter :
```
-#glustercli volume create --replica 2 : : --thin-arbiter :
-```
+
For example:
-```
+
+```console
glustercli volume create testvol --replica 2 server{1..2}:/bricks/brick-{1..2} --thin-arbiter server-3:/bricks/brick_ta --force
volume create: testvol: success: please start the volume to access data
```
# How Thin Arbiter works
+
There will be only one process running on thin arbiter node which will be
used to update replica id file for all replica pairs across all volumes.
Replica id file contains the information of good and bad data bricks in the
form of xattrs. Replica pairs will use its respective replica-id file that
is going to be created during mount.
-1) Read Transactions:
-Reads are allowed when quorum is met. i.e.
+1. Read Transactions:
+ Reads are allowed when quorum is met. i.e.
- When all data bricks and thin arbiter are up: Perform lookup on data bricks to figure out good/bad bricks and
serve content from the good brick.
@@ -65,7 +71,7 @@ Reads are allowed when quorum is met. i.e.
done on the data brick to check if the file is really healthy or not. If the file is good, data will be served from
this brick else an EIO error would be returned to user.
-2) Write transactions:
- Thin arbiter doesn’t participate in I/O, transaction will choose to wind operations on thin-arbiter brick to
- make sure the necessary metadata is kept up-to-date in case of failures. Operation failure will lead to
- updating the replica-id file on thin-arbiter with source/sink information in the xattrs just how it happens in AFR.
+2. Write transactions:
+ Thin arbiter doesn’t participate in I/O, transaction will choose to wind operations on thin-arbiter brick to
+ make sure the necessary metadata is kept up-to-date in case of failures. Operation failure will lead to
+ updating the replica-id file on thin-arbiter with source/sink information in the xattrs just how it happens in AFR.
diff --git a/docs/Administrator-Guide/Trash.md b/docs/Administrator-Guide/Trash.md
index 6dc6807..7ca102f 100644
--- a/docs/Administrator-Guide/Trash.md
+++ b/docs/Administrator-Guide/Trash.md
@@ -1,80 +1,85 @@
-Trash Translator
-================
+# Trash Translator
+
Trash translator will allow users to access deleted or truncated files. Every brick will maintain a hidden .trashcan directory, which will be used to store the files deleted or truncated from the respective brick. The aggregate of all those .trashcan directories can be accessed from the mount point. To avoid name collisions, a timestamp is appended to the original file name while it is being moved to the trash directory.
## Implications and Usage
+
Apart from the primary use-case of accessing files deleted or truncated by the user, the trash translator can be helpful for internal operations such as self-heal and rebalance. During self-heal and rebalance it is possible to lose crucial data. In those circumstances, the trash translator can assist in the recovery of the lost data. The trash translator is designed to intercept unlink, truncate and ftruncate fops, store a copy of the current file in the trash directory, and then perform the fop on the original file. For the internal operations, the files are stored under the 'internal_op' folder inside the trash directory.
## Volume Options
-* ***`gluster volume set features.trash `***
+- **_`gluster volume set features.trash `_**
- This command can be used to enable a trash translator in a volume. If set to on, a trash directory will be created in every brick inside the volume during the volume start command. By default, a translator is loaded during volume start but remains non-functional. Disabling trash with the help of this option will not remove the trash directory or even its contents from the volume.
+ This command can be used to enable a trash translator in a volume. If set to on, a trash directory will be created in every brick inside the volume during the volume start command. By default, a translator is loaded during volume start but remains non-functional. Disabling trash with the help of this option will not remove the trash directory or even its contents from the volume.
-* ***`gluster volume set features.trash-dir `***
+- **_`gluster volume set features.trash-dir `_**
- This command is used to reconfigure the trash directory to a user-specified name. The argument is a valid directory name. The directory will be created inside every brick under this name. If not specified by the user, the trash translator will create the trash directory with the default name “.trashcan”. This can be used only when the trash-translator is on.
+ This command is used to reconfigure the trash directory to a user-specified name. The argument is a valid directory name. The directory will be created inside every brick under this name. If not specified by the user, the trash translator will create the trash directory with the default name “.trashcan”. This can be used only when the trash-translator is on.
-* ***`gluster volume set features.trash-max-filesize `***
+- **_`gluster volume set features.trash-max-filesize `_**
- This command can be used to filter files entering the trash directory based on their size. Files above trash_max_filesize are deleted/truncated directly. Value for size may be followed by multiplicative suffixes as KB(=1024 bytes), MB(=1024\*1024 bytes) ,and GB(=1024\*1024\*1024 bytes). The default size is set to 5MB.
+ This command can be used to filter files entering the trash directory based on their size. Files above trash_max_filesize are deleted/truncated directly. Value for size may be followed by multiplicative suffixes as KB(=1024 bytes), MB(=1024\*1024 bytes) ,and GB(=1024\*1024\*1024 bytes). The default size is set to 5MB.
-* ***`gluster volume set features.trash-eliminate-path [ , , . . . ]`***
+- **_`gluster volume set features.trash-eliminate-path [ , , . . . ]`_**
- This command can be used to set the eliminate pattern for the trash translator. Files residing under this pattern will not be moved to the trash directory during deletion/truncation. The path must be a valid one present in the volume.
+ This command can be used to set the eliminate pattern for the trash translator. Files residing under this pattern will not be moved to the trash directory during deletion/truncation. The path must be a valid one present in the volume.
-* ***`gluster volume set features.trash-internal-op `***
+- **_`gluster volume set features.trash-internal-op `_**
- This command can be used to enable trash for internal operations like self-heal and re-balance. By default set to off.
+ This command can be used to enable trash for internal operations like self-heal and re-balance. By default set to off.
## Sample usage
+
The following steps give illustrates a simple scenario of deletion of a file from a directory
-1. Create a simple distributed volume and start it.
+1. Create a simple distributed volume and start it.
- # gluster volume create test rhs:/home/brick
- # gluster volume start test
+ gluster volume create test rhs:/home/brick
+ gluster volume start test
-2. Enable trash translator
+2. Enable trash translator
- # gluster volume set test features.trash on
+ gluster volume set test features.trash on
-3. Mount glusterfs volume via native client as follows.
+3. Mount glusterfs volume via native client as follows.
- # mount -t glusterfs rhs:test /mnt
+ mount -t glusterfs rhs:test /mnt
-4. Create a directory and file in the mount.
+4. Create a directory and file in the mount.
- # mkdir mnt/dir
- # echo abc > mnt/dir/file
+ mkdir mnt/dir
+ echo abc > mnt/dir/file
-5. Delete the file from the mount.
+5. Delete the file from the mount.
- # rm mnt/dir/file -rf
+ rm mnt/dir/file -rf
-6. Checkout inside the trash directory.
+6. Checkout inside the trash directory.
- # ls mnt/.trashcan
+ ls mnt/.trashcan
We can find the deleted file inside the trash directory with a timestamp appending on its filename.
For example,
```console
-# mount -t glusterfs rh-host:/test /mnt/test
-# mkdir /mnt/test/abc
-# touch /mnt/test/abc/file
-# rm /mnt/test/abc/file
-remove regular empty file ‘/mnt/test/abc/file’? y
-# ls /mnt/test/abc
-#
-# ls /mnt/test/.trashcan/abc/
-file2014-08-21_123400
+mount -t glusterfs rh-host:/test /mnt/test
+mkdir /mnt/test/abc
+touch /mnt/test/abc/file
+rm -f /mnt/test/abc/file
+
+ls /mnt/test/abc
+
+ls /mnt/test/.trashcan/abc/
```
+You will see `file2014-08-21_123400` as the output of the last `ls` command.
+
#### Points to be remembered
-* As soon as the volume is started, the trash directory will be created inside the volume and will be visible through the mount. Disabling the trash will not have any impact on its visibility from the mount.
-* Even though deletion of trash-directory is not permitted, currently residing trash contents will be removed on issuing delete on it and only an empty trash-directory exists.
+
+- As soon as the volume is started, the trash directory will be created inside the volume and will be visible through the mount. Disabling the trash will not have any impact on its visibility from the mount.
+- Even though deletion of trash-directory is not permitted, currently residing trash contents will be removed on issuing delete on it and only an empty trash-directory exists.
#### Known issue
+
Since trash translator resides on the server side higher translators like AFR, DHT are unaware of rename and truncate operations being done by this translator which eventually moves the files to trash directory. Unless and until a complete-path-based lookup comes on trashed files, those may not be visible from the mount.
diff --git a/docs/Administrator-Guide/Tuning-Volume-Options.md b/docs/Administrator-Guide/Tuning-Volume-Options.md
index 6020f12..cbf0e99 100644
--- a/docs/Administrator-Guide/Tuning-Volume-Options.md
+++ b/docs/Administrator-Guide/Tuning-Volume-Options.md
@@ -1,4 +1,3 @@
-
You can tune volume options, as needed, while the cluster is online and
@@ -34,130 +33,130 @@ description and default value:
> The default options given here are subject to modification at any
> given time and may not be the same for all versions.
-Type | Option | Description | Default Value | Available Options
---- | --- | --- | --- | ---
- | auth.allow | IP addresses of the clients which should be allowed to access the volume. | \* (allow all) | Valid IP address which includes wild card patterns including \*, such as 192.168.1.\*
- | auth.reject | IP addresses of the clients which should be denied to access the volume. | NONE (reject none) | Valid IP address which includes wild card patterns including \*, such as 192.168.2.\*
-Cluster | cluster.self-heal-window-size | Specifies the maximum number of blocks per file on which self-heal would happen simultaneously. | 1 | 0 - 1024 blocks
- | cluster.data-self-heal-algorithm | Specifies the type of self-heal. If you set the option as "full", the entire file is copied from source to destinations. If the option is set to "diff" the file blocks that are not in sync are copied to destinations. Reset uses a heuristic model. If the file does not exist on one of the subvolumes, or a zero-byte file exists (created by entry self-heal) the entire content has to be copied anyway, so there is no benefit from using the "diff" algorithm. If the file size is about the same as page size, the entire file can be read and written with a few operations, which will be faster than "diff" which has to read checksums and then read and write. | reset | full/diff/reset
- | cluster.min-free-disk | Specifies the percentage of disk space that must be kept free. Might be useful for non-uniform bricks | 10% | Percentage of required minimum free disk space
- | cluster.min-free-inodes | Specifies when system has only N% of inodes remaining, warnings starts to appear in log files | 10% | Percentage of required minimum free inodes
- | cluster.stripe-block-size | Specifies the size of the stripe unit that will be read from or written to. | 128 KB (for all files) | size in bytes
- | cluster.self-heal-daemon | Allows you to turn-off proactive self-heal on replicated | On | On/Off
- | cluster.ensure-durability | This option makes sure the data/metadata is durable across abrupt shutdown of the brick. | On | On/Off
- | cluster.lookup-unhashed | This option does a lookup through all the sub-volumes, in case a lookup didn’t return any result from the hashed subvolume. If set to OFF, it does not do a lookup on the remaining subvolumes. | on | auto, yes/no, enable/disable, 1/0, on/off
- | cluster.lookup-optimize | This option enables the optimization of -ve lookups, by not doing a lookup on non-hashed subvolumes for files, in case the hashed subvolume does not return any result. This option disregards the lookup-unhashed setting, when enabled. | on | on/off
- | cluster.randomize-hash-range-by-gfid | Allows to use gfid of directory to determine the subvolume from which hash ranges are allocated starting with 0. Note that we still use a directory/file’s name to determine the subvolume to which it hashes | off | on/off
- | cluster.rebal-throttle | Sets the maximum number of parallel file migrations allowed on a node during the rebalance operation. The default value is normal and allows 2 files to be migrated at a time. Lazy will allow only one file to be migrated at a time and aggressive will allow maxof[(((processing units) - 4) / 2), 4] | normal | lazy/normal/aggressive
- | cluster.background-self-heal-count | Specifies the number of per client self-heal jobs that can perform parallel heals in the background. | 8 | 0-256
- | cluster.heal-timeout | Time interval for checking the need to self-heal in self-heal-daemon | 600 | 5-(signed-int)
- | cluster.eager-lock | If eager-lock is off, locks release immediately after file operations complete, improving performance for some operations, but reducing access efficiency | on | on/off
- | cluster.quorum-type | If value is “fixed” only allow writes if quorum-count bricks are present. If value is “auto” only allow writes if more than half of bricks, or exactly half including the first brick, are present | none | none/auto/fixed
- | cluster.quorum-count | If quorum-type is “fixed” only allow writes if this many bricks are present. Other quorum types will OVERWRITE this value | null | 1-(signed-int)
- | cluster.heal-wait-queue-length | Specifies the number of heals that can be queued for the parallel background self heal jobs. | 128 | 0-10000
- | cluster.favorite-child-policy | Specifies which policy can be used to automatically resolve split-brains without user intervention. “size” picks the file with the biggest size as the source. “ctime” and “mtime” pick the file with the latest ctime and mtime respectively as the source. “majority” picks a file with identical mtime and size in more than half the number of bricks in the replica. | none | none/size/ctime/mtime/majority
- | cluster.use-anonymous-inode | Setting this option heals directory renames efficiently | no | no/yes
-Disperse | disperse.eager-lock | If eager-lock is on, the lock remains in place either until lock contention is detected, or for 1 second in order to check if there is another request for that file from the same client. If eager-lock is off, locks release immediately after file operations complete, improving performance for some operations, but reducing access efficiency. | on | on/off
- | disperse.other-eager-lock | This option is equivalent to the disperse.eager-lock option but applicable only for non regular files. When multiple clients access a particular directory, disabling disperse.other-eager-lockoption for the volume can improve performance for directory access without compromising performance of I/O's for regular files. | off | on/off
- | disperse.shd-max-threads | Specifies the number of entries that can be self healed in parallel on each disperse subvolume by self-heal daemon. | 1 | 1 - 64
- | disperse.shd-wait-qlength | Specifies the number of entries that must be kept in the dispersed subvolume's queue for self-heal daemon threads to take up as soon as any of the threads are free to heal. This value should be changed based on how much memory self-heal daemon process can use for keeping the next set of entries that need to be healed. | 1024 | 1 - 655536
- | disprse.eager-lock-timeout | Maximum time (in seconds) that a lock on an inode is kept held if no new operations on the inode are received. | 1 | 1-60
- | disperse.other-eager-lock-timeout | It’s equivalent to eager-lock-timeout option but for non regular files. | 1 | 1-60
- | disperse.background-heals | This option can be used to control number of parallel heals running in background. | 8 | 0-256
- | disperse.heal-wait-qlength | This option can be used to control number of heals that can wait | 128 | 0-65536
- | disperse.read-policy | inode-read fops happen only on ‘k’ number of bricks in n=k+m disperse subvolume. ‘round-robin’ selects the read subvolume using round-robin algo. ‘gfid-hash’ selects read subvolume based on hash of the gfid of that file/directory. | gfid-hash | round-robin/gfid-hash
- | disperse.self-heal-window-size | Maximum number blocks(128KB) per file for which self-heal process would be applied simultaneously. | 1 | 1-1024
- | disperse.optimistic-change-log | This option Set/Unset dirty flag for every update fop at the start of the fop. If OFF, this option impacts performance of entry or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If ON and all the bricks are good, dirty flag will be set at the start only for file fops, For metadata and entry fops dirty flag will not be set at the start This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. |on | on/off
- | disperse.parallel-writes | This controls if writes can be wound in parallel as long as it doesn’t modify same stripes | on | on/off
- | disperse.stripe-cache | This option will keep the last stripe of write fop in memory. If next write falls in this stripe, we need not to read it again from backend and we can save READ fop going over the network. This will improve performance, specially for sequential writes. However, this will also lead to extra memory consumption, maximum (cache size * stripe size) Bytes per open file |4 | 0-10
- | disperse.quorum-count | This option can be used to define how many successes on the bricks constitute a success to the application. This count should be in the range [disperse-data-count, disperse-count] (inclusive) | 0 | 0-(signedint)
- | disperse.use-anonymous-inode | Setting this option heals renames efficiently | off | on/off
-Logging | diagnostics.brick-log-level | Changes the log-level of the bricks | INFO | DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
- | diagnostics.client-log-level | Changes the log-level of the clients. | INFO | DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
- | diagnostics.brick-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the brick log files. | CRITICAL | INFO/WARNING/ERROR/CRITICAL
- | diagnostics.client-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the client log files. | CRITICAL | INFO/WARNING/ERROR/CRITICAL
-| diagnostics.brick-log-format | Allows you to configure the log format to log either with a message id or without one on the brick. | with-msg-id | no-msg-id/with-msg-id
-| diagnostics.client-log-format | Allows you to configure the log format to log either with a message ID or without one on the client. | with-msg-id | no-msg-id/with-msg-id
- | diagnostics.brick-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the bricks.| 5 | 0 and 20 (0 and 20 included)
- | diagnostics.client-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the clients.| 5 | 0 and 20 (0 and 20 included)
- | diagnostics.brick-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the bricks. | 120 | 30 - 300 seconds (30 and 300 included)
- | diagnostics.client-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the clients. | 120 | 30 - 300 seconds (30 and 300 included)
-Performance | *features.trash | Enable/disable trash translator | off | on/off
- | *performance.readdir-ahead | Enable/disable readdir-ahead translator in the volume | off | on/off
- | *performance.read-ahead | Enable/disable read-ahead translator in the volume | off | on/off
- | *performance.io-cache | Enable/disable io-cache translator in the volume | off | on/off
- | performance.quick-read | To enable/disable quick-read translator in the volume. | on | off/on
- | performance.md-cache | Enables and disables md-cache translator. | off | off/on
- | performance.open-behind | Enables and disables open-behind translator. | on | off/on
- | performance.nl-cache | Enables and disables nl-cache translator. | off | off/on
- | performance.stat-prefetch | Enables and disables stat-prefetch translator. | on | off/on
- | performance.client-io-threads | Enables and disables client-io-thread translator. | on | off/on
- | performance.write-behind | Enables and disables write-behind translator. | on | off/on
- | performance.write-behind-window-size | Size of the per-file write-behind buffer. | 1MB | Write-behind cache size
- | performance.io-thread-count | The number of threads in IO threads translator. | 16 | 1-64
- | performance.flush-behind | If this option is set ON, instructs write-behind translator to perform flush in background, by returning success (or any errors, if any of previous writes were failed) to application even before flush is sent to backend filesystem. | On | On/Off
- | performance.cache-max-file-size | Sets the maximum file size cached by the io-cache translator. Can use the normal size descriptors of KB, MB, GB,TB or PB (for example, 6GB). Maximum size uint64. | 2 ^ 64 -1 bytes | size in bytes
- | performance.cache-min-file-size | Sets the minimum file size cached by the io-cache translator. Values same as "max" above | 0B | size in bytes
- | performance.cache-refresh-timeout | The cached data for a file will be retained till 'cache-refresh-timeout' seconds, after which data re-validation is performed. | 1s | 0-61
- | performance.cache-size | Size of the read cache. | 32 MB | size in bytes
- | performance.lazy-open | This option requires open-behind to be on. Perform an open in the backend only when a necessary FOP arrives (for example, write on the file descriptor, unlink of the file). When this option is disabled, perform backend open immediately after an unwinding open. | Yes | Yes/No
- | performance.md-cache-timeout | The time period in seconds which controls when metadata cache has to be refreshed. If the age of cache is greater than this time-period, it is refreshed. Every time cache is refreshed, its age is reset to 0. | 1 | 0-600 seconds
- | performance.nfs-strict-write-ordering | Specifies whether to prevent later writes from overtaking earlier writes for NFS, even if the writes do not relate to the same files or locations. | off | on/off
- | performance.nfs.flush-behind | Specifies whether the write-behind translator performs flush operations in the background for NFS by returning (false) success to the application before flush file operations are sent to the backend file system. | on | on/off
- | performance.nfs.strict-o-direct | Specifies whether to attempt to minimize the cache effects of I/O for a file on NFS. When this option is enabled and a file descriptor is opened using the O_DIRECT flag, write-back caching is disabled for writes that affect that file descriptor. When this option is disabled, O_DIRECT has no effect on caching. This option is ignored if performance.write-behind is disabled. | off | on/off
- | performance.nfs.write-behind-trickling-writes | Enables and disables trickling-write strategy for the write-behind translator for NFS clients. | on | off/on
- | performance.nfs.write-behind-window-size | Specifies the size of the write-behind buffer for a single file or inode for NFS. | 1 MB | 512 KB - 1 GB
- | performance.rda-cache-limit | The value specified for this option is the maximum size of cache consumed by the readdir-ahead translator. This value is global and the total memory consumption by readdir-ahead is capped by this value, irrespective of the number/size of directories cached. | 10MB | 0-1GB
- | performance.rda-request-size | The value specified for this option will be the size of buffer holding directory entries in readdirp response. | 128KB | 4KB-128KB
- | performance.resync-failed-syncs-after-fsync | If syncing cached writes that were issued before an fsync operation fails, this option configures whether to reattempt the failed sync operations. |off | on/off
- | performance.strict-o-direct | Specifies whether to attempt to minimize the cache effects of I/O for a file. When this option is enabled and a file descriptor is opened using the O_DIRECT flag, write-back caching is disabled for writes that affect that file descriptor. When this option is disabled, O_DIRECT has no effect on caching. This option is ignored if performance.write-behind is disabled. | on | on/off
- | performance.strict-write-ordering | Specifies whether to prevent later writes from overtaking earlier writes, even if the writes do not relate to the same files or locations. | on | on/off
- | performance.use-anonymous-fd | This option requires open-behind to be on. For read operations, use anonymous file descriptor when the original file descriptor is open-behind and not yet opened in the backend.| Yes | No/Yes
- | performance.write-behind-trickling-writes | Enables and disables trickling-write strategy for the write-behind translator for FUSE clients. | on | off/on
- | performance.write-behind-window-size | Specifies the size of the write-behind buffer for a single file or inode. | 1MB | 512 KB - 1 GB
- | features.read-only | Enables you to mount the entire volume as read-only for all the clients (including NFS clients) accessing it. | Off | On/Off
- | features.quota-deem-statfs | When this option is set to on, it takes the quota limits into consideration while estimating the filesystem size. The limit will be treated as the total size instead of the actual size of filesystem. | on | on/off
- | features.shard | Enables or disables sharding on the volume. Affects files created after volume configuration. | disable | enable/disable
- | features.shard-block-size | Specifies the maximum size of file pieces when sharding is enabled. Affects files created after volume configuration. | 64MB | 4MB-4TB
- | features.uss | This option enable/disable User Serviceable Snapshots on the volume. | off | on/off
- | geo-replication.indexing | Use this option to automatically sync the changes in the filesystem from Primary to Secondary. | Off | On/Off
- | network.frame-timeout | The time frame after which the operation has to be declared as dead, if the server does not respond for a particular operation. | 1800 (30 mins) | 1800 secs
- | network.ping-timeout | The time duration for which the client waits to check if the server is responsive. When a ping timeout happens, there is a network disconnect between the client and server. All resources held by server on behalf of the client get cleaned up. When a reconnection happens, all resources will need to be re-acquired before the client can resume its operations on the server. Additionally, the locks will be acquired and the lock tables updated. This reconnect is a very expensive operation and should be avoided. | 42 Secs | 42 Secs
-nfs | nfs.enable-ino32 | For 32-bit nfs clients or applications that do not support 64-bit inode numbers or large files, use this option from the CLI to make Gluster NFS return 32-bit inode numbers instead of 64-bit inode numbers. | Off | On/Off
- | nfs.volume-access | Set the access type for the specified sub-volume. | read-write | read-write/read-only
- | nfs.trusted-write | If there is an UNSTABLE write from the client, STABLE flag will be returned to force the client to not send a COMMIT request. In some environments, combined with a replicated GlusterFS setup, this option can improve write performance. This flag allows users to trust Gluster replication logic to sync data to the disks and recover when required. COMMIT requests if received will be handled in a default manner by fsyncing. STABLE writes are still handled in a sync manner. | Off | On/Off
- | nfs.trusted-sync | All writes and COMMIT requests are treated as async. This implies that no write requests are guaranteed to be on server disks when the write reply is received at the NFS client. Trusted sync includes trusted-write behavior. | Off | On/Off
- | nfs.export-dir | This option can be used to export specified comma separated subdirectories in the volume. The path must be an absolute path. Along with path allowed list of IPs/hostname can be associated with each subdirectory. If provided connection will allowed only from these IPs. Format: \[(hostspec[hostspec...])][,...]. Where hostspec can be an IP address, hostname or an IP range in CIDR notation. **Note**: Care must be taken while configuring this option as invalid entries and/or unreachable DNS servers can introduce unwanted delay in all the mount calls. | No sub directory exported. | Absolute path with allowed list of IP/hostname
- | nfs.export-volumes | Enable/Disable exporting entire volumes, instead if used in conjunction with nfs3.export-dir, can allow setting up only subdirectories as exports. | On | On/Off
- | nfs.rpc-auth-unix | Enable/Disable the AUTH_UNIX authentication type. This option is enabled by default for better interoperability. However, you can disable it if required. | On | On/Off
- | nfs.rpc-auth-null | Enable/Disable the AUTH_NULL authentication type. It is not recommended to change the default value for this option. | On | On/Off
- | nfs.rpc-auth-allow\ | Allow a comma separated list of addresses and/or hostnames to connect to the server. By default, all clients are disallowed. This allows you to define a general rule for all exported volumes. | Reject All | IP address or Host name
- | nfs.rpc-auth-reject\ | Reject a comma separated list of addresses and/or hostnames from connecting to the server. By default, all connections are disallowed. This allows you to define a general rule for all exported volumes. | Reject All | IP address or Host name
- | nfs.ports-insecure | Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option. | Off | On/Off
- | nfs.addr-namelookup | Turn-off name lookup for incoming client connections using this option. In some setups, the name server can take too long to reply to DNS queries resulting in timeouts of mount requests. Use this option to turn off name lookups during address authentication. Note, turning this off will prevent you from using hostnames in rpc-auth.addr.* filters. | On | On/Off
- | nfs.register-with-portmap |For systems that need to run multiple NFS servers, you need to prevent more than one from registering with portmap service. Use this option to turn off portmap registration for Gluster NFS. | On | On/Off
- | nfs.port \ | Use this option on systems that need Gluster NFS to be associated with a non-default port number. | NA | 38465-38467
- | nfs.disable | Turn-off volume being exported by NFS | Off | On/Off
-Server | server.allow-insecure | Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option.| On | On/Off
- | server.statedump-path | Location of the state dump file. | tmp directory of the brick | New directory path
- | server.allow-insecure | Allows FUSE-based client connections from unprivileged ports.By default, this is enabled, meaning that ports can accept and reject messages from insecure ports. When disabled, only privileged ports are allowed. | on | on/off
- | server.anongid | Value of the GID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root GID (that is 0) are changed to have the GID of the anonymous user. | 65534 (this UID is also known as nfsnobody) | 0 - 4294967295
- | server.anonuid | Value of the UID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root UID (that is 0) are changed to have the UID of the anonymous user. | 65534 (this UID is also known as nfsnobody) | 0 - 4294967295
- | server.event-threads | Specifies the number of event threads to execute in parallel. Larger values would help process responses faster, depending on available processing power. | 2 | 1-1024
- | server.gid-timeout | The time period in seconds which controls when cached groups has to expire. This is the cache that contains the groups (GIDs) where a specified user (UID) belongs to. This option is used only when server.manage-gids is enabled.| 2 | 0-4294967295 seconds
- | server.manage-gids | Resolve groups on the server-side. By enabling this option, the groups (GIDs) a user (UID) belongs to gets resolved on the server, instead of using the groups that were send in the RPC Call by the client. This option makes it possible to apply permission checks for users that belong to bigger group lists than the protocol supports (approximately 93). | off | on/off
- | server.root-squash | Prevents root users from having root privileges, and instead assigns them the privileges of nfsnobody. This squashes the power of the root users, preventing unauthorized modification of files on the Red Hat Gluster Storage servers. This option is used only for glusterFS NFS protocol. | off | on/off
- | server.statedump-path | Specifies the directory in which the statedumpfiles must be stored. | path to directory | /var/run/gluster (for a default installation)
-Storage | storage.health-check-interval | Number of seconds between health-checks done on the filesystem that is used for the brick(s). Defaults to 30 seconds, set to 0 to disable. | tmp directory of the brick | New directory path
- | storage.linux-io_uring | Enable/Disable io_uring based I/O at the posix xlator on the bricks. | Off | On/Off
- | storage.fips-mode-rchecksum | If enabled, posix_rchecksum uses the FIPS compliant SHA256 checksum, else it uses MD5. | on | on/ off
- | storage.create-mask | Maximum set (upper limit) of permission for the files that will be created. | 0777 | 0000 - 0777
- | storage.create-directory-mask | Maximum set (upper limit) of permission for the directories that will be created. | 0777 | 0000 - 0777
- | storage.force-create-mode | Minimum set (lower limit) of permission for the files that will be created. | 0000 | 0000 - 0777
- | storage.force-create-directory | Minimum set (lower limit) of permission for the directories that will be created. | 0000 | 0000 - 0777
- | storage.health-check-interval | Sets the time interval in seconds for a filesystem health check. You can set it to 0 to disable. | 30 seconds | 0-4294967295 seconds
- | storage.reserve | To reserve storage space at the brick. This option accepts size in form of MB and also in form of percentage. If user has configured the storage.reserve option using size in MB earlier, and then wants to give the size in percentage, it can be done using the same option. Also, the newest set value is considered, if it was in MB before and then if it sent in percentage, the percentage value becomes new value and the older one is over-written | 1 (1% of the brick size) | 0-100
+| Type | Option | Description | Default Value | Available Options |
+| --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | --------------------------------------- |
+| auth.allow | IP addresses of the clients which should be allowed to access the volume. | \* (allow all) | Valid IP address which includes wild card patterns including \*, such as 192.168.1.\* |
+| auth.reject | IP addresses of the clients which should be denied to access the volume. | NONE (reject none) | Valid IP address which includes wild card patterns including \*, such as 192.168.2.\* |
+| Cluster | cluster.self-heal-window-size | Specifies the maximum number of blocks per file on which self-heal would happen simultaneously. | 1 | 0 - 1024 blocks |
+| cluster.data-self-heal-algorithm | Specifies the type of self-heal. If you set the option as "full", the entire file is copied from source to destinations. If the option is set to "diff" the file blocks that are not in sync are copied to destinations. Reset uses a heuristic model. If the file does not exist on one of the subvolumes, or a zero-byte file exists (created by entry self-heal) the entire content has to be copied anyway, so there is no benefit from using the "diff" algorithm. If the file size is about the same as page size, the entire file can be read and written with a few operations, which will be faster than "diff" which has to read checksums and then read and write. | reset | full/diff/reset |
+| cluster.min-free-disk | Specifies the percentage of disk space that must be kept free. Might be useful for non-uniform bricks | 10% | Percentage of required minimum free disk space |
+| cluster.min-free-inodes | Specifies when system has only N% of inodes remaining, warnings starts to appear in log files | 10% | Percentage of required minimum free inodes |
+| cluster.stripe-block-size | Specifies the size of the stripe unit that will be read from or written to. | 128 KB (for all files) | size in bytes |
+| cluster.self-heal-daemon | Allows you to turn-off proactive self-heal on replicated | On | On/Off |
+| cluster.ensure-durability | This option makes sure the data/metadata is durable across abrupt shutdown of the brick. | On | On/Off |
+| cluster.lookup-unhashed | This option does a lookup through all the sub-volumes, in case a lookup didn’t return any result from the hashed subvolume. If set to OFF, it does not do a lookup on the remaining subvolumes. | on | auto, yes/no, enable/disable, 1/0, on/off |
+| cluster.lookup-optimize | This option enables the optimization of -ve lookups, by not doing a lookup on non-hashed subvolumes for files, in case the hashed subvolume does not return any result. This option disregards the lookup-unhashed setting, when enabled. | on | on/off |
+| cluster.randomize-hash-range-by-gfid | Allows to use gfid of directory to determine the subvolume from which hash ranges are allocated starting with 0. Note that we still use a directory/file’s name to determine the subvolume to which it hashes | off | on/off |
+| cluster.rebal-throttle | Sets the maximum number of parallel file migrations allowed on a node during the rebalance operation. The default value is normal and allows 2 files to be migrated at a time. Lazy will allow only one file to be migrated at a time and aggressive will allow maxof[(((processing units) - 4) / 2), 4] | normal | lazy/normal/aggressive |
+| cluster.background-self-heal-count | Specifies the number of per client self-heal jobs that can perform parallel heals in the background. | 8 | 0-256 |
+| cluster.heal-timeout | Time interval for checking the need to self-heal in self-heal-daemon | 600 | 5-(signed-int) |
+| cluster.eager-lock | If eager-lock is off, locks release immediately after file operations complete, improving performance for some operations, but reducing access efficiency | on | on/off |
+| cluster.quorum-type | If value is “fixed” only allow writes if quorum-count bricks are present. If value is “auto” only allow writes if more than half of bricks, or exactly half including the first brick, are present | none | none/auto/fixed |
+| cluster.quorum-count | If quorum-type is “fixed” only allow writes if this many bricks are present. Other quorum types will OVERWRITE this value | null | 1-(signed-int) |
+| cluster.heal-wait-queue-length | Specifies the number of heals that can be queued for the parallel background self heal jobs. | 128 | 0-10000 |
+| cluster.favorite-child-policy | Specifies which policy can be used to automatically resolve split-brains without user intervention. “size” picks the file with the biggest size as the source. “ctime” and “mtime” pick the file with the latest ctime and mtime respectively as the source. “majority” picks a file with identical mtime and size in more than half the number of bricks in the replica. | none | none/size/ctime/mtime/majority |
+| cluster.use-anonymous-inode | Setting this option heals directory renames efficiently | no | no/yes |
+| Disperse | disperse.eager-lock | If eager-lock is on, the lock remains in place either until lock contention is detected, or for 1 second in order to check if there is another request for that file from the same client. If eager-lock is off, locks release immediately after file operations complete, improving performance for some operations, but reducing access efficiency. | on | on/off |
+| disperse.other-eager-lock | This option is equivalent to the disperse.eager-lock option but applicable only for non regular files. When multiple clients access a particular directory, disabling disperse.other-eager-lockoption for the volume can improve performance for directory access without compromising performance of I/O's for regular files. | off | on/off |
+| disperse.shd-max-threads | Specifies the number of entries that can be self healed in parallel on each disperse subvolume by self-heal daemon. | 1 | 1 - 64 |
+| disperse.shd-wait-qlength | Specifies the number of entries that must be kept in the dispersed subvolume's queue for self-heal daemon threads to take up as soon as any of the threads are free to heal. This value should be changed based on how much memory self-heal daemon process can use for keeping the next set of entries that need to be healed. | 1024 | 1 - 655536 |
+| disprse.eager-lock-timeout | Maximum time (in seconds) that a lock on an inode is kept held if no new operations on the inode are received. | 1 | 1-60 |
+| disperse.other-eager-lock-timeout | It’s equivalent to eager-lock-timeout option but for non regular files. | 1 | 1-60 |
+| disperse.background-heals | This option can be used to control number of parallel heals running in background. | 8 | 0-256 |
+| disperse.heal-wait-qlength | This option can be used to control number of heals that can wait | 128 | 0-65536 |
+| disperse.read-policy | inode-read fops happen only on ‘k’ number of bricks in n=k+m disperse subvolume. ‘round-robin’ selects the read subvolume using round-robin algo. ‘gfid-hash’ selects read subvolume based on hash of the gfid of that file/directory. | gfid-hash | round-robin/gfid-hash |
+| disperse.self-heal-window-size | Maximum number blocks(128KB) per file for which self-heal process would be applied simultaneously. | 1 | 1-1024 |
+| disperse.optimistic-change-log | This option Set/Unset dirty flag for every update fop at the start of the fop. If OFF, this option impacts performance of entry or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If ON and all the bricks are good, dirty flag will be set at the start only for file fops, For metadata and entry fops dirty flag will not be set at the start This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. | on | on/off |
+| disperse.parallel-writes | This controls if writes can be wound in parallel as long as it doesn’t modify same stripes | on | on/off |
+| disperse.stripe-cache | This option will keep the last stripe of write fop in memory. If next write falls in this stripe, we need not to read it again from backend and we can save READ fop going over the network. This will improve performance, specially for sequential writes. However, this will also lead to extra memory consumption, maximum (cache size \* stripe size) Bytes per open file | 4 | 0-10 |
+| disperse.quorum-count | This option can be used to define how many successes on the bricks constitute a success to the application. This count should be in the range [disperse-data-count, disperse-count] (inclusive) | 0 | 0-(signedint) |
+| disperse.use-anonymous-inode | Setting this option heals renames efficiently | off | on/off |
+| Logging | diagnostics.brick-log-level | Changes the log-level of the bricks | INFO | DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE |
+| diagnostics.client-log-level | Changes the log-level of the clients. | INFO | DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE |
+| diagnostics.brick-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the brick log files. | CRITICAL | INFO/WARNING/ERROR/CRITICAL |
+| diagnostics.client-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the client log files. | CRITICAL | INFO/WARNING/ERROR/CRITICAL |
+| diagnostics.brick-log-format | Allows you to configure the log format to log either with a message id or without one on the brick. | with-msg-id | no-msg-id/with-msg-id |
+| diagnostics.client-log-format | Allows you to configure the log format to log either with a message ID or without one on the client. | with-msg-id | no-msg-id/with-msg-id |
+| diagnostics.brick-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the bricks. | 5 | 0 and 20 (0 and 20 included) |
+| diagnostics.client-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the clients. | 5 | 0 and 20 (0 and 20 included) |
+| diagnostics.brick-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the bricks. | 120 | 30 - 300 seconds (30 and 300 included) |
+| diagnostics.client-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the clients. | 120 | 30 - 300 seconds (30 and 300 included) |
+| Performance | \*features.trash | Enable/disable trash translator | off | on/off |
+| \*performance.readdir-ahead | Enable/disable readdir-ahead translator in the volume | off | on/off |
+| \*performance.read-ahead | Enable/disable read-ahead translator in the volume | off | on/off |
+| \*performance.io-cache | Enable/disable io-cache translator in the volume | off | on/off |
+| performance.quick-read | To enable/disable quick-read translator in the volume. | on | off/on |
+| performance.md-cache | Enables and disables md-cache translator. | off | off/on |
+| performance.open-behind | Enables and disables open-behind translator. | on | off/on |
+| performance.nl-cache | Enables and disables nl-cache translator. | off | off/on |
+| performance.stat-prefetch | Enables and disables stat-prefetch translator. | on | off/on |
+| performance.client-io-threads | Enables and disables client-io-thread translator. | on | off/on |
+| performance.write-behind | Enables and disables write-behind translator. | on | off/on |
+| performance.write-behind-window-size | Size of the per-file write-behind buffer. | 1MB | Write-behind cache size |
+| performance.io-thread-count | The number of threads in IO threads translator. | 16 | 1-64 |
+| performance.flush-behind | If this option is set ON, instructs write-behind translator to perform flush in background, by returning success (or any errors, if any of previous writes were failed) to application even before flush is sent to backend filesystem. | On | On/Off |
+| performance.cache-max-file-size | Sets the maximum file size cached by the io-cache translator. Can use the normal size descriptors of KB, MB, GB,TB or PB (for example, 6GB). Maximum size uint64. | 2 ^ 64 -1 bytes | size in bytes |
+| performance.cache-min-file-size | Sets the minimum file size cached by the io-cache translator. Values same as "max" above | 0B | size in bytes |
+| performance.cache-refresh-timeout | The cached data for a file will be retained till 'cache-refresh-timeout' seconds, after which data re-validation is performed. | 1s | 0-61 |
+| performance.cache-size | Size of the read cache. | 32 MB | size in bytes |
+| performance.lazy-open | This option requires open-behind to be on. Perform an open in the backend only when a necessary FOP arrives (for example, write on the file descriptor, unlink of the file). When this option is disabled, perform backend open immediately after an unwinding open. | Yes | Yes/No |
+| performance.md-cache-timeout | The time period in seconds which controls when metadata cache has to be refreshed. If the age of cache is greater than this time-period, it is refreshed. Every time cache is refreshed, its age is reset to 0. | 1 | 0-600 seconds |
+| performance.nfs-strict-write-ordering | Specifies whether to prevent later writes from overtaking earlier writes for NFS, even if the writes do not relate to the same files or locations. | off | on/off |
+| performance.nfs.flush-behind | Specifies whether the write-behind translator performs flush operations in the background for NFS by returning (false) success to the application before flush file operations are sent to the backend file system. | on | on/off |
+| performance.nfs.strict-o-direct | Specifies whether to attempt to minimize the cache effects of I/O for a file on NFS. When this option is enabled and a file descriptor is opened using the O_DIRECT flag, write-back caching is disabled for writes that affect that file descriptor. When this option is disabled, O_DIRECT has no effect on caching. This option is ignored if performance.write-behind is disabled. | off | on/off |
+| performance.nfs.write-behind-trickling-writes | Enables and disables trickling-write strategy for the write-behind translator for NFS clients. | on | off/on |
+| performance.nfs.write-behind-window-size | Specifies the size of the write-behind buffer for a single file or inode for NFS. | 1 MB | 512 KB - 1 GB |
+| performance.rda-cache-limit | The value specified for this option is the maximum size of cache consumed by the readdir-ahead translator. This value is global and the total memory consumption by readdir-ahead is capped by this value, irrespective of the number/size of directories cached. | 10MB | 0-1GB |
+| performance.rda-request-size | The value specified for this option will be the size of buffer holding directory entries in readdirp response. | 128KB | 4KB-128KB |
+| performance.resync-failed-syncs-after-fsync | If syncing cached writes that were issued before an fsync operation fails, this option configures whether to reattempt the failed sync operations. | off | on/off |
+| performance.strict-o-direct | Specifies whether to attempt to minimize the cache effects of I/O for a file. When this option is enabled and a file descriptor is opened using the O_DIRECT flag, write-back caching is disabled for writes that affect that file descriptor. When this option is disabled, O_DIRECT has no effect on caching. This option is ignored if performance.write-behind is disabled. | on | on/off |
+| performance.strict-write-ordering | Specifies whether to prevent later writes from overtaking earlier writes, even if the writes do not relate to the same files or locations. | on | on/off |
+| performance.use-anonymous-fd | This option requires open-behind to be on. For read operations, use anonymous file descriptor when the original file descriptor is open-behind and not yet opened in the backend. | Yes | No/Yes |
+| performance.write-behind-trickling-writes | Enables and disables trickling-write strategy for the write-behind translator for FUSE clients. | on | off/on |
+| performance.write-behind-window-size | Specifies the size of the write-behind buffer for a single file or inode. | 1MB | 512 KB - 1 GB |
+| features.read-only | Enables you to mount the entire volume as read-only for all the clients (including NFS clients) accessing it. | Off | On/Off |
+| features.quota-deem-statfs | When this option is set to on, it takes the quota limits into consideration while estimating the filesystem size. The limit will be treated as the total size instead of the actual size of filesystem. | on | on/off |
+| features.shard | Enables or disables sharding on the volume. Affects files created after volume configuration. | disable | enable/disable |
+| features.shard-block-size | Specifies the maximum size of file pieces when sharding is enabled. Affects files created after volume configuration. | 64MB | 4MB-4TB |
+| features.uss | This option enable/disable User Serviceable Snapshots on the volume. | off | on/off |
+| geo-replication.indexing | Use this option to automatically sync the changes in the filesystem from Primary to Secondary. | Off | On/Off |
+| network.frame-timeout | The time frame after which the operation has to be declared as dead, if the server does not respond for a particular operation. | 1800 (30 mins) | 1800 secs |
+| network.ping-timeout | The time duration for which the client waits to check if the server is responsive. When a ping timeout happens, there is a network disconnect between the client and server. All resources held by server on behalf of the client get cleaned up. When a reconnection happens, all resources will need to be re-acquired before the client can resume its operations on the server. Additionally, the locks will be acquired and the lock tables updated. This reconnect is a very expensive operation and should be avoided. | 42 Secs | 42 Secs |
+| nfs | nfs.enable-ino32 | For 32-bit nfs clients or applications that do not support 64-bit inode numbers or large files, use this option from the CLI to make Gluster NFS return 32-bit inode numbers instead of 64-bit inode numbers. | Off | On/Off |
+| nfs.volume-access | Set the access type for the specified sub-volume. | read-write | read-write/read-only |
+| nfs.trusted-write | If there is an UNSTABLE write from the client, STABLE flag will be returned to force the client to not send a COMMIT request. In some environments, combined with a replicated GlusterFS setup, this option can improve write performance. This flag allows users to trust Gluster replication logic to sync data to the disks and recover when required. COMMIT requests if received will be handled in a default manner by fsyncing. STABLE writes are still handled in a sync manner. | Off | On/Off |
+| nfs.trusted-sync | All writes and COMMIT requests are treated as async. This implies that no write requests are guaranteed to be on server disks when the write reply is received at the NFS client. Trusted sync includes trusted-write behavior. | Off | On/Off |
+| nfs.export-dir | This option can be used to export specified comma separated subdirectories in the volume. The path must be an absolute path. Along with path allowed list of IPs/hostname can be associated with each subdirectory. If provided connection will allowed only from these IPs. Format: \[(hostspec[hostspec...])][,...]. Where hostspec can be an IP address, hostname or an IP range in CIDR notation. **Note**: Care must be taken while configuring this option as invalid entries and/or unreachable DNS servers can introduce unwanted delay in all the mount calls. | No sub directory exported. | Absolute path with allowed list of IP/hostname |
+| nfs.export-volumes | Enable/Disable exporting entire volumes, instead if used in conjunction with nfs3.export-dir, can allow setting up only subdirectories as exports. | On | On/Off |
+| nfs.rpc-auth-unix | Enable/Disable the AUTH_UNIX authentication type. This option is enabled by default for better interoperability. However, you can disable it if required. | On | On/Off |
+| nfs.rpc-auth-null | Enable/Disable the AUTH_NULL authentication type. It is not recommended to change the default value for this option. | On | On/Off |
+| nfs.rpc-auth-allow\ | Allow a comma separated list of addresses and/or hostnames to connect to the server. By default, all clients are disallowed. This allows you to define a general rule for all exported volumes. | Reject All | IP address or Host name |
+| nfs.rpc-auth-reject\ | Reject a comma separated list of addresses and/or hostnames from connecting to the server. By default, all connections are disallowed. This allows you to define a general rule for all exported volumes. | Reject All | IP address or Host name |
+| nfs.ports-insecure | Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option. | Off | On/Off |
+| nfs.addr-namelookup | Turn-off name lookup for incoming client connections using this option. In some setups, the name server can take too long to reply to DNS queries resulting in timeouts of mount requests. Use this option to turn off name lookups during address authentication. Note, turning this off will prevent you from using hostnames in rpc-auth.addr.\* filters. | On | On/Off |
+| nfs.register-with-portmap | For systems that need to run multiple NFS servers, you need to prevent more than one from registering with portmap service. Use this option to turn off portmap registration for Gluster NFS. | On | On/Off |
+| nfs.port \ | Use this option on systems that need Gluster NFS to be associated with a non-default port number. | NA | 38465-38467 |
+| nfs.disable | Turn-off volume being exported by NFS | Off | On/Off |
+| Server | server.allow-insecure | Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option. | On | On/Off |
+| server.statedump-path | Location of the state dump file. | tmp directory of the brick | New directory path |
+| server.allow-insecure | Allows FUSE-based client connections from unprivileged ports.By default, this is enabled, meaning that ports can accept and reject messages from insecure ports. When disabled, only privileged ports are allowed. | on | on/off |
+| server.anongid | Value of the GID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root GID (that is 0) are changed to have the GID of the anonymous user. | 65534 (this UID is also known as nfsnobody) | 0 - 4294967295 |
+| server.anonuid | Value of the UID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root UID (that is 0) are changed to have the UID of the anonymous user. | 65534 (this UID is also known as nfsnobody) | 0 - 4294967295 |
+| server.event-threads | Specifies the number of event threads to execute in parallel. Larger values would help process responses faster, depending on available processing power. | 2 | 1-1024 |
+| server.gid-timeout | The time period in seconds which controls when cached groups has to expire. This is the cache that contains the groups (GIDs) where a specified user (UID) belongs to. This option is used only when server.manage-gids is enabled. | 2 | 0-4294967295 seconds |
+| server.manage-gids | Resolve groups on the server-side. By enabling this option, the groups (GIDs) a user (UID) belongs to gets resolved on the server, instead of using the groups that were send in the RPC Call by the client. This option makes it possible to apply permission checks for users that belong to bigger group lists than the protocol supports (approximately 93). | off | on/off |
+| server.root-squash | Prevents root users from having root privileges, and instead assigns them the privileges of nfsnobody. This squashes the power of the root users, preventing unauthorized modification of files on the Red Hat Gluster Storage servers. This option is used only for glusterFS NFS protocol. | off | on/off |
+| server.statedump-path | Specifies the directory in which the statedumpfiles must be stored. | path to directory | /var/run/gluster (for a default installation) |
+| Storage | storage.health-check-interval | Number of seconds between health-checks done on the filesystem that is used for the brick(s). Defaults to 30 seconds, set to 0 to disable. | tmp directory of the brick | New directory path |
+| storage.linux-io_uring | Enable/Disable io_uring based I/O at the posix xlator on the bricks. | Off | On/Off |
+| storage.fips-mode-rchecksum | If enabled, posix_rchecksum uses the FIPS compliant SHA256 checksum, else it uses MD5. | on | on/ off |
+| storage.create-mask | Maximum set (upper limit) of permission for the files that will be created. | 0777 | 0000 - 0777 |
+| storage.create-directory-mask | Maximum set (upper limit) of permission for the directories that will be created. | 0777 | 0000 - 0777 |
+| storage.force-create-mode | Minimum set (lower limit) of permission for the files that will be created. | 0000 | 0000 - 0777 |
+| storage.force-create-directory | Minimum set (lower limit) of permission for the directories that will be created. | 0000 | 0000 - 0777 |
+| storage.health-check-interval | Sets the time interval in seconds for a filesystem health check. You can set it to 0 to disable. | 30 seconds | 0-4294967295 seconds |
+| storage.reserve | To reserve storage space at the brick. This option accepts size in form of MB and also in form of percentage. If user has configured the storage.reserve option using size in MB earlier, and then wants to give the size in percentage, it can be done using the same option. Also, the newest set value is considered, if it was in MB before and then if it sent in percentage, the percentage value becomes new value and the older one is over-written | 1 (1% of the brick size) | 0-100 |
-> **Note**
+> **Note**
>
-> We've found few performance xlators, options marked with * in above table have been causing more performance regression than improving. These xlators should be turned off for volumes.
+> We've found few performance xlators, options marked with \* in above table have been causing more performance regression than improving. These xlators should be turned off for volumes.
diff --git a/docs/Administrator-Guide/io_uring.md b/docs/Administrator-Guide/io_uring.md
index b3de11f..15da3a3 100644
--- a/docs/Administrator-Guide/io_uring.md
+++ b/docs/Administrator-Guide/io_uring.md
@@ -1,17 +1,19 @@
# io_uring support in gluster
io_uring is an asynchronous I/O interface similar to linux-aio, but aims to be more performant.
-Refer https://kernel.dk/io_uring.pdf and https://kernel-recipes.org/en/2019/talks/faster-io-through-io_uring/ for more details.
+Refer [https://kernel.dk/io_uring.pdf](https://kernel.dk/io_uring.pdf) and [https://kernel-recipes.org/en/2019/talks/faster-io-through-io_uring](https://kernel-recipes.org/en/2019/talks/faster-io-through-io_uring) for more details.
-Incorporating io_uring in various layers of gluster is an ongoing activity but beginning with glusterfs-9.0, support has been added to the posix translator via the ```storage.linux-io_uring``` volume option. When this option is enabled, the posix translator in the glusterfs brick process (at the server side) will use io_uring calls for reads, writes and fsyncs as opposed to the normal pread/pwrite based syscalls.
+Incorporating io_uring in various layers of gluster is an ongoing activity but beginning with glusterfs-9.0, support has been added to the posix translator via the `storage.linux-io_uring` volume option. When this option is enabled, the posix translator in the glusterfs brick process (at the server side) will use io_uring calls for reads, writes and fsyncs as opposed to the normal pread/pwrite based syscalls.
#### Example:
- [server~]# gluster volume set testvol storage.linux-io_uring on
- volume set: success
- [server~]#
- [server~]# gluster volume set testvol storage.linux-io_uring off
- volume set: success
+```{ .console .no-copy }
+# gluster volume set testvol storage.linux-io_uring on
+volume set: success
+
+# gluster volume set testvol storage.linux-io_uring off
+volume set: success
+```
This option can be enabled/disabled only when the volume is not running.
-i.e. you can toggle the option when the volume is `Created` or is `Stopped` as indicated in ```gluster volume status $VOLNAME```
+i.e. you can toggle the option when the volume is `Created` or is `Stopped` as indicated in `gluster volume status $VOLNAME`
diff --git a/docs/Administrator-Guide/overview.md b/docs/Administrator-Guide/overview.md
index 8da93cd..300069a 100644
--- a/docs/Administrator-Guide/overview.md
+++ b/docs/Administrator-Guide/overview.md
@@ -1,6 +1,5 @@
### Overview
-
The Administration guide covers day to day management tasks as well as advanced configuration methods for your Gluster setup.
You can manage your Gluster cluster using the [Gluster CLI](../CLI-Reference/cli-main.md)
diff --git a/docs/Administrator-Guide/setting-up-storage.md b/docs/Administrator-Guide/setting-up-storage.md
index 1b7affc..60b9469 100644
--- a/docs/Administrator-Guide/setting-up-storage.md
+++ b/docs/Administrator-Guide/setting-up-storage.md
@@ -3,7 +3,6 @@
A volume is a logical collection of bricks where each brick is an export directory on a server in the trusted storage pool.
Before creating a volume, you need to set up the bricks that will form the volume.
-
- - [Brick Naming Conventions](./Brick-Naming-Conventions.md)
- - [Formatting and Mounting Bricks](./formatting-and-mounting-bricks.md)
- - [Posix ACLS](./Access-Control-Lists.md)
+- [Brick Naming Conventions](./Brick-Naming-Conventions.md)
+- [Formatting and Mounting Bricks](./formatting-and-mounting-bricks.md)
+- [Posix ACLS](./Access-Control-Lists.md)