1
0
mirror of https://github.com/gluster/glusterdocs.git synced 2026-02-05 15:47:01 +01:00

Merge branch 'main' into install-new-sntx

This commit is contained in:
Niraj Kumar Yadav
2022-07-25 16:09:18 +05:30
committed by GitHub
161 changed files with 2984 additions and 2690 deletions

View File

@@ -1,46 +1,55 @@
# Split brain and the ways to deal with it
### Split brain:
Split brain is a situation where two or more replicated copies of a file become divergent. When a file is in split brain, there is an inconsistency in either data or metadata of the file amongst the bricks of a replica and do not have enough information to authoritatively pick a copy as being pristine and heal the bad copies, despite all bricks being up and online. For a directory, there is also an entry split brain where a file inside it can have different gfid/file-type across the bricks of a replica. Split brain can happen mainly because of 2 reasons:
1. Due to network disconnect:
Where a client temporarily loses connection to the bricks.
Split brain is a situation where two or more replicated copies of a file become divergent. When a file is in split brain, there is an inconsistency in either data or metadata of the file amongst the bricks of a replica and do not have enough information to authoritatively pick a copy as being pristine and heal the bad copies, despite all bricks being up and online. For a directory, there is also an entry split brain where a file inside it can have different gfid/file-type across the bricks of a replica.
Split brain can happen mainly because of 2 reasons:
1. Due to network disconnect, where a client temporarily loses connection to the bricks.
- There is a replica pair of 2 bricks, brick1 on server1 and brick2 on server2.
- Client1 loses connection to brick2 and client2 loses connection to brick1 due to network split.
- Writes from client1 goes to brick1 and from client2 goes to brick2, which is nothing but split-brain.
2. Gluster brick processes going down or returning error:
- Server1 is down and server2 is up: Writes happen on server 2.
- Server1 comes up, server2 goes down (Heal not happened / data on server 2 is not replicated on server1): Writes happen on server1.
- Server2 comes up: Both server1 and server2 has data independent of each other.
If we use the replica 2 volume, it is not possible to prevent split-brain without losing availability.
If we use the `replica 2` volume, it is not possible to prevent split-brain without losing availability.
### Ways to deal with split brain:
In glusterfs there are ways to resolve split brain. You can see the detailed description of how to resolve a split-brain [here](../Troubleshooting/resolving-splitbrain.md). Moreover, there are ways to reduce the chances of ending up in split-brain situations. They are:
1. Replica 3 volume
1. volume with `replica 3`
2. Arbiter volume
Both of these uses the client-quorum option of glusterfs to avoid the split-brain situations.
Both of these use the client-quorum option of glusterfs to avoid the split-brain situations.
### Client quorum:
This is a feature implemented in Automatic File Replication (AFR here on) module, to prevent split-brains in the I/O path for replicate/distributed-replicate volumes. By default, if the client-quorum is not met for a particular replica subvol, it becomes read-only. The other subvols (in a dist-rep volume) will still have R/W access. [Here](arbiter-volumes-and-quorum.md#client-quorum) you can see more details about client-quorum.
#### Client quorum in replica 2 volumes:
In a replica 2 volume it is not possible to achieve high availability and consistency at the same time, without sacrificing tolerance to partition. If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only.
In a `replica 2` volume it is not possible to achieve high availability and consistency at the same time, without sacrificing tolerance to partition. If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only.
If the quorum-type is set to fixed, and the quorum-count is set to 1, then we may end up in split brain.
- Brick1 is up and brick2 is down. Quorum is met and write happens on brick1.
- Brick1 goes down and brick2 comes up (No heal happened). Quorum is met, write happens on brick2.
- Brick1 comes up. Quorum is met, but both the bricks have independent writes - split-brain.
To avoid this we have to set the quorum-count to 2, which will cost the availability. Even if we have one replica brick up and running, the quorum is not met and we end up seeing EROFS.
### 1. Replica 3 volume:
When we create a replicated or distributed replicated volume with replica count 3, the cluster.quorum-type option is set to auto by default. That means at least 2 bricks should be up and running to satisfy the quorum and allow the writes. This is the recommended setting for a replica 3 volume and this should not be changed. Here is how it prevents files from ending up in split brain:
When we create a replicated or distributed replicated volume with replica count 3, the cluster.quorum-type option is set to auto by default. That means at least 2 bricks should be up and running to satisfy the quorum and allow the writes. This is the recommended setting for a `replica 3` volume and this should not be changed. Here is how it prevents files from ending up in split brain:
B1, B2, and B3 are the 3 bricks of a replica 3 volume.
1. B1 & B2 are up and B3 is down. Quorum is met and write happens on B1 & B2.
2. B3 comes up and B2 is down. Quorum is met and write happens on B1 & B3.
3. B2 comes up and B1 goes down. Quorum is met. But when a write request comes, AFR sees that B2 & B3 are blaming each other (B2 says that some writes are pending on B3 and B3 says that some writes are pending on B2), therefore the write is not allowed and is failed with EIO.
Command to create a replica 3 volume:
Command to create a `replica 3` volume:
```sh
$gluster volume create <volname> replica 3 host1:brick1 host2:brick2 host3:brick3
```
@@ -65,6 +74,7 @@ Since the arbiter brick has only name and metadata of the files, there are some
You can find more details on arbiter [here](arbiter-volumes-and-quorum.md).
### Differences between replica 3 and arbiter volumes:
1. In case of a replica 3 volume, we store the entire file in all the bricks and it is recommended to have bricks of same size. But in case of arbiter, since we do not store data, the size of the arbiter brick is comparatively lesser than the other bricks.
2. Arbiter is a state between replica 2 and replica 3 volume. If we have only arbiter and one of the other brick is up and the arbiter brick blames the other brick, then we can not proceed with the FOPs.
4. Replica 3 gives high availability compared to arbiter, because unlike in arbiter, replica 3 has a full copy of the data in all 3 bricks.

View File

@@ -2,7 +2,7 @@
The arbiter volume is a special subset of replica volumes that is aimed at
preventing split-brains and providing the same consistency guarantees as a normal
replica 3 volume without consuming 3x space.
`replica 3` volume without consuming 3x space.
<!-- TOC depthFrom:1 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->
@@ -22,7 +22,7 @@ replica 3 volume without consuming 3x space.
The syntax for creating the volume is:
```
# gluster volume create <VOLNAME> replica 2 arbiter 1 <NEW-BRICK> ...
# gluster volume create <VOLNAME> replica 2 arbiter 1 <NEW-BRICK> ...
```
**Note**: The earlier syntax used to be ```replica 3 arbiter 1``` but that was
leading to confusions among users about the total no. of data bricks. For the
@@ -33,7 +33,7 @@ arbiter volume.
For example:
```
# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick
# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick
volume create: testvol: success: please start the volume to access data
```
@@ -66,9 +66,9 @@ performance.readdir-ahead: on `
```
The arbiter brick will store only the file/directory names (i.e. the tree structure)
and extended attributes (metadata) but not any data. i.e. the file size
and extended attributes (metadata) but not any data, i.e. the file size
(as shown by `ls -l`) will be zero bytes. It will also store other gluster
metadata like the .glusterfs folder and its contents.
metadata like the `.glusterfs` folder and its contents.
_**Note:** Enabling the arbiter feature **automatically** configures_
_client-quorum to 'auto'. This setting is **not** to be changed._
@@ -76,11 +76,10 @@ _client-quorum to 'auto'. This setting is **not** to be changed._
## Arbiter brick(s) sizing
Since the arbiter brick does not store file data, its disk usage will be considerably
less than the other bricks of the replica. The sizing of the brick will depend on
smaller than for the other bricks of the replica. The sizing of the brick will depend on
how many files you plan to store in the volume. A good estimate will be
4KB times the number of files in the replica. Note that the estimate also
depends on the inode space alloted by the underlying filesystem for a given
disk size.
depends on the inode space allocated by the underlying filesystem for a given disk size.
The `maxpct` value in XFS for volumes of size 1TB to 50TB is only 5%.
If you want to store say 300 million files, 4KB x 300M gives us 1.2TB.
@@ -130,7 +129,7 @@ greater than 50%, so that two nodes separated from each other do not believe
they have quorum simultaneously. For a two-node plain replica volume, this would
mean both nodes need to be up and running. So there is no notion of HA/failover.
There are users who create a replica 2 volume from 2 nodes and peer-probe
There are users who create a `replica 2` volume from 2 nodes and peer-probe
a 'dummy' node without bricks and enable server quorum with a ratio of 51%.
This does not prevent files from getting into split-brain. For example, if B1
and B2 are the bricks/nodes of the replica and B3 is the dummy node, we can
@@ -176,7 +175,7 @@ The following volume set options are used to configure it:
to specify the number of bricks to be active to participate in quorum.
If the quorum-type is auto then this option has no significance.
Earlier, when quorm was not met, the replica subvolume turned read-only. But
Earlier, when quorum was not met, the replica subvolume turned read-only. But
since [glusterfs-3.13](https://docs.gluster.org/en/latest/release-notes/3.13.0/#addition-of-checks-for-allowing-lookups-in-afr-and-removal-of-clusterquorum-reads-volume-option) and upwards, the subvolume becomes unavailable, i.e. all
the file operations fail with ENOTCONN error instead of becoming EROFS.
This means the ```cluster.quorum-reads``` volume option is also not supported.
@@ -185,16 +184,16 @@ This means the ```cluster.quorum-reads``` volume option is also not supported.
## Replica 2 and Replica 3 volumes
From the above descriptions, it is clear that client-quorum cannot really be applied
to a replica 2 volume:(without costing HA).
to a `replica 2` volume (without costing HA).
If the quorum-type is set to auto, then by the description
given earlier, the first brick must always be up, irrespective of the status of the
second brick. IOW, if only the second brick is up, the subvol returns ENOTCONN, i.e. no HA.
If quorum-type is set to fixed, then the quorum-count *has* to be two
to prevent split-brains (otherwise a write can succeed in brick1, another in brick2 =>split-brain).
So for all practical purposes, if you want high availability in a replica 2 volume,
So for all practical purposes, if you want high availability in a `replica 2` volume,
it is recommended not to enable client-quorum.
In a replica 3 volume, client-quorum is enabled by default and set to 'auto'.
In a `replica 3` volume, client-quorum is enabled by default and set to 'auto'.
This means 2 bricks need to be up for the write to succeed. Here is how this
configuration prevents files from ending up in split-brain:

View File

@@ -7,5 +7,3 @@ OK, you can do that by editing planet-gluster [feeds](https://github.com/gluster
Please find instructions mentioned in the file and send a pull request.
Once approved, all your gluster related posts will appear in [planet.gluster.org](http://planet.gluster.org) website.

View File

@@ -1,31 +1,29 @@
Before filing an issue
----------------------
## Before filing an issue
If you are finding any issues, these preliminary checks as useful:
- Is SELinux enabled? (you can use `getenforce` to check)
- Are iptables rules blocking any data traffic? (`iptables -L` can
help check)
- Are all the nodes reachable from each other? [ Network problem ]
- Please search [issues](https://github.com/gluster/glusterfs/issues)
to see if the bug has already been reported
- If an issue has been already filed for a particular release and
you found the issue in another release, add a comment in issue.
- Is SELinux enabled? (you can use `getenforce` to check)
- Are iptables rules blocking any data traffic? (`iptables -L` can
help check)
- Are all the nodes reachable from each other? [ Network problem ]
- Please search [issues](https://github.com/gluster/glusterfs/issues)
to see if the bug has already been reported
- If an issue has been already filed for a particular release and you found the issue in another release, add a comment in issue.
Anyone can search in github issues, you don't need an account. Searching
requires some effort, but helps avoid duplicates, and you may find that
your problem has already been solved.
Reporting An Issue
------------------
## Reporting An Issue
- You should have an account with github.com
- Here is the link to file an issue:
[Github](https://github.com/gluster/glusterfs/issues/new)
- You should have an account with github.com
- Here is the link to file an issue:
[Github](https://github.com/gluster/glusterfs/issues/new)
*Note: Please go through all below sections to understand what
_Note: Please go through all below sections to understand what
information we need to put in a bug. So it will help the developer to
root cause and fix it*
root cause and fix it_
### Required Information
@@ -33,84 +31,86 @@ You should gather the information below before creating the bug report.
#### Package Information
- Location from which the packages are used
- Package Info - version of glusterfs package installed
- Location from which the packages are used
- Package Info - version of glusterfs package installed
#### Cluster Information
- Number of nodes in the cluster
- Hostnames and IPs of the gluster Node [if it is not a security
issue]
- Hostname / IP will help developers in understanding &
correlating with the logs
- Output of `gluster peer status`
- Node IP, from which the "x" operation is done
- "x" here means any operation that causes the issue
- Number of nodes in the cluster
- Hostnames and IPs of the gluster Node [if it is not a security
issue]
- Hostname / IP will help developers in understanding & correlating with the logs
- Output of `gluster peer status`
- Node IP, from which the "x" operation is done
- "x" here means any operation that causes the issue
#### Volume Information
- Number of volumes
- Volume Names
- Volume on which the particular issue is seen [ if applicable ]
- Type of volumes
- Volume options if available
- Output of `gluster volume info`
- Output of `gluster volume status`
- Get the statedump of the volume with the problem
`$ gluster volume statedump <vol-name>`
- Number of volumes
- Volume Names
- Volume on which the particular issue is seen [ if applicable ]
- Type of volumes
- Volume options if available
- Output of `gluster volume info`
- Output of `gluster volume status`
- Get the statedump of the volume with the problem `gluster volume statedump <vol-name>`
This dumps statedump per brick process in `/var/run/gluster`
*NOTE: Collect statedumps from one gluster Node in a directory.*
_NOTE: Collect statedumps from one gluster Node in a directory._
Repeat it in all Nodes containing the bricks of the volume. All the so
collected directories could be archived, compressed and attached to bug
#### Brick Information
- xfs options when a brick partition was done
- This could be obtained with this command :
- xfs options when a brick partition was done
`$ xfs_info /dev/mapper/vg1-brick`
- This could be obtained with this command: `xfs_info /dev/mapper/vg1-brick`
- Extended attributes on the bricks
- This could be obtained with this command:
- Extended attributes on the bricks
`$ getfattr -d -m. -ehex /rhs/brick1/b1`
- This could be obtained with this command: `getfattr -d -m. -ehex /rhs/brick1/b1`
#### Client Information
- OS Type ( Ubuntu, Fedora, RHEL )
- OS Version: In case of Linux distro get the following :
- OS Type ( Ubuntu, Fedora, RHEL )
- OS Version: In case of Linux distro get the following :
`uname -r`
`cat /etc/issue`
```console
uname -r
cat /etc/issue
```
- Fuse or NFS Mount point on the client with output of mount commands
- Output of `df -Th` command
- Fuse or NFS Mount point on the client with output of mount commands
- Output of `df -Th` command
#### Tool Information
- If any tools are used for testing, provide the info/version about it
- if any IO is simulated using a script, provide the script
- If any tools are used for testing, provide the info/version about it
- if any IO is simulated using a script, provide the script
#### Logs Information
- You can check logs for issues/warnings/errors.
- Self-heal logs
- Rebalance logs
- Glusterd logs
- Brick logs
- NFS logs (if applicable)
- Samba logs (if applicable)
- Client mount log
- Add the entire logs as attachment, if its very large to paste as a
comment
- You can check logs for issues/warnings/errors.
- Self-heal logs
- Rebalance logs
- Glusterd logs
- Brick logs
- NFS logs (if applicable)
- Samba logs (if applicable)
- Client mount log
- Add the entire logs as attachment, if its very large to paste as a
comment
#### SOS report for CentOS/Fedora
- Get the sosreport from the involved gluster Node and Client [ in
case of CentOS /Fedora ]
- Add a meaningful name/IP to the sosreport, by renaming/adding
hostname/ip to the sosreport name
- Get the sosreport from the involved gluster Node and Client [ in
case of CentOS /Fedora ]
- Add a meaningful name/IP to the sosreport, by renaming/adding
hostname/ip to the sosreport name

View File

@@ -1,25 +1,24 @@
Issues Triage Guidelines
========================
# Issues Triage Guidelines
- Triaging of issues is an important task; when done correctly, it can
reduce the time between reporting an issue and the availability of a
fix enormously.
- Triaging of issues is an important task; when done correctly, it can
reduce the time between reporting an issue and the availability of a
fix enormously.
- Triager should focus on new issues, and try to define the problem
easily understandable and as accurate as possible. The goal of the
triagers is to reduce the time that developers need to solve the bug
report.
- Triager should focus on new issues, and try to define the problem
easily understandable and as accurate as possible. The goal of the
triagers is to reduce the time that developers need to solve the bug
report.
- A triager is like an assistant that helps with the information
gathering and possibly the debugging of a new bug report. Because a
triager helps preparing a bug before a developer gets involved, it
can be a very nice role for new community members that are
interested in technical aspects of the software.
- A triager is like an assistant that helps with the information
gathering and possibly the debugging of a new bug report. Because a
triager helps preparing a bug before a developer gets involved, it
can be a very nice role for new community members that are
interested in technical aspects of the software.
- Triagers will stumble upon many different kind of issues, ranging
from reports about spelling mistakes, or unclear log messages to
memory leaks causing crashes or performance issues in environments
with several hundred storage servers.
- Triagers will stumble upon many different kind of issues, ranging
from reports about spelling mistakes, or unclear log messages to
memory leaks causing crashes or performance issues in environments
with several hundred storage servers.
Nobody expects that triagers can prepare all bug reports. Therefore most
developers will be able to assist the triagers, answer questions and
@@ -28,17 +27,16 @@ more experienced and will rely less on developers.
**Issue triage can be summarized as below points:**
- Is the issue a bug? an enhancement request? or a question? Assign the relevant label.
- Is there enough information in the issue description?
- Is it a duplicate issue?
- Is it assigned to correct component of GlusterFS?
- Is the bug summary is correct?
- Assigning issue or Adding people's github handle in the comment, so they get notified.
- Is the issue a bug? an enhancement request? or a question? Assign the relevant label.
- Is there enough information in the issue description?
- Is it a duplicate issue?
- Is it assigned to correct component of GlusterFS?
- Is the bug summary is correct?
- Assigning issue or Adding people's github handle in the comment, so they get notified.
The detailed discussion about the above points are below.
Is there enough information?
----------------------------
## Is there enough information?
It's hard to generalize what makes a good report. For "average"
reporters is definitely often helpful to have good steps to reproduce,
@@ -46,42 +44,38 @@ GlusterFS software version , and information about the test/production
environment, Linux/GNU distribution.
If the reporter is a developer, steps to reproduce can sometimes be
omitted as context is obvious. *However, this can create a problem for
omitted as context is obvious. _However, this can create a problem for
contributors that need to find their way, hence it is strongly advised
to list the steps to reproduce an issue.*
to list the steps to reproduce an issue._
Other tips:
- There should be only one issue per report. Try not to mix related or
similar looking bugs per report.
- There should be only one issue per report. Try not to mix related or
similar looking bugs per report.
- It should be possible to call the described problem fixed at some
point. "Improve the documentation" or "It runs slow" could never be
called fixed, while "Documentation should cover the topic Embedding"
or "The page at <http://en.wikipedia.org/wiki/Example> should load
in less than five seconds" would have a criterion. A good summary of
the bug will also help others in finding existing bugs and prevent
filing of duplicates.
- It should be possible to call the described problem fixed at some
point. "Improve the documentation" or "It runs slow" could never be
called fixed, while "Documentation should cover the topic Embedding"
or "The page at <http://en.wikipedia.org/wiki/Example> should load
in less than five seconds" would have a criterion. A good summary of
the bug will also help others in finding existing bugs and prevent
filing of duplicates.
- If the bug is a graphical problem, you may want to ask for a
screenshot to attach to the bug report. Make sure to ask that the
screenshot should not contain any confidential information.
- If the bug is a graphical problem, you may want to ask for a
screenshot to attach to the bug report. Make sure to ask that the
screenshot should not contain any confidential information.
Is it a duplicate?
------------------
## Is it a duplicate?
If you think that you have found a duplicate but you are not totally
sure, just add a comment like "This issue looks related to issue #NNN" (and
replace NNN by issue-id) so somebody else can take a look and help judging.
Is it assigned with correct label?
----------------------------------
## Is it assigned with correct label?
Go through the labels and assign the appropriate label
Are the fields correct?
-----------------------
## Are the fields correct?
### Description
@@ -89,8 +83,8 @@ Sometimes the description does not summarize the bug itself well. You may
want to update the bug summary to make the report distinguishable. A
good title may contain:
- A brief explanation of the root cause (if it was found)
- Some of the symptoms people are experiencing
- A brief explanation of the root cause (if it was found)
- Some of the symptoms people are experiencing
### Assigning issue or Adding people's github handle in the comment

View File

@@ -15,7 +15,7 @@ Minor releases will have guaranteed backwards compatibilty with earlier minor re
Each GlusterFS major release has a 4-6 month release window, in which changes get merged. This window is split into two phases.
1. A Open phase, where all changes get merged
1. A Stability phase, where only changes that stabilize the release get merged.
2. A Stability phase, where only changes that stabilize the release get merged.
The first 2-4 months of a release window will be the Open phase, and the last month will be the stability phase.
@@ -30,8 +30,8 @@ All changes will be accepted during the Open phase. The changes have a few requi
- a change fixing a bug SHOULD have public test case
- a change introducing a new feature MUST have a disable switch that can disable the feature during a build
#### Stability phase
This phase is used to stabilize any new features introduced in the open phase, or general bug fixes for already existing features.
A new `release-<version>` branch is created at the beginning of this phase. All changes need to be sent to the master branch before getting backported to the new release branch.
@@ -54,6 +54,7 @@ Patches accepted in the Stability phase have the following requirements:
Patches that do not satisfy the above requirements can still be submitted for review, but cannot be merged.
## Release procedure
This procedure is followed by a release maintainer/manager, to perform the actual release.
The release procedure for both major releases and minor releases is nearly the same.
@@ -63,6 +64,7 @@ The procedure for the major releases starts at the beginning of the Stability ph
_TODO: Add the release verification procedure_
### Release steps
The release-manager needs to follow the following steps, to actually perform the release once ready.
#### Create tarball
@@ -73,9 +75,11 @@ The release-manager needs to follow the following steps, to actually perform the
4. create the tarball with the [release job in Jenkins](http://build.gluster.org/job/release/)
#### Notify packagers
Notify the packagers that we need packages created. Provide the link to the source tarball from the Jenkins release job to the [packagers mailinglist](mailto:packaging@gluster.org). A list of the people involved in the package maintenance for the different distributions is in the `MAINTAINERS` file in the sources, all of them should be subscribed to the packagers mailinglist.
#### Create a new Tracker Bug for the next release
The tracker bugs are used as guidance for blocker bugs and should get created when a release is made. To create one
- Create a [new milestone](https://github.com/gluster/glusterfs/milestones/new)
@@ -83,19 +87,21 @@ The tracker bugs are used as guidance for blocker bugs and should get created wh
- issues that were not fixed in previous release, but in milestone should be moved to the new milestone.
#### Create Release Announcement
(Major releases)
The Release Announcement is based off the release notes. This needs to indicate:
* What this release's overall focus is
* Which versions will stop receiving updates as of this release
* Links to the direct download folder
* Feature set
Best practice as of version-8 is to create a collaborative version of the release notes that both the release manager and community lead work on together, and the release manager posts to the mailing lists (gluster-users@, gluster-devel@, announce@).
(Major releases)
The Release Announcement is based off the release notes. This needs to indicate:
- What this release's overall focus is
- Which versions will stop receiving updates as of this release
- Links to the direct download folder
- Feature set
Best practice as of version-8 is to create a collaborative version of the release notes that both the release manager and community lead work on together, and the release manager posts to the mailing lists (gluster-users@, gluster-devel@, announce@).
#### Create Upgrade Guide
(Major releases)
If required, as in the case of a major release, an upgrade guide needs to be available at the same time as the release.
(Major releases)
If required, as in the case of a major release, an upgrade guide needs to be available at the same time as the release.
This document should go under the [Upgrade Guide](https://github.com/gluster/glusterdocs/tree/master/Upgrade-Guide) section of the [glusterdocs](https://github.com/gluster/glusterdocs) repository.
#### Send Release Announcement
@@ -103,13 +109,15 @@ This document should go under the [Upgrade Guide](https://github.com/gluster/glu
Once the Fedora/EL RPMs are ready (and any others that are ready by then), send the release announcement:
- Gluster Mailing lists
- [gluster-announce](https://lists.gluster.org/mailman/listinfo/announce/)
- [gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel)
- [gluster-users](https://lists.gluster.org/mailman/listinfo/gluster-users/)
- [Gluster Blog](https://planet.gluster.org/)
The blog will automatically post to both Facebook and Twitter. Be careful with this!
- [Gluster Twitter account](https://twitter.com/gluster)
- [Gluster Facebook page](https://www.facebook.com/GlusterInc)
- [Gluster LinkedIn group](https://www.linkedin.com/company/gluster/about/)
- [gluster-announce](https://lists.gluster.org/mailman/listinfo/announce/)
- [gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel)
- [gluster-users](https://lists.gluster.org/mailman/listinfo/gluster-users/)
- [Gluster Blog](https://planet.gluster.org/)
The blog will automatically post to both Facebook and Twitter. Be careful with this!
- [Gluster Twitter account](https://twitter.com/gluster)
- [Gluster Facebook page](https://www.facebook.com/GlusterInc)
- [Gluster LinkedIn group](https://www.linkedin.com/company/gluster/about/)

View File

@@ -13,8 +13,10 @@ explicitly called out.
### Guidelines that Maintainers are expected to adhere to
1. Ensure qualitative and timely management of patches sent for review.
2. For merging patches into the repository, it is expected of maintainers to:
1. Ensure qualitative and timely management of patches sent for review.
2. For merging patches into the repository, it is expected of maintainers to:
- Merge patches of owned components only.
- Seek approvals from all maintainers before merging a patchset spanning
multiple components.
@@ -28,14 +30,15 @@ explicitly called out.
quality of the codebase.
- Not merge patches written by themselves until there is a +2 Code Review
vote by other reviewers.
3. The responsibility of merging a patch into a release branch in normal
circumstances will be that of the release maintainer's. Only in exceptional
situations, maintainers & sub-maintainers will merge patches into a release
branch.
4. Release maintainers will ensure approval from appropriate maintainers before
merging a patch into a release branch.
5. Maintainers have a responsibility to the community, it is expected of
maintainers to:
3. The responsibility of merging a patch into a release branch in normal
circumstances will be that of the release maintainer's. Only in exceptional
situations, maintainers & sub-maintainers will merge patches into a release
branch.
4. Release maintainers will ensure approval from appropriate maintainers before
merging a patch into a release branch.
5. Maintainers have a responsibility to the community, it is expected of maintainers to:
- Facilitate the community in all aspects.
- Be very active and visible in the community.
- Be objective and consider the larger interests of the community ahead of
@@ -53,4 +56,3 @@ Any questions or comments regarding these guidelines can be routed to
Github can be used to list patches that need reviews and/or can get
merged from [Pull Requests](https://github.com/gluster/glusterfs/pulls)

View File

@@ -1,28 +1,23 @@
# Workflow Guide
Bug Handling
------------
## Bug Handling
- [Bug reporting guidelines](./Bug-Reporting-Guidelines.md) -
Guideline for reporting a bug in GlusterFS
- [Bug triage guidelines](./Bug-Triage.md) - Guideline on how to
triage bugs for GlusterFS
- [Bug reporting guidelines](./Bug-Reporting-Guidelines.md) -
Guideline for reporting a bug in GlusterFS
- [Bug triage guidelines](./Bug-Triage.md) - Guideline on how to
triage bugs for GlusterFS
Release Process
---------------
## Release Process
- [GlusterFS Release process](./GlusterFS-Release-process.md) -
Our release process / checklist
- [GlusterFS Release process](./GlusterFS-Release-process.md) -
Our release process / checklist
Patch Acceptance
----------------
## Patch Acceptance
- The [Guidelines For Maintainers](./Guidelines-For-Maintainers.md) explains when
maintainers can merge patches.
- The [Guidelines For Maintainers](./Guidelines-For-Maintainers.md) explains when
maintainers can merge patches.
Blogging about gluster
----------------
- The [Adding your gluster blog](./Adding-your-blog.md) explains how to add your
gluster blog to Community blogger.
## Blogging about gluster
- The [Adding your gluster blog](./Adding-your-blog.md) explains how to add your
gluster blog to Community blogger.

View File

@@ -1,4 +1,5 @@
# Backport Guidelines
In GlusterFS project, as a policy, any new change, bug fix, etc., are to be
fixed in 'devel' branch before release branches. When a bug is fixed in
the devel branch, it might be desirable or necessary in release branch.
@@ -9,17 +10,17 @@ understand how to request for backport from community.
## Policy
* No feature from devel would be backported to the release branch
* CVE ie., security vulnerability [(listed on the CVE database)](https://cve.mitre.org/cve/search_cve_list.html)
reported in the existing releases would be backported, after getting fixed
in devel branch.
* Only topics which bring about data loss or, unavailability would be
backported to the release.
* For any other issues, the project recommends that the installation be
upgraded to a newer release where the specific bug has been addressed.
- No feature from devel would be backported to the release branch
- CVE ie., security vulnerability [(listed on the CVE database)](https://cve.mitre.org/cve/search_cve_list.html)
reported in the existing releases would be backported, after getting fixed
in devel branch.
- Only topics which bring about data loss or, unavailability would be
backported to the release.
- For any other issues, the project recommends that the installation be
upgraded to a newer release where the specific bug has been addressed.
- Gluster provides 'rolling' upgrade support, i.e., one can upgrade their
server version without stopping the application I/O, so we recommend migrating
to higher version.
server version without stopping the application I/O, so we recommend migrating
to higher version.
## Things to pay attention to while backporting a patch.
@@ -27,12 +28,10 @@ If your patch meets the criteria above, or you are a user, who prefer to have a
fix backported, because your current setup is facing issues, below are the
steps you need to take care to submit a patch on release branch.
* The patch should have same 'Change-Id'.
- The patch should have same 'Change-Id'.
### How to contact release owners?
All release owners are part of 'gluster-devel@gluster.org' mailing list.
Please write your expectation from next release there, so we can take that
to consideration while making the release.

View File

@@ -7,9 +7,11 @@ This page describes how to build and install GlusterFS.
The following packages are required for building GlusterFS,
- GNU Autotools
- Automake
- Autoconf
- Libtool
- Automake
- Autoconf
- Libtool
- lex (generally flex)
- GNU Bison
- OpenSSL
@@ -258,9 +260,9 @@ cd extras/LinuxRPM
make glusterrpms
```
This will create rpms from the source in 'extras/LinuxRPM'. *(Note: You
This will create rpms from the source in 'extras/LinuxRPM'. _(Note: You
will need to install the rpmbuild requirements including rpmbuild and
mock)*<br>
mock)_<br>
For CentOS / Enterprise Linux 8 the dependencies can be installed via:
```console

View File

@@ -1,8 +1,8 @@
Developers
==========
# Developers
### Contributing to the Gluster community
-------------------------------------
---
Are you itching to send in patches and participate as a developer in the
Gluster community? Here are a number of starting points for getting
@@ -10,36 +10,37 @@ involved. All you need is your 'github' account to be handy.
Remember that, [Gluster community](https://github.com/gluster) has multiple projects, each of which has its own way of handling PRs and patches. Decide on which project you want to contribute. Below documents are mostly about 'GlusterFS' project, which is the core of Gluster Community.
Workflow
--------
## Workflow
- [Simplified Developer Workflow](./Simplified-Development-Workflow.md)
- A simpler and faster intro to developing with GlusterFS, than the document below
- [Developer Workflow](./Development-Workflow.md)
- Covers detail about requirements from a patch; tools and toolkits used by developers.
This is recommended reading in order to begin contributions to the project.
- [GD2 Developer Workflow](https://github.com/gluster/glusterd2/blob/master/doc/development-guide.md)
- Helps in on-boarding developers to contribute in GlusterD2 project.
- [Simplified Developer Workflow](./Simplified-Development-Workflow.md)
Compiling Gluster
-----------------
- A simpler and faster intro to developing with GlusterFS, than the document below
- [Building GlusterFS](./Building-GlusterFS.md) - How to compile
Gluster from source code.
- [Developer Workflow](./Development-Workflow.md)
Developing
----------
- Covers detail about requirements from a patch; tools and toolkits used by developers.
This is recommended reading in order to begin contributions to the project.
- [Projects](./Projects.md) - Ideas for projects you could
create
- [Fixing issues reported by tools for static code
analysis](./Fixing-issues-reported-by-tools-for-static-code-analysis.md)
- This is a good starting point for developers to fix bugs in
GlusterFS project.
- [GD2 Developer Workflow](https://github.com/gluster/glusterd2/blob/master/doc/development-guide.md)
Releases and Backports
----------------------
- Helps in on-boarding developers to contribute in GlusterD2 project.
- [Backport Guidelines](./Backport-Guidelines.md) describe the steps that branches too.
## Compiling Gluster
- [Building GlusterFS](./Building-GlusterFS.md) - How to compile
Gluster from source code.
## Developing
- [Projects](./Projects.md) - Ideas for projects you could
create
- [Fixing issues reported by tools for static code
analysis](./Fixing-issues-reported-by-tools-for-static-code-analysis.md)
- This is a good starting point for developers to fix bugs in GlusterFS project.
## Releases and Backports
- [Backport Guidelines](./Backport-Guidelines.md) describe the steps that branches too.
Some more GlusterFS Developer documentation can be found [in glusterfs documentation directory](https://github.com/gluster/glusterfs/tree/master/doc/developer-guide)

View File

@@ -1,12 +1,10 @@
Development workflow of Gluster
================================
# Development workflow of Gluster
This document provides a detailed overview of the development model
followed by the GlusterFS project. For a simpler overview visit
[Simplified development workflow](./Simplified-Development-Workflow.md).
##Basics
--------
## Basics
The GlusterFS development model largely revolves around the features and
functionality provided by Git version control system, Github and Jenkins
@@ -31,8 +29,7 @@ all builds and tests can be viewed at
'regression' job which is designed to execute test scripts provided as
part of the code change.
##Preparatory Setup
-------------------
## Preparatory Setup
Here is a list of initial one-time steps before you can start hacking on
code.
@@ -46,9 +43,9 @@ Fork [GlusterFS repository](https://github.com/gluster/glusterfs/fork)
Get yourself a working tree by cloning the development repository from
```console
# git clone git@github.com:${username}/glusterfs.git
# cd glusterfs/
# git remote add upstream git@github.com:gluster/glusterfs.git
git clone git@github.com:${username}/glusterfs.git
cd glusterfs/
git remote add upstream git@github.com:gluster/glusterfs.git
```
### Preferred email and set username
@@ -69,13 +66,14 @@ get alerts.
Set up a filter rule in your mail client to tag or classify emails with
the header
```text
list: <glusterfs.gluster.github.com>
```
as mails originating from the github system.
##Development & Other flows
---------------------------
## Development & Other flows
### Issue
@@ -90,17 +88,17 @@ as mails originating from the github system.
- Make sure clang-format is installed and is run on the patch.
### Keep up-to-date
- GlusterFS is a large project with many developers, so there would be one or the other patch everyday.
- It is critical for developer to be up-to-date with devel repo to be Conflict-Free when PR is opened.
- Git provides many options to keep up-to-date, below is one of them
```console
# git fetch upstream
# git rebase upstream/devel
git fetch upstream
git rebase upstream/devel
```
##Branching policy
------------------
## Branching policy
This section describes both, the branching policies on the public repo
as well as the suggested best-practice for local branching
@@ -130,13 +128,12 @@ change. The name of the branch on your personal fork can start with issueNNNN,
followed by anything of your choice. If you are submitting changes to the devel
branch, first create a local task branch like this -
```console
```{ .console .no-copy }
# git checkout -b issueNNNN upstream/main
... <hack, commit>
```
##Building
----------
## Building
### Environment Setup
@@ -147,18 +144,19 @@ refer : [Building GlusterFS](./Building-GlusterFS.md)
Once the required packages are installed for your appropiate system,
generate the build configuration:
```console
# ./autogen.sh
# ./configure --enable-fusermount
./autogen.sh
./configure --enable-fusermount
```
### Build and install
```console
# make && make install
make && make install
```
##Commit policy / PR description
--------------------------------
## Commit policy / PR description
Typically you would have a local branch per task. You will need to
sign-off your commit (git commit -s) before sending the
@@ -169,22 +167,21 @@ CONTRIBUTING file available in the repository root.
Provide a meaningful commit message. Your commit message should be in
the following format
- A short one-line title of format 'component: title', describing what the patch accomplishes
- An empty line following the subject
- Situation necessitating the patch
- Description of the code changes
- Reason for doing it this way (compared to others)
- Description of test cases
- When you open a PR, having a reference Issue for the commit is mandatory in GlusterFS.
- Commit message can have, either Fixes: #NNNN or Updates: #NNNN in a separate line in the commit message.
Here, NNNN is the Issue ID in glusterfs repository.
- Each commit needs the author to have the 'Signed-off-by: Name <email>' line.
Can do this by -s option for git commit.
- If the PR is not ready for review, apply the label work-in-progress.
Check the availability of "Draft PR" is present for you, if yes, use that instead.
- A short one-line title of format 'component: title', describing what the patch accomplishes
- An empty line following the subject
- Situation necessitating the patch
- Description of the code changes
- Reason for doing it this way (compared to others)
- Description of test cases
- When you open a PR, having a reference Issue for the commit is mandatory in GlusterFS.
- Commit message can have, either Fixes: #NNNN or Updates: #NNNN in a separate line in the commit message.
Here, NNNN is the Issue ID in glusterfs repository.
- Each commit needs the author to have the 'Signed-off-by: Name <email>' line.
Can do this by -s option for git commit.
- If the PR is not ready for review, apply the label work-in-progress.
Check the availability of "Draft PR" is present for you, if yes, use that instead.
##Push the change
-----------------
## Push the change
After doing the local commit, it is time to submit the code for review.
There is a script available inside glusterfs.git called rfc.sh. It is
@@ -192,31 +189,34 @@ recommended you keep pushing to your repo every day, so you don't loose
any work. You can submit your changes for review by simply executing
```console
# ./rfc.sh
./rfc.sh
```
or
```console
# git push origin HEAD:issueNNN
git push origin HEAD:issueNNN
```
This script rfc.sh does the following:
- The first time it is executed, it downloads a git hook from
<http://review.gluster.org/tools/hooks/commit-msg> and sets it up
locally to generate a Change-Id: tag in your commit message (if it
was not already generated.)
- Rebase your commit against the latest upstream HEAD. This rebase
also causes your commits to undergo massaging from the just
downloaded commit-msg hook.
- Prompt for a Reference Id for each commit (if it was not already provided)
and include it as a "fixes: #n" tag in the commit log. You can just hit
<enter> at this prompt if your submission is purely for review
purposes.
- Push the changes for review. On a successful push, you will see a URL pointing to
the change in [Pull requests](https://github.com/gluster/glusterfs/pulls) section.
- The first time it is executed, it downloads a git hook from
<http://review.gluster.org/tools/hooks/commit-msg> and sets it up
locally to generate a Change-Id: tag in your commit message (if it
was not already generated.)
- Rebase your commit against the latest upstream HEAD. This rebase
also causes your commits to undergo massaging from the just
downloaded commit-msg hook.
- Prompt for a Reference Id for each commit (if it was not already provided)
and include it as a "fixes: #n" tag in the commit log. You can just hit
<enter> at this prompt if your submission is purely for review
purposes.
- Push the changes for review. On a successful push, you will see a URL pointing to
the change in [Pull requests](https://github.com/gluster/glusterfs/pulls) section.
## Test cases and Verification
------------------------------
---
### Auto-triggered tests
@@ -258,13 +258,13 @@ To check and run all regression tests locally, run the below script
from glusterfs root directory.
```console
# ./run-tests.sh
./run-tests.sh
```
To run a single regression test locally, run the below command.
```console
# prove -vf <path_to_the_file>
prove -vf <path_to_the_file>
```
**NOTE:** The testing framework needs perl-Test-Harness package to be installed.
@@ -284,18 +284,17 @@ of the feature. Please go through glusto-tests project to understand
more information on how to write and execute the tests in glusto.
1. Extend/Modify old test cases in existing scripts - This is typically
when present behavior (default values etc.) of code is changed.
when present behavior (default values etc.) of code is changed.
2. No test cases - This is typically when a code change is trivial
(e.g. fixing typos in output strings, code comments).
(e.g. fixing typos in output strings, code comments).
3. Only test case and no code change - This is typically when we are
adding test cases to old code (already existing before this regression
test policy was enforced). More details on how to work with test case
scripts can be found in tests/README.
adding test cases to old code (already existing before this regression
test policy was enforced). More details on how to work with test case
scripts can be found in tests/README.
##Reviewing / Commenting
------------------------
## Reviewing / Commenting
Code review with Github is relatively easy compared to other available
tools. Each change is presented as multiple files and each file can be
@@ -304,8 +303,7 @@ on each line by clicking on '+' icon and writing in your comments in
the text box. Such in-line comments are saved as drafts, till you
finally publish them by Starting a Review.
##Incorporate, rfc.sh, Reverify
--------------------------------------
## Incorporate, rfc.sh, Reverify
Code review comments are notified via email. After incorporating the
changes in code, you can mark each of the inline comments as 'done'
@@ -313,8 +311,9 @@ changes in code, you can mark each of the inline comments as 'done'
commits in the same branch with -
```console
# git commit -a -s
git commit -a -s
```
Push the commit by executing rfc.sh. If your previous push was an "rfc"
push (i.e, without a Issue Id) you will be prompted for a Issue Id
again. You can re-push an rfc change without any other code change too
@@ -332,8 +331,7 @@ comments can be made on the new patch as well, and the same cycle repeats.
If no further changes are necessary, the reviewer can approve the patch.
##Submission Qualifiers
-----------------------
## Submission Qualifiers
GlusterFS project follows 'Squash and Merge' method.
@@ -350,8 +348,7 @@ The project maintainer will merge the changes once a patch
meets these qualifiers. If you feel there is delay, feel free
to add a comment, discuss the same in Slack channel, or send email.
##Submission Disqualifiers
--------------------------
## Submission Disqualifiers
- +2 : is equivalent to "Approve" from the people in the maintainer's group.
- +1 : can be given by a maintainer/reviewer by explicitly stating that in the comment.

View File

@@ -2,8 +2,8 @@
Fixing easy issues is an excellent method to start contributing patches to Gluster.
Sometimes an *Easy Fix* issue has a patch attached. In those cases,
the *Patch* keyword has been added to the bug. These bugs can be
Sometimes an _Easy Fix_ issue has a patch attached. In those cases,
the _Patch_ keyword has been added to the bug. These bugs can be
used by new contributors that would like to verify their workflow. [Bug
1099645](https://bugzilla.redhat.com/1099645) is one example of those.
@@ -11,12 +11,12 @@ All such issues can be found [here](https://github.com/gluster/glusterfs/labels/
### Guidelines for new comers
- While trying to write a patch, do not hesitate to ask questions.
- If something in the documentation is unclear, we do need to know so
that we can improve it.
- There are no stupid questions, and it's more stupid to not ask
questions that others can easily answer. Always assume that if you
have a question, someone else would like to hear the answer too.
- While trying to write a patch, do not hesitate to ask questions.
- If something in the documentation is unclear, we do need to know so
that we can improve it.
- There are no stupid questions, and it's more stupid to not ask
questions that others can easily answer. Always assume that if you
have a question, someone else would like to hear the answer too.
[Reach out](https://www.gluster.org/community/) to the developers
in #gluster on [Gluster Slack](https://gluster.slack.com) channel, or on

View File

@@ -1,7 +1,6 @@
Static Code Analysis Tools
--------------------------
## Static Code Analysis Tools
Bug fixes for issues reported by *Static Code Analysis Tools* should
Bug fixes for issues reported by _Static Code Analysis Tools_ should
follow [Development Work Flow](./Development-Workflow.md)
### Coverity
@@ -9,49 +8,48 @@ follow [Development Work Flow](./Development-Workflow.md)
GlusterFS is part of [Coverity's](https://scan.coverity.com/) scan
program.
- To see Coverity issues you have to be a member of the GlusterFS
project in Coverity scan website.
- Here is the link to [Coverity scan website](https://scan.coverity.com/projects/987)
- Go to above link and subscribe to GlusterFS project (as
contributor). It will send a request to Admin for including you in
the Project.
- Once admins for the GlusterFS Coverity scan approve your request,
you will be able to see the defects raised by Coverity.
- [Issue #1060](https://github.com/gluster/glusterfs/issues/1060)
can be used as a umbrella bug for Coverity issues in master
branch unless you are trying to fix a specific issue.
- When you decide to work on some issue, please assign it to your name
in the same Coverity website. So that we don't step on each others
work.
- When marking a bug intentional in Coverity scan website, please put
an explanation for the same. So that it will help others to
understand the reasoning behind it.
- To see Coverity issues you have to be a member of the GlusterFS
project in Coverity scan website.
- Here is the link to [Coverity scan website](https://scan.coverity.com/projects/987)
- Go to above link and subscribe to GlusterFS project (as
contributor). It will send a request to Admin for including you in
the Project.
- Once admins for the GlusterFS Coverity scan approve your request,
you will be able to see the defects raised by Coverity.
- [Issue #1060](https://github.com/gluster/glusterfs/issues/1060)
can be used as a umbrella bug for Coverity issues in master
branch unless you are trying to fix a specific issue.
- When you decide to work on some issue, please assign it to your name
in the same Coverity website. So that we don't step on each others
work.
- When marking a bug intentional in Coverity scan website, please put
an explanation for the same. So that it will help others to
understand the reasoning behind it.
*If you have more questions please send it to
_If you have more questions please send it to
[gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel) mailing
list*
list_
### CPP Check
Cppcheck is available in Fedora and EL's EPEL repo
- Install Cppcheck
- Install Cppcheck
# dnf install cppcheck
dnf install cppcheck
- Clone GlusterFS code
- Clone GlusterFS code
# git clone https://github.com/gluster/glusterfs
git clone https://github.com/gluster/glusterfs
- Run Cpp check
# cppcheck glusterfs/ 2>cppcheck.log
- Run Cpp check
cppcheck glusterfs/ 2>cppcheck.log
### Clang-Scan Daily Runs
We have daily runs of static source code analysis tool clang-scan on
the glusterfs sources. There are daily analyses of the master and
the glusterfs sources. There are daily analyses of the master and
on currently supported branches.
Results are posted at

View File

@@ -3,9 +3,7 @@
This page contains a list of project ideas which will be suitable for
students (for GSOC, internship etc.)
Projects/Features which needs contributors
------------------------------------------
## Projects/Features which needs contributors
### RIO
@@ -13,27 +11,23 @@ Issue: https://github.com/gluster/glusterfs/issues/243
This is a new distribution logic, which can scale Gluster to 1000s of nodes.
### Composition xlator for small files
Merge small files into a designated large file using our own custom
semantics. This can improve our small file performance.
### Path based geo-replication
Issue: https://github.com/gluster/glusterfs/issues/460
This would allow remote volume to be of different type (NFS/S3 etc etc) too.
### Project Quota support
Issue: https://github.com/gluster/glusterfs/issues/184
This will make Gluster's Quota faster, and also provide desired behavior.
### Cluster testing framework based on gluster-tester
Repo: https://github.com/aravindavk/gluster-tester

View File

@@ -1,5 +1,4 @@
Simplified development workflow for GlusterFS
=============================================
# Simplified development workflow for GlusterFS
This page gives a simplified model of the development workflow used by
the GlusterFS project. This will give the steps required to get a patch
@@ -8,8 +7,7 @@ accepted into the GlusterFS source.
Visit [Development Work Flow](./Development-Workflow.md) a more
detailed description of the workflow.
##Initial preparation
---------------------
## Initial preparation
The GlusterFS development workflow revolves around
[GitHub](http://github.com/gluster/glusterfs/) and
@@ -17,13 +15,15 @@ The GlusterFS development workflow revolves around
Using these both tools requires some initial preparation.
### Get the source
Git clone the GlusterFS source using
```console
git clone git@github.com:${username}/glusterfs.git
cd glusterfs/
git remote add upstream git@github.com:gluster/glusterfs.git
```{ .console .no-copy }
git clone git@github.com:${username}/glusterfs.git
cd glusterfs/
git remote add upstream git@github.com:gluster/glusterfs.git
```
This will clone the GlusterFS source into a subdirectory named glusterfs
with the devel branch checked out.
@@ -34,7 +34,7 @@ distribution specific package manger to install git. After installation
configure git. At the minimum, set a git user email. To set the email
do,
```console
```{ .console .no-copy }
git config --global user.name <name>
git config --global user.email <email address>
```
@@ -43,8 +43,7 @@ Next, install the build requirements for GlusterFS. Refer
[Building GlusterFS - Build Requirements](./Building-GlusterFS.md#Build Requirements)
for the actual requirements.
##Actual development
--------------------
## Actual development
The commands in this section are to be run inside the glusterfs source
directory.
@@ -55,23 +54,25 @@ It is recommended to use separate local development branches for each
change you want to contribute to GlusterFS. To create a development
branch, first checkout the upstream branch you want to work on and
update it. More details on the upstream branching model for GlusterFS
can be found at [Development Work Flow - Branching\_policy](./Development-Workflow.md#branching-policy).
can be found at [Development Work Flow - Branching_policy](./Development-Workflow.md#branching-policy).
For example if you want to develop on the devel branch,
```console
# git checkout devel
# git pull
git checkout devel
git pull
```
Now, create a new branch from devel and switch to the new branch. It is
recommended to have descriptive branch names. Do,
```console
```{ .console .no-copy }
git branch issueNNNN
git checkout issueNNNN
```
or,
```console
```{ .console .no-copy }
git checkout -b issueNNNN upstream/main
```
@@ -100,8 +101,8 @@ working GlusterFS installation and needs to be run as root. To run the
regression test suite, do
```console
# make install
# ./run-tests.sh
make install
./run-tests.sh
```
or, After uploading the patch The regression tests would be triggered
@@ -113,7 +114,7 @@ If you haven't broken anything, you can now commit your changes. First
identify the files that you modified/added/deleted using git-status and
stage these files.
```console
```{ .console .no-copy }
git status
git add <list of modified files>
```
@@ -121,7 +122,7 @@ git add <list of modified files>
Now, commit these changes using
```console
# git commit -s
git commit -s
```
Provide a meaningful commit message. The commit message policy is
@@ -134,18 +135,19 @@ sign-off the commit with your configured email.
To submit your change for review, run the rfc.sh script,
```console
# ./rfc.sh
./rfc.sh
```
or
```console
```{ .console .no-copy }
git push origin HEAD:issueNNN
```
More details on the rfc.sh script are available at
[Development Work Flow - rfc.sh](./Development-Workflow.md#rfc.sh).
##Review process
----------------
## Review process
Your change will now be reviewed by the GlusterFS maintainers and
component owners. You can follow and take part in the review process
@@ -186,8 +188,9 @@ review comments. Build and test to see if the new changes are working.
Stage your changes and commit your new changes in new commits using,
```console
# git commit -a -s
git commit -a -s
```
Now you can resubmit the commit for review using the rfc.sh script or git push.
The formal review process could take a long time. To increase chances

View File

@@ -1,5 +1,4 @@
How to compile GlusterFS RPMs from git source, for RHEL/CentOS, and Fedora
--------------------------------------------------------------------------
## How to compile GlusterFS RPMs from git source, for RHEL/CentOS, and Fedora
Creating rpm's of GlusterFS from git source is fairly easy, once you know the steps.
@@ -21,13 +20,13 @@ Specific instructions for compiling are below. If you're using:
### Preparation steps for Fedora 16-20 (only)
1. Install gcc, the python development headers, and python setuptools:
1. Install gcc, the python development headers, and python setuptools:
# sudo yum -y install gcc python-devel python-setuptools
sudo yum -y install gcc python-devel python-setuptools
2. If you're compiling GlusterFS version 3.4, then install python-swiftclient. Other GlusterFS versions don't need it:
2. If you're compiling GlusterFS version 3.4, then install python-swiftclient. Other GlusterFS versions don't need it:
# sudo easy_install simplejson python-swiftclient
sudo easy_install simplejson python-swiftclient
Now follow through with the **Common Steps** part below.
@@ -35,15 +34,15 @@ Now follow through with the **Common Steps** part below.
You'll need EPEL installed first and some CentOS-specific packages. The commands below will get that done for you. After that, follow through the "Common steps" section.
1. Install EPEL first:
1. Install EPEL first:
# curl -OL `[`http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm`](http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm)
# sudo yum -y install epel-release-5-4.noarch.rpm --nogpgcheck
curl -OL http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
sudo yum -y install epel-release-5-4.noarch.rpm --nogpgcheck
2. Install the packages required only on CentOS 5.x:
2. Install the packages required only on CentOS 5.x:
# sudo yum -y install buildsys-macros gcc ncurses-devel \
python-ctypes python-sphinx10 redhat-rpm-config
sudo yum -y install buildsys-macros gcc ncurses-devel \
python-ctypes python-sphinx10 redhat-rpm-config
Now follow through with the **Common Steps** part below.
@@ -51,32 +50,31 @@ Now follow through with the **Common Steps** part below.
You'll need EPEL installed first and some CentOS-specific packages. The commands below will get that done for you. After that, follow through the "Common steps" section.
1. Install EPEL first:
1. Install EPEL first:
# sudo yum -y install `[`http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm`](http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm)
sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
2. Install the packages required only on CentOS:
2. Install the packages required only on CentOS:
# sudo yum -y install python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config
sudo yum -y install python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config
Now follow through with the **Common Steps** part below.
### Preparation steps for CentOS 8.x (only)
You'll need EPEL installed and then the powertools package enabled.
You'll need EPEL installed and then the powertools package enabled.
1. Install EPEL first:
# sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
1. Install EPEL first:
2. Enable the PowerTools repo and install CentOS 8.x specific packages for building the rpms.
sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
# sudo yum --enablerepo=PowerTools install automake autoconf libtool flex bison openssl-devel \
libxml2-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel \
glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel \
redhat-rpm-config rpcgen libtirpc-devel make python3-devel rsync libuuid-devel \
rpm-build dbench perl-Test-Harness attr libcurl-devel selinux-policy-devel -y
2. Enable the PowerTools repo and install CentOS 8.x specific packages for building the rpms.
sudo yum --enablerepo=PowerTools install automake autoconf libtool flex bison openssl-devel \
libxml2-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel \
glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel \
redhat-rpm-config rpcgen libtirpc-devel make python3-devel rsync libuuid-devel \
rpm-build dbench perl-Test-Harness attr libcurl-devel selinux-policy-devel -y
Now follow through from Point 2 in the **Common Steps** part below.
@@ -84,14 +82,14 @@ Now follow through from Point 2 in the **Common Steps** part below.
You'll need EPEL installed first and some RHEL specific packages. The 2 commands below will get that done for you. After that, follow through the "Common steps" section.
1. Install EPEL first:
1. Install EPEL first:
# sudo yum -y install `[`http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm`](http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm)
sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
2. Install the packages required only on RHEL:
2. Install the packages required only on RHEL:
# sudo yum -y --enablerepo=rhel-6-server-optional-rpms install python-webob1.0 \
python-paste-deploy1.5 python-sphinx10 redhat-rpm-config
sudo yum -y --enablerepo=rhel-6-server-optional-rpms install python-webob1.0 \
python-paste-deploy1.5 python-sphinx10 redhat-rpm-config
Now follow through with the **Common Steps** part below.
@@ -104,64 +102,65 @@ These steps are for both Fedora and RHEL/CentOS. At the end you'll have the comp
- If you're on RHEL/CentOS 5.x and get a message about lvm2-devel not being available, it's ok. You can ignore it. :)
- If you're on RHEL/CentOS 6.x and get any messages about python-eventlet, python-netifaces, python-sphinx and/or pyxattr not being available, it's ok. You can ignore them. :)
- If you're on CentOS 8.x, you can skip step 1 and start from step 2. Also, for CentOS 8.x, the steps have been
tested for the master branch. It is unknown if it would work for older branches.
tested for the master branch. It is unknown if it would work for older branches.
<br/>
1. Install the needed packages
1. Install the needed packages
# sudo yum -y --disablerepo=rhs* --enablerepo=*optional-rpms install git autoconf \
automake bison dos2unix flex fuse-devel glib2-devel libaio-devel \
libattr-devel libibverbs-devel librdmacm-devel libtool libxml2-devel lvm2-devel make \
openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces \
python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel \
rpm-build systemtap-sdt-devel tar libcmocka-devel
sudo yum -y --disablerepo=rhs* --enablerepo=*optional-rpms install git autoconf \
automake bison dos2unix flex fuse-devel glib2-devel libaio-devel \
libattr-devel libibverbs-devel librdmacm-devel libtool libxml2-devel lvm2-devel make \
openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces \
python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel \
rpm-build systemtap-sdt-devel tar libcmocka-devel
2. Clone the GlusterFS git repository
2. Clone the GlusterFS git repository
# git clone `[`git://git.gluster.org/glusterfs`](git://git.gluster.org/glusterfs)
# cd glusterfs
git clone git://git.gluster.org/glusterfs
cd glusterfs
3. Choose which branch to compile
3. Choose which branch to compile
If you want to compile the latest development code, you can skip this step and go on to the next one. :)
If instead, you want to compile the code for a specific release of GlusterFS (such as v3.4), get the list of release names here:
# git branch -a | grep release
remotes/origin/release-2.0
remotes/origin/release-3.0
remotes/origin/release-3.1
remotes/origin/release-3.2
remotes/origin/release-3.3
remotes/origin/release-3.4
remotes/origin/release-3.5
# git branch -a | grep release
remotes/origin/release-2.0
remotes/origin/release-3.0
remotes/origin/release-3.1
remotes/origin/release-3.2
remotes/origin/release-3.3
remotes/origin/release-3.4
remotes/origin/release-3.5
Then switch to the correct release using the git "checkout" command, and the name of the release after the "remotes/origin/" bit from the list above:
# git checkout release-3.4
git checkout release-3.4
**NOTE -** The CentOS 5.x instructions have only been tested for the master branch in GlusterFS git. It is unknown (yet) if they work for branches older than release-3.5.
---
If you are compiling the latest development code you can skip steps **4** and **5**. Instead, you can run the below command and you will get the RPMs.
***
# extras/LinuxRPM/make_glusterrpms
---
If you are compiling the latest development code you can skip steps **4** and **5**. Instead, you can run the below command and you will get the RPMs.
4. Configure and compile GlusterFS
extras/LinuxRPM/make_glusterrpms
***
4. Configure and compile GlusterFS
Now you're ready to compile Gluster:
# ./autogen.sh
# ./configure --enable-fusermount
# make dist
./autogen.sh
./configure --enable-fusermount
make dist
5. Create the GlusterFS RPMs
5. Create the GlusterFS RPMs
# cd extras/LinuxRPM
# make glusterrpms
cd extras/LinuxRPM
make glusterrpms
That should complete with no errors, leaving you with a directory containing the RPMs.

View File

@@ -1,47 +1,52 @@
# Get core dump on a customer set up without killing the process
### Why do we need this?
Finding the root cause of an issue that occurred in the customer/production setup is a challenging task.
Most of the time we cannot replicate/setup the environment and scenario which is leading to the issue on
our test setup. In such cases, we got to grab most of the information from the system where the problem
has occurred.
<br>
### What information we look for and also useful?
The information like a core dump is very helpful to catch the root cause of an issue by adding ASSERT() in
the code at the places where we feel something is wrong and install the custom build on the affected setup.
But the issue is ASSERT() would kill the process and produce the core dump.
<br>
### Is it a good idea to do ASSERT() on customer setup?
Remember we are seeking help from customer setup, they unlikely agree to kill the process and produce the
Remember we are seeking help from customer setup, they unlikely agree to kill the process and produce the
core dump for us to root cause it. It affects the customers business and nobody agrees with this proposal.
<br>
### What if we have a way to produce a core dump without a kill?
Yes, Glusterfs provides a way to do this. Gluster has customized ASSERT() i.e GF_ASSERT() in place which helps
in producing the core dump without killing the associated process and also provides a script which can be run on
the customer set up that produces the core dump without harming the running process (This presumes we already have
GF_ASSERT() at the expected place in the current build running on customer setup. If not, we need to install custom
Yes, Glusterfs provides a way to do this. Gluster has customized ASSERT() i.e GF_ASSERT() in place which helps
in producing the core dump without killing the associated process and also provides a script which can be run on
the customer set up that produces the core dump without harming the running process (This presumes we already have
GF_ASSERT() at the expected place in the current build running on customer setup. If not, we need to install custom
build on that setup by adding GF_ASSERT()).
<br>
### Is GF_ASSERT() newly introduced in Gluster code?
No. GF_ASSERT() is already there in the codebase before this improvement. In the debug build, GF_ASSERT() kills the
process and produces the core dump but in the production build, it just logs the error and moves on. What we have done
is we just changed the implementation of the code and now in production build also we get the core dump but the process
No. GF_ASSERT() is already there in the codebase before this improvement. In the debug build, GF_ASSERT() kills the
process and produces the core dump but in the production build, it just logs the error and moves on. What we have done
is we just changed the implementation of the code and now in production build also we get the core dump but the process
wont be killed. The code places where GF_ASSERT() is not covered, please add it as per the requirement.
<br>
## Here are the steps to achieve the goal:
- Add GF_ASSERT() in the Gluster code path where you expect something wrong is happening.
- Build the Gluster code, install and mount the Gluster volume (For detailed steps refer: Gluster quick start guide).
- Now, in the other terminal, run the gfcore.py script
`# ./extras/debug/gfcore.py $PID 1 /tmp/` (PID of the gluster process you are interested in, got it by `ps -ef | grep gluster`
in the previous step. For more details, check `# ./extras/debug/gfcore.py --help`)
- Hit the code path where you have introduced GF_ASSERT(). If GF_ASSERT() is in fuse_write() path, you can hit the code
path by writing on to a file present under Gluster moun. Ex: `# dd if=/dev/zero of=/mnt/glustrefs/abcd bs=1M count=1`
where `/mnt/glusterfs` is the gluster mount
- Go to the terminal where the gdb is running (step 3) and observe that the gdb process is terminated
- Go to the directory where the core-dump is produced. Default would be present working directory.
- Access the core dump using gdb Ex: `# gdb -ex "core-file $GFCORE_FILE" $GLUSTER_BINARY`
(1st arg would be core file name and 2nd arg is o/p of file command in the previous step)
- Observe that the Gluster process is unaffected by checking its process state. Check pid status using `ps -ef | grep gluster`
<br>
Thanks, Xavi Hernandez(jahernan@redhat.com) for the idea. This will ease many Gluster developer's/maintainers life.
- Add GF_ASSERT() in the Gluster code path where you expect something wrong is happening.
- Build the Gluster code, install and mount the Gluster volume (For detailed steps refer: Gluster quick start guide).
- Now, in the other terminal, run the gfcore.py script
`# ./extras/debug/gfcore.py $PID 1 /tmp/` (PID of the gluster process you are interested in, got it by `ps -ef | grep gluster`
in the previous step. For more details, check `# ./extras/debug/gfcore.py --help`)
- Hit the code path where you have introduced GF_ASSERT(). If GF_ASSERT() is in fuse_write() path, you can hit the code
path by writing on to a file present under Gluster moun. Ex: `# dd if=/dev/zero of=/mnt/glustrefs/abcd bs=1M count=1`
where `/mnt/glusterfs` is the gluster mount
- Go to the terminal where the gdb is running (step 3) and observe that the gdb process is terminated
- Go to the directory where the core-dump is produced. Default would be present working directory.
- Access the core dump using gdb Ex: `# gdb -ex "core-file $GFCORE_FILE" $GLUSTER_BINARY`
(1st arg would be core file name and 2nd arg is o/p of file command in the previous step)
- Observe that the Gluster process is unaffected by checking its process state. Check pid status using `ps -ef | grep gluster`
Thanks, Xavi Hernandez(jahernan@redhat.com) for the idea. This will ease many Gluster developer's/maintainers life.

View File

@@ -1,5 +1,4 @@
GlusterFS Tools
---------------
## GlusterFS Tools
- [glusterfind](./glusterfind.md)
- [gfind missing files](./gfind-missing-files.md)
- [glusterfind](./glusterfind.md)
- [gfind missing files](./gfind-missing-files.md)

View File

@@ -54,15 +54,15 @@ bash gfid_to_path.sh <BRICK_PATH> <GFID_FILE>
## Things to keep in mind when running the tool
1. Running this tool can result in a crawl of the backend filesystem at each
brick which can be intensive. To ensure there is no impact on ongoing I/O on
RHS volumes, we recommend that this tool be run at a low I/O scheduling class
(best-effort) and priority.
1. Running this tool can result in a crawl of the backend filesystem at each
brick which can be intensive. To ensure there is no impact on ongoing I/O on
RHS volumes, we recommend that this tool be run at a low I/O scheduling class
(best-effort) and priority.
ionice -c 2 -p <pid of gfind_missing_files.sh>
ionice -c 2 -p <pid of gfind_missing_files.sh>
2. We do not recommend interrupting the tool when it is running
(e.g. by doing CTRL^C). It is better to wait for the tool to finish
2. We do not recommend interrupting the tool when it is running
(e.g. by doing CTRL^C). It is better to wait for the tool to finish
execution. In case it is interrupted, manually unmount the Slave Volume.
umount <MOUNT_POINT>
umount <MOUNT_POINT>

View File

@@ -6,11 +6,23 @@ This tool should be run in one of the node, which will get Volume info and gets
## Session Management
Create a glusterfind session to remember the time when last sync or processing complete. For example, your backup application runs every day and gets incremental results on each run. The tool maintains session in `$GLUSTERD_WORKDIR/glusterfind/`, for each session it creates and directory and creates a sub directory with Volume name. (Default working directory is /var/lib/glusterd, in some systems this location may change. To find Working dir location run `grep working-directory /etc/glusterfs/glusterd.vol` or `grep working-directory /usr/local/etc/glusterfs/glusterd.vol` if source install)
Create a glusterfind session to remember the time when last sync or processing complete. For example, your backup application runs every day and gets incremental results on each run. The tool maintains session in `$GLUSTERD_WORKDIR/glusterfind/`, for each session it creates and directory and creates a sub directory with Volume name. (Default working directory is /var/lib/glusterd, in some systems this location may change. To find Working dir location run
```console
grep working-directory /etc/glusterfs/glusterd.vol
```
or
```console
grep working-directory /usr/local/etc/glusterfs/glusterd.vol
```
if you installed from the source.
For example, if the session name is "backup" and volume name is "datavol", then the tool creates `$GLUSTERD_WORKDIR/glusterfind/backup/datavol`. Now onwards we refer this directory as `$SESSION_DIR`.
```text
```{ .text .no-copy }
create => pre => post => [delete]
```
@@ -34,13 +46,13 @@ Incremental find uses Changelogs to get the list of GFIDs modified/created. Any
If we set build-pgfid option in Volume GlusterFS starts recording each files parent directory GFID as xattr in file on any ENTRY fop.
```text
```{ .text .no-copy }
trusted.pgfid.<GFID>=NUM_LINKS
```
To convert from GFID to path, we can mount Volume with aux-gfid-mount option, and get Path information by a getfattr query.
```console
```{ .console .no-copy }
getfattr -n glusterfs.ancestry.path -e text /mnt/datavol/.gfid/<GFID>
```
@@ -54,7 +66,7 @@ Tool collects the list of GFIDs failed to convert with above method and does a f
### Create the session
```console
```{ .console .no-copy }
glusterfind create SESSION_NAME VOLNAME [--force]
glusterfind create --help
```
@@ -63,7 +75,7 @@ Where, SESSION_NAME is any name without space to identify when run second time.
Examples,
```console
```{ .console .no-copy }
# glusterfind create --help
# glusterfind create backup datavol
# glusterfind create antivirus_scanner datavol
@@ -72,7 +84,7 @@ Examples,
### Pre Command
```console
```{ .console .no-copy }
glusterfind pre SESSION_NAME VOLUME_NAME OUTFILE
glusterfind pre --help
```
@@ -83,7 +95,7 @@ To trigger the full find, call the pre command with `--full` argument. Multiple
Examples,
```console
```{ .console .no-copy }
# glusterfind pre backup datavol /root/backup.txt
# glusterfind pre backup datavol /root/backup.txt --full
@@ -97,27 +109,27 @@ Examples,
Output file contains list of files/dirs relative to the Volume mount, if we need to prefix with any path to have absolute path then,
```console
# glusterfind pre backup datavol /root/backup.txt --file-prefix=/mnt/datavol/
glusterfind pre backup datavol /root/backup.txt --file-prefix=/mnt/datavol/
```
### List Command
To get the list of sessions and respective session time,
```console
```{ .console .no-copy }
glusterfind list [--session SESSION_NAME] [--volume VOLUME_NAME]
```
Examples,
```console
```{ .console .no-copy }
# glusterfind list
# glusterfind list --session backup
```
Example output,
```console
```{ .text .no-copy }
SESSION VOLUME SESSION TIME
---------------------------------------------------------------------------
backup datavol 2015-03-04 17:35:34
@@ -125,26 +137,26 @@ backup datavol 2015-03-04 17:35:34
### Post Command
```console
```{ .console .no-copy }
glusterfind post SESSION_NAME VOLUME_NAME
```
Examples,
```console
# glusterfind post backup datavol
glusterfind post backup datavol
```
### Delete Command
```console
```{ .console .no-copy }
glusterfind delete SESSION_NAME VOLUME_NAME
```
Examples,
```console
# glusterfind delete backup datavol
glusterfind delete backup datavol
```
## Adding more Crawlers
@@ -170,7 +182,7 @@ Custom crawler can be executable script/binary which accepts volume name, brick
For example,
```console
```{ .console .no-copy }
/root/parallelbrickcrawl SESSION_NAME VOLUME BRICK_PATH OUTFILE START_TIME [--debug]
```

View File

@@ -3,6 +3,7 @@
For the Gluster to communicate within a cluster either the firewalls
have to be turned off or enable communication for each server.
```{ .console .no-copy }
iptables -I INPUT -p all -s `<ip-address>` -j ACCEPT
```
@@ -115,14 +116,12 @@ Brick3: node03.yourdomain.net:/export/sdb1/brick
```
This shows us essentially what we just specified during the volume
creation. The one this to mention is the `Status`. A status of `Created`
means that the volume has been created, but hasnt yet been started,
which would cause any attempt to mount the volume fail.
creation. The one key output worth noticing is `Status`.
A status of `Created` means that the volume has been created,
but hasnt yet been started, which would cause any attempt to mount the volume fail.
Now, we should start the volume.
Now, we should start the volume before we try to mount it.
```console
gluster volume start gv0
```
Find all documentation [here](../index.md)

View File

@@ -6,7 +6,7 @@ planning but the growth has mostly been ad-hoc and need-based.
Central to the plan of revitalizing the Gluster.org community is the ability to
provide well-maintained infrastructure services with predictable uptimes and
resilience. We're migrating the existing services into the Community Cage. The
resilience. We're migrating the existing services into the Community Cage. The
implied objective is that the transition would open up ways and means of the
formation of a loose coalition among Infrastructure Administrators who provide
expertise and guidance to the community projects within the OSAS team.

View File

@@ -1,23 +1,24 @@
## Tools We Use
| Service/Tool | Purpose | Hosted At |
|----------------------|----------------------------------------------------|-----------------|
| Github | Code Review | Github |
| Jenkins | CI, build-verification-test | Temporary Racks |
| Backups | Website, Gerrit and Jenkins backup | Rackspace |
| Docs | Documentation content | mkdocs.org |
| download.gluster.org | Official download site of the binaries | Rackspace |
| Mailman | Lists mailman | Rackspace |
| www.gluster.org | Web asset | Rackspace |
| Service/Tool | Purpose | Hosted At |
| :------------------- | :------------------------------------: | --------------: |
| Github | Code Review | Github |
| Jenkins | CI, build-verification-test | Temporary Racks |
| Backups | Website, Gerrit and Jenkins backup | Rackspace |
| Docs | Documentation content | mkdocs.org |
| download.gluster.org | Official download site of the binaries | Rackspace |
| Mailman | Lists mailman | Rackspace |
| www.gluster.org | Web asset | Rackspace |
## Notes
* download.gluster.org: Resiliency is important for availability and metrics.
- download.gluster.org: Resiliency is important for availability and metrics.
Since it's official download, access need to restricted as much as possible.
Few developers building the community packages have access. If anyone requires
access can raise an issue at [gluster/project-infrastructure](https://github.com/gluster/project-infrastructure/issues/new)
with valid reason
* Mailman: Should be migrated to a separate host. Should be made more redundant
- Mailman: Should be migrated to a separate host. Should be made more redundant
(ie, more than 1 MX).
* www.gluster.org: Framework, Artifacts now exist under gluster.github.com. Has
- www.gluster.org: Framework, Artifacts now exist under gluster.github.com. Has
various legacy installation of software (mediawiki, etc ), being cleaned as
we find them.

View File

@@ -1,9 +1,8 @@
Troubleshooting Guide
---------------------
## Troubleshooting Guide
This guide describes some commonly seen issues and steps to recover from them.
If that doesnt help, reach out to the [Gluster community](https://www.gluster.org/community/), in which case the guide also describes what information needs to be provided in order to debug the issue. At minimum, we need the version of gluster running and the output of `gluster volume info`.
### Where Do I Start?
Is the issue already listed in the component specific troubleshooting sections?
@@ -15,7 +14,6 @@ Is the issue already listed in the component specific troubleshooting sections?
- [Gluster NFS Issues](./troubleshooting-gnfs.md)
- [File Locks](./troubleshooting-filelocks.md)
If that didn't help, here is how to debug further.
Identifying the problem and getting the necessary information to diagnose it is the first step in troubleshooting your Gluster setup. As Gluster operations involve interactions between multiple processes, this can involve multiple steps.
@@ -25,5 +23,3 @@ Identifying the problem and getting the necessary information to diagnose it is
- An operation failed
- [High Memory Usage](./troubleshooting-memory.md)
- [A Gluster process crashed](./gluster-crash.md)

View File

@@ -8,24 +8,26 @@ normal filesystem. The GFID of a file is stored in its xattr named
#### Special mount using gfid-access translator:
```console
# mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
```
Assuming, you have `GFID` of a file from changelog (or somewhere else).
For trying this out, you can get `GFID` of a file from mountpoint:
```console
# getfattr -n glusterfs.gfid.string /mnt/testvol/dir/file
getfattr -n glusterfs.gfid.string /mnt/testvol/dir/file
```
---
### Get file path from GFID (Method 1):
**(Lists hardlinks delimited by `:`, returns path as seen from mountpoint)**
#### Turn on build-pgfid option
```console
# gluster volume set test build-pgfid on
gluster volume set test build-pgfid on
```
Read virtual xattr `glusterfs.ancestry.path` which contains the file path
@@ -36,7 +38,7 @@ getfattr -n glusterfs.ancestry.path -e text /mnt/testvol/.gfid/<GFID>
**Example:**
```console
```{ .console .no-copy }
[root@vm1 glusterfs]# ls -il /mnt/testvol/dir/
total 1
10610563327990022372 -rw-r--r--. 2 root root 3 Jul 17 18:05 file
@@ -54,6 +56,7 @@ glusterfs.ancestry.path="/dir/file:/dir/file3"
```
### Get file path from GFID (Method 2):
**(Does not list all hardlinks, returns backend brick path)**
```console
@@ -70,4 +73,5 @@ trusted.glusterfs.pathinfo="(<DISTRIBUTE:test-dht> <POSIX(/mnt/brick-test/b):vm1
```
#### References and links:
[posix: placeholders for GFID to path conversion](http://review.gluster.org/5951)

View File

@@ -1,13 +1,11 @@
Debugging a Crash
=================
# Debugging a Crash
To find out why a Gluster process terminated abruptly, we need the following:
* A coredump of the process that crashed
* The exact version of Gluster that is running
* The Gluster log files
* the output of `gluster volume info`
* Steps to reproduce the crash if available
- A coredump of the process that crashed
- The exact version of Gluster that is running
- The Gluster log files
- the output of `gluster volume info`
- Steps to reproduce the crash if available
Contact the [community](https://www.gluster.org/community/) with this information or [open an issue](https://github.com/gluster/glusterfs/issues/new)

View File

@@ -1,6 +1,6 @@
Heal info and split-brain resolution
=======================================
This document explains the heal info command available in gluster for monitoring pending heals in replicate volumes and the methods available to resolve split-brains.
# Heal info and split-brain resolution
This document explains the heal info command available in gluster for monitoring pending heals in replicate volumes and the methods available to resolve split-brains.
## Types of Split-Brains:
@@ -9,26 +9,27 @@ is the correct one.
There are three types of split-brains:
- Data split-brain: The data in the file differs on the bricks in the replica set
- Metadata split-brain: The metadata differs on the bricks
- Entry split-brain: The GFID of the file is different on the bricks in the replica or the type of the file is different on the bricks in the replica. Type-mismatch cannot be healed using any of the split-brain resolution methods while gfid split-brains can be.
- Data split-brain: The data in the file differs on the bricks in the replica set
- Metadata split-brain: The metadata differs on the bricks
- Entry split-brain: The GFID of the file is different on the bricks in the replica or the type of the file is different on the bricks in the replica. Type-mismatch cannot be healed using any of the split-brain resolution methods while gfid split-brains can be.
## 1) Volume heal info:
Usage: `gluster volume heal <VOLNAME> info`
This lists all the files that require healing (and will be processed by the self-heal daemon). It prints either their path or their GFID.
### Interpreting the output
All the files listed in the output of this command need to be healed.
The files listed may also be accompanied by the following tags:
a) 'Is in split-brain'
A file in data or metadata split-brain will
be listed with " - Is in split-brain" appended after its path/GFID. E.g.
A file in data or metadata split-brain will
be listed with " - Is in split-brain" appended after its path/GFID. E.g.
"/file4" in the output provided below. However, for a file in GFID split-brain,
the parent directory of the file is shown to be in split-brain and the file
itself is shown to be needing healing, e.g. "/dir" in the output provided below
the parent directory of the file is shown to be in split-brain and the file
itself is shown to be needing healing, e.g. "/dir" in the output provided below
is in split-brain because of GFID split-brain of file "/dir/a".
Files in split-brain cannot be healed without resolving the split-brain.
@@ -36,11 +37,13 @@ b) 'Is possibly undergoing heal'
When the heal info command is run, it (or to be more specific, the 'glfsheal' binary that is executed when you run the command) takes locks on each file to find if it needs healing. However, if the self-heal daemon had already started healing the file, it would have taken locks which glfsheal wouldn't be able to acquire. In such a case, it could print this message. Another possible case could be multiple glfsheal processes running simultaneously (e.g. multiple users ran a heal info command at the same time) and competing for same lock.
The following is an example of heal info command's output.
### Example
Consider a replica volume "test" with two bricks b1 and b2;
self-heal daemon off, mounted at /mnt.
```console
```{ .console .no-copy }
# gluster volume heal test info
Brick \<hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> - Is in split-brain
@@ -63,24 +66,27 @@ Number of entries: 6
```
### Analysis of the output
It can be seen that
A) from brick b1, four entries need healing:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1) file with gfid:6dc78b20-7eb6-49a3-8edb-087b90142246 needs healing
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2) "aaca219f-0e25-4576-8689-3bfd93ca70c2",
"39f301ae-4038-48c2-a889-7dac143e82dd" and "c3c94de2-232d-4083-b534-5da17fc476ac"
are in split-brain
B) from brick b2 six entries need healing-
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1) "a", "file2" and "file3" need healing
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2) "file1", "file4" & "/dir" are in split-brain
It can be seen that
A) from brick b1, four entries need healing:
- file with gfid:6dc78b20-7eb6-49a3-8edb-087b90142246 needs healing
- "aaca219f-0e25-4576-8689-3bfd93ca70c2", "39f301ae-4038-48c2-a889-7dac143e82dd" and "c3c94de2-232d-4083-b534-5da17fc476ac" are in split-brain
B) from brick b2 six entries need healing-
- "a", "file2" and "file3" need healing
- "file1", "file4" & "/dir" are in split-brain
# 2. Volume heal info split-brain
Usage: `gluster volume heal <VOLNAME> info split-brain`
This command only shows the list of files that are in split-brain. The output is therefore a subset of `gluster volume heal <VOLNAME> info`
### Example
```console
```{ .console .no-copy }
# gluster volume heal test info split-brain
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
@@ -95,19 +101,22 @@ Brick <hostname:brickpath-b2>
Number of entries in split-brain: 3
```
Note that similar to the heal info command, for GFID split-brains (same filename but different GFID)
Note that similar to the heal info command, for GFID split-brains (same filename but different GFID)
their parent directories are listed to be in split-brain.
# 3. Resolution of split-brain using gluster CLI
Once the files in split-brain are identified, their resolution can be done
from the gluster command line using various policies. Type-mismatch cannot be healed using this methods. Split-brain resolution commands let the user resolve data, metadata, and GFID split-brains.
## 3.1 Resolution of data/metadata split-brain using gluster CLI
Data and metadata split-brains can be resolved using the following policies:
## i) Select the bigger-file as source
This command is useful for per file healing where it is known/decided that the
file with bigger size is to be considered as source.
file with bigger size is to be considered as source.
`gluster volume heal <VOLNAME> split-brain bigger-file <FILE>`
Here, `<FILE>` can be either the full file name as seen from the root of the volume
(or) the GFID-string representation of the file, which sometimes gets displayed
@@ -115,13 +124,14 @@ in the heal info command's output. Once this command is executed, the replica co
size is found and healing is completed with that brick as a source.
### Example :
Consider the earlier output of the heal info split-brain command.
Before healing the file, notice file size and md5 checksums :
Before healing the file, notice file size and md5 checksums :
On brick b1:
```console
```{ .console .no-copy }
[brick1]# stat b1/dir/file1
File: b1/dir/file1
Size: 17 Blocks: 16 IO Block: 4096 regular file
@@ -138,7 +148,7 @@ Change: 2015-03-06 13:55:37.206880347 +0530
On brick b2:
```console
```{ .console .no-copy }
[brick2]# stat b2/dir/file1
File: b2/dir/file1
Size: 13 Blocks: 16 IO Block: 4096 regular file
@@ -153,7 +163,7 @@ Change: 2015-03-06 13:52:22.910758923 +0530
cb11635a45d45668a403145059c2a0d5 b2/dir/file1
```
**Healing file1 using the above command** :-
**Healing file1 using the above command** :-
`gluster volume heal test split-brain bigger-file /dir/file1`
Healed /dir/file1.
@@ -161,7 +171,7 @@ After healing is complete, the md5sum and file size on both bricks should be the
On brick b1:
```console
```{ .console .no-copy }
[brick1]# stat b1/dir/file1
File: b1/dir/file1
Size: 17 Blocks: 16 IO Block: 4096 regular file
@@ -178,7 +188,7 @@ Change: 2015-03-06 14:17:12.880343950 +0530
On brick b2:
```console
```{ .console .no-copy }
[brick2]# stat b2/dir/file1
File: b2/dir/file1
Size: 17 Blocks: 16 IO Block: 4096 regular file
@@ -195,7 +205,7 @@ Change: 2015-03-06 14:17:12.881343955 +0530
## ii) Select the file with the latest mtime as source
```console
```{ .console .no-copy }
gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
```
@@ -203,20 +213,21 @@ As is perhaps self-explanatory, this command uses the brick having the latest mo
## iii) Select one of the bricks in the replica as the source for a particular file
```console
```{ .console .no-copy }
gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>
```
Here, `<HOSTNAME:BRICKNAME>` is selected as source brick and `<FILE>` present in the source brick is taken as the source for healing.
### Example :
Notice the md5 checksums and file size before and after healing.
Before heal :
On brick b1:
```console
```{ .console .no-copy }
[brick1]# stat b1/file4
File: b1/file4
Size: 4 Blocks: 16 IO Block: 4096 regular file
@@ -233,7 +244,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
```console
```{ .console .no-copy }
[brick2]# stat b2/file4
File: b2/file4
Size: 4 Blocks: 16 IO Block: 4096 regular file
@@ -251,7 +262,7 @@ Change: 2015-03-06 13:52:35.769833142 +0530
**Healing the file with gfid c3c94de2-232d-4083-b534-5da17fc476ac using the above command** :
```console
# gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac
gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac
```
Healed gfid:c3c94de2-232d-4083-b534-5da17fc476ac.
@@ -260,7 +271,7 @@ After healing :
On brick b1:
```console
```{ .console .no-copy }
# stat b1/file4
File: b1/file4
Size: 4 Blocks: 16 IO Block: 4096 regular file
@@ -276,7 +287,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b1/file4
On brick b2:
```console
```{ .console .no-copy }
# stat b2/file4
File: b2/file4
Size: 4 Blocks: 16 IO Block: 4096 regular file
@@ -292,7 +303,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b2/file4
## iv) Select one brick of the replica as the source for all files
```console
```{ .console .no-copy }
gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
```
@@ -301,9 +312,10 @@ replica pair is source. As the result of the above command all split-brained
files in `<HOSTNAME:BRICKNAME>` are selected as source and healed to the sink.
### Example:
Consider a volume having three entries "a, b and c" in split-brain.
```console
```{ .console .no-copy }
# gluster volume heal test split-brain source-brick test-host:/test/b1
Healed gfid:944b4764-c253-4f02-b35f-0d0ae2f86c0f.
Healed gfid:3256d814-961c-4e6e-8df2-3a3143269ced.
@@ -312,19 +324,24 @@ Number of healed entries: 3
```
# 3.2 Resolution of GFID split-brain using gluster CLI
GFID split-brains can also be resolved by the gluster command line using the same policies that are used to resolve data and metadata split-brains.
## i) Selecting the bigger-file as source
This method is useful for per file healing and where you can decided that the file with bigger size is to be considered as source.
Run the following command to obtain the path of the file that is in split-brain:
```console
```{ .console .no-copy }
# gluster volume heal VOLNAME info split-brain
```
From the output, identify the files for which file operations performed from the client failed with input/output error.
### Example :
```console
```{ .console .no-copy }
# gluster volume heal testvol info
Brick 10.70.47.45:/bricks/brick2/b0
/f5
@@ -340,19 +357,22 @@ Brick 10.70.47.144:/bricks/brick2/b1
Status: Connected
Number of entries: 2
```
> **Note**
> Entries which are in GFID split-brain may not be shown as in split-brain by the heal info or heal info split-brain commands always. For entry split-brains, it is the parent directory which is shown as being in split-brain. So one might need to run info split-brain to get the dir names and then heal info to get the list of files under that dir which might be in split-brain (it could just be needing heal without split-brain).
In the above command, testvol is the volume name, b0 and b1 are the bricks.
Execute the below getfattr command on the brick to fetch information if a file is in GFID split-brain or not.
```console
```{ .console .no-copy }
# getfattr -d -e hex -m. <path-to-file>
```
### Example :
On brick /b0
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f5
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f5
@@ -364,7 +384,8 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f5
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b1/f5
@@ -379,7 +400,8 @@ You can notice the difference in GFID for the file f5 in both the bricks.
You can find the differences in the file size by executing stat command on the file from the bricks.
On brick /b0
```console
```{ .console .no-copy }
# stat /bricks/brick2/b0/f5
File: /bricks/brick2/b0/f5
Size: 15 Blocks: 8 IO Block: 4096 regular file
@@ -393,7 +415,8 @@ Birth: -
```
On brick /b1
```console
```{ .console .no-copy }
# stat /bricks/brick2/b1/f5
File: /bricks/brick2/b1/f5
Size: 2 Blocks: 8 IO Block: 4096 regular file
@@ -408,12 +431,13 @@ Birth: -
Execute the following command along with the full filename as seen from the root of the volume which is displayed in the heal info command's output:
```console
```{ .console .no-copy }
# gluster volume heal VOLNAME split-brain bigger-file FILE
```
### Example :
```console
```{ .console .no-copy }
# gluster volume heal testvol split-brain bigger-file /f5
GFID split-brain resolved for file /f5
```
@@ -421,7 +445,8 @@ GFID split-brain resolved for file /f5
After the healing is complete, the GFID of the file on both the bricks must be the same as that of the file which had the bigger size. The following is a sample output of the getfattr command after completion of healing the file.
On brick /b0
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f5
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f5
@@ -431,7 +456,8 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f5
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b1/f5
@@ -441,14 +467,16 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303
```
## ii) Selecting the file with latest mtime as source
This method is useful for per file healing and if you want the file with latest mtime has to be considered as source.
### Example :
Lets take another file which is in GFID split-brain and try to heal that using the latest-mtime option.
On brick /b0
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f4
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f4
@@ -460,7 +488,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f4
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b1/f4
@@ -475,7 +504,8 @@ You can notice the difference in GFID for the file f4 in both the bricks.
You can find the difference in the modification time by executing stat command on the file from the bricks.
On brick /b0
```console
```{ .console .no-copy }
# stat /bricks/brick2/b0/f4
File: /bricks/brick2/b0/f4
Size: 14 Blocks: 8 IO Block: 4096 regular file
@@ -489,7 +519,8 @@ Birth: -
```
On brick /b1
```console
```{ .console .no-copy }
# stat /bricks/brick2/b1/f4
File: /bricks/brick2/b1/f4
Size: 2 Blocks: 8 IO Block: 4096 regular file
@@ -503,12 +534,14 @@ Birth: -
```
Execute the following command:
```console
```{ .console .no-copy }
# gluster volume heal VOLNAME split-brain latest-mtime FILE
```
### Example :
```console
```{ .console .no-copy }
# gluster volume heal testvol split-brain latest-mtime /f4
GFID split-brain resolved for file /f4
```
@@ -516,7 +549,9 @@ GFID split-brain resolved for file /f4
After the healing is complete, the GFID of the files on both bricks must be same. The following is a sample output of the getfattr command after completion of healing the file. You can notice that the file has been healed using the brick having the latest mtime as the source.
On brick /b0
```console# getfattr -d -m . -e hex /bricks/brick2/b0/f4
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f4
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
@@ -525,7 +560,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f4
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b1/f4
@@ -535,13 +571,16 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
## iii) Select one of the bricks in the replica as source for a particular file
This method is useful for per file healing and if you know which copy of the file is good.
### Example :
Lets take another file which is in GFID split-brain and try to heal that using the source-brick option.
On brick /b0
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f3
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f3
@@ -553,7 +592,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f3
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f3
@@ -567,14 +607,16 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
You can notice the difference in GFID for the file f3 in both the bricks.
Execute the following command:
```console
```{ .console .no-copy }
# gluster volume heal VOLNAME split-brain source-brick HOSTNAME:export-directory-absolute-path FILE
```
In this command, FILE present in HOSTNAME : export-directory-absolute-path is taken as source for healing.
### Example :
```console
```{ .console .no-copy }
# gluster volume heal testvol split-brain source-brick 10.70.47.144:/bricks/brick2/b1 /f3
GFID split-brain resolved for file /f3
```
@@ -582,7 +624,8 @@ GFID split-brain resolved for file /f3
After the healing is complete, the GFID of the file on both the bricks should be same as that of the brick which was chosen as source for healing. The following is a sample output of the getfattr command after the file is healed.
On brick /b0
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b0/f3
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b0/f3
@@ -592,7 +635,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
On brick /b1
```console
```{ .console .no-copy }
# getfattr -d -m . -e hex /bricks/brick2/b1/f3
getfattr: Removing leading '/' from absolute path names
file: bricks/brick2/b1/f3
@@ -602,19 +646,22 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303
```
> **Note**
>- One cannot use the GFID of the file as an argument with any of the CLI options to resolve GFID split-brain. It should be the absolute path as seen from the mount point to the file considered as source.
>
>- With source-brick option there is no way to resolve all the GFID split-brain in one shot by not specifying any file path in the CLI as done while resolving data or metadata split-brain. For each file in GFID split-brain, run the CLI with the policy you want to use.
> - One cannot use the GFID of the file as an argument with any of the CLI options to resolve GFID split-brain. It should be the absolute path as seen from the mount point to the file considered as source.
>
>- Resolving directory GFID split-brain using CLI with the "source-brick" option in a "distributed-replicated" volume needs to be done on all the sub-volumes explicitly, which are in this state. Since directories get created on all the sub-volumes, using one particular brick as source for directory GFID split-brain heals the directory for that particular sub-volume. Source brick should be chosen in such a way that after heal all the bricks of all the sub-volumes have the same GFID.
> - With source-brick option there is no way to resolve all the GFID split-brain in one shot by not specifying any file path in the CLI as done while resolving data or metadata split-brain. For each file in GFID split-brain, run the CLI with the policy you want to use.
>
> - Resolving directory GFID split-brain using CLI with the "source-brick" option in a "distributed-replicated" volume needs to be done on all the sub-volumes explicitly, which are in this state. Since directories get created on all the sub-volumes, using one particular brick as source for directory GFID split-brain heals the directory for that particular sub-volume. Source brick should be chosen in such a way that after heal all the bricks of all the sub-volumes have the same GFID.
## Note:
As mentioned earlier, type-mismatch can not be resolved using CLI. Type-mismatch means different st_mode values (for example, the entry is a file in one brick while it is a directory on the other). Trying to heal such entry would fail.
### Example
The entry named "entry1" is of different types on the bricks of the replica. Lets try to heal that using the split-brain CLI.
```console
```{ .console .no-copy }
# gluster volume heal test split-brain source-brick test-host:/test/b1 /entry1
Healing /entry1 failed:Operation not permitted.
Volume heal failed.
@@ -623,22 +670,23 @@ Volume heal failed.
However, they can be fixed by deleting the file from all but one bricks. See [Fixing Directory entry split-brain](#dir-split-brain)
# An overview of working of heal info commands
When these commands are invoked, a "glfsheal" process is spawned which reads
the entries from the various sub-directories under `/<brick-path>/.glusterfs/indices/` of all
the bricks that are up (that it can connect to) one after another. These
entries are GFIDs of files that might need healing. Once GFID entries from a
brick are obtained, based on the lookup response of this file on each
participating brick of replica-pair & trusted.afr.* extended attributes it is
found out if the file needs healing, is in split-brain etc based on the
When these commands are invoked, a "glfsheal" process is spawned which reads
the entries from the various sub-directories under `/<brick-path>/.glusterfs/indices/` of all
the bricks that are up (that it can connect to) one after another. These
entries are GFIDs of files that might need healing. Once GFID entries from a
brick are obtained, based on the lookup response of this file on each
participating brick of replica-pair & trusted.afr.\* extended attributes it is
found out if the file needs healing, is in split-brain etc based on the
requirement of each command and displayed to the user.
# 4. Resolution of split-brain from the mount point
A set of getfattr and setfattr commands have been provided to detect the data and metadata split-brain status of a file and resolve split-brain, if any, from mount point.
Consider a volume "test", having bricks b0, b1, b2 and b3.
```console
```{ .console .no-copy }
# gluster volume info test
Volume Name: test
@@ -656,7 +704,7 @@ Brick4: test-host:/test/b3
Directory structure of the bricks is as follows:
```console
```{ .console .no-copy }
# tree -R /test/b?
/test/b0
├── dir
@@ -683,7 +731,7 @@ Directory structure of the bricks is as follows:
Some files in the volume are in split-brain.
```console
```{ .console .no-copy }
# gluster v heal test info split-brain
Brick test-host:/test/b0/
/file100
@@ -708,7 +756,7 @@ Number of entries in split-brain: 2
### To know data/metadata split-brain status of a file:
```console
```{ .console .no-copy }
getfattr -n replica.split-brain-status <path-to-file>
```
@@ -716,50 +764,52 @@ The above command executed from mount provides information if a file is in data/
This command is not applicable to gfid/directory split-brain.
### Example:
1) "file100" is in metadata split-brain. Executing the above mentioned command for file100 gives :
```console
1. "file100" is in metadata split-brain. Executing the above mentioned command for file100 gives :
```{ .console .no-copy }
# getfattr -n replica.split-brain-status file100
file: file100
replica.split-brain-status="data-split-brain:no metadata-split-brain:yes Choices:test-client-0,test-client-1"
```
2) "file1" is in data split-brain.
2. "file1" is in data split-brain.
```console
```{ .console .no-copy }
# getfattr -n replica.split-brain-status file1
file: file1
replica.split-brain-status="data-split-brain:yes metadata-split-brain:no Choices:test-client-2,test-client-3"
```
3) "file99" is in both data and metadata split-brain.
3. "file99" is in both data and metadata split-brain.
```console
```{ .console .no-copy }
# getfattr -n replica.split-brain-status file99
file: file99
replica.split-brain-status="data-split-brain:yes metadata-split-brain:yes Choices:test-client-2,test-client-3"
```
4) "dir" is in directory split-brain but as mentioned earlier, the above command is not applicable to such split-brain. So it says that the file is not under data or metadata split-brain.
4. "dir" is in directory split-brain but as mentioned earlier, the above command is not applicable to such split-brain. So it says that the file is not under data or metadata split-brain.
```console
```{ .console .no-copy }
# getfattr -n replica.split-brain-status dir
file: dir
replica.split-brain-status="The file is not under data or metadata split-brain"
```
5) "file2" is not in any kind of split-brain.
5. "file2" is not in any kind of split-brain.
```console
```{ .console .no-copy }
# getfattr -n replica.split-brain-status file2
file: file2
replica.split-brain-status="The file is not under data or metadata split-brain"
```
### To analyze the files in data and metadata split-brain
Trying to do operations (say cat, getfattr etc) from the mount on files in split-brain, gives an input/output error. To enable the users analyze such files, a setfattr command is provided.
```console
```{ .console .no-copy }
# setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>
```
@@ -767,9 +817,9 @@ Using this command, a particular brick can be chosen to access the file in split
### Example:
1) "file1" is in data-split-brain. Trying to read from the file gives input/output error.
1. "file1" is in data-split-brain. Trying to read from the file gives input/output error.
```console
```{ .console .no-copy }
# cat file1
cat: file1: Input/output error
```
@@ -778,13 +828,13 @@ Split-brain choices provided for file1 were test-client-2 and test-client-3.
Setting test-client-2 as split-brain choice for file1 serves reads from b2 for the file.
```console
```{ .console .no-copy }
# setfattr -n replica.split-brain-choice -v test-client-2 file1
```
Now, read operations on the file can be done.
```console
```{ .console .no-copy }
# cat file1
xyz
```
@@ -793,18 +843,18 @@ Similarly, to inspect the file from other choice, replica.split-brain-choice is
Trying to inspect the file from a wrong choice errors out.
To undo the split-brain-choice that has been set, the above mentioned setfattr command can be used
To undo the split-brain-choice that has been set, the above mentioned setfattr command can be used
with "none" as the value for extended attribute.
### Example:
```console
```{ .console .no-copy }
# setfattr -n replica.split-brain-choice -v none file1
```
Now performing cat operation on the file will again result in input/output error, as before.
```console
```{ .console .no-copy }
# cat file
cat: file1: Input/output error
```
@@ -812,13 +862,13 @@ cat: file1: Input/output error
Once the choice for resolving split-brain is made, source brick is supposed to be set for the healing to be done.
This is done using the following command:
```console
```{ .console .no-copy }
# setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>
```
## Example
```console
```{ .console .no-copy }
# setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1
```
@@ -826,18 +876,19 @@ The above process can be used to resolve data and/or metadata split-brain on all
**NOTE**:
1) If "fopen-keep-cache" fuse mount option is disabled then inode needs to be invalidated each time before selecting a new replica.split-brain-choice to inspect a file. This can be done by using:
1. If "fopen-keep-cache" fuse mount option is disabled then inode needs to be invalidated each time before selecting a new replica.split-brain-choice to inspect a file. This can be done by using:
```console
```{ .console .no-copy }
# sefattr -n inode-invalidate -v 0 <path-to-file>
```
2) The above mentioned process for split-brain resolution from mount will not work on nfs mounts as it doesn't provide xattrs support.
2. The above mentioned process for split-brain resolution from mount will not work on nfs mounts as it doesn't provide xattrs support.
# 5. Automagic unsplit-brain by [ctime|mtime|size|majority]
The CLI and fuse mount based resolution methods require intervention in the sense that the admin/ user needs to run the commands manually. There is a `cluster.favorite-child-policy` volume option which when set to one of the various policies available, automatically resolve split-brains without user intervention. The default value is 'none', i.e. it is disabled.
```console
The CLI and fuse mount based resolution methods require intervention in the sense that the admin/ user needs to run the commands manually. There is a `cluster.favorite-child-policy` volume option which when set to one of the various policies available, automatically resolve split-brains without user intervention. The default value is 'none', i.e. it is disabled.
```{ .console .no-copy }
# gluster volume set help | grep -A3 cluster.favorite-child-policy
Option: cluster.favorite-child-policy
Default Value: none
@@ -846,40 +897,41 @@ Description: This option can be used to automatically resolve split-brains using
`cluster.favorite-child-policy` applies to all files of the volume. It is assumed that if this option is enabled with a particular policy, you don't care to examine the split-brain files on a per file basis but just want the split-brain to be resolved as and when it occurs based on the set policy.
<a name="manual-split-brain"></a>
# Manual Split-Brain Resolution:
Quick Start:
============
1. Get the path of the file that is in split-brain:
> It can be obtained either by
> a) The command `gluster volume heal info split-brain`.
> b) Identify the files for which file operations performed
from the client keep failing with Input/Output error.
# Quick Start:
2. Close the applications that opened this file from the mount point.
In case of VMs, they need to be powered-off.
1. Get the path of the file that is in split-brain:
3. Decide on the correct copy:
> This is done by observing the afr changelog extended attributes of the file on
the bricks using the getfattr command; then identifying the type of split-brain
(data split-brain, metadata split-brain, entry split-brain or split-brain due to
gfid-mismatch); and finally determining which of the bricks contains the 'good copy'
of the file.
> `getfattr -d -m . -e hex <file-path-on-brick>`.
It is also possible that one brick might contain the correct data while the
other might contain the correct metadata.
> It can be obtained either by
> a) The command `gluster volume heal info split-brain`.
> b) Identify the files for which file operations performed from the client keep failing with Input/Output error.
4. Reset the relevant extended attribute on the brick(s) that contains the
'bad copy' of the file data/metadata using the setfattr command.
> `setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick>`
1. Close the applications that opened this file from the mount point.
In case of VMs, they need to be powered-off.
5. Trigger self-heal on the file by performing lookup from the client:
> `ls -l <file-path-on-gluster-mount>`
1. Decide on the correct copy:
> This is done by observing the afr changelog extended attributes of the file on
> the bricks using the getfattr command; then identifying the type of split-brain
> (data split-brain, metadata split-brain, entry split-brain or split-brain due to
> gfid-mismatch); and finally determining which of the bricks contains the 'good copy'
> of the file.
> `getfattr -d -m . -e hex <file-path-on-brick>`.
> It is also possible that one brick might contain the correct data while the
> other might contain the correct metadata.
1. Reset the relevant extended attribute on the brick(s) that contains the
'bad copy' of the file data/metadata using the setfattr command.
> `setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick>`
1. Trigger self-heal on the file by performing lookup from the client:
> `ls -l <file-path-on-gluster-mount>`
# Detailed Instructions for steps 3 through 5:
Detailed Instructions for steps 3 through 5:
===========================================
To understand how to resolve split-brain we need to know how to interpret the
afr changelog extended attributes.
@@ -887,7 +939,7 @@ Execute `getfattr -d -m . -e hex <file-path-on-brick>`
Example:
```console
```{ .console .no-copy }
[root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt
\#file: brick-a/file.txt
security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
@@ -900,7 +952,7 @@ The extended attributes with `trusted.afr.<volname>-client-<subvolume-index>`
are used by afr to maintain changelog of the file.The values of the
`trusted.afr.<volname>-client-<subvolume-index>` are calculated by the glusterfs
client (fuse or nfs-server) processes. When the glusterfs client modifies a file
or directory, the client contacts each brick and updates the changelog extended
or directory, the client contacts each brick and updates the changelog extended
attribute according to the response of the brick.
'subvolume-index' is nothing but (brick number - 1) in
@@ -908,7 +960,7 @@ attribute according to the response of the brick.
Example:
```console
```{ .console .no-copy }
[root@pranithk-laptop ~]# gluster volume info vol
Volume Name: vol
Type: Distributed-Replicate
@@ -929,7 +981,7 @@ Example:
In the example above:
```console
```{ .console .no-copy }
Brick | Replica set | Brick subvolume index
----------------------------------------------------------------------------
-/gfs/brick-a | 0 | 0
@@ -945,25 +997,25 @@ Brick | Replica set | Brick subvolume index
Each file in a brick maintains the changelog of itself and that of the files
present in all the other bricks in its replica set as seen by that brick.
In the example volume given above, all files in brick-a will have 2 entries,
In the example volume given above, all files in brick-a will have 2 entries,
one for itself and the other for the file present in its replica pair, i.e.brick-b:
trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for itself (brick-a)
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a
Likewise, all files in brick-b will have:
trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for brick-a as seen by brick-b
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b)
trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b)
The same can be extended for other replica pairs.
The same can be extended for other replica pairs.
Interpreting Changelog (roughly pending operation count) Value:
Each extended attribute has a value which is 24 hexa decimal digits.
First 8 digits represent changelog of data. Second 8 digits represent changelog
of metadata. Last 8 digits represent Changelog of directory entries.
of metadata. Last 8 digits represent Changelog of directory entries.
Pictorially representing the same, we have:
```text
```{ .text .no-copy }
0x 000003d7 00000001 00000000
| | |
| | \_ changelog of directory entries
@@ -971,17 +1023,16 @@ Pictorially representing the same, we have:
\ _ changelog of data
```
For Directories metadata and entry changelogs are valid.
For regular files data and metadata changelogs are valid.
For special files like device files etc metadata changelog is valid.
When a file split-brain happens it could be either data split-brain or
meta-data split-brain or both. When a split-brain happens the changelog of the
file would be something like this:
file would be something like this:
Example:(Lets consider both data, metadata split-brain on same file).
```console
```{ .console .no-copy }
[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
\#file: gfs/brick-a/a
@@ -1007,7 +1058,7 @@ on itself but failed on /gfs/brick-b/a.
The second 8 digits of trusted.afr.vol-client-0 are
all zeros (0x........00000000........), and the second 8 digits of
trusted.afr.vol-client-1 are not all zeros (0x........00000001........).
So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded
So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded
on itself but failed on /gfs/brick-b/a.
#### According to Changelog extended attributes on file /gfs/brick-b/a:
@@ -1029,12 +1080,12 @@ file, it is in both data and metadata split-brain.
#### Deciding on the correct copy:
The user may have to inspect stat,getfattr output of the files to decide which
The user may have to inspect stat,getfattr output of the files to decide which
metadata to retain and contents of the file to decide which data to retain.
Continuing with the example above, lets say we want to retain the data
of /gfs/brick-a/a and metadata of /gfs/brick-b/a.
#### Resetting the relevant changelogs to resolve the split-brain:
#### Resetting the relevant changelogs to resolve the split-brain:
For resolving data-split-brain:
@@ -1068,27 +1119,31 @@ For trusted.afr.vol-client-1
Hence execute
`setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /gfs/brick-a/a`
Thus after the above operations are done, the changelogs look like this:
[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
\#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
Thus after the above operations are done, the changelogs look like this:
\#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000000000000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
```{ .console .no-copy }
[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
\#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
\#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000000000000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
```
## Triggering Self-heal:
Triggering Self-heal:
---------------------
Perform `ls -l <file-path-on-gluster-mount>` to trigger healing.
<a name="dir-split-brain"></a>
Fixing Directory entry split-brain:
----------------------------------
---
Afr has the ability to conservatively merge different entries in the directories
when there is a split-brain on directory.
If on one brick directory 'd' has entries '1', '2' and has entries '3', '4' on
@@ -1108,9 +1163,11 @@ needs to be removed.The gfid-link files are present in the .glusterfs folder
in the top-level directory of the brick. If the gfid of the file is
0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute got
from the getfattr command earlier),the gfid-link file can be found at
> /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b
#### Word of caution:
Before deleting the gfid-link, we have to ensure that there are no hard links
to the file present on that brick. If hard-links exist,they must be deleted as
well.

View File

@@ -2,20 +2,18 @@
A statedump is, as the name suggests, a dump of the internal state of a glusterfs process. It captures information about in-memory structures such as frames, call stacks, active inodes, fds, mempools, iobufs, and locks as well as xlator specific data structures. This can be an invaluable tool for debugging memory leaks and hung processes.
- [Generate a Statedump](#generate-a-statedump)
- [Read a Statedump](#read-a-statedump)
- [Debug with a Statedump](#debug-with-statedumps)
- [Generate a Statedump](#generate-a-statedump)
- [Read a Statedump](#read-a-statedump)
- [Debug with a Statedump](#debug-with-statedumps)
************************
---
## Generate a Statedump
Run the command
```console
# gluster --print-statedumpdir
gluster --print-statedumpdir
```
on a gluster server node to find out which directory the statedumps will be created in. This directory may need to be created if not already present.
@@ -38,7 +36,6 @@ kill -USR1 <pid-of-gluster-mount-process>
There are specific commands to generate statedumps for all brick processes/nfs server/quotad which can be used instead of the above. Run the following
commands on one of the server nodes:
For bricks:
```console
@@ -59,16 +56,17 @@ gluster volume statedump <volname> quotad
The statedumps will be created in `statedump-directory` on each node. The statedumps for brick processes will be created with the filename `hyphenated-brick-path.<pid>.dump.timestamp` while for all other processes it will be `glusterdump.<pid>.dump.timestamp`.
***
---
## Read a Statedump
Statedumps are text files and can be opened in any text editor. The first and last lines of the file contain the start and end time (in UTC)respectively of when the statedump file was written.
### Mallinfo
The mallinfo return status is printed in the following format. Please read _man mallinfo_ for more information about what each field means.
```
```{.text .no-copy }
[mallinfo]
mallinfo_arena=100020224 /* Non-mmapped space allocated (bytes) */
mallinfo_ordblks=69467 /* Number of free chunks */
@@ -83,19 +81,19 @@ mallinfo_keepcost=133712 /* Top-most, releasable space (bytes) */
```
### Memory accounting stats
Each xlator defines data structures specific to its requirements. The statedump captures information about the memory usage and allocations of these structures for each xlator in the call-stack and prints them in the following format:
For the xlator with the name _glusterfs_
```
```{.text .no-copy }
[global.glusterfs - Memory usage] #[global.<xlator-name> - Memory usage]
num_types=119 #The number of data types it is using
```
followed by the memory usage for each data-type for that translator. The following example displays a sample for the gf_common_mt_gf_timer_t type
```
```{.text .no-copy }
[global.glusterfs - usage-type gf_common_mt_gf_timer_t memusage]
#[global.<xlator-name> - usage-type <tag associated with the data-type> memusage]
size=112 #Total size allocated for data-type when the statedump was taken i.e. num_allocs * sizeof (data-type)
@@ -113,7 +111,7 @@ Mempools are an optimization intended to reduce the number of allocations of a d
Memory pool allocations by each xlator are displayed in the following format:
```
```{.text .no-copy }
[mempool] #Section name
-----=-----
pool-name=fuse:fd_t #pool-name=<xlator-name>:<data-type>
@@ -129,10 +127,9 @@ max-stdalloc=0 #Maximum number of allocations from heap that were in active
This information is also useful while debugging high memory usage issues as large hot_count and cur-stdalloc values may point to an element not being freed after it has been used.
### Iobufs
```
```{.text .no-copy }
[iobuf.global]
iobuf_pool=0x1f0d970 #The memory pool for iobufs
iobuf_pool.default_page_size=131072 #The default size of iobuf (if no iobuf size is specified the default size is allocated)
@@ -148,7 +145,7 @@ There are 3 lists of arenas
2. Purge list: arenas that can be purged(no active iobufs, active_cnt == 0).
3. Filled list: arenas without free iobufs.
```
```{.text .no-copy }
[purge.1] #purge.<S.No.>
purge.1.mem_base=0x7fc47b35f000 #The address of the arena structure
purge.1.active_cnt=0 #The number of iobufs active in that arena
@@ -168,7 +165,7 @@ arena.5.page_size=32768
If the active_cnt of any arena is non zero, then the statedump will also have the iobuf list.
```
```{.text .no-copy }
[arena.6.active_iobuf.1] #arena.<S.No>.active_iobuf.<iobuf.S.No.>
arena.6.active_iobuf.1.ref=1 #refcount of the iobuf
arena.6.active_iobuf.1.ptr=0x7fdb921a9000 #address of the iobuf
@@ -180,12 +177,11 @@ arena.6.active_iobuf.2.ptr=0x7fdb92189000
A lot of filled arenas at any given point in time could be a sign of iobuf leaks.
### Call stack
The fops received by gluster are handled using call stacks. A call stack contains information about the uid/gid/pid etc of the process that is executing the fop. Each call stack contains different call-frames for each xlator which handles that fop.
```
```{.text .no-copy }
[global.callpool.stack.3] #global.callpool.stack.<Serial-Number>
stack=0x7fc47a44bbe0 #Stack address
uid=0 #Uid of the process executing the fop
@@ -199,9 +195,10 @@ cnt=9 #Number of frames in this stack.
```
### Call-frame
Each frame will have information about which xlator the frame belongs to, which function it wound to/from and which it will be unwound to, and whether it has unwound.
```
```{.text .no-copy }
[global.callpool.stack.3.frame.2] #global.callpool.stack.<stack-serial-number>.frame.<frame-serial-number>
frame=0x7fc47a611dbc #Frame address
ref_count=0 #Incremented at the time of wind and decremented at the time of unwind.
@@ -215,12 +212,11 @@ unwind_to=afr_lookup_cbk #Parent xlator function to unwind to
To debug hangs in the system, see which xlator has not yet unwound its fop by checking the value of the _complete_ tag in the statedump. (_complete=0_ indicates the xlator has not yet unwound).
### FUSE Operation History
Gluster Fuse maintains a history of the operations that it has performed.
```
```{.text .no-copy }
[xlator.mount.fuse.history]
TIME=2014-07-09 16:44:57.523364
message=[0] fuse_release: RELEASE(): 4590:, fd: 0x1fef0d8, gfid: 3afb4968-5100-478d-91e9-76264e634c9f
@@ -234,7 +230,7 @@ message=[0] fuse_getattr_resume: 4591, STAT, path: (/iozone.tmp), gfid: (3afb496
### Xlator configuration
```
```{.text .no-copy }
[cluster/replicate.r2-replicate-0] #Xlator type, name information
child_count=2 #Number of children for the xlator
#Xlator specific configuration below
@@ -255,7 +251,7 @@ wait_count=1
### Graph/inode table
```
```{.text .no-copy }
[active graph - 1]
conn.1.bound_xl./data/brick01a/homegfs.hashsize=14057
@@ -268,7 +264,7 @@ conn.1.bound_xl./data/brick01a/homegfs.purge_size=0 #Number of inodes present
### Inode
```
```{.text .no-copy }
[conn.1.bound_xl./data/brick01a/homegfs.active.324] #324th inode in active inode list
gfid=e6d337cf-97eb-44b3-9492-379ba3f6ad42 #Gfid of the inode
nlookup=13 #Number of times lookups happened from the client or from fuse kernel
@@ -285,9 +281,10 @@ ia_type=2
```
### Inode context
Each xlator can store information specific to it in the inode context. This context can also be printed in the statedump. Here is the inode context of the locks xlator
```
```{.text .no-copy }
[xlator.features.locks.homegfs-locks.inode]
path=/homegfs/users/dfrobins/gfstest/r4/SCRATCH/fort.5102 - path of the file
mandatory=0
@@ -301,10 +298,11 @@ lock-dump.domain.domain=homegfs-replicate-0:metadata #Domain name where metadata
lock-dump.domain.domain=homegfs-replicate-0 #Domain name where entry/data operations take locks to maintain replication consistency
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=11141120, len=131072, pid = 18446744073709551615, owner=080b1ada117f0000, client=0xb7fc30, connection-id=compute-30-029.com-3505-2014/06/29-14:46:12:477358-homegfs-client-0-0-1, granted at Sun Jun 29 11:10:36 2014 #Active lock information
```
***
---
## Debug With Statedumps
### Memory leaks
Statedumps can be used to determine whether the high memory usage of a process is caused by a leak. To debug the issue, generate statedumps for that process at regular intervals, or before and after running the steps that cause the memory used to increase. Once you have multiple statedumps, compare the memory allocation stats to see if any of them are increasing steadily as those could indicate a potential memory leak.
@@ -315,7 +313,7 @@ The following examples walk through using statedumps to debug two different memo
[BZ 1120151](https://bugzilla.redhat.com/show_bug.cgi?id=1120151) reported high memory usage by the self heal daemon whenever one of the bricks was wiped in a replicate volume and a full self-heal was invoked to heal the contents. This issue was debugged using statedumps to determine which data-structure was leaking memory.
A statedump of the self heal daemon process was taken using
A statedump of the self heal daemon process was taken using
```console
kill -USR1 `<pid-of-gluster-self-heal-daemon>`
@@ -323,7 +321,7 @@ kill -USR1 `<pid-of-gluster-self-heal-daemon>`
On examining the statedump:
```
```{.text .no-copy }
grep -w num_allocs glusterdump.5225.dump.1405493251
num_allocs=77078
num_allocs=87070
@@ -338,6 +336,7 @@ hot-count=4095
```
On searching for num_allocs with high values in the statedump, a `grep` of the statedump revealed a large number of allocations for the following data-types under the replicate xlator:
1. gf_common_mt_asprintf
2. gf_common_mt_char
3. gf_common_mt_mem_pool.
@@ -345,16 +344,15 @@ On searching for num_allocs with high values in the statedump, a `grep` of the s
On checking the afr-code for allocations with tag `gf_common_mt_char`, it was found that the `data-self-heal` code path does not free one such allocated data structure. `gf_common_mt_mem_pool` suggests that there is a leak in pool memory. The `replicate-0:dict_t`, `glusterfs:data_t` and `glusterfs:data_pair_t` pools are using a lot of memory, i.e. cold_count is `0` and there are too many allocations. Checking the source code of dict.c shows that `key` in `dict` is allocated with `gf_common_mt_char` i.e. `2.` tag and value is created using gf_asprintf which in-turn uses `gf_common_mt_asprintf` i.e. `1.`. Checking the code for leaks in self-heal code paths led to a line which over-writes a variable with new dictionary even when it was already holding a reference to another dictionary. After fixing these leaks, we ran the same test to verify that none of the `num_allocs` values increased in the statedump of the self-daemon after healing 10,000 files.
Please check [http://review.gluster.org/8316](http://review.gluster.org/8316) for more info about the patch/code.
#### Leaks in mempools:
The statedump output of mempools was used to test and verify the fixes for [BZ 1134221](https://bugzilla.redhat.com/show_bug.cgi?id=1134221). On code analysis, dict_t objects were found to be leaking (due to missing unref's) during name self-heal.
The statedump output of mempools was used to test and verify the fixes for [BZ 1134221](https://bugzilla.redhat.com/show_bug.cgi?id=1134221). On code analysis, dict_t objects were found to be leaking (due to missing unref's) during name self-heal.
Glusterfs was compiled with the -DDEBUG flags to have cold count set to 0 by default. The test involved creating 100 files on plain replicate volume, removing them from one of the backend bricks, and then triggering lookups on them from the mount point. A statedump of the mount process was taken before executing the test case and after it was completed.
Statedump output of the fuse mount process before the test case was executed:
```
```{.text .no-copy }
pool-name=glusterfs:dict_t
hot-count=0
cold-count=0
@@ -364,12 +362,11 @@ max-alloc=0
pool-misses=33
cur-stdalloc=14
max-stdalloc=18
```
Statedump output of the fuse mount process after the test case was executed:
```
```{.text .no-copy }
pool-name=glusterfs:dict_t
hot-count=0
cold-count=0
@@ -379,15 +376,15 @@ max-alloc=0
pool-misses=2841
cur-stdalloc=214
max-stdalloc=220
```
Here, as cold count was 0 by default, cur-stdalloc indicates the number of dict_t objects that were allocated from the heap using mem_get(), and are yet to be freed using mem_put(). After running the test case (named selfheal of 100 files), there was a rise in the cur-stdalloc value (from 14 to 214) for dict_t.
After the leaks were fixed, glusterfs was again compiled with -DDEBUG flags and the steps were repeated. Statedumps of the FUSE mount were taken before and after executing the test case to ascertain the validity of the fix. And the results were as follows:
Statedump output of the fuse mount process before executing the test case:
```
```{.text .no-copy }
pool-name=glusterfs:dict_t
hot-count=0
cold-count=0
@@ -397,11 +394,11 @@ max-alloc=0
pool-misses=33
cur-stdalloc=14
max-stdalloc=18
```
Statedump output of the fuse mount process after executing the test case:
```
```{.text .no-copy }
pool-name=glusterfs:dict_t
hot-count=0
cold-count=0
@@ -411,17 +408,18 @@ max-alloc=0
pool-misses=2837
cur-stdalloc=14
max-stdalloc=119
```
The value of cur-stdalloc remained 14 after the test, indicating that the fix indeed does what it's supposed to do.
### Hangs caused by frame loss
[BZ 994959](https://bugzilla.redhat.com/show_bug.cgi?id=994959) reported that the Fuse mount hangs on a readdirp operation.
Here are the steps used to locate the cause of the hang using statedump.
Statedumps were taken for all gluster processes after reproducing the issue. The following stack was seen in the FUSE mount's statedump:
```
```{.text .no-copy }
[global.callpool.stack.1.frame.1]
ref_count=1
translator=fuse
@@ -463,8 +461,8 @@ parent=r2-quick-read
wind_from=qr_readdirp
wind_to=FIRST_CHILD (this)->fops->readdirp
unwind_to=qr_readdirp_cbk
```
`unwind_to` shows that call was unwound to `afr_readdirp_cbk` from the r2-client-1 xlator.
Inspecting that function revealed that afr is not unwinding the stack when fop failed.
Check [http://review.gluster.org/5531](http://review.gluster.org/5531) for more info about patch/code changes.

View File

@@ -8,7 +8,7 @@ The first level of analysis always starts with looking at the log files. Which o
Sometimes, you might need more verbose logging to figure out whats going on:
`gluster volume set $volname client-log-level $LEVEL`
where LEVEL can be any one of `DEBUG, WARNING, ERROR, INFO, CRITICAL, NONE, TRACE`. This should ideally make all the log files mentioned above to start logging at `$LEVEL`. The default is `INFO` but you can temporarily toggle it to `DEBUG` or `TRACE` if you want to see under-the-hood messages. Useful when the normal logs dont give a clue as to what is happening.
where LEVEL can be any one of `DEBUG, WARNING, ERROR, INFO, CRITICAL, NONE, TRACE`. This should ideally make all the log files mentioned above to start logging at `$LEVEL`. The default is `INFO` but you can temporarily toggle it to `DEBUG` or `TRACE` if you want to see under-the-hood messages. Useful when the normal logs dont give a clue as to what is happening.
## Heal related issues:
@@ -20,17 +20,19 @@ Most issues Ive seen on the mailing list and with customers can broadly fit i
If the number of entries are large, then heal info will take longer than usual. While there are performance improvements to heal info being planned, a faster way to get an approx. count of the pending entries is to use the `gluster volume heal $VOLNAME statistics heal-count` command.
**Knowledge Hack:** Since we know that during the write transaction. the xattrop folder will capture the gfid-string of the file if it needs heal, we can also do an `ls /brick/.glusterfs/indices/xattrop|wc -l` on each brick to get the approx. no of entries that need heal. If this number reduces over time, it is a sign that the heal backlog is reducing. You will also see messages whenever a particular type of heal starts/ends for a given gfid, like so:
**Knowledge Hack:** Since we know that during the write transaction. the xattrop folder will capture the gfid-string of the file if it needs heal, we can also do an `ls /brick/.glusterfs/indices/xattrop|wc -l` on each brick to get the approx. no of entries that need heal. If this number reduces over time, it is a sign that the heal backlog is reducing. You will also see messages whenever a particular type of heal starts/ends for a given gfid, like so:
`[2019-05-07 12:05:14.460442] I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do] 0-testvol-replicate-0: performing entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb`
```{.text .no-copy }
[2019-05-07 12:05:14.460442] I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do] 0-testvol-replicate-0: performing entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb
`[2019-05-07 12:05:14.474710] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1`
[2019-05-07 12:05:14.474710] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1
`[2019-05-07 12:05:14.493506] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1`
[2019-05-07 12:05:14.493506] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
`[2019-05-07 12:05:14.494577] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-testvol-replicate-0: performing metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5`
[2019-05-07 12:05:14.494577] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-testvol-replicate-0: performing metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5
`[2019-05-07 12:05:14.498398] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1`
[2019-05-07 12:05:14.498398] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1
```
### ii) Self-heal is stuck/ not getting completed.
@@ -38,69 +40,88 @@ If a file seems to be forever appearing in heal info and not healing, check the
- Examine the afr xattrs- Do they clearly indicate the good and bad copies? If there isnt at least one good copy, then the file is in split-brain and you would need to use the split-brain resolution CLI.
- Identify which nodes shds would be picking up the file for heal. If a file is listed in the heal info output under brick1 and brick2, then the shds on the nodes which host those bricks would attempt (and one of them would succeed) in doing the heal.
- Once the shd is identified, look at the shd logs to see if it is indeed connected to the bricks.
- Once the shd is identified, look at the shd logs to see if it is indeed connected to the bricks.
This is good:
`[2019-05-07 09:53:02.912923] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-testvol-client-2: Connected to testvol-client-2, attached to remote volume '/bricks/brick3'`
```{.text .no-copy }
[2019-05-07 09:53:02.912923] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-testvol-client-2: Connected to testvol-client-2, attached to remote volume '/bricks/brick3'
```
This indicates a disconnect:
`[2019-05-07 11:44:47.602862] I [MSGID: 114018] [client.c:2334:client_rpc_notify] 0-testvol-client-2: disconnected from testvol-client-2. Client process will keep trying to connect to glusterd until brick's port is available`
`[2019-05-07 11:44:50.953516] E [MSGID: 114058] [client-handshake.c:1456:client_query_portmap_cbk] 0-testvol-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.`
```{.text .no-copy }
[2019-05-07 11:44:47.602862] I [MSGID: 114018] [client.c:2334:client_rpc_notify] 0-testvol-client-2: disconnected from testvol-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2019-05-07 11:44:50.953516] E [MSGID: 114058] [client-handshake.c:1456:client_query_portmap_cbk] 0-testvol-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
```
Alternatively, take a statedump of the self-heal daemon (shd) and check if all client xlators are connected to the respective bricks. The shd must have `connected=1` for all the client xlators, meaning it can talk to all the bricks.
| Shds statedump entry of a client xlator that is connected to the 3rd brick | Shds statedump entry of the same client xlator if it is diconnected from the 3rd brick |
|:--------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Shds statedump entry of a client xlator that is connected to the 3rd brick | Shds statedump entry of the same client xlator if it is diconnected from the 3rd brick |
| :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [xlator.protocol.client.testvol-client-2.priv] connected=1 total_bytes_read=75004 ping_timeout=42 total_bytes_written=50608 ping_msgs_sent=0 msgs_sent=0 | [xlator.protocol.client.testvol-client-2.priv] connected=0 total_bytes_read=75004 ping_timeout=42 total_bytes_written=50608 ping_msgs_sent=0 msgs_sent=0 |
If there are connection issues (i.e. `connected=0`), you would need to investigate and fix them. Check if the pid and the TCP/RDMA Port of the brick proceess from gluster volume status $VOLNAME matches that of `ps aux|grep glusterfsd|grep $brick-path`
`[root@tuxpad glusterfs]# gluster volume status`
```{.text .no-copy }
# gluster volume status
Status of volume: testvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 127.0.0.2:/bricks/brick1 49152 0 Y 12527
Gluster process TCP Port RDMA Port Online Pid
`[root@tuxpad glusterfs]# ps aux|grep brick1`
---
`root 12527 0.0 0.1 1459208 20104 ? Ssl 11:20 0:01 /usr/local/sbin/glusterfsd -s 127.0.0.2 --volfile-id testvol.127.0.0.2.bricks-brick1 -p /var/run/gluster/vols/testvol/127.0.0.2-bricks-brick1.pid -S /var/run/gluster/70529980362a17d6.socket --brick-name /bricks/brick1 -l /var/log/glusterfs/bricks/bricks-brick1.log --xlator-option *-posix.glusterd-uuid=d90b1532-30e5-4f9d-a75b-3ebb1c3682d4 --process-name brick --brick-port 49152 --xlator-option testvol-server.listen-port=49152`
Brick 127.0.0.2:/bricks/brick1 49152 0 Y 12527
```
```{.text .no-copy }
# ps aux|grep brick1
root 12527 0.0 0.1 1459208 20104 ? Ssl 11:20 0:01 /usr/local/sbin/glusterfsd -s 127.0.0.2 --volfile-id testvol.127.0.0.2.bricks-brick1 -p /var/run/gluster/vols/testvol/127.0.0.2-bricks-brick1.pid -S /var/run/gluster/70529980362a17d6.socket --brick-name /bricks/brick1 -l /var/log/glusterfs/bricks/bricks-brick1.log --xlator-option *-posix.glusterd-uuid=d90b1532-30e5-4f9d-a75b-3ebb1c3682d4 --process-name brick --brick-port 49152 --xlator-option testvol-server.listen-port=49152
```
Though this will likely match, sometimes there could be a bug leading to stale port usage. A quick workaround would be to restart glusterd on that node and check if things match. Report the issue to the devs if you see this problem.
- I have seen some cases where a file is listed in heal info, and the afr xattrs indicate pending metadata or data heal but the file itself is not present on all bricks. Ideally, the parent directory of the file must have pending entry heal xattrs so that the file either gets created on the missing bricks or gets deleted from the ones where it is present. But if the parent dir doesnt have xattrs, the entry heal cant proceed. In such cases, you can
-- Either do a lookup directly on the file from the mount so that name heal is triggered and then shd can pickup the data/metadata heal.
-- Or manually set entry xattrs on the parent dir to emulate an entry heal so that the file gets created as a part of it.
-- If a bricks underlying filesystem/lvm was damaged and fsckd to recovery, some files/dirs might be missing on it. If there is a lot of missing info on the recovered bricks, it might be better to just to a replace-brick or reset-brick and let the heal fully sync everything rather than fiddling with afr xattrs of individual entries.
**Hack:** How to trigger heal on *any* file/directory
- Either do a lookup directly on the file from the mount so that name heal is triggered and then shd can pickup the data/metadata heal.
- Or manually set entry xattrs on the parent dir to emulate an entry heal so that the file gets created as a part of it.
- If a bricks underlying filesystem/lvm was damaged and fsckd to recovery, some files/dirs might be missing on it. If there is a lot of missing info on the recovered bricks, it might be better to just to a replace-brick or reset-brick and let the heal fully sync everything rather than fiddling with afr xattrs of individual entries.
**Hack:** How to trigger heal on _any_ file/directory
Knowing about self-heal logic and index heal from the previous post, we can sort of emulate a heal with the following steps. This is not something that you should be doing on your cluster but it pays to at least know that it is possible when push comes to shove.
1. Picking one brick as good and setting the afr pending xattr on it blaming the bad bricks.
2. Capture the gfid inside .glusterfs/indices/xattrop so that the shd can pick it up during index heal.
3. Finally, trigger index heal: gluster volume heal $VOLNAME .
*Example:* Let us say a FILE-1 exists with `trusted.gfid=0x1ad2144928124da9b7117d27393fea5c` on all bricks of a replica 3 volume called testvol. It has no afr xattrs. But you still need to emulate a heal. Let us say you choose brick-2 as the source. Let us do the steps listed above:
_Example:_ Let us say a FILE-1 exists with `trusted.gfid=0x1ad2144928124da9b7117d27393fea5c` on all bricks of a replica 3 volume called testvol. It has no afr xattrs. But you still need to emulate a heal. Let us say you choose brick-2 as the source. Let us do the steps listed above:
1. Make brick-2 blame the other 2 bricks:
[root@tuxpad fuse_mnt]# setfattr -n trusted.afr.testvol-client-2 -v 0x000000010000000000000000 /bricks/brick2/FILE-1
[root@tuxpad fuse_mnt]# setfattr -n trusted.afr.testvol-client-1 -v 0x000000010000000000000000 /bricks/brick2/FILE-1
1. Make brick-2 blame the other 2 bricks:
2. Store the gfid string inside xattrop folder as a hardlink to the base entry:
root@tuxpad ~]# cd /bricks/brick2/.glusterfs/indices/xattrop/
[root@tuxpad xattrop]# ls -li
total 0
17829255 ----------. 1 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7`
[root@tuxpad xattrop]# ln xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 1ad21449-2812-4da9-b711-7d27393fea5c
[root@tuxpad xattrop]# ll
total 0
----------. 2 root root 0 May 10 11:20 1ad21449-2812-4da9-b711-7d27393fea5c
----------. 2 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7
setfattr -n trusted.afr.testvol-client-2 -v 0x000000010000000000000000 /bricks/brick2/FILE-1
setfattr -n trusted.afr.testvol-client-1 -v 0x000000010000000000000000 /bricks/brick2/FILE-1
3. Trigger heal: gluster volume heal testvol
The glustershd.log of node-2 should log about the heal.
[2019-05-10 06:10:46.027238] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on 1ad21449-2812-4da9-b711-7d27393fea5c. sources=[1] sinks=0 2
So the data was healed from the second brick to the first and third brick.
2. Store the gfid string inside xattrop folder as a hardlink to the base entry:
# cd /bricks/brick2/.glusterfs/indices/xattrop/
# ls -li
total 0
17829255 ----------. 1 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7`
# ln xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 1ad21449-2812-4da9-b711-7d27393fea5c
# ll
total 0
----------. 2 root root 0 May 10 11:20 1ad21449-2812-4da9-b711-7d27393fea5c
----------. 2 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7
3. Trigger heal: `gluster volume heal testvol`
The glustershd.log of node-2 should log about the heal.
[2019-05-10 06:10:46.027238] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on 1ad21449-2812-4da9-b711-7d27393fea5c. sources=[1] sinks=0 2
So the data was healed from the second brick to the first and third brick.
### iii) Self-heal is too slow
@@ -109,7 +130,7 @@ If the heal backlog is decreasing and you see glustershd logging heals but you
Option: cluster.shd-max-threads
Default Value: 1
Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you dont have the storage hardware to support this.
Option: cluster.shd-wait-qlength
Default Value: 1024
Description: This option can be used to control number of heals that can wait in SHD per subvolume
@@ -118,38 +139,45 @@ Im not covering it here but it is possible to launch multiple shd instances (
### iv) Self-heal is too aggressive and slows down the system.
If shd-max-threads are at the lowest value (i.e. 1) and you see if CPU usage of the bricks is too high, you can check if the volumes profile info shows a lot of RCHECKSUM fops. Data self-heal does checksum calculation (i.e the `posix_rchecksum()` FOP) which can be CPU intensive. You can the `cluster.data-self-heal-algorithm` option to full. This does a full file copy instead of computing rolling checksums and syncing only the mismatching blocks. The tradeoff is that the network consumption will be increased.
If shd-max-threads are at the lowest value (i.e. 1) and you see if CPU usage of the bricks is too high, you can check if the volumes profile info shows a lot of RCHECKSUM fops. Data self-heal does checksum calculation (i.e the `posix_rchecksum()` FOP) which can be CPU intensive. You can the `cluster.data-self-heal-algorithm` option to full. This does a full file copy instead of computing rolling checksums and syncing only the mismatching blocks. The tradeoff is that the network consumption will be increased.
You can also disable all client-side heals if they are turned on so that the client bandwidth is consumed entirely by the application FOPs and not the ones by client side background heals. i.e. turn off `cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal`.
Note: In recent versions of gluster, client-side heals are disabled by default.
You can also disable all client-side heals if they are turned on so that the client bandwidth is consumed entirely by the application FOPs and not the ones by client side background heals. i.e. turn off `cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal`.
Note: In recent versions of gluster, client-side heals are disabled by default.
## Mount related issues:
### i) All fops are failing with ENOTCONN
### i) All fops are failing with ENOTCONN
Check mount log/ statedump for loss of quorum, just like for glustershd. If this is a fuse client (as opposed to an nfs/ gfapi client), you can also check the .meta folder to check the connection status to the bricks.
`[root@tuxpad ~]# cat /mnt/fuse_mnt/.meta/graphs/active/testvol-client-*/private |grep connected`
`connected = 0`
`connected = 1`
`connected = 1`
```{.text .no-copy }
# cat /mnt/fuse_mnt/.meta/graphs/active/testvol-client-*/private |grep connected
If `connected=0`, the connection to that brick is lost. Find out why. If the client is not connected to quorum number of bricks, then AFR fails lookups (and therefore any subsequent FOP) with Transport endpoint is not connected
connected = 0
connected = 1
connected = 1
```
If `connected=0`, the connection to that brick is lost. Find out why. If the client is not connected to quorum number of bricks, then AFR fails lookups (and therefore any subsequent FOP) with Transport endpoint is not connected
### ii) FOPs on some files are failing with ENOTCONN
Check mount log for the file being unreadable:
`[2019-05-10 11:04:01.607046] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 13-testvol-replicate-0: no read subvols for /FILE.txt`
`[2019-05-10 11:04:01.607775] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 234: LOOKUP() /FILE.txt => -1 (Transport endpoint is not connected)`
This means there was only 1 good copy and the client has lost connection to that brick. You need to ensure that the client is connected to all bricks.
```{.text .no-copy }
[2019-05-10 11:04:01.607046] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 13-testvol-replicate-0: no read subvols for /FILE.txt
[2019-05-10 11:04:01.607775] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 234: LOOKUP() /FILE.txt => -1 (Transport endpoint is not connected)
```
This means there was only 1 good copy and the client has lost connection to that brick. You need to ensure that the client is connected to all bricks.
### iii) Mount is hung
It can be difficult to pin-point the issue immediately and might require assistance from the developers but the first steps to debugging could be to
- strace the fuse mount; see where it is hung.
- Take a statedump of the mount to see which xlator has frames that are not wound (i.e. complete=0) and for which FOP. Then check the source code to see if there are any unhanded cases where the xlator doesnt wind the FOP to its child.
- Take statedump of bricks to see if there are any stale locks. An indication of stale locks is the same lock being present in multiple statedumps or the granted date being very old.
- strace the fuse mount; see where it is hung.
- Take a statedump of the mount to see which xlator has frames that are not wound (i.e. complete=0) and for which FOP. Then check the source code to see if there are any unhanded cases where the xlator doesnt wind the FOP to its child.
- Take statedump of bricks to see if there are any stale locks. An indication of stale locks is the same lock being present in multiple statedumps or the granted date being very old.
Excerpt from a brick statedump:

View File

@@ -1,6 +1,4 @@
Troubleshooting File Locks
==========================
# Troubleshooting File Locks
Use [statedumps](./statedump.md) to find and list the locks held
on files. The statedump output also provides information on each lock
@@ -13,11 +11,11 @@ lock using the following `clear lock` commands.
1. **Perform statedump on the volume to view the files that are locked
using the following command:**
# gluster volume statedump inode
gluster volume statedump inode
For example, to display statedump of test-volume:
# gluster volume statedump test-volume
gluster volume statedump test-volume
Volume statedump successful
The statedump files are created on the brick servers in the` /tmp`
@@ -58,25 +56,23 @@ lock using the following `clear lock` commands.
2. **Clear the lock using the following command:**
# gluster volume clear-locks
gluster volume clear-locks
For example, to clear the entry lock on `file1` of test-volume:
# gluster volume clear-locks test-volume / kind granted entry file1
gluster volume clear-locks test-volume / kind granted entry file1
Volume clear-locks successful
vol-locks: entry blocked locks=0 granted locks=1
3. **Clear the inode lock using the following command:**
# gluster volume clear-locks
gluster volume clear-locks
For example, to clear the inode lock on `file1` of test-volume:
# gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0
gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0
Volume clear-locks successful
vol-locks: inode blocked locks=0 granted locks=1
Perform statedump on test-volume again to verify that the
above inode and entry locks are cleared.

View File

@@ -8,13 +8,13 @@ to GlusterFS Geo-replication.
For every Geo-replication session, the following three log files are
associated to it (four, if the secondary is a gluster volume):
- **Primary-log-file** - log file for the process which monitors the Primary
volume
- **Secondary-log-file** - log file for process which initiates the changes in
secondary
- **Primary-gluster-log-file** - log file for the maintenance mount point
that Geo-replication module uses to monitor the Primary volume
- **Secondary-gluster-log-file** - is the secondary's counterpart of it
- **Primary-log-file** - log file for the process which monitors the Primary
volume
- **Secondary-log-file** - log file for process which initiates the changes in
secondary
- **Primary-gluster-log-file** - log file for the maintenance mount point
that Geo-replication module uses to monitor the Primary volume
- **Secondary-gluster-log-file** - is the secondary's counterpart of it
**Primary Log File**
@@ -28,7 +28,7 @@ gluster volume geo-replication <session> config log-file
For example:
```console
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file
gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file
```
**Secondary Log File**
@@ -38,13 +38,13 @@ running on secondary machine), use the following commands:
1. On primary, run the following command:
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66
gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66
Displays the session owner details.
2. On secondary, run the following command:
# gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log
gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log
3. Replace the session owner details (output of Step 1) to the output
of Step 2 to get the location of the log file.
@@ -52,7 +52,7 @@ running on secondary machine), use the following commands:
/var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log
### Rotating Geo-replication Logs
Administrators can rotate the log file of a particular primary-secondary
session, as needed. When you run geo-replication's ` log-rotate`
command, the log file is backed up with the current timestamp suffixed
@@ -61,34 +61,34 @@ log file.
**To rotate a geo-replication log file**
- Rotate log file for a particular primary-secondary session using the
following command:
- Rotate log file for a particular primary-secondary session using the
following command:
# gluster volume geo-replication log-rotate
gluster volume geo-replication log-rotate
For example, to rotate the log file of primary `Volume1` and secondary
`example.com:/data/remote_dir` :
For example, to rotate the log file of primary `Volume1` and secondary
`example.com:/data/remote_dir` :
# gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate
gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate
log rotate successful
- Rotate log file for all sessions for a primary volume using the
following command:
- Rotate log file for all sessions for a primary volume using the
following command:
# gluster volume geo-replication log-rotate
gluster volume geo-replication log-rotate
For example, to rotate the log file of primary `Volume1`:
For example, to rotate the log file of primary `Volume1`:
# gluster volume geo-replication Volume1 log rotate
gluster volume geo-replication Volume1 log rotate
log rotate successful
- Rotate log file for all sessions using the following command:
- Rotate log file for all sessions using the following command:
# gluster volume geo-replication log-rotate
gluster volume geo-replication log-rotate
For example, to rotate the log file for all sessions:
For example, to rotate the log file for all sessions:
# gluster volume geo-replication log rotate
gluster volume geo-replication log rotate
log rotate successful
### Synchronization is not complete
@@ -102,16 +102,14 @@ GlusterFS geo-replication begins synchronizing all the data. All files
are compared using checksum, which can be a lengthy and high resource
utilization operation on large data sets.
### Issues in Data Synchronization
**Description**: Geo-replication display status as OK, but the files do
not get synced, only directories and symlink gets synced with the
following error message in the log:
```console
[2011-05-02 13:42:13.467644] E [primary:288:regjob] GMaster: failed to
sync ./some\_file\`
```{ .text .no-copy }
[2011-05-02 13:42:13.467644] E [primary:288:regjob] GMaster: failed to sync ./some\_file\`
```
**Solution**: Geo-replication invokes rsync v3.0.0 or higher on the host
@@ -123,7 +121,7 @@ required version.
**Description**: Geo-replication displays status as faulty very often
with a backtrace similar to the following:
```console
```{ .text .no-copy }
2011-04-28 14:06:18.378859] E [syncdutils:131:log\_raise\_exception]
\<top\>: FAIL: Traceback (most recent call last): File
"/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
@@ -139,28 +137,28 @@ the primary gsyncd module and secondary gsyncd module is broken and this can
happen for various reasons. Check if it satisfies all the following
pre-requisites:
- Password-less SSH is set up properly between the host and the remote
machine.
- If FUSE is installed in the machine, because geo-replication module
mounts the GlusterFS volume using FUSE to sync data.
- If the **Secondary** is a volume, check if that volume is started.
- If the Secondary is a plain directory, verify if the directory has been
created already with the required permissions.
- If GlusterFS 3.2 or higher is not installed in the default location
(in Primary) and has been prefixed to be installed in a custom
location, configure the `gluster-command` for it to point to the
exact location.
- If GlusterFS 3.2 or higher is not installed in the default location
(in secondary) and has been prefixed to be installed in a custom
location, configure the `remote-gsyncd-command` for it to point to
the exact place where gsyncd is located.
- Password-less SSH is set up properly between the host and the remote
machine.
- If FUSE is installed in the machine, because geo-replication module
mounts the GlusterFS volume using FUSE to sync data.
- If the **Secondary** is a volume, check if that volume is started.
- If the Secondary is a plain directory, verify if the directory has been
created already with the required permissions.
- If GlusterFS 3.2 or higher is not installed in the default location
(in Primary) and has been prefixed to be installed in a custom
location, configure the `gluster-command` for it to point to the
exact location.
- If GlusterFS 3.2 or higher is not installed in the default location
(in secondary) and has been prefixed to be installed in a custom
location, configure the `remote-gsyncd-command` for it to point to
the exact place where gsyncd is located.
### Intermediate Primary goes to Faulty State
**Description**: In a cascading set-up, the intermediate primary goes to
faulty state with the following log:
```console
```{ .text .no-copy }
raise RuntimeError ("aborting on uuid change from %s to %s" % \\
RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f-
4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154

View File

@@ -4,45 +4,40 @@ The glusterd daemon runs on every trusted server node and is responsible for the
The gluster CLI sends commands to the glusterd daemon on the local node, which executes the operation and returns the result to the user.
<br>
### Debugging glusterd
#### Logs
Start by looking at the log files for clues as to what went wrong when you hit a problem.
The default directory for Gluster logs is /var/log/glusterfs. The logs for the CLI and glusterd are:
- glusterd : /var/log/glusterfs/glusterd.log
- gluster CLI : /var/log/glusterfs/cli.log
- glusterd : /var/log/glusterfs/glusterd.log
- gluster CLI : /var/log/glusterfs/cli.log
#### Statedumps
Statedumps are useful in debugging memory leaks and hangs.
See [Statedump](./statedump.md) for more details.
<br>
### Common Issues and How to Resolve Them
**"*Another transaction is in progress for volname*" or "*Locking failed on xxx.xxx.xxx.xxx"***
**"_Another transaction is in progress for volname_" or "_Locking failed on xxx.xxx.xxx.xxx"_**
As Gluster is distributed by nature, glusterd takes locks when performing operations to ensure that configuration changes made to a volume are atomic across the cluster.
These errors are returned when:
* More than one transaction contends on the same lock.
> *Solution* : These are likely to be transient errors and the operation will succeed if retried once the other transaction is complete.
- More than one transaction contends on the same lock.
* A stale lock exists on one of the nodes.
> *Solution* : Repeating the operation will not help until the stale lock is cleaned up. Restart the glusterd process holding the lock
> _Solution_ : These are likely to be transient errors and the operation will succeed if retried once the other transaction is complete.
* Check the glusterd.log file to find out which node holds the stale lock. Look for the message:
`lock being held by <uuid>`
* Run `gluster peer status` to identify the node with the uuid in the log message.
* Restart glusterd on that node.
- A stale lock exists on one of the nodes.
> _Solution_ : Repeating the operation will not help until the stale lock is cleaned up. Restart the glusterd process holding the lock
<br>
- Check the glusterd.log file to find out which node holds the stale lock. Look for the message:
`lock being held by <uuid>`
- Run `gluster peer status` to identify the node with the uuid in the log message.
- Restart glusterd on that node.
**"_Transport endpoint is not connected_" errors but all bricks are up**
@@ -51,51 +46,40 @@ Gluster client processes query glusterd for the ports the bricks processes are l
If the port information in glusterd is incorrect, the client will fail to connect to the brick even though it is up. Operations which
would need to access that brick may fail with "Transport endpoint is not connected".
*Solution* : Restart the glusterd service.
<br>
_Solution_ : Restart the glusterd service.
**"Peer Rejected"**
`gluster peer status` returns "Peer Rejected" for a node.
```console
```{ .text .no-copy }
Hostname: <hostname>
Uuid: <xxxx-xxx-xxxx>
State: Peer Rejected (Connected)
```
This indicates that the volume configuration on the node is not in sync with the rest of the trusted storage pool.
This indicates that the volume configuration on the node is not in sync with the rest of the trusted storage pool.
You should see the following message in the glusterd log for the node on which the peer status command was run:
```console
```{ .text .no-copy }
Version of Cksums <vol-name> differ. local cksum = xxxxxx, remote cksum = xxxxyx on peer <hostname>
```
*Solution*: Update the cluster.op-version
_Solution_: Update the cluster.op-version
* Run `gluster volume get all cluster.max-op-version` to get the latest supported op-version.
* Update the cluster.op-version to the latest supported op-version by executing `gluster volume set all cluster.op-version <op-version>`.
<br>
- Run `gluster volume get all cluster.max-op-version` to get the latest supported op-version.
- Update the cluster.op-version to the latest supported op-version by executing `gluster volume set all cluster.op-version <op-version>`.
**"Accepted Peer Request"**
If the glusterd handshake fails while expanding a cluster, the view of the cluster will be inconsistent. The state of the peer in `gluster peer status` will be “accepted peer request” and subsequent CLI commands will fail with an error.
Eg. `Volume create command will fail with "volume create: testvol: failed: Host <hostname> is not in 'Peer in Cluster' state`
If the glusterd handshake fails while expanding a cluster, the view of the cluster will be inconsistent. The state of the peer in `gluster peer status` will be “accepted peer request” and subsequent CLI commands will fail with an error.
Eg. `Volume create command will fail with "volume create: testvol: failed: Host <hostname> is not in 'Peer in Cluster' state`
In this case the value of the state field in `/var/lib/glusterd/peers/<UUID>` will be other than 3.
*Solution*:
* Stop glusterd
* Open `/var/lib/glusterd/peers/<UUID>`
* Change state to 3
* Start glusterd
_Solution_:
- Stop glusterd
- Open `/var/lib/glusterd/peers/<UUID>`
- Change state to 3
- Start glusterd

View File

@@ -11,14 +11,14 @@ This error is encountered when the server has not started correctly.
On most Linux distributions this is fixed by starting portmap:
```console
# /etc/init.d/portmap start
/etc/init.d/portmap start
```
On some distributions where portmap has been replaced by rpcbind, the
following command is required:
```console
# /etc/init.d/rpcbind start
/etc/init.d/rpcbind start
```
After starting portmap or rpcbind, gluster NFS server needs to be
@@ -32,13 +32,13 @@ This error can arise in case there is already a Gluster NFS server
running on the same machine. This situation can be confirmed from the
log file, if the following error lines exist:
```text
```{ .text .no-copy }
[2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use
[2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use
[2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection
[2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed
[2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
[2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed
[2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use
[2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection
[2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed
[2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
[2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed
[2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols
```
@@ -50,7 +50,7 @@ multiple NFS servers on the same machine.
If the mount command fails with the following error message:
```console
```{ .text .no-copy }
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
```
@@ -59,7 +59,7 @@ For NFS clients to mount the NFS server, rpc.statd service must be
running on the clients. Start rpc.statd service by running the following command:
```console
# rpc.statd
rpc.statd
```
### mount command takes too long to finish.
@@ -71,14 +71,14 @@ NFS client. The resolution for this is to start either of these services
by running the following command:
```console
# /etc/init.d/portmap start
/etc/init.d/portmap start
```
On some distributions where portmap has been replaced by rpcbind, the
following command is required:
```console
# /etc/init.d/rpcbind start
/etc/init.d/rpcbind start
```
### NFS server glusterfsd starts but initialization fails with “nfsrpc- service: portmap registration of program failed” error message in the log.
@@ -88,8 +88,8 @@ still fail preventing clients from accessing the mount points. Such a
situation can be confirmed from the following error messages in the log
file:
```text
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap
```{ .text .no-copy }
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap
[2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed
[2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
[2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed
@@ -104,12 +104,12 @@ file:
On most Linux distributions, portmap can be started using the
following command:
# /etc/init.d/portmap start
/etc/init.d/portmap start
On some distributions where portmap has been replaced by rpcbind,
run the following command:
# /etc/init.d/rpcbind start
/etc/init.d/rpcbind start
After starting portmap or rpcbind, gluster NFS server needs to be
restarted.
@@ -126,8 +126,8 @@ file:
On Linux, kernel NFS servers can be stopped by using either of the
following commands depending on the distribution in use:
# /etc/init.d/nfs-kernel-server stop
# /etc/init.d/nfs stop
/etc/init.d/nfs-kernel-server stop
/etc/init.d/nfs stop
3. **Restart Gluster NFS server**
@@ -135,7 +135,7 @@ file:
mount command fails with following error
```console
```{ .text .no-copy }
mount: mount to NFS server '10.1.10.11' failed: timed out (retrying).
```
@@ -175,14 +175,13 @@ Perform one of the following to resolve this issue:
forcing the NFS client to use version 3. The **vers** option to
mount command is used for this purpose:
# mount -o vers=3
mount -o vers=3
### showmount fails with clnt\_create: RPC: Unable to receive
### showmount fails with clnt_create: RPC: Unable to receive
Check your firewall setting to open ports 111 for portmap
requests/replies and Gluster NFS server requests/replies. Gluster NFS
server operates over the following port numbers: 38465, 38466, and
38467.
server operates over the following port numbers: 38465, 38466, and 38467.
### Application fails with "Invalid argument" or "Value too large for defined data type" error.
@@ -193,9 +192,9 @@ numbers instead: nfs.enable-ino32 \<on|off\>
Applications that will benefit are those that were either:
- built 32-bit and run on 32-bit machines such that they do not
support large files by default
- built 32-bit on 64-bit systems
- built 32-bit and run on 32-bit machines such that they do not
support large files by default
- built 32-bit on 64-bit systems
This option is disabled by default so NFS returns 64-bit inode numbers
by default.
@@ -203,6 +202,6 @@ by default.
Applications which can be rebuilt from source are recommended to rebuild
using the following flag with gcc:
```
```console
-D_FILE_OFFSET_BITS=64
```

View File

@@ -1,5 +1,4 @@
Troubleshooting High Memory Utilization
=======================================
# Troubleshooting High Memory Utilization
If the memory utilization of a Gluster process increases significantly with time, it could be a leak caused by resources not being freed.
If you suspect that you may have hit such an issue, try using [statedumps](./statedump.md) to debug the issue.
@@ -12,4 +11,3 @@ If you are unable to figure out where the leak is, please [file an issue](https:
- Steps to reproduce the issue if available
- Statedumps for the process collected at intervals as the memory utilization increases
- The Gluster log files for the process (if possible)

View File

@@ -1,32 +1,32 @@
Upgrading GlusterFS
-------------------
- [About op-version](./op-version.md)
## Upgrading GlusterFS
- [About op-version](./op-version.md)
If you are using GlusterFS version 6.x or above, you can upgrade it to the following:
- [Upgrading to 10](./upgrade-to-10.md)
- [Upgrading to 9](./upgrade-to-9.md)
- [Upgrading to 8](./upgrade-to-8.md)
- [Upgrading to 7](./upgrade-to-7.md)
- [Upgrading to 10](./upgrade-to-10.md)
- [Upgrading to 9](./upgrade-to-9.md)
- [Upgrading to 8](./upgrade-to-8.md)
- [Upgrading to 7](./upgrade-to-7.md)
If you are using GlusterFS version 5.x or above, you can upgrade it to the following:
- [Upgrading to 8](./upgrade-to-8.md)
- [Upgrading to 7](./upgrade-to-7.md)
- [Upgrading to 6](./upgrade-to-6.md)
- [Upgrading to 8](./upgrade-to-8.md)
- [Upgrading to 7](./upgrade-to-7.md)
- [Upgrading to 6](./upgrade-to-6.md)
If you are using GlusterFS version 4.x or above, you can upgrade it to the following:
- [Upgrading to 6](./upgrade-to-6.md)
- [Upgrading to 5](./upgrade-to-5.md)
- [Upgrading to 6](./upgrade-to-6.md)
- [Upgrading to 5](./upgrade-to-5.md)
If you are using GlusterFS version 3.4.x or above, you can upgrade it to following:
- [Upgrading to 3.5](./upgrade-to-3.5.md)
- [Upgrading to 3.6](./upgrade-to-3.6.md)
- [Upgrading to 3.7](./upgrade-to-3.7.md)
- [Upgrading to 3.9](./upgrade-to-3.9.md)
- [Upgrading to 3.10](./upgrade-to-3.10.md)
- [Upgrading to 3.11](./upgrade-to-3.11.md)
- [Upgrading to 3.12](./upgrade-to-3.12.md)
- [Upgrading to 3.13](./upgrade-to-3.13.md)
- [Upgrading to 3.5](./upgrade-to-3.5.md)
- [Upgrading to 3.6](./upgrade-to-3.6.md)
- [Upgrading to 3.7](./upgrade-to-3.7.md)
- [Upgrading to 3.9](./upgrade-to-3.9.md)
- [Upgrading to 3.10](./upgrade-to-3.10.md)
- [Upgrading to 3.11](./upgrade-to-3.11.md)
- [Upgrading to 3.12](./upgrade-to-3.12.md)
- [Upgrading to 3.13](./upgrade-to-3.13.md)

View File

@@ -1,6 +1,7 @@
# Generic Upgrade procedure
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -9,27 +10,28 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT:** If there are disperse or, pure distributed volumes in the storage pool being upgraded, this procedure is NOT recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to new-version :
1. Stop all gluster services, either using the command below, or through other means.
1. Stop all gluster services, either using the command below, or through other means.
# systemctl stop glusterd
# systemctl stop glustereventsd
# killall glusterfs glusterfsd glusterd
systemctl stop glusterd
systemctl stop glustereventsd
killall glusterfs glusterfsd glusterd
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
3. Install Gluster new-version, below example shows how to create a repository on fedora and use it to upgrade :
3. Install Gluster new-version, below example shows how to create a repository on fedora and use it to upgrade :
3.1 Create a private repository (assuming /new-gluster-rpms/ folder has the new rpms ):
3.1 Create a private repository (assuming /new-gluster-rpms/ folder has the new rpms ):
# createrepo /new-gluster-rpms/
createrepo /new-gluster-rpms/
3.2 Create the .repo file in /etc/yum.d/ :
3.2 Create the .repo file in /etc/yum.d/ :
# cat /etc/yum.d/newglusterrepo.repo
[newglusterrepo]
@@ -38,76 +40,74 @@ This procedure involves upgrading **one server at a time**, while keeping the vo
gpgcheck=0
enabled=1
3.3 Upgrade glusterfs, for example to upgrade glusterfs-server to x.y version :
3.3 Upgrade glusterfs, for example to upgrade glusterfs-server to x.y version :
# yum update glusterfs-server-x.y.fc30.x86_64.rpm
yum update glusterfs-server-x.y.fc30.x86_64.rpm
4. Ensure that version reflects new-version in the output of,
4. Ensure that version reflects new-version in the output of,
# gluster --version
gluster --version
5. Start glusterd on the upgraded server
5. Start glusterd on the upgraded server
# systemctl start glusterd
systemctl start glusterd
6. Ensure that all gluster processes are online by checking the output of,
6. Ensure that all gluster processes are online by checking the output of,
# gluster volume status
gluster volume status
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
# systemctl start glustereventsd
systemctl start glustereventsd
8. Invoke self-heal on all the gluster volumes by running,
8. Invoke self-heal on all the gluster volumes by running,
# for i in `gluster volume list`; do gluster volume heal $i; done
for i in `gluster volume list`; do gluster volume heal $i; done
9. Verify that there are no heal backlog by running the command for all the volumes,
9. Verify that there are no heal backlog by running the command for all the volumes,
# gluster volume heal <volname> info
gluster volume heal <volname> info
> **NOTE:** Before proceeding to upgrade the next server in the pool it is recommended to check the heal backlog. If there is a heal backlog, it is recommended to wait until the backlog is empty, or, the backlog does not contain any entries requiring a sync to the just upgraded server.
10. Restart any gfapi based application stopped previously in step (2)
1. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
```sh
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
# systemctl stop glusterd
# systemctl stop glustereventsd
# killall glusterfs glusterfsd glusterd
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
systemctl stop glusterd
systemctl stop glustereventsd
killall glusterfs glusterfsd glusterd
3. Install Gluster new-version, on all servers
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
4. Ensure that version reflects new-version in the output of the following command on all servers,
```sh
# gluster --version
```
3. Install Gluster new-version, on all servers
5. Start glusterd on all the upgraded servers
```sh
# systemctl start glusterd
```
6. Ensure that all gluster processes are online by checking the output of,
```sh
# gluster volume status
```
4. Ensure that version reflects new-version in the output of the following command on all servers,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
```sh
# systemctl start glustereventsd
```
gluster --version
8. Restart any gfapi based application stopped previously in step (2)
5. Start glusterd on all the upgraded servers
systemctl start glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
systemctl start glustereventsd
8. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
@@ -117,12 +117,13 @@ Perform the following steps post upgrading the entire trusted storage pool,
#### If upgrading from a version lesser than Gluster 7.0
> **NOTE:** If you have ever enabled quota on your volumes then after the upgrade
is done, you will have to restart all the nodes in the cluster one by one so as to
fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols/<volname>/ directory.`
The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted
everything will be back to normal.
> is done, you will have to restart all the nodes in the cluster one by one so as to
> fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols/<volname>/ directory.`
> The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted
> everything will be back to normal.
### Upgrade procedure for clients
Following are the steps to upgrade clients to the new-version version,
1. Unmount all glusterfs mount points on the client

View File

@@ -1,5 +1,5 @@
### op-version
op-version is the operating version of the Gluster which is running.
op-version was introduced to ensure gluster running with different versions do not end up in a problem and backward compatibility issues can be tackled.
@@ -13,19 +13,19 @@ Current op-version can be queried as below:
For 3.10 onwards:
```console
# gluster volume get all cluster.op-version
gluster volume get all cluster.op-version
```
For release < 3.10:
```console
```{ .console .no-copy }
# gluster volume get <VOLNAME> cluster.op-version
```
To get the maximum possible op-version a cluster can support, the following query can be used (this is available 3.10 release onwards):
```console
# gluster volume get all cluster.max-op-version
gluster volume get all cluster.max-op-version
```
For example, if some nodes in a cluster have been upgraded to X and some to X+, then the maximum op-version supported by the cluster is X, and the cluster.op-version can be bumped up to X to support new features.
@@ -34,7 +34,7 @@ op-version can be updated as below.
For example, after upgrading to glusterfs-4.0.0, set op-version as:
```console
# gluster volume set all cluster.op-version 40000
gluster volume set all cluster.op-version 40000
```
Note:
@@ -46,11 +46,10 @@ When trying to set a volume option, it might happen that one or more of the conn
To check op-version information for the connected clients and find the offending client, the following query can be used for 3.10 release onwards:
```console
```{ .console .no-copy }
# gluster volume status <all|VOLNAME> clients
```
The respective clients can then be upgraded to the required version.
This information could also be used to make an informed decision while bumping up the op-version of a cluster, so that connected clients can support all the new features provided by the upgraded cluster as well.

View File

@@ -10,6 +10,7 @@ Refer, to the [generic upgrade procedure](./generic-upgrade-procedure.md) guide
## Major issues
### The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
@@ -18,7 +19,7 @@ before an upgrade from releases older than release 4.1.0,
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -26,7 +27,7 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```
@@ -40,7 +41,6 @@ If these are set, then unset them using the following commands,
- Tiering support (tier xlator and changetimerecorder)
- Glupy
**NOTE:** Failure to do the above may result in failure during online upgrades,
and the reset of these options to their defaults needs to be done **prior** to
upgrading the cluster.
@@ -48,4 +48,3 @@ upgrading the cluster.
### Deprecated translators and upgrade procedure for volumes using these features
[If you are upgrading from a release prior to release-6 be aware of deprecated xlators and functionality](https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/#deprecated-translators-and-upgrade-procedure-for-volumes-using-these-features).

View File

@@ -1,6 +1,7 @@
## Upgrade procedure to Gluster 3.10.0, from Gluster 3.9.x, 3.8.x and 3.7.x
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -9,83 +10,82 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT**: If any of your volumes, in the trusted storage pool that is being upgraded, uses disperse or is a pure distributed volume, this procedure is **NOT** recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 3.10 version:
1. Stop all gluster services, either using the command below, or through other means,
```sh
#killall glusterfs glusterfsd glusterd
```
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
1. Stop all gluster services, either using the command below, or through other means,
3. Install Gluster 3.10
killall glusterfs glusterfsd glusterd
4. Ensure that version reflects 3.10.0 in the output of,
```sh
#gluster --version
```
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
5. Start glusterd on the upgraded server
```sh
#glusterd
```
3. Install Gluster 3.10
6. Ensure that all gluster processes are online by checking the output of,
```sh
#gluster volume status
```
4. Ensure that version reflects 3.10.0 in the output of,
7. Self-heal all gluster volumes by running
```sh
#for i in `gluster volume list`; do gluster volume heal $i; done
```
gluster --version
8. Ensure that there is no heal backlog by running the below command for all volumes
```sh
#gluster volume heal <volname> info
```
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
5. Start glusterd on the upgraded server
9. Restart any gfapi based application stopped previously in step (2)
glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. Self-heal all gluster volumes by running
for i in `gluster volume list`; do gluster volume heal $i; done
8. Ensure that there is no heal backlog by running the below command for all volumes
gluster volume heal <volname> info
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
9. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
```sh
#killall glusterfs glusterfsd glusterd
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
3. Install Gluster 3.10, on all servers
killall glusterfs glusterfsd glusterd
4. Ensure that version reflects 3.10.0 in the output of the following command on all servers,
```sh
#gluster --version
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
5. Start glusterd on all the upgraded servers
```sh
#glusterd
```
6. Ensure that all gluster processes are online by checking the output of,
```sh
#gluster volume status
```
3. Install Gluster 3.10, on all servers
7. Restart any gfapi based application stopped previously in step (2)
4. Ensure that version reflects 3.10.0 in the output of the following command on all servers,
gluster --version
5. Start glusterd on all the upgraded servers
glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
- Proceed to [upgrade the clients](#upgrade-procedure-for-clients) to 3.10 version as well
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 3.10.0 version,
1. Unmount all glusterfs mount points on the client

View File

@@ -3,6 +3,7 @@
**NOTE:** Upgrade procedure remains the same as with the 3.10 release
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -11,87 +12,86 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT**: If any of your volumes, in the trusted storage pool that is being upgraded, uses disperse or is a pure distributed volume, this procedure is **NOT** recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 3.11 version:
1. Stop all gluster services, either using the command below, or through other means,
```sh
#killall glusterfs glusterfsd glusterd
```
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
1. Stop all gluster services, either using the command below, or through other means,
3. Install Gluster 3.11
killall glusterfs glusterfsd glusterd
4. Ensure that version reflects 3.11.x in the output of,
```sh
#gluster --version
```
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
**NOTE:** x is the minor release number for the release
3. Install Gluster 3.11
5. Start glusterd on the upgraded server
```sh
#glusterd
```
4. Ensure that version reflects 3.11.x in the output of,
6. Ensure that all gluster processes are online by checking the output of,
```sh
#gluster volume status
```
gluster --version
7. Self-heal all gluster volumes by running
```sh
#for i in `gluster volume list`; do gluster volume heal $i; done
```
**NOTE:** x is the minor release number for the release
8. Ensure that there is no heal backlog by running the below command for all volumes
```sh
#gluster volume heal <volname> info
```
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
5. Start glusterd on the upgraded server
9. Restart any gfapi based application stopped previously in step (2)
glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. Self-heal all gluster volumes by running
for i in `gluster volume list`; do gluster volume heal $i; done
8. Ensure that there is no heal backlog by running the below command for all volumes
gluster volume heal <volname> info
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
9. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
```sh
#killall glusterfs glusterfsd glusterd
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
3. Install Gluster 3.11, on all servers
killall glusterfs glusterfsd glusterd
4. Ensure that version reflects 3.11.x in the output of the following command on all servers,
```sh
#gluster --version
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
**NOTE:** x is the minor release number for the release
3. Install Gluster 3.11, on all servers
5. Start glusterd on all the upgraded servers
```sh
#glusterd
```
6. Ensure that all gluster processes are online by checking the output of,
```sh
#gluster volume status
```
4. Ensure that version reflects 3.11.x in the output of the following command on all servers,
7. Restart any gfapi based application stopped previously in step (2)
gluster --version
**NOTE:** x is the minor release number for the release
5. Start glusterd on all the upgraded servers
glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
- Proceed to [upgrade the clients](#upgrade-procedure-for-clients) to 3.11 version as well
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 3.11.x version,
**NOTE:** x is the minor release number for the release

View File

@@ -3,6 +3,7 @@
> **NOTE:** Upgrade procedure remains the same as with 3.11 and 3.10 releases
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -11,90 +12,96 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT:** If there are disperse or, pure distributed volumes in the storage pool being upgraded, this procedure is NOT recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 3.12 version:
1. Stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
# systemctl stop glustereventsd
1. Stop all gluster services, either using the command below, or through other means,
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
killall glusterfs glusterfsd glusterd
systemctl stop glustereventsd
3. Install Gluster 3.12
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
4. Ensure that version reflects 3.12.x in the output of,
3. Install Gluster 3.12
# gluster --version
4. Ensure that version reflects 3.12.x in the output of,
gluster --version
> **NOTE:** x is the minor release number for the release
5. Start glusterd on the upgraded server
5. Start glusterd on the upgraded server
# glusterd
glusterd
6. Ensure that all gluster processes are online by checking the output of,
6. Ensure that all gluster processes are online by checking the output of,
# gluster volume status
gluster volume status
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
# systemctl start glustereventsd
systemctl start glustereventsd
8. Invoke self-heal on all the gluster volumes by running,
8. Invoke self-heal on all the gluster volumes by running,
# for i in `gluster volume list`; do gluster volume heal $i; done
for i in `gluster volume list`; do gluster volume heal $i; done
9. Verify that there are no heal backlog by running the command for all the volumes,
9. Verify that there are no heal backlog by running the command for all the volumes,
# gluster volume heal <volname> info
gluster volume heal <volname> info
> **NOTE:** Before proceeding to upgrade the next server in the pool it is recommended to check the heal backlog. If there is a heal backlog, it is recommended to wait until the backlog is empty, or, the backlog does not contain any entries requiring a sync to the just upgraded server.
10. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd glustereventsd
# systemctl stop glustereventsd
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
killall glusterfs glusterfsd glusterd glustereventsd
systemctl stop glustereventsd
3. Install Gluster 3.12, on all servers
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
4. Ensure that version reflects 3.12.x in the output of the following command on all servers,
3. Install Gluster 3.12, on all servers
# gluster --version
4. Ensure that version reflects 3.12.x in the output of the following command on all servers,
> **NOTE:** x is the minor release number for the release
gluster --version
5. Start glusterd on all the upgraded servers
> **NOTE:** x is the minor release number for the release
# glusterd
5. Start glusterd on all the upgraded servers
6. Ensure that all gluster processes are online by checking the output of,
glusterd
# gluster volume status
6. Ensure that all gluster processes are online by checking the output of,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
gluster volume status
# systemctl start glustereventsd
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
8. Restart any gfapi based application stopped previously in step (2)
systemctl start glustereventsd
8. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
- Proceed to [upgrade the clients](#upgrade-procedure-for-clients) to 3.12 version as well
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 3.12.x version,
> **NOTE:** x is the minor release number for the release

View File

@@ -3,6 +3,7 @@
**NOTE:** Upgrade procedure remains the same as with 3.12 and 3.10 releases
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -11,80 +12,86 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT**: If any of your volumes, in the trusted storage pool that is being upgraded, uses disperse or is a pure distributed volume, this procedure is **NOT** recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 3.13 version:
1. Stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
1. Stop all gluster services, either using the command below, or through other means,
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
killall glusterfs glusterfsd glusterd
3. Install Gluster 3.13
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
4. Ensure that version reflects 3.13.x in the output of,
# gluster --version
3. Install Gluster 3.13
**NOTE:** x is the minor release number for the release
4. Ensure that version reflects 3.13.x in the output of,
5. Start glusterd on the upgraded server
gluster --version
# glusterd
**NOTE:** x is the minor release number for the release
6. Ensure that all gluster processes are online by checking the output of,
5. Start glusterd on the upgraded server
# gluster volume status
glusterd
7. Self-heal all gluster volumes by running
6. Ensure that all gluster processes are online by checking the output of,
# for i in `gluster volume list`; do gluster volume heal $i; done
gluster volume status
8. Ensure that there is no heal backlog by running the below command for all volumes
7. Self-heal all gluster volumes by running
# gluster volume heal <volname> info
for i in `gluster volume list`; do gluster volume heal $i; done
8. Ensure that there is no heal backlog by running the below command for all volumes
gluster volume heal <volname> info
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
9. Restart any gfapi based application stopped previously in step (2)
9. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
killall glusterfs glusterfsd glusterd
3. Install Gluster 3.13, on all servers
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
4. Ensure that version reflects 3.13.x in the output of the following command on all servers,
3. Install Gluster 3.13, on all servers
# gluster --version
4. Ensure that version reflects 3.13.x in the output of the following command on all servers,
**NOTE:** x is the minor release number for the release
gluster --version
5. Start glusterd on all the upgraded servers
**NOTE:** x is the minor release number for the release
# glusterd
5. Start glusterd on all the upgraded servers
6. Ensure that all gluster processes are online by checking the output of,
glusterd
# gluster volume status
6. Ensure that all gluster processes are online by checking the output of,
7. Restart any gfapi based application stopped previously in step (2)
gluster volume status
7. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
- Proceed to [upgrade the clients](#upgrade-procedure-for-clients) to 3.13 version as well
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 3.13.x version,
**NOTE:** x is the minor release number for the release

View File

@@ -23,7 +23,7 @@ provided below)
1. Execute "pre-upgrade-script-for-quota.sh" mentioned under "Upgrade Steps For Quota" section.
2. Stop all glusterd, glusterfsd and glusterfs processes on your server.
3. Install GlusterFS 3.5.0
3. Install GlusterFS 3.5.0
4. Start glusterd.
5. Ensure that all started volumes have processes online in “gluster volume status”.
6. Execute "Post-Upgrade Script" mentioned under "Upgrade Steps For Quota" section.
@@ -77,7 +77,7 @@ The upgrade process for quota involves executing two upgrade scripts:
1. pre-upgrade-script-for-quota.sh, and\
2. post-upgrade-script-for-quota.sh
*Pre-Upgrade Script:*
_Pre-Upgrade Script:_
What it does:
@@ -105,11 +105,11 @@ Invocation:
Invoke the script by executing \`./pre-upgrade-script-for-quota.sh\`
from the shell on any one of the nodes in the cluster.
- Example:
- Example:
[root@server1 extras]#./pre-upgrade-script-for-quota.sh
[root@server1 extras]#./pre-upgrade-script-for-quota.sh
*Post-Upgrade Script:*
_Post-Upgrade Script:_
What it does:
@@ -164,9 +164,9 @@ In the first case, invoke post-upgrade-script-for-quota.sh from the
shell for each volume with quota enabled, with the name of the volume
passed as an argument in the command-line:
- Example:
- Example:
*For a volume "vol1" on which quota is enabled, invoke the script in the following way:*
_For a volume "vol1" on which quota is enabled, invoke the script in the following way:_
[root@server1 extras]#./post-upgrade-script-for-quota.sh vol1
@@ -176,9 +176,9 @@ procedure on each one of them. In this case, invoke
post-upgrade-script-for-quota.sh from the shell with 'all' passed as an
argument in the command-line:
- Example:
- Example:
[root@server1 extras]#./post-upgrade-script-for-quota.sh all
[root@server1 extras]#./post-upgrade-script-for-quota.sh all
Note:

View File

@@ -1,4 +1,5 @@
# GlusterFS upgrade from 3.5.x to 3.6.x
Now that GlusterFS 3.6.0 is out, here is the process to upgrade from
earlier installed versions of GlusterFS.
@@ -8,15 +9,15 @@ GlusterFS clients. If you are not updating your clients to GlusterFS
version 3.6 you need to disable client self healing process. You can
perform this by below steps.
```console
```{ .console .no-copy }
# gluster v set testvol cluster.entry-self-heal off
volume set: success
#
# gluster v set testvol cluster.data-self-heal off
volume set: success
# gluster v set testvol cluster.metadata-self-heal off
volume set: success
#
```
### GlusterFS upgrade from 3.5.x to 3.6.x
@@ -27,7 +28,7 @@ For this approach, schedule a downtime and prevent all your clients from
accessing ( umount your volumes, stop gluster Volumes..etc)the servers.
1. Stop all glusterd, glusterfsd and glusterfs processes on your server.
2. Install GlusterFS 3.6.0
2. Install GlusterFS 3.6.0
3. Start glusterd.
4. Ensure that all started volumes have processes online in “gluster volume status”.
@@ -59,7 +60,7 @@ provided below)
1. Execute "pre-upgrade-script-for-quota.sh" mentioned under "Upgrade Steps For Quota" section.
2. Stop all glusterd, glusterfsd and glusterfs processes on your server.
3. Install GlusterFS 3.6.0
3. Install GlusterFS 3.6.0
4. Start glusterd.
5. Ensure that all started volumes have processes online in “gluster volume status”.
6. Execute "Post-Upgrade Script" mentioned under "Upgrade Steps For Quota" section.
@@ -87,7 +88,7 @@ The upgrade process for quota involves executing two upgrade scripts:
1. pre-upgrade-script-for-quota.sh, and\
2. post-upgrade-script-for-quota.sh
*Pre-Upgrade Script:*
_Pre-Upgrade Script:_
What it does:
@@ -121,7 +122,7 @@ Example:
[root@server1 extras]#./pre-upgrade-script-for-quota.sh
```
*Post-Upgrade Script:*
_Post-Upgrade Script:_
What it does:
@@ -178,7 +179,7 @@ passed as an argument in the command-line:
Example:
*For a volume "vol1" on which quota is enabled, invoke the script in the following way:*
_For a volume "vol1" on which quota is enabled, invoke the script in the following way:_
```console
[root@server1 extras]#./post-upgrade-script-for-quota.sh vol1
@@ -227,7 +228,7 @@ covered in detail here.
**Below are the steps to upgrade:**
1. Stop the geo-replication session in older version ( \< 3.5) using
1. Stop the geo-replication session in older version ( \< 3.5) using
the below command
# gluster volume geo-replication `<master_vol>` `<slave_host>`::`<slave_vol>` stop

View File

@@ -1,4 +1,5 @@
# GlusterFS upgrade to 3.7.x
Now that GlusterFS 3.7.0 is out, here is the process to upgrade from
earlier installed versions of GlusterFS. Please read the entire howto
before proceeding with an upgrade of your deployment
@@ -13,15 +14,15 @@ version 3.6 along with your servers you would need to disable client
self healing process before the upgrade. You can perform this by below
steps.
```console
```{ .console .no-copy }
# gluster v set testvol cluster.entry-self-heal off
volume set: success
#
# gluster v set testvol cluster.data-self-heal off
volume set: success
# gluster v set testvol cluster.metadata-self-heal off
volume set: success
#
```
### GlusterFS upgrade to 3.7.x
@@ -71,11 +72,11 @@ The upgrade process for quota involves the following:
1. Run pre-upgrade-script-for-quota.sh
2. Upgrade to 3.7.0
2. Run post-upgrade-script-for-quota.sh
3. Run post-upgrade-script-for-quota.sh
More details on the scripts are as under.
*Pre-Upgrade Script:*
_Pre-Upgrade Script:_
What it does:
@@ -109,7 +110,7 @@ Example:
[root@server1 extras]#./pre-upgrade-script-for-quota.sh
```
*Post-Upgrade Script:*
_Post-Upgrade Script:_
What it does:

View File

@@ -1,12 +1,13 @@
## Upgrade procedure from Gluster 3.7.x
### Pre-upgrade Notes
- Online upgrade is only possible with replicated and distributed replicate volumes.
- Online upgrade is not yet supported for dispersed or distributed dispersed volumes.
- Ensure no configuration changes are done during the upgrade.
- If you are using geo-replication, please upgrade the slave cluster(s) before upgrading the master.
- Upgrading the servers ahead of the clients is recommended.
- Upgrade the clients after the servers are upgraded. It is recommended to have the same client and server major versions.
### Pre-upgrade Notes
- Online upgrade is only possible with replicated and distributed replicate volumes.
- Online upgrade is not yet supported for dispersed or distributed dispersed volumes.
- Ensure no configuration changes are done during the upgrade.
- If you are using geo-replication, please upgrade the slave cluster(s) before upgrading the master.
- Upgrading the servers ahead of the clients is recommended.
- Upgrade the clients after the servers are upgraded. It is recommended to have the same client and server major versions.
### Online Upgrade Procedure for Servers
@@ -14,7 +15,7 @@ The procedure involves upgrading one server at a time . On every storage server
- Stop all gluster services using the below command or through your favorite way to stop them.
# killall glusterfs glusterfsd glusterd
killall glusterfs glusterfsd glusterd
- If you are using gfapi based applications (qemu, NFS-Ganesha, Samba etc.) on the servers, please stop those applications too.
@@ -22,38 +23,39 @@ The procedure involves upgrading one server at a time . On every storage server
- Ensure that version reflects 3.8.x in the output of
# gluster --version
gluster --version
- Start glusterd on the upgraded server
# glusterd
glusterd
- Ensure that all gluster processes are online by executing
# gluster volume status
gluster volume status
- Self-heal all gluster volumes by running
# for i in `gluster volume list`; do gluster volume heal $i; done
for i in `gluster volume list`; do gluster volume heal $i; done
- Ensure that there is no heal backlog by running the below command for all volumes
# gluster volume heal <volname> info
gluster volume heal <volname> info
- Restart any gfapi based application stopped previously.
- After the upgrade is complete on all servers, run the following command:
# gluster volume set all cluster.op-version 30800
gluster volume set all cluster.op-version 30800
### Offline Upgrade Procedure
### Offline Upgrade Procedure
For this procedure, schedule a downtime and prevent all your clients from accessing the servers.
On every storage server in your trusted storage pool:
- Stop all gluster services using the below command or through your favorite way to stop them.
# killall glusterfs glusterfsd glusterd
killall glusterfs glusterfsd glusterd
- If you are using gfapi based applications (qemu, NFS-Ganesha, Samba etc.) on the servers, please stop those applications too.
@@ -61,25 +63,24 @@ On every storage server in your trusted storage pool:
- Ensure that version reflects 3.8.x in the output of
# gluster --version
gluster --version
- Start glusterd on the upgraded server
# glusterd
glusterd
- Ensure that all gluster processes are online by executing
# gluster volume status
gluster volume status
- Restart any gfapi based application stopped previously.
- After the upgrade is complete on all servers, run the following command:
# gluster volume set all cluster.op-version 30800
gluster volume set all cluster.op-version 30800
### Upgrade Procedure for Clients
- Unmount all glusterfs mount points on the client
- Stop applications using gfapi (qemu etc.)
- Install Gluster 3.8

View File

@@ -9,5 +9,5 @@ Note that there is only a single difference, related to the `op-version`:
After the upgrade is complete on all servers, run the following command:
```console
# gluster volume set all cluster.op-version 30900
gluster volume set all cluster.op-version 30900
```

View File

@@ -3,6 +3,7 @@
**NOTE:** Upgrade procedure remains the same as with 3.12 and 3.10 releases
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -11,74 +12,79 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT**: If any of your volumes, in the trusted storage pool that is being upgraded, uses disperse or is a pure distributed volume, this procedure is **NOT** recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 4.0 version:
1. Stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
1. Stop all gluster services, either using the command below, or through other means,
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
killall glusterfs glusterfsd glusterd
3. Install Gluster 4.0
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
4. Ensure that version reflects 4.0.x in the output of,
3. Install Gluster 4.0
# gluster --version
4. Ensure that version reflects 4.0.x in the output of,
**NOTE:** x is the minor release number for the release
gluster --version
5. Start glusterd on the upgraded server
**NOTE:** x is the minor release number for the release
# glusterd
5. Start glusterd on the upgraded server
6. Ensure that all gluster processes are online by checking the output of,
glusterd
# gluster volume status
6. Ensure that all gluster processes are online by checking the output of,
7. Self-heal all gluster volumes by running
gluster volume status
# for i in `gluster volume list`; do gluster volume heal $i; done
7. Self-heal all gluster volumes by running
8. Ensure that there is no heal backlog by running the below command for all volumes
for i in `gluster volume list`; do gluster volume heal $i; done
# gluster volume heal <volname> info
8. Ensure that there is no heal backlog by running the below command for all volumes
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
gluster volume heal <volname> info
9. Restart any gfapi based application stopped previously in step (2)
> NOTE: If there is a heal backlog, wait till the backlog is empty, or the backlog does not have any entries needing a sync to the just upgraded server, before proceeding to upgrade the next server in the pool
9. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
killall glusterfs glusterfsd glusterd
3. Install Gluster 4.0, on all servers
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
4. Ensure that version reflects 4.0.x in the output of the following command on all servers,
3. Install Gluster 4.0, on all servers
# gluster --version
4. Ensure that version reflects 4.0.x in the output of the following command on all servers,
**NOTE:** x is the minor release number for the release
gluster --version
5. Start glusterd on all the upgraded servers
**NOTE:** x is the minor release number for the release
# glusterd
5. Start glusterd on all the upgraded servers
6. Ensure that all gluster processes are online by checking the output of,
glusterd
# gluster volume status
6. Ensure that all gluster processes are online by checking the output of,
7. Restart any gfapi based application stopped previously in step (2)
gluster volume status
7. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
@@ -86,6 +92,7 @@ Perform the following steps post upgrading the entire trusted storage pool,
- Post upgrading the clients, for replicate volumes, it is recommended to enable the option `gluster volume set <volname> fips-mode-rchecksum on` to turn off usage of MD5 checksums during healing. This enables running Gluster on FIPS compliant systems.
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 4.0.x version,
**NOTE:** x is the minor release number for the release

View File

@@ -3,6 +3,7 @@
> **NOTE:** Upgrade procedure remains the same as with 3.12 and 3.10 releases
### Pre-upgrade notes
- Online upgrade is only possible with replicated and distributed replicate volumes
- Online upgrade is not supported for dispersed or distributed dispersed volumes
- Ensure no configuration changes are done during the upgrade
@@ -11,88 +12,89 @@
- It is recommended to have the same client and server, major versions running eventually
### Online upgrade procedure for servers
This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool.
> **ALERT:** If there are disperse or, pure distributed volumes in the storage pool being upgraded, this procedure is NOT recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead.
#### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to 4.1 version:
1. Stop all gluster services, either using the command below, or through other means,
# killall glusterfs glusterfsd glusterd
# systemctl stop glustereventsd
1. Stop all gluster services, either using the command below, or through other means,
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
killall glusterfs glusterfsd glusterd
systemctl stop glustereventsd
3. Install Gluster 4.1
2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.)
4. Ensure that version reflects 4.1.x in the output of,
3. Install Gluster 4.1
# gluster --version
4. Ensure that version reflects 4.1.x in the output of,
gluster --version
> **NOTE:** x is the minor release number for the release
5. Start glusterd on the upgraded server
5. Start glusterd on the upgraded server
# glusterd
glusterd
6. Ensure that all gluster processes are online by checking the output of,
6. Ensure that all gluster processes are online by checking the output of,
# gluster volume status
gluster volume status
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
# systemctl start glustereventsd
systemctl start glustereventsd
8. Invoke self-heal on all the gluster volumes by running,
8. Invoke self-heal on all the gluster volumes by running,
# for i in `gluster volume list`; do gluster volume heal $i; done
for i in `gluster volume list`; do gluster volume heal $i; done
9. Verify that there are no heal backlog by running the command for all the volumes,
9. Verify that there are no heal backlog by running the command for all the volumes,
# gluster volume heal <volname> info
gluster volume heal <volname> info
> **NOTE:** Before proceeding to upgrade the next server in the pool it is recommended to check the heal backlog. If there is a heal backlog, it is recommended to wait until the backlog is empty, or, the backlog does not contain any entries requiring a sync to the just upgraded server.
10. Restart any gfapi based application stopped previously in step (2)
### Offline upgrade procedure
This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes.
#### Steps to perform an offline upgrade:
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
```sh
#killall glusterfs glusterfsd glusterd glustereventsd
#systemctl stop glustereventsd
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means,
3. Install Gluster 4.1, on all servers
killall glusterfs glusterfsd glusterd glustereventsd
systemctl stop glustereventsd
4. Ensure that version reflects 4.1.x in the output of the following command on all servers,
```sh
#gluster --version
```
2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers
> **NOTE:** x is the minor release number for the release
3. Install Gluster 4.1, on all servers
5. Start glusterd on all the upgraded servers
```sh
#glusterd
```
6. Ensure that all gluster processes are online by checking the output of,
```sh
#gluster volume status
```
4. Ensure that version reflects 4.1.x in the output of the following command on all servers,
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
```sh
#systemctl start glustereventsd
```
gluster --version
8. Restart any gfapi based application stopped previously in step (2)
> **NOTE:** x is the minor release number for the release
5. Start glusterd on all the upgraded servers
glusterd
6. Ensure that all gluster processes are online by checking the output of,
gluster volume status
7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means,
systemctl start glustereventsd
8. Restart any gfapi based application stopped previously in step (2)
### Post upgrade steps
Perform the following steps post upgrading the entire trusted storage pool,
- It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details
@@ -100,6 +102,7 @@ Perform the following steps post upgrading the entire trusted storage pool,
- Post upgrading the clients, for replicate volumes, it is recommended to enable the option `gluster volume set <volname> fips-mode-rchecksum on` to turn off usage of MD5 checksums during healing. This enables running Gluster on FIPS compliant systems.
### Upgrade procedure for clients
Following are the steps to upgrade clients to the 4.1.x version,
> **NOTE:** x is the minor release number for the release

View File

@@ -8,15 +8,16 @@ version reference.
### Major issues
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -24,7 +25,7 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```

View File

@@ -11,15 +11,16 @@ version reference.
### Major issues
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -27,7 +28,7 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```

View File

@@ -10,22 +10,23 @@ documented instructions, replacing 7 when you encounter 4.1 in the guide as the
version reference.
> **NOTE:** If you have ever enabled quota on your volumes then after the upgrade
is done, you will have to restart all the nodes in the cluster one by one so as to
fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols/<volname>/ directory.`
The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted
everything will be back to normal.
> is done, you will have to restart all the nodes in the cluster one by one so as to
> fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols/<volname>/ directory.`
> The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted
> everything will be back to normal.
### Major issues
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
1. The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
- features.grace-timeout
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -33,7 +34,7 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```

View File

@@ -7,17 +7,19 @@ aware of the features and fixes provided with the release.
> With version 8, there are certain changes introduced to the directory structure of changelog files in gluster geo-replication.
> Thus, before the upgrade of geo-rep packages, we need to execute the [upgrade script](https://github.com/gluster/glusterfs/commit/2857fe3fad4d2b30894847088a54b847b88a23b9) with the brick path as argument, as described below:
>1. Stop the geo-rep session
>2. Run the upgrade script with the brick path as the argument. Script can be used in loop for multiple bricks.
>3. Start the upgradation process.
>This script will update the existing changelog directory structure and the paths inside the htime files to a new format introduced in version 8.
>If the above mentioned script is not executed, the search algorithm, used during the history crawl will fail with the wrong result for upgradation from version 7 and below to version 8 and above.
>
> 1. Stop the geo-rep session
> 2. Run the upgrade script with the brick path as the argument. Script can be used in loop for multiple bricks.
> 3. Start the upgradation process.
> This script will update the existing changelog directory structure and the paths inside the htime files to a new format introduced in version 8.
> If the above mentioned script is not executed, the search algorithm, used during the history crawl will fail with the wrong result for upgradation from version 7 and below to version 8 and above.
Refer, to the [generic upgrade procedure](./generic-upgrade-procedure.md) guide and follow documented instructions.
## Major issues
### The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
@@ -26,7 +28,7 @@ before an upgrade from releases older than release 4.1.0,
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -34,7 +36,7 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```
@@ -48,7 +50,6 @@ If these are set, then unset them using the following commands,
- Tiering support (tier xlator and changetimerecorder)
- Glupy
**NOTE:** Failure to do the above may result in failure during online upgrades,
and the reset of these options to their defaults needs to be done **prior** to
upgrading the cluster.

View File

@@ -10,6 +10,7 @@ Refer, to the [generic upgrade procedure](./generic-upgrade-procedure.md) guide
## Major issues
### The following options are removed from the code base and require to be unset
before an upgrade from releases older than release 4.1.0,
- features.lock-heal
@@ -18,7 +19,7 @@ before an upgrade from releases older than release 4.1.0,
To check if these options are set use,
```console
# gluster volume info
gluster volume info
```
and ensure that the above options are not part of the `Options Reconfigured:`
@@ -26,11 +27,11 @@ section in the output of all volumes in the cluster.
If these are set, then unset them using the following commands,
```console
```{ .console .no-copy }
# gluster volume reset <volname> <option>
```
### Make sure you are not using any of the following depricated features :
### Make sure you are not using any of the following deprecated features :
- Block device (bd) xlator
- Decompounder feature
@@ -40,7 +41,6 @@ If these are set, then unset them using the following commands,
- Tiering support (tier xlator and changetimerecorder)
- Glupy
**NOTE:** Failure to do the above may result in failure during online upgrades,
and the reset of these options to their defaults needs to be done **prior** to
upgrading the cluster.

View File

@@ -1,57 +1,58 @@
Glossary
========
# Glossary
**Access Control Lists**
: Access Control Lists (ACLs) allow you to assign different permissions
for different users or groups even though they do not correspond to the
original owner or the owning group.
: Access Control Lists (ACLs) allow you to assign different permissions
for different users or groups even though they do not correspond to the
original owner or the owning group.
**Block Storage**
: Block special files, or block devices, correspond to devices through which the system moves
data in the form of blocks. These device nodes often represent addressable devices such as
hard disks, CD-ROM drives, or memory regions. GlusterFS requires a filesystem (like XFS) that
supports extended attributes.
: Block special files, or block devices, correspond to devices through which the system moves
data in the form of blocks. These device nodes often represent addressable devices such as
hard disks, CD-ROM drives, or memory regions. GlusterFS requires a filesystem (like XFS) that
supports extended attributes.
**Brick**
: A Brick is the basic unit of storage in GlusterFS, represented by an export directory
on a server in the trusted storage pool.
A brick is expressed by combining a server with an export directory in the following format:
: A Brick is the basic unit of storage in GlusterFS, represented by an export directory
on a server in the trusted storage pool.
A brick is expressed by combining a server with an export directory in the following format:
`SERVER:EXPORT`
For example:
`myhostname:/exports/myexportdir/`
```{ .text .no-copy }
SERVER:EXPORT
For example:
myhostname:/exports/myexportdir/
```
**Client**
: Any machine that mounts a GlusterFS volume. Any applications that use libgfapi access
mechanism can also be treated as clients in GlusterFS context.
: Any machine that mounts a GlusterFS volume. Any applications that use libgfapi access
mechanism can also be treated as clients in GlusterFS context.
**Cluster**
: A trusted pool of linked computers working together, resembling a single computing resource.
In GlusterFS, a cluster is also referred to as a trusted storage pool.
: A trusted pool of linked computers working together, resembling a single computing resource.
In GlusterFS, a cluster is also referred to as a trusted storage pool.
**Distributed File System**
: A file system that allows multiple clients to concurrently access data which is spread across
servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental
to all distributed file systems.
: A file system that allows multiple clients to concurrently access data which is spread across
servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental
to all distributed file systems.
**Extended Attributes**
: Extended file attributes (abbreviated xattr) is a filesystem feature that enables
users/programs to associate files/dirs with metadata. Gluster stores metadata in xattrs.
: Extended file attributes (abbreviated xattr) is a filesystem feature that enables
users/programs to associate files/dirs with metadata. Gluster stores metadata in xattrs.
**Filesystem**
: A method of storing and organizing computer files and their data.
Essentially, it organizes these files into a database for the
storage, organization, manipulation, and retrieval by the computer's
operating system.
: A method of storing and organizing computer files and their data.
Essentially, it organizes these files into a database for the
storage, organization, manipulation, and retrieval by the computer's
operating system.
Source [Wikipedia][Wikipedia]
Source [Wikipedia][wikipedia]
**FUSE**
: Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like
computer operating systems that lets non-privileged users create their
own file systems without editing kernel code. This is achieved by
running file system code in user space while the FUSE module provides
only a "bridge" to the actual kernel interfaces.
computer operating systems that lets non-privileged users create their
own file systems without editing kernel code. This is achieved by
running file system code in user space while the FUSE module provides
only a "bridge" to the actual kernel interfaces.
Source: [Wikipedia][1]
**GFID**
@@ -60,156 +61,156 @@ associated with it called the GFID. This is analogous to inode in a
regular filesystem.
**glusterd**
: The Gluster daemon/service that manages volumes and cluster membership. It is required to
run on all the servers in the trusted storage pool.
: The Gluster daemon/service that manages volumes and cluster membership. It is required to
run on all the servers in the trusted storage pool.
**Geo-Replication**
: Geo-replication provides a continuous, asynchronous, and incremental
replication service from site to another over Local Area Networks
(LANs), Wide Area Network (WANs), and across the Internet.
: Geo-replication provides a continuous, asynchronous, and incremental
replication service from site to another over Local Area Networks
(LANs), Wide Area Network (WANs), and across the Internet.
**Infiniband**
InfiniBand is a switched fabric computer network communications link
used in high-performance computing and enterprise data centers.
InfiniBand is a switched fabric computer network communications link
used in high-performance computing and enterprise data centers.
**Metadata**
: Metadata is defined as data providing information about one or more
other pieces of data. There is no special metadata storage concept in
GlusterFS. The metadata is stored with the file data itself usually in the
form of extended attributes
: Metadata is defined as data providing information about one or more
other pieces of data. There is no special metadata storage concept in
GlusterFS. The metadata is stored with the file data itself usually in the
form of extended attributes
**Namespace**
: A namespace is an abstract container or environment created to hold a
logical grouping of unique identifiers or symbols. Each Gluster volume
exposes a single namespace as a POSIX mount point that contains every
file in the cluster.
: A namespace is an abstract container or environment created to hold a
logical grouping of unique identifiers or symbols. Each Gluster volume
exposes a single namespace as a POSIX mount point that contains every
file in the cluster.
**Node**
: A server or computer that hosts one or more bricks.
: A server or computer that hosts one or more bricks.
**N-way Replication**
: Local synchronous data replication which is typically deployed across campus
or Amazon Web Services Availability Zones.
: Local synchronous data replication which is typically deployed across campus
or Amazon Web Services Availability Zones.
**Petabyte**
: A petabyte (derived from the SI prefix peta- ) is a unit of
information equal to one quadrillion (short scale) bytes, or 1000
terabytes. The unit symbol for the petabyte is PB. The prefix peta-
(P) indicates a power of 1000:
: A petabyte (derived from the SI prefix peta- ) is a unit of
information equal to one quadrillion (short scale) bytes, or 1000
terabytes. The unit symbol for the petabyte is PB. The prefix peta-
(P) indicates a power of 1000:
1 PB = 1,000,000,000,000,000 B = 10005 B = 1015 B.
```{ .text .no-copy }
1 PB = 1,000,000,000,000,000 B = 10005 B = 1015 B.
The term "pebibyte" (PiB), using a binary prefix, is used for the
corresponding power of 1024.
The term "pebibyte" (PiB), using a binary prefix, is used for the
corresponding power of 1024.
```
Source: [Wikipedia][3]
**POSIX**
: Portable Operating System Interface (for Unix) is the name of a family
of related standards specified by the IEEE to define the application
programming interface (API), along with shell and utilities interfaces
for software compatible with variants of the Unix operating system
Gluster exports a POSIX compatible file system.
: Portable Operating System Interface (for Unix) is the name of a family
of related standards specified by the IEEE to define the application
programming interface (API), along with shell and utilities interfaces
for software compatible with variants of the Unix operating system
Gluster exports a POSIX compatible file system.
**Quorum**
: The configuration of quorum in a trusted storage pool determines the
number of server failures that the trusted storage pool can sustain.
If an additional failure occurs, the trusted storage pool becomes
unavailable.
: The configuration of quorum in a trusted storage pool determines the
number of server failures that the trusted storage pool can sustain.
If an additional failure occurs, the trusted storage pool becomes
unavailable.
**Quota**
: Quota allows you to set limits on usage of disk space by directories or
by volumes.
: Quota allows you to set limits on usage of disk space by directories or
by volumes.
**RAID**
: Redundant Array of Inexpensive Disks (RAID) is a technology that provides
increased storage reliability through redundancy, combining multiple
low-cost, less-reliable disk drives components into a logical unit where
all drives in the array are interdependent.
: Redundant Array of Inexpensive Disks (RAID) is a technology that provides
increased storage reliability through redundancy, combining multiple
low-cost, less-reliable disk drives components into a logical unit where
all drives in the array are interdependent.
**RDMA**
: Remote direct memory access (RDMA) is a direct memory access from the
memory of one computer into that of another without involving either
one's operating system. This permits high-throughput, low-latency
networking, which is especially useful in massively parallel computer
clusters
: Remote direct memory access (RDMA) is a direct memory access from the
memory of one computer into that of another without involving either
one's operating system. This permits high-throughput, low-latency
networking, which is especially useful in massively parallel computer
clusters
**Rebalance**
: The process of redistributing data in a distributed volume when a
brick is added or removed.
: The process of redistributing data in a distributed volume when a
brick is added or removed.
**RRDNS**
: Round Robin Domain Name Service (RRDNS) is a method to distribute load
across application servers. It is implemented by creating multiple A
records with the same name and different IP addresses in the zone file
of a DNS server.
: Round Robin Domain Name Service (RRDNS) is a method to distribute load
across application servers. It is implemented by creating multiple A
records with the same name and different IP addresses in the zone file
of a DNS server.
**Samba**
: Samba allows file and print sharing between computers running Windows and
computers running Linux. It is an implementation of several services and
protocols including SMB and CIFS.
: Samba allows file and print sharing between computers running Windows and
computers running Linux. It is an implementation of several services and
protocols including SMB and CIFS.
**Scale-Up Storage**
: Increases the capacity of the storage device in a single dimension.
For example, adding additional disk capacity to an existing trusted storage pool.
: Increases the capacity of the storage device in a single dimension.
For example, adding additional disk capacity to an existing trusted storage pool.
**Scale-Out Storage**
: Scale out systems are designed to scale on both capacity and performance.
It increases the capability of a storage device in single dimension.
For example, adding more systems of the same size, or adding servers to a trusted storage pool
that increases CPU, disk capacity, and throughput for the trusted storage pool.
: Scale out systems are designed to scale on both capacity and performance.
It increases the capability of a storage device in single dimension.
For example, adding more systems of the same size, or adding servers to a trusted storage pool
that increases CPU, disk capacity, and throughput for the trusted storage pool.
**Self-Heal**
: The self-heal daemon that runs in the background, identifies
inconsistencies in files/dirs in a replicated or erasure coded volume and then resolves
or heals them. This healing process is usually required when one or more
bricks of a volume goes down and then comes up later.
: The self-heal daemon that runs in the background, identifies
inconsistencies in files/dirs in a replicated or erasure coded volume and then resolves
or heals them. This healing process is usually required when one or more
bricks of a volume goes down and then comes up later.
**Server**
: The machine (virtual or bare metal) that hosts the bricks in which data is stored.
: The machine (virtual or bare metal) that hosts the bricks in which data is stored.
**Split-brain**
: A situation where data on two or more bricks in a replicated
volume start to diverge in terms of content or metadata. In this state,
one cannot determine programmatically which set of data is "right" and
which is "wrong".
: A situation where data on two or more bricks in a replicated
volume start to diverge in terms of content or metadata. In this state,
one cannot determine programmatically which set of data is "right" and
which is "wrong".
**Subvolume**
: A brick after being processed by at least one translator.
: A brick after being processed by at least one translator.
**Translator**
: Translators (also called xlators) are stackable modules where each
module has a very specific purpose. Translators are stacked in a
hierarchical structure called as graph. A translator receives data
from its parent translator, performs necessary operations and then
passes the data down to its child translator in hierarchy.
: Translators (also called xlators) are stackable modules where each
module has a very specific purpose. Translators are stacked in a
hierarchical structure called as graph. A translator receives data
from its parent translator, performs necessary operations and then
passes the data down to its child translator in hierarchy.
**Trusted Storage Pool**
: A storage pool is a trusted network of storage servers. When you start
the first server, the storage pool consists of that server alone.
: A storage pool is a trusted network of storage servers. When you start
the first server, the storage pool consists of that server alone.
**Userspace**
: Applications running in user space dont directly interact with
hardware, instead using the kernel to moderate access. Userspace
applications are generally more portable than applications in kernel
space. Gluster is a user space application.
: Applications running in user space dont directly interact with
hardware, instead using the kernel to moderate access. Userspace
applications are generally more portable than applications in kernel
space. Gluster is a user space application.
**Virtual File System (VFS)**
: VFS is a kernel software layer which handles all system calls related to the standard Linux file system.
It provides a common interface to several kinds of file systems.
: VFS is a kernel software layer which handles all system calls related to the standard Linux file system.
It provides a common interface to several kinds of file systems.
**Volume**
: A volume is a logical collection of bricks.
: A volume is a logical collection of bricks.
**Vol file**
: Vol files or volume (.vol) files are configuration files that determine the behavior of the
Gluster trusted storage pool. It is a textual representation of a
collection of modules (also known as translators) that together implement the
various functions required.
Gluster trusted storage pool. It is a textual representation of a
collection of modules (also known as translators) that together implement the
various functions required.
[Wikipedia]: http://en.wikipedia.org/wiki/Filesystem
[1]: http://en.wikipedia.org/wiki/Filesystem_in_Userspace
[2]: http://en.wikipedia.org/wiki/Open_source
[3]: http://en.wikipedia.org/wiki/Petabyte
[wikipedia]: http://en.wikipedia.org/wiki/Filesystem
[1]: http://en.wikipedia.org/wiki/Filesystem_in_Userspace
[2]: http://en.wikipedia.org/wiki/Open_source
[3]: http://en.wikipedia.org/wiki/Petabyte

View File

@@ -0,0 +1,40 @@
// Add ability to copy the current URL using vim like shortcuts
// There already exists navigation related shortcuts like
// F/S -- For Searching
// P/N -- For navigating to previous/next pages
// This patch just extends those features
// Expose the internal notification API of mkdocs
// This API isn't exposed publically, IDK why
// They use it internally to show notifications when user copies a code block
// I reverse engineered it for ease of use, takes a string arg `msg`
const notifyDOM = (msg) => {
if (typeof alert$ === "undefined") {
console.error("Clipboard notification API not available");
return;
}
alert$.next(msg);
};
// Extend the keyboard shortcut features
keyboard$.subscribe((key) => {
// We want to allow the user to be able to type our modifiders in search
// Disallowing that would be hilarious
if (key.mode === "search") {
return;
}
const keyPressed = key.type.toLowerCase();
// Y is added to honor vim enthusiasts (yank)
if (keyPressed === "c" || keyPressed === "y") {
const currLocation = window.location.href;
if (currLocation) {
navigator.clipboard
.writeText(currLocation)
.then(() => notifyDOM("Address copied to clipboard"))
.catch((e) => console.error(e));
}
}
});

View File

@@ -1,74 +0,0 @@
(function () {
'use strict';
$(document).ready(function () {
fixSearchResults();
fixSearch();
warnDomain();
});
/**
* Adds a TOC-style table to each page in the 'Modules' section.
*/
function fixSearchResults() {
$('#mkdocs-search-results').text('Searching...');
}
/**
* Warn if the domain is gluster.readthedocs.io
*
*/
function warnDomain() {
var domain = window.location.hostname;
if (domain.indexOf('readthedocs.io') != -1) {
$('div.section').prepend('<div class="warning"><p>You are viewing outdated content. We have moved to <a href="http://docs.gluster.org' + window.location.pathname + '">docs.gluster.org.</a></p></div>');
}
}
/*
* RTD messes up MkDocs' search feature by tinkering with the search box defined in the theme, see
* https://github.com/rtfd/readthedocs.org/issues/1088. This function sets up a DOM4 MutationObserver
* to react to changes to the search form (triggered by RTD on doc ready). It then reverts everything
* the RTD JS code modified.
*/
function fixSearch() {
var target = document.getElementById('mkdocs-search-form');
var config = {attributes: true, childList: true};
var observer = new MutationObserver(function(mutations) {
// if it isn't disconnected it'll loop infinitely because the observed element is modified
observer.disconnect();
var form = $('#mkdocs-search-form');
form.empty();
form.attr('action', 'https://' + window.location.hostname + '/en/' + determineSelectedBranch() + '/search.html');
$('<input>').attr({
type: "text",
name: "q",
placeholder: "Search docs"
}).appendTo(form);
});
if (window.location.origin.indexOf('readthedocs') > -1 || window.location.origin.indexOf('docs.gluster.org') > -1) {
observer.observe(target, config);
}
}
/**
* Analyzes the URL of the current page to find out what the selected GitHub branch is. It's usually
* part of the location path. The code needs to distinguish between running MkDocs standalone
* and docs served from RTD. If no valid branch could be determined 'dev' returned.
*
* @returns GitHub branch name
*/
function determineSelectedBranch() {
var branch = 'latest', path = window.location.pathname;
if (window.location.origin.indexOf('readthedocs') > -1) {
// path is like /en/<branch>/<lang>/build/ -> extract 'lang'
// split[0] is an '' because the path starts with the separator
branch = path.split('/')[2];
}
return branch;
}
}());

View File

@@ -7,28 +7,29 @@ This is a major release that includes a range of features, code improvements and
A selection of the key features and changes are documented in this page.
A full list of bugs that have been addressed is included further below.
- [Announcements](#announcements)
- [Highlights](#highlights)
- [Bugs addressed in the release](#bugs-addressed)
- [Release notes for Gluster 10.0](#release-notes-for-gluster-100)
- [Announcements](#announcements)
- [Builds are available at -](#builds-are-available-at--)
- [Highlights](#highlights)
- [Bugs addressed](#bugs-addressed)
## Announcements
1. Releases that receive maintenance updates post release 10 is 9
([reference](https://www.gluster.org/release-schedule/))
2. Release 10 will receive maintenance updates around the 15th of every alternative month, and the release 9 will recieve maintainance updates around 15th every three months.
([reference](https://www.gluster.org/release-schedule/))
2. Release 10 will receive maintenance updates around the 15th of every alternative month, and the release 9 will recieve maintainance updates around 15th every three months.
## Builds are available at -
[https://download.gluster.org/pub/gluster/glusterfs/10/10.0/](https://download.gluster.org/pub/gluster/glusterfs/10/10.0/)
[https://download.gluster.org/pub/gluster/glusterfs/10/10.0/](https://download.gluster.org/pub/gluster/glusterfs/10/10.0/)
## Highlights
- Major performance improvement of ~20% w.r.t small files
as well as large files testing in controlled lab environments [#2771](https://github.com/gluster/glusterfs/issues/2771)
**NOTE**: The above improvement requires tcmalloc library to be enabled for building. We have tested and verified tcmalloc in X86_64 platforms and is enabled only for x86_64 builds in current release.
**NOTE**: The above improvement requires tcmalloc library to be enabled for building. We have tested and verified tcmalloc in X86_64 platforms and is enabled only for x86_64 builds in current release.
- Randomized port selection for bricks, improves startup time [#786](https://github.com/gluster/glusterfs/issues/786)
- Performance improvement with use of readdir instead of readdirp in fix-layout [#2241](https://github.com/gluster/glusterfs/issues/2241)
- Heal time improvement with bigger window size [#2067](https://github.com/gluster/glusterfs/issues/2067)
@@ -37,168 +38,168 @@ A full list of bugs that have been addressed is included further below.
Bugs addressed since release-10 are listed below.
- [#504](https://github.com/gluster/glusterfs/issues/504) AFR: remove memcpy() + ntoh32() pattern
- [#705](https://github.com/gluster/glusterfs/issues/705) gf_backtrace_save inefficiencies
- [#782](https://github.com/gluster/glusterfs/issues/782) Do not explicitly call strerror(errnum) when logging
- [#786](https://github.com/gluster/glusterfs/issues/786) glusterd-pmap binds to 10K ports on startup (using IPv4)
- [#904](https://github.com/gluster/glusterfs/issues/904) [bug:1649037] Translators allocate too much memory in their xlator_
- [#1000](https://github.com/gluster/glusterfs/issues/1000) [bug:1193929] GlusterFS can be improved
- [#1002](https://github.com/gluster/glusterfs/issues/1002) [bug:1679998] GlusterFS can be improved
- [#1052](https://github.com/gluster/glusterfs/issues/1052) [bug:1693692] Increase code coverage from regression tests
- [#1060](https://github.com/gluster/glusterfs/issues/1060) [bug:789278] Issues reported by Coverity static analysis tool
- [#1096](https://github.com/gluster/glusterfs/issues/1096) [bug:1622665] clang-scan report: glusterfs issues
- [#1101](https://github.com/gluster/glusterfs/issues/1101) [bug:1813029] volume brick fails to come online because other proce
- [#1251](https://github.com/gluster/glusterfs/issues/1251) performance: improve __afr_fd_ctx_get() function
- [#1339](https://github.com/gluster/glusterfs/issues/1339) Rebalance status is not shown correctly after node reboot
- [#1358](https://github.com/gluster/glusterfs/issues/1358) features/shard: wrong "inode->ref" leading to ASSERT in inode_unref
- [#1359](https://github.com/gluster/glusterfs/issues/1359) Cleanup --disable-mempool
- [#1380](https://github.com/gluster/glusterfs/issues/1380) fd_unref() optimization - do an atomic decrement outside the lock a
- [#1384](https://github.com/gluster/glusterfs/issues/1384) mount glusterfs volume, files larger than 64Mb only show 64Mb
- [#1406](https://github.com/gluster/glusterfs/issues/1406) shared storage volume fails to mount in ipv6 environment
- [#1415](https://github.com/gluster/glusterfs/issues/1415) Removing problematic language in geo-replication
- [#1423](https://github.com/gluster/glusterfs/issues/1423) shard_make_block_abspath() should be called with a string of of the
- [#1536](https://github.com/gluster/glusterfs/issues/1536) Improve dict_reset() efficiency
- [#1545](https://github.com/gluster/glusterfs/issues/1545) fuse_invalidate_entry() - too many repetitive calls to uuid_utoa()
- [#1583](https://github.com/gluster/glusterfs/issues/1583) Rework stats structure (xl->stats.total.metrics[fop_idx] and friend
- [#1584](https://github.com/gluster/glusterfs/issues/1584) MAINTAINERS file needs to be revisited and updated
- [#1596](https://github.com/gluster/glusterfs/issues/1596) 'this' NULL check relies on 'THIS' not being NULL
- [#1600](https://github.com/gluster/glusterfs/issues/1600) Save and re-use MYUUID
- [#1678](https://github.com/gluster/glusterfs/issues/1678) Improve gf_error_to_errno() and gf_errno_to_error() positive flow
- [#1695](https://github.com/gluster/glusterfs/issues/1695) Rebalance has a redundant lookup operation
- [#1702](https://github.com/gluster/glusterfs/issues/1702) Move GF_CLIENT_PID_GSYNCD check to start of the function.
- [#1703](https://github.com/gluster/glusterfs/issues/1703) Remove trivial check for GF_XATTR_SHARD_FILE_SIZE before calling sh
- [#1707](https://github.com/gluster/glusterfs/issues/1707) PL_LOCAL_GET_REQUESTS access the dictionary twice for the same info
- [#1717](https://github.com/gluster/glusterfs/issues/1717) glusterd: sequence of rebalance and replace/reset-brick presents re
- [#1723](https://github.com/gluster/glusterfs/issues/1723) DHT: further investigation for treating an ongoing mknod's linkto file
- [#1749](https://github.com/gluster/glusterfs/issues/1749) brick-process: call 'notify()' and 'fini()' of brick xlators in a p
- [#1755](https://github.com/gluster/glusterfs/issues/1755) Reduce calls to 'THIS' in fd_destroy() and others, where 'THIS' is
- [#1761](https://github.com/gluster/glusterfs/issues/1761) CONTRIBUTING.md regression can only be run by maintainers
- [#1764](https://github.com/gluster/glusterfs/issues/1764) Slow write on ZFS bricks after healing millions of files due to add
- [#1772](https://github.com/gluster/glusterfs/issues/1772) build: add LTO as a configure option
- [#1773](https://github.com/gluster/glusterfs/issues/1773) DHT/Rebalance - Remove unused variable dht_migrate_file
- [#1779](https://github.com/gluster/glusterfs/issues/1779) Add-brick command should check hostnames with bricks present in vol
- [#1825](https://github.com/gluster/glusterfs/issues/1825) Latency in io-stats should be in nanoseconds resolution, not micros
- [#1872](https://github.com/gluster/glusterfs/issues/1872) Question: How to check heal info without glusterd management layer
- [#1885](https://github.com/gluster/glusterfs/issues/1885) __posix_writev() - reduce memory copies and unneeded zeroing
- [#1888](https://github.com/gluster/glusterfs/issues/1888) GD_OP_VERSION needs to be updated for release-10
- [#1898](https://github.com/gluster/glusterfs/issues/1898) schedule_georep.py resulting in failure when used with python3
- [#1909](https://github.com/gluster/glusterfs/issues/1909) core: Avoid several dict OR key is NULL message in brick logs
- [#1925](https://github.com/gluster/glusterfs/issues/1925) dht_pt_getxattr does not seem to handle virtual xattrs.
- [#1935](https://github.com/gluster/glusterfs/issues/1935) logging to syslog instead of any glusterfs logs
- [#1943](https://github.com/gluster/glusterfs/issues/1943) glusterd-volgen: Add functionality to accept any custom xlator
- [#1952](https://github.com/gluster/glusterfs/issues/1952) posix-aio: implement GF_FOP_FSYNC
- [#1959](https://github.com/gluster/glusterfs/issues/1959) Broken links in the 2 replicas split-brain-issue - [Bug][Enhancemen
- [#1960](https://github.com/gluster/glusterfs/issues/1960) Add missing LOCK_DESTROY() calls
- [#1966](https://github.com/gluster/glusterfs/issues/1966) Can't print trace details due to memory allocation issues
- [#1977](https://github.com/gluster/glusterfs/issues/1977) Inconsistent locking in presence of disconnects
- [#1978](https://github.com/gluster/glusterfs/issues/1978) test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is gett
- [#1981](https://github.com/gluster/glusterfs/issues/1981) Reduce posix_fdstat() calls in IO paths
- [#1991](https://github.com/gluster/glusterfs/issues/1991) mdcache: bug causes getxattr() to report ENODATA when fetching samb
- [#1992](https://github.com/gluster/glusterfs/issues/1992) dht: var decommission_subvols_cnt becomes invalid when config is up
- [#1996](https://github.com/gluster/glusterfs/issues/1996) Analyze if spinlocks have any benefit and remove them if not
- [#2001](https://github.com/gluster/glusterfs/issues/2001) Error handling in /usr/sbin/gluster-eventsapi produces AttributeErr
- [#2005](https://github.com/gluster/glusterfs/issues/2005) ./tests/bugs/replicate/bug-921231.t is continuously failing
- [#2013](https://github.com/gluster/glusterfs/issues/2013) dict_t hash-calculation can be removed when hash_size=1
- [#2024](https://github.com/gluster/glusterfs/issues/2024) Remove gfs_id variable or at least set to appropriate value
- [#2025](https://github.com/gluster/glusterfs/issues/2025) list_del() should not set prev and next
- [#2033](https://github.com/gluster/glusterfs/issues/2033) tests/bugs/nfs/bug-1053579.t fails on CentOS 8
- [#2038](https://github.com/gluster/glusterfs/issues/2038) shard_unlink() fails due to no space to create marker file
- [#2039](https://github.com/gluster/glusterfs/issues/2039) Do not allow POSIX IO backend switch when the volume is running
- [#2042](https://github.com/gluster/glusterfs/issues/2042) mount ipv6 gluster volume with serveral backup-volfile-servers,use
- [#2052](https://github.com/gluster/glusterfs/issues/2052) Revert the commit 50e953e2450b5183988c12e87bdfbc997e0ad8a8
- [#2054](https://github.com/gluster/glusterfs/issues/2054) cleanup call_stub_t from unused variables
- [#2063](https://github.com/gluster/glusterfs/issues/2063) Provide autoconf option to enable/disable storage.linux-io_uring du
- [#2067](https://github.com/gluster/glusterfs/issues/2067) Change self-heal-window-size to 1MB by default
- [#2075](https://github.com/gluster/glusterfs/issues/2075) Annotate synctasks with valgrind API if --enable-valgrind[=memcheck
- [#2080](https://github.com/gluster/glusterfs/issues/2080) Glustereventsd default port
- [#2083](https://github.com/gluster/glusterfs/issues/2083) GD_MSG_DICT_GET_FAILED should not include 'errno' but 'ret'
- [#2086](https://github.com/gluster/glusterfs/issues/2086) Move tests/00-geo-rep/00-georep-verify-non-root-setup.t to tests/00
- [#2096](https://github.com/gluster/glusterfs/issues/2096) iobuf_arena structure doesn't need passive and active iobufs, but l
- [#2099](https://github.com/gluster/glusterfs/issues/2099) 'force' option does not work in the replicated volume snapshot crea
- [#2101](https://github.com/gluster/glusterfs/issues/2101) Move 00-georep-verify-non-root-setup.t back to tests/00-geo-rep/
- [#2107](https://github.com/gluster/glusterfs/issues/2107) mount crashes when setfattr -n distribute.fix.layout -v "yes" is ex
- [#2116](https://github.com/gluster/glusterfs/issues/2116) enable quota for multiple volumes take more time
- [#2117](https://github.com/gluster/glusterfs/issues/2117) Concurrent quota enable causes glusterd deadlock
- [#2123](https://github.com/gluster/glusterfs/issues/2123) Implement an I/O framework
- [#2129](https://github.com/gluster/glusterfs/issues/2129) CID 1445996 Null pointer dereferences (FORWARD_NULL) /xlators/mgmt/
- [#2130](https://github.com/gluster/glusterfs/issues/2130) stack.h/c: remove unused variable and reorder struct
- [#2133](https://github.com/gluster/glusterfs/issues/2133) Changelog History Crawl failed after resuming stopped geo-replicati
- [#2134](https://github.com/gluster/glusterfs/issues/2134) Fix spurious failures caused by change in profile info duration to
- [#2138](https://github.com/gluster/glusterfs/issues/2138) glfs_write() dumps a core file file when buffer size is 1GB
- [#2154](https://github.com/gluster/glusterfs/issues/2154) "Operation not supported" doing a chmod on a symlink
- [#2159](https://github.com/gluster/glusterfs/issues/2159) Remove unused component tests
- [#2161](https://github.com/gluster/glusterfs/issues/2161) Crash caused by memory corruption
- [#2169](https://github.com/gluster/glusterfs/issues/2169) Stack overflow when parallel-readdir is enabled
- [#2180](https://github.com/gluster/glusterfs/issues/2180) CID 1446716: Memory - illegal accesses (USE_AFTER_FREE) /xlators/mg
- [#2187](https://github.com/gluster/glusterfs/issues/2187) [Input/output error] IO failure while performing shrink operation w
- [#2190](https://github.com/gluster/glusterfs/issues/2190) Move a test case tests/basic/glusterd-restart-shd-mux.t to flaky
- [#2192](https://github.com/gluster/glusterfs/issues/2192) 4+1 arbiter setup is broken
- [#2198](https://github.com/gluster/glusterfs/issues/2198) There are blocked inodelks for a long time
- [#2216](https://github.com/gluster/glusterfs/issues/2216) Fix coverity issues
- [#2232](https://github.com/gluster/glusterfs/issues/2232) "Invalid argument" when reading a directory with gfapi
- [#2234](https://github.com/gluster/glusterfs/issues/2234) Segmentation fault in directory quota daemon for replicated volume
- [#2239](https://github.com/gluster/glusterfs/issues/2239) rebalance crashes in dht on master
- [#2241](https://github.com/gluster/glusterfs/issues/2241) Using readdir instead of readdirp for fix-layout increases performa
- [#2253](https://github.com/gluster/glusterfs/issues/2253) Disable lookup-optimize by default in the virt group
- [#2258](https://github.com/gluster/glusterfs/issues/2258) Provide option to disable fsync in data migration
- [#2260](https://github.com/gluster/glusterfs/issues/2260) failed to list quota info after setting limit-usage
- [#2268](https://github.com/gluster/glusterfs/issues/2268) dht_layout_unref() only uses 'this' to check that 'this->private' i
- [#2278](https://github.com/gluster/glusterfs/issues/2278) nfs-ganesha does not start due to shared storage not ready, but ret
- [#2287](https://github.com/gluster/glusterfs/issues/2287) runner infrastructure fails to provide platfrom independent error c
- [#2294](https://github.com/gluster/glusterfs/issues/2294) dict.c: remove some strlen() calls if using DICT_LIST_IMP
- [#2308](https://github.com/gluster/glusterfs/issues/2308) Developer sessions for glusterfs
- [#2313](https://github.com/gluster/glusterfs/issues/2313) Long setting names mess up the columns and break parsing
- [#2317](https://github.com/gluster/glusterfs/issues/2317) Rebalance doesn't migrate some sparse files
- [#2328](https://github.com/gluster/glusterfs/issues/2328) "gluster volume set <volname> group samba" needs to include write-b
- [#2330](https://github.com/gluster/glusterfs/issues/2330) gf_msg can cause relock deadlock
- [#2334](https://github.com/gluster/glusterfs/issues/2334) posix_handle_soft() is doing an unnecessary stat
- [#2337](https://github.com/gluster/glusterfs/issues/2337) memory leak observed in lock fop
- [#2348](https://github.com/gluster/glusterfs/issues/2348) Gluster's test suite on RHEL 8 runs slower than on RHEL 7
- [#2351](https://github.com/gluster/glusterfs/issues/2351) glusterd: After upgrade on release 9.1 glusterd protocol is broken
- [#2353](https://github.com/gluster/glusterfs/issues/2353) Permission issue after upgrading to Gluster v9.1
- [#2360](https://github.com/gluster/glusterfs/issues/2360) extras: postscript fails on logrotation of snapd logs
- [#2364](https://github.com/gluster/glusterfs/issues/2364) After the service is restarted, a large number of handles are not r
- [#2370](https://github.com/gluster/glusterfs/issues/2370) glusterd: Issues with custom xlator changes
- [#2378](https://github.com/gluster/glusterfs/issues/2378) Remove sys_fstatat() from posix_handle_unset_gfid() function - not
- [#2380](https://github.com/gluster/glusterfs/issues/2380) Remove sys_lstat() from posix_acl_xattr_set() - not needed
- [#2388](https://github.com/gluster/glusterfs/issues/2388) Geo-replication gets delayed when there are many renames on primary
- [#2394](https://github.com/gluster/glusterfs/issues/2394) Spurious failure in tests/basic/fencing/afr-lock-heal-basic.t
- [#2398](https://github.com/gluster/glusterfs/issues/2398) Bitrot and scrub process showed like unknown in the gluster volume
- [#2404](https://github.com/gluster/glusterfs/issues/2404) Spurious failure of tests/bugs/ec/bug-1236065.t
- [#2407](https://github.com/gluster/glusterfs/issues/2407) configure glitch with CC=clang
- [#2410](https://github.com/gluster/glusterfs/issues/2410) dict_xxx_sizen variant compilation should fail on passing a variabl
- [#2414](https://github.com/gluster/glusterfs/issues/2414) Prefer mallinfo2() to mallinfo() if available
- [#2421](https://github.com/gluster/glusterfs/issues/2421) rsync should not try to sync internal xattrs.
- [#2429](https://github.com/gluster/glusterfs/issues/2429) Use file timestamps with nanosecond precision
- [#2431](https://github.com/gluster/glusterfs/issues/2431) Drop --disable-syslog configuration option
- [#2440](https://github.com/gluster/glusterfs/issues/2440) Geo-replication not working on Ubuntu 21.04
- [#2443](https://github.com/gluster/glusterfs/issues/2443) Core dumps on Gluster 9 - 3 replicas
- [#2446](https://github.com/gluster/glusterfs/issues/2446) client_add_lock_for_recovery() - new_client_lock() should be called
- [#2467](https://github.com/gluster/glusterfs/issues/2467) failed to open /proc/0/status: No such file or directory
- [#2470](https://github.com/gluster/glusterfs/issues/2470) sharding: [inode.c:1255:__inode_unlink] 0-inode: dentry not found
- [#2480](https://github.com/gluster/glusterfs/issues/2480) Brick going offline on another host as well as the host which reboo
- [#2502](https://github.com/gluster/glusterfs/issues/2502) xlator/features/locks/src/common.c has code duplication
- [#2507](https://github.com/gluster/glusterfs/issues/2507) Use appropriate msgid in gf_msg()
- [#2515](https://github.com/gluster/glusterfs/issues/2515) Unable to mount the gluster volume using fuse unless iptables is fl
- [#2522](https://github.com/gluster/glusterfs/issues/2522) ganesha_ha (extras/ganesha/ocf): ganesha_grace RA fails in start()
- [#2540](https://github.com/gluster/glusterfs/issues/2540) delay-gen doesn't work correctly for delays longer than 2 seconds
- [#2551](https://github.com/gluster/glusterfs/issues/2551) Sometimes the lock notification feature doesn't work
- [#2581](https://github.com/gluster/glusterfs/issues/2581) With strict-locks enabled clients which are holding posix locks sti
- [#2590](https://github.com/gluster/glusterfs/issues/2590) trusted.io-stats-dump extended attribute usage description error
- [#2611](https://github.com/gluster/glusterfs/issues/2611) Granular entry self-heal is taking more time than full entry self h
- [#2617](https://github.com/gluster/glusterfs/issues/2617) High CPU utilization of thread glfs_fusenoti and huge delays in som
- [#2620](https://github.com/gluster/glusterfs/issues/2620) Granular entry heal purging of index name trigger two lookups in th
- [#2625](https://github.com/gluster/glusterfs/issues/2625) auth.allow value is corrupted after add-brick operation
- [#2626](https://github.com/gluster/glusterfs/issues/2626) entry self-heal does xattrops unnecessarily in many cases
- [#2649](https://github.com/gluster/glusterfs/issues/2649) glustershd failed in bind with error "Address already in use"
- [#2652](https://github.com/gluster/glusterfs/issues/2652) Removal of deadcode: Pump
- [#2659](https://github.com/gluster/glusterfs/issues/2659) tests/basic/afr/afr-anon-inode.t crashed
- [#2664](https://github.com/gluster/glusterfs/issues/2664) Test suite produce uncompressed logs
- [#2693](https://github.com/gluster/glusterfs/issues/2693) dht: dht_local_wipe is crashed while running rename operation
- [#2771](https://github.com/gluster/glusterfs/issues/2771) Smallfile improvement in glusterfs
- [#2782](https://github.com/gluster/glusterfs/issues/2782) Glustereventsd does not listen on IPv4 when IPv6 is not available
- [#2789](https://github.com/gluster/glusterfs/issues/2789) An improper locking bug(e.g., deadlock) on the lock up_inode_ctx->c
- [#2798](https://github.com/gluster/glusterfs/issues/2798) FUSE mount option for localtime-logging is not exposed
- [#2816](https://github.com/gluster/glusterfs/issues/2816) Glusterfsd memory leak when subdir_mounting a volume
- [#2835](https://github.com/gluster/glusterfs/issues/2835) dht: found anomalies in dht_layout after commit c4cbdbcb3d02fb56a62
- [#2857](https://github.com/gluster/glusterfs/issues/2857) variable twice initialization.
- [#504](https://github.com/gluster/glusterfs/issues/504) AFR: remove memcpy() + ntoh32() pattern
- [#705](https://github.com/gluster/glusterfs/issues/705) gf_backtrace_save inefficiencies
- [#782](https://github.com/gluster/glusterfs/issues/782) Do not explicitly call strerror(errnum) when logging
- [#786](https://github.com/gluster/glusterfs/issues/786) glusterd-pmap binds to 10K ports on startup (using IPv4)
- [#904](https://github.com/gluster/glusterfs/issues/904) [bug:1649037] Translators allocate too much memory in their xlator\_
- [#1000](https://github.com/gluster/glusterfs/issues/1000) [bug:1193929] GlusterFS can be improved
- [#1002](https://github.com/gluster/glusterfs/issues/1002) [bug:1679998] GlusterFS can be improved
- [#1052](https://github.com/gluster/glusterfs/issues/1052) [bug:1693692] Increase code coverage from regression tests
- [#1060](https://github.com/gluster/glusterfs/issues/1060) [bug:789278] Issues reported by Coverity static analysis tool
- [#1096](https://github.com/gluster/glusterfs/issues/1096) [bug:1622665] clang-scan report: glusterfs issues
- [#1101](https://github.com/gluster/glusterfs/issues/1101) [bug:1813029] volume brick fails to come online because other proce
- [#1251](https://github.com/gluster/glusterfs/issues/1251) performance: improve \_\_afr_fd_ctx_get() function
- [#1339](https://github.com/gluster/glusterfs/issues/1339) Rebalance status is not shown correctly after node reboot
- [#1358](https://github.com/gluster/glusterfs/issues/1358) features/shard: wrong "inode->ref" leading to ASSERT in inode_unref
- [#1359](https://github.com/gluster/glusterfs/issues/1359) Cleanup --disable-mempool
- [#1380](https://github.com/gluster/glusterfs/issues/1380) fd_unref() optimization - do an atomic decrement outside the lock a
- [#1384](https://github.com/gluster/glusterfs/issues/1384) mount glusterfs volume, files larger than 64Mb only show 64Mb
- [#1406](https://github.com/gluster/glusterfs/issues/1406) shared storage volume fails to mount in ipv6 environment
- [#1415](https://github.com/gluster/glusterfs/issues/1415) Removing problematic language in geo-replication
- [#1423](https://github.com/gluster/glusterfs/issues/1423) shard_make_block_abspath() should be called with a string of of the
- [#1536](https://github.com/gluster/glusterfs/issues/1536) Improve dict_reset() efficiency
- [#1545](https://github.com/gluster/glusterfs/issues/1545) fuse_invalidate_entry() - too many repetitive calls to uuid_utoa()
- [#1583](https://github.com/gluster/glusterfs/issues/1583) Rework stats structure (xl->stats.total.metrics[fop_idx] and friend
- [#1584](https://github.com/gluster/glusterfs/issues/1584) MAINTAINERS file needs to be revisited and updated
- [#1596](https://github.com/gluster/glusterfs/issues/1596) 'this' NULL check relies on 'THIS' not being NULL
- [#1600](https://github.com/gluster/glusterfs/issues/1600) Save and re-use MYUUID
- [#1678](https://github.com/gluster/glusterfs/issues/1678) Improve gf_error_to_errno() and gf_errno_to_error() positive flow
- [#1695](https://github.com/gluster/glusterfs/issues/1695) Rebalance has a redundant lookup operation
- [#1702](https://github.com/gluster/glusterfs/issues/1702) Move GF_CLIENT_PID_GSYNCD check to start of the function.
- [#1703](https://github.com/gluster/glusterfs/issues/1703) Remove trivial check for GF_XATTR_SHARD_FILE_SIZE before calling sh
- [#1707](https://github.com/gluster/glusterfs/issues/1707) PL_LOCAL_GET_REQUESTS access the dictionary twice for the same info
- [#1717](https://github.com/gluster/glusterfs/issues/1717) glusterd: sequence of rebalance and replace/reset-brick presents re
- [#1723](https://github.com/gluster/glusterfs/issues/1723) DHT: further investigation for treating an ongoing mknod's linkto file
- [#1749](https://github.com/gluster/glusterfs/issues/1749) brick-process: call 'notify()' and 'fini()' of brick xlators in a p
- [#1755](https://github.com/gluster/glusterfs/issues/1755) Reduce calls to 'THIS' in fd_destroy() and others, where 'THIS' is
- [#1761](https://github.com/gluster/glusterfs/issues/1761) CONTRIBUTING.md regression can only be run by maintainers
- [#1764](https://github.com/gluster/glusterfs/issues/1764) Slow write on ZFS bricks after healing millions of files due to add
- [#1772](https://github.com/gluster/glusterfs/issues/1772) build: add LTO as a configure option
- [#1773](https://github.com/gluster/glusterfs/issues/1773) DHT/Rebalance - Remove unused variable dht_migrate_file
- [#1779](https://github.com/gluster/glusterfs/issues/1779) Add-brick command should check hostnames with bricks present in vol
- [#1825](https://github.com/gluster/glusterfs/issues/1825) Latency in io-stats should be in nanoseconds resolution, not micros
- [#1872](https://github.com/gluster/glusterfs/issues/1872) Question: How to check heal info without glusterd management layer
- [#1885](https://github.com/gluster/glusterfs/issues/1885) \_\_posix_writev() - reduce memory copies and unneeded zeroing
- [#1888](https://github.com/gluster/glusterfs/issues/1888) GD_OP_VERSION needs to be updated for release-10
- [#1898](https://github.com/gluster/glusterfs/issues/1898) schedule_georep.py resulting in failure when used with python3
- [#1909](https://github.com/gluster/glusterfs/issues/1909) core: Avoid several dict OR key is NULL message in brick logs
- [#1925](https://github.com/gluster/glusterfs/issues/1925) dht_pt_getxattr does not seem to handle virtual xattrs.
- [#1935](https://github.com/gluster/glusterfs/issues/1935) logging to syslog instead of any glusterfs logs
- [#1943](https://github.com/gluster/glusterfs/issues/1943) glusterd-volgen: Add functionality to accept any custom xlator
- [#1952](https://github.com/gluster/glusterfs/issues/1952) posix-aio: implement GF_FOP_FSYNC
- [#1959](https://github.com/gluster/glusterfs/issues/1959) Broken links in the 2 replicas split-brain-issue - [Bug]Enhancemen
- [#1960](https://github.com/gluster/glusterfs/issues/1960) Add missing LOCK_DESTROY() calls
- [#1966](https://github.com/gluster/glusterfs/issues/1966) Can't print trace details due to memory allocation issues
- [#1977](https://github.com/gluster/glusterfs/issues/1977) Inconsistent locking in presence of disconnects
- [#1978](https://github.com/gluster/glusterfs/issues/1978) test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is gett
- [#1981](https://github.com/gluster/glusterfs/issues/1981) Reduce posix_fdstat() calls in IO paths
- [#1991](https://github.com/gluster/glusterfs/issues/1991) mdcache: bug causes getxattr() to report ENODATA when fetching samb
- [#1992](https://github.com/gluster/glusterfs/issues/1992) dht: var decommission_subvols_cnt becomes invalid when config is up
- [#1996](https://github.com/gluster/glusterfs/issues/1996) Analyze if spinlocks have any benefit and remove them if not
- [#2001](https://github.com/gluster/glusterfs/issues/2001) Error handling in /usr/sbin/gluster-eventsapi produces AttributeErr
- [#2005](https://github.com/gluster/glusterfs/issues/2005) ./tests/bugs/replicate/bug-921231.t is continuously failing
- [#2013](https://github.com/gluster/glusterfs/issues/2013) dict_t hash-calculation can be removed when hash_size=1
- [#2024](https://github.com/gluster/glusterfs/issues/2024) Remove gfs_id variable or at least set to appropriate value
- [#2025](https://github.com/gluster/glusterfs/issues/2025) list_del() should not set prev and next
- [#2033](https://github.com/gluster/glusterfs/issues/2033) tests/bugs/nfs/bug-1053579.t fails on CentOS 8
- [#2038](https://github.com/gluster/glusterfs/issues/2038) shard_unlink() fails due to no space to create marker file
- [#2039](https://github.com/gluster/glusterfs/issues/2039) Do not allow POSIX IO backend switch when the volume is running
- [#2042](https://github.com/gluster/glusterfs/issues/2042) mount ipv6 gluster volume with serveral backup-volfile-servers,use
- [#2052](https://github.com/gluster/glusterfs/issues/2052) Revert the commit 50e953e2450b5183988c12e87bdfbc997e0ad8a8
- [#2054](https://github.com/gluster/glusterfs/issues/2054) cleanup call_stub_t from unused variables
- [#2063](https://github.com/gluster/glusterfs/issues/2063) Provide autoconf option to enable/disable storage.linux-io_uring du
- [#2067](https://github.com/gluster/glusterfs/issues/2067) Change self-heal-window-size to 1MB by default
- [#2075](https://github.com/gluster/glusterfs/issues/2075) Annotate synctasks with valgrind API if --enable-valgrind[=memcheck
- [#2080](https://github.com/gluster/glusterfs/issues/2080) Glustereventsd default port
- [#2083](https://github.com/gluster/glusterfs/issues/2083) GD_MSG_DICT_GET_FAILED should not include 'errno' but 'ret'
- [#2086](https://github.com/gluster/glusterfs/issues/2086) Move tests/00-geo-rep/00-georep-verify-non-root-setup.t to tests/00
- [#2096](https://github.com/gluster/glusterfs/issues/2096) iobuf_arena structure doesn't need passive and active iobufs, but l
- [#2099](https://github.com/gluster/glusterfs/issues/2099) 'force' option does not work in the replicated volume snapshot crea
- [#2101](https://github.com/gluster/glusterfs/issues/2101) Move 00-georep-verify-non-root-setup.t back to tests/00-geo-rep/
- [#2107](https://github.com/gluster/glusterfs/issues/2107) mount crashes when setfattr -n distribute.fix.layout -v "yes" is ex
- [#2116](https://github.com/gluster/glusterfs/issues/2116) enable quota for multiple volumes take more time
- [#2117](https://github.com/gluster/glusterfs/issues/2117) Concurrent quota enable causes glusterd deadlock
- [#2123](https://github.com/gluster/glusterfs/issues/2123) Implement an I/O framework
- [#2129](https://github.com/gluster/glusterfs/issues/2129) CID 1445996 Null pointer dereferences (FORWARD_NULL) /xlators/mgmt/
- [#2130](https://github.com/gluster/glusterfs/issues/2130) stack.h/c: remove unused variable and reorder struct
- [#2133](https://github.com/gluster/glusterfs/issues/2133) Changelog History Crawl failed after resuming stopped geo-replicati
- [#2134](https://github.com/gluster/glusterfs/issues/2134) Fix spurious failures caused by change in profile info duration to
- [#2138](https://github.com/gluster/glusterfs/issues/2138) glfs_write() dumps a core file file when buffer size is 1GB
- [#2154](https://github.com/gluster/glusterfs/issues/2154) "Operation not supported" doing a chmod on a symlink
- [#2159](https://github.com/gluster/glusterfs/issues/2159) Remove unused component tests
- [#2161](https://github.com/gluster/glusterfs/issues/2161) Crash caused by memory corruption
- [#2169](https://github.com/gluster/glusterfs/issues/2169) Stack overflow when parallel-readdir is enabled
- [#2180](https://github.com/gluster/glusterfs/issues/2180) CID 1446716: Memory - illegal accesses (USE_AFTER_FREE) /xlators/mg
- [#2187](https://github.com/gluster/glusterfs/issues/2187) [Input/output error] IO failure while performing shrink operation w
- [#2190](https://github.com/gluster/glusterfs/issues/2190) Move a test case tests/basic/glusterd-restart-shd-mux.t to flaky
- [#2192](https://github.com/gluster/glusterfs/issues/2192) 4+1 arbiter setup is broken
- [#2198](https://github.com/gluster/glusterfs/issues/2198) There are blocked inodelks for a long time
- [#2216](https://github.com/gluster/glusterfs/issues/2216) Fix coverity issues
- [#2232](https://github.com/gluster/glusterfs/issues/2232) "Invalid argument" when reading a directory with gfapi
- [#2234](https://github.com/gluster/glusterfs/issues/2234) Segmentation fault in directory quota daemon for replicated volume
- [#2239](https://github.com/gluster/glusterfs/issues/2239) rebalance crashes in dht on master
- [#2241](https://github.com/gluster/glusterfs/issues/2241) Using readdir instead of readdirp for fix-layout increases performa
- [#2253](https://github.com/gluster/glusterfs/issues/2253) Disable lookup-optimize by default in the virt group
- [#2258](https://github.com/gluster/glusterfs/issues/2258) Provide option to disable fsync in data migration
- [#2260](https://github.com/gluster/glusterfs/issues/2260) failed to list quota info after setting limit-usage
- [#2268](https://github.com/gluster/glusterfs/issues/2268) dht_layout_unref() only uses 'this' to check that 'this->private' i
- [#2278](https://github.com/gluster/glusterfs/issues/2278) nfs-ganesha does not start due to shared storage not ready, but ret
- [#2287](https://github.com/gluster/glusterfs/issues/2287) runner infrastructure fails to provide platfrom independent error c
- [#2294](https://github.com/gluster/glusterfs/issues/2294) dict.c: remove some strlen() calls if using DICT_LIST_IMP
- [#2308](https://github.com/gluster/glusterfs/issues/2308) Developer sessions for glusterfs
- [#2313](https://github.com/gluster/glusterfs/issues/2313) Long setting names mess up the columns and break parsing
- [#2317](https://github.com/gluster/glusterfs/issues/2317) Rebalance doesn't migrate some sparse files
- [#2328](https://github.com/gluster/glusterfs/issues/2328) "gluster volume set <volname> group samba" needs to include write-b
- [#2330](https://github.com/gluster/glusterfs/issues/2330) gf_msg can cause relock deadlock
- [#2334](https://github.com/gluster/glusterfs/issues/2334) posix_handle_soft() is doing an unnecessary stat
- [#2337](https://github.com/gluster/glusterfs/issues/2337) memory leak observed in lock fop
- [#2348](https://github.com/gluster/glusterfs/issues/2348) Gluster's test suite on RHEL 8 runs slower than on RHEL 7
- [#2351](https://github.com/gluster/glusterfs/issues/2351) glusterd: After upgrade on release 9.1 glusterd protocol is broken
- [#2353](https://github.com/gluster/glusterfs/issues/2353) Permission issue after upgrading to Gluster v9.1
- [#2360](https://github.com/gluster/glusterfs/issues/2360) extras: postscript fails on logrotation of snapd logs
- [#2364](https://github.com/gluster/glusterfs/issues/2364) After the service is restarted, a large number of handles are not r
- [#2370](https://github.com/gluster/glusterfs/issues/2370) glusterd: Issues with custom xlator changes
- [#2378](https://github.com/gluster/glusterfs/issues/2378) Remove sys_fstatat() from posix_handle_unset_gfid() function - not
- [#2380](https://github.com/gluster/glusterfs/issues/2380) Remove sys_lstat() from posix_acl_xattr_set() - not needed
- [#2388](https://github.com/gluster/glusterfs/issues/2388) Geo-replication gets delayed when there are many renames on primary
- [#2394](https://github.com/gluster/glusterfs/issues/2394) Spurious failure in tests/basic/fencing/afr-lock-heal-basic.t
- [#2398](https://github.com/gluster/glusterfs/issues/2398) Bitrot and scrub process showed like unknown in the gluster volume
- [#2404](https://github.com/gluster/glusterfs/issues/2404) Spurious failure of tests/bugs/ec/bug-1236065.t
- [#2407](https://github.com/gluster/glusterfs/issues/2407) configure glitch with CC=clang
- [#2410](https://github.com/gluster/glusterfs/issues/2410) dict_xxx_sizen variant compilation should fail on passing a variabl
- [#2414](https://github.com/gluster/glusterfs/issues/2414) Prefer mallinfo2() to mallinfo() if available
- [#2421](https://github.com/gluster/glusterfs/issues/2421) rsync should not try to sync internal xattrs.
- [#2429](https://github.com/gluster/glusterfs/issues/2429) Use file timestamps with nanosecond precision
- [#2431](https://github.com/gluster/glusterfs/issues/2431) Drop --disable-syslog configuration option
- [#2440](https://github.com/gluster/glusterfs/issues/2440) Geo-replication not working on Ubuntu 21.04
- [#2443](https://github.com/gluster/glusterfs/issues/2443) Core dumps on Gluster 9 - 3 replicas
- [#2446](https://github.com/gluster/glusterfs/issues/2446) client_add_lock_for_recovery() - new_client_lock() should be called
- [#2467](https://github.com/gluster/glusterfs/issues/2467) failed to open /proc/0/status: No such file or directory
- [#2470](https://github.com/gluster/glusterfs/issues/2470) sharding: [inode.c:1255:__inode_unlink] 0-inode: dentry not found
- [#2480](https://github.com/gluster/glusterfs/issues/2480) Brick going offline on another host as well as the host which reboo
- [#2502](https://github.com/gluster/glusterfs/issues/2502) xlator/features/locks/src/common.c has code duplication
- [#2507](https://github.com/gluster/glusterfs/issues/2507) Use appropriate msgid in gf_msg()
- [#2515](https://github.com/gluster/glusterfs/issues/2515) Unable to mount the gluster volume using fuse unless iptables is fl
- [#2522](https://github.com/gluster/glusterfs/issues/2522) ganesha_ha (extras/ganesha/ocf): ganesha_grace RA fails in start()
- [#2540](https://github.com/gluster/glusterfs/issues/2540) delay-gen doesn't work correctly for delays longer than 2 seconds
- [#2551](https://github.com/gluster/glusterfs/issues/2551) Sometimes the lock notification feature doesn't work
- [#2581](https://github.com/gluster/glusterfs/issues/2581) With strict-locks enabled clients which are holding posix locks sti
- [#2590](https://github.com/gluster/glusterfs/issues/2590) trusted.io-stats-dump extended attribute usage description error
- [#2611](https://github.com/gluster/glusterfs/issues/2611) Granular entry self-heal is taking more time than full entry self h
- [#2617](https://github.com/gluster/glusterfs/issues/2617) High CPU utilization of thread glfs_fusenoti and huge delays in som
- [#2620](https://github.com/gluster/glusterfs/issues/2620) Granular entry heal purging of index name trigger two lookups in th
- [#2625](https://github.com/gluster/glusterfs/issues/2625) auth.allow value is corrupted after add-brick operation
- [#2626](https://github.com/gluster/glusterfs/issues/2626) entry self-heal does xattrops unnecessarily in many cases
- [#2649](https://github.com/gluster/glusterfs/issues/2649) glustershd failed in bind with error "Address already in use"
- [#2652](https://github.com/gluster/glusterfs/issues/2652) Removal of deadcode: Pump
- [#2659](https://github.com/gluster/glusterfs/issues/2659) tests/basic/afr/afr-anon-inode.t crashed
- [#2664](https://github.com/gluster/glusterfs/issues/2664) Test suite produce uncompressed logs
- [#2693](https://github.com/gluster/glusterfs/issues/2693) dht: dht_local_wipe is crashed while running rename operation
- [#2771](https://github.com/gluster/glusterfs/issues/2771) Smallfile improvement in glusterfs
- [#2782](https://github.com/gluster/glusterfs/issues/2782) Glustereventsd does not listen on IPv4 when IPv6 is not available
- [#2789](https://github.com/gluster/glusterfs/issues/2789) An improper locking bug(e.g., deadlock) on the lock up_inode_ctx->c
- [#2798](https://github.com/gluster/glusterfs/issues/2798) FUSE mount option for localtime-logging is not exposed
- [#2816](https://github.com/gluster/glusterfs/issues/2816) Glusterfsd memory leak when subdir_mounting a volume
- [#2835](https://github.com/gluster/glusterfs/issues/2835) dht: found anomalies in dht_layout after commit c4cbdbcb3d02fb56a62
- [#2857](https://github.com/gluster/glusterfs/issues/2857) variable twice initialization.

View File

@@ -12,15 +12,18 @@ This is a bugfix and improvement release. The release notes for [10.0](10.0.md)
- Users are highly encouraged to upgrade to newer releases of GlusterFS.
## Important fixes in this release
- Fix missing stripe count issue with upgrade from 9.x to 10.x
- Fix IO failure when shrinking distributed dispersed volume with ongoing IO
- Fix log spam introduced with glusterfs 10.0
- Enable ltcmalloc_minimal instead of ltcmalloc
## Builds are available at -
[https://download.gluster.org/pub/gluster/glusterfs/10/10.1/](https://download.gluster.org/pub/gluster/glusterfs/10/10.1/)
## Bugs addressed
- [#2846](https://github.com/gluster/glusterfs/issues/2846) Avoid redundant logs in gluster
- [#2903](https://github.com/gluster/glusterfs/issues/2903) Fix worker disconnect due to AttributeError in geo-replication
- [#2910](https://github.com/gluster/glusterfs/issues/2910) Check for available ports in port_range in glusterd

View File

@@ -3,22 +3,26 @@
This is a bugfix and improvement release. The release notes for [10.0](10.0.md) and [10.1](10.1.md) contain a listing of all the new features that were added and bugs fixed in the GlusterFS 10 stable release.
**NOTE:**
- Next minor release tentative date: Week of 15th Aug, 2022
- Users are highly encouraged to upgrade to newer releases of GlusterFS.
## Important fixes in this release
- Optimize server functionality by enhancing server_process_event_upcall code path during the handling of upcall event
- Fix all bricks not starting issue on node reboot when brick count is high(>750)
- Fix stale posix locks that appear after client disconnection
## Builds are available at
[https://download.gluster.org/pub/gluster/glusterfs/10/10.2/](https://download.gluster.org/pub/gluster/glusterfs/10/10.2/)
## Bugs addressed
- [#3182](https://github.com/gluster/glusterfs/issues/3182) Fix stale posix locks that appear after client disconnection
- [#3187](https://github.com/gluster/glusterfs/issues/3187) Fix Locks xlator fd leaks
- [#3234](https://github.com/gluster/glusterfs/issues/3234) Fix incorrect directory check inorder to successfully locate the SSL certificate
- [#3262](https://github.com/gluster/glusterfs/issues/3262) Synchronize layout_(ref|unref) during layout_(get|set) in dht
- [#3262](https://github.com/gluster/glusterfs/issues/3262) Synchronize layout*(ref|unref) during layout*(get|set) in dht
- [#3321](https://github.com/gluster/glusterfs/issues/3321) Optimize server functionality by enhancing server_process_event_upcall code path during the handling of upcall event
- [#3334](https://github.com/gluster/glusterfs/issues/3334) Fix errors and timeouts when creating qcow2 file via libgfapi
- [#3375](https://github.com/gluster/glusterfs/issues/3375) Fix all bricks not starting issue on node reboot when brick count is high(>750)

View File

@@ -11,28 +11,30 @@ of bugs that has been addressed is included further below.
## Major changes and features
### Brick multiplexing
*Notes for users:*
Multiplexing reduces both port and memory usage. It does *not* improve
_Notes for users:_
Multiplexing reduces both port and memory usage. It does _not_ improve
performance vs. non-multiplexing except when memory is the limiting factor,
though there are other related changes that improve performance overall (e.g.
compared to 3.9).
Multiplexing is off by default. It can be enabled with
Multiplexing is off by default. It can be enabled with
```bash
# gluster volume set all cluster.brick-multiplex on
```
*Limitations:*
_Limitations:_
There are currently no tuning options for multiplexing - it's all or nothing.
This will change in the near future.
*Known Issues:*
_Known Issues:_
The only feature or combination of features known not to work with multiplexing
is USS and SSL. Anyone using that combination should leave multiplexing off.
is USS and SSL. Anyone using that combination should leave multiplexing off.
### Support to display op-version information from clients
*Notes for users:*
_Notes for users:_
To get information on what op-version are supported by the clients, users can
invoke the `gluster volume status` command for clients. Along with information
on hostname, port, bytes read, bytes written and number of clients connected
@@ -43,12 +45,13 @@ operate. Following is the example usage:
# gluster volume status <VOLNAME|all> clients
```
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
### Support to get maximum op-version in a heterogeneous cluster
*Notes for users:*
_Notes for users:_
A heterogeneous cluster operates on a common op-version that can be supported
across all the nodes in the trusted storage pool. Upon upgrade of the nodes in
the cluster, the cluster might support a higher op-version. Users can retrieve
@@ -60,12 +63,13 @@ the `gluster volume get` command on the newly introduced global option,
# gluster volume get all cluster.max-op-version
```
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
### Support for rebalance time to completion estimation
*Notes for users:*
_Notes for users:_
Users can now see approximately how much time the rebalance
operation will take to complete across all nodes.
@@ -76,27 +80,27 @@ as part of the rebalance status. Use the command:
# gluster volume rebalance <VOLNAME> status
```
*Limitations:*
_Limitations:_
The rebalance process calculates the time left based on the rate
at while files are processed on the node and the total number of files
on the brick which is determined using statfs. The limitations of this
are:
* A single fs partition must host only one brick. Multiple bricks on
the same fs partition will cause the statfs results to be invalid.
- A single fs partition must host only one brick. Multiple bricks on
the same fs partition will cause the statfs results to be invalid.
* The estimates are dynamic and are recalculated every time the rebalance status
command is invoked.The estimates become more accurate over time so short running
rebalance operations may not benefit.
- The estimates are dynamic and are recalculated every time the rebalance status
command is invoked.The estimates become more accurate over time so short running
rebalance operations may not benefit.
*Known Issues:*
_Known Issues:_
As glusterfs does not stored the number of files on the brick, we use statfs to
guess the number. The .glusterfs directory contents can significantly skew this
number and affect the calculated estimates.
### Separation of tier as its own service
*Notes for users:*
_Notes for users:_
This change is to move the management of the tier daemon into the gluster
service framework, thereby improving it stability and manageability by the
service framework.
@@ -104,24 +108,26 @@ service framework.
This has no change to any of the tier commands or user facing interfaces and
operations.
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
### Statedump support for gfapi based applications
*Notes for users:*
_Notes for users:_
gfapi based applications now can dump state information for better trouble
shooting of issues. A statedump can be triggered in two ways:
1. by executing the following on one of the Gluster servers,
```bash
# gluster volume statedump <VOLNAME> client <HOST>:<PID>
```
- `<VOLNAME>` should be replaced by the name of the volume
- `<HOST>` should be replaced by the hostname of the system running the
gfapi application
- `<PID>` should be replaced by the PID of the gfapi application
- `<VOLNAME>` should be replaced by the name of the volume
- `<HOST>` should be replaced by the hostname of the system running the
gfapi application
- `<PID>` should be replaced by the PID of the gfapi application
2. through calling `glfs_sysrq(<FS>, GLFS_SYSRQ_STATEDUMP)` within the
application
@@ -131,7 +137,7 @@ shooting of issues. A statedump can be triggered in two ways:
All statedumps (`*.dump.*` files) will be located at the usual location,
on most distributions this would be `/var/run/gluster/`.
*Limitations:*
_Limitations:_
It is not possible to trigger statedumps from the Gluster CLI when the
gfapi application has lost its management connection to the GlusterD
servers.
@@ -141,24 +147,26 @@ GlusterFS 3.10 is the first release that contains support for the new
debugging will need to be adapted to call this function. At the time of
the release of 3.10, no applications are known to call `glfs_sysrq()`.
*Known Issues:*
_Known Issues:_
### Disabled creation of trash directory by default
*Notes for users:*
_Notes for users:_
From now onwards trash directory, namely .trashcan, will not be be created by
default upon creation of new volumes unless and until the feature is turned ON
and the restrictions on the same will be applicable as long as features.trash
is set for a particular volume.
*Limitations:*
_Limitations:_
After upgrade for pre-existing volumes, trash directory will be still present at
root of the volume. Those who are not interested in this feature may have to
manually delete the directory from the mount point.
*Known Issues:*
_Known Issues:_
### Implemented parallel readdirp with distribute xlator
*Notes for users:*
_Notes for users:_
Currently the directory listing gets slower as the number of bricks/nodes
increases in a volume, though the file/directory numbers remain unchanged.
With this feature, the performance of directory listing is made mostly
@@ -167,28 +175,32 @@ exponentially reduce the directory listing performance. (On a 2, 5, 10, 25 brick
setup we saw ~5, 100, 400, 450% improvement consecutively)
To enable this feature:
```bash
# gluster volume set <VOLNAME> performance.readdir-ahead on
# gluster volume set <VOLNAME> performance.parallel-readdir on
```
To disable this feature:
```bash
# gluster volume set <VOLNAME> performance.parallel-readdir off
```
If there are more than 50 bricks in the volume it is good to increase the cache
size to be more than 10Mb (default value):
```bash
# gluster volume set <VOLNAME> performance.rda-cache-limit <CACHE SIZE>
```
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
### md-cache can optionally -ve cache security.ima xattr
*Notes for users:*
_Notes for users:_
From kernel version 3.X or greater, creating of a file results in removexattr
call on security.ima xattr. This xattr is not set on the file unless IMA
feature is active. With this patch, removxattr call returns ENODATA if it is
@@ -197,18 +209,20 @@ not found in the cache.
The end benefit is faster create operations where IMA is not enabled.
To cache this xattr use,
```bash
# gluster volume set <VOLNAME> performance.cache-ima-xattrs on
```
The above option is on by default.
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
### Added support for CPU extensions in disperse computations
*Notes for users:*
_Notes for users:_
To improve disperse computations, a new way of generating dynamic code
targeting specific CPU extensions like SSE and AVX on Intel processors is
implemented. The available extensions are detected on run time. This can
@@ -226,18 +240,18 @@ command:
Valid <type> values are:
* none: Completely disable dynamic code generation
* auto: Automatically detect available extensions and use the best one
* x64: Use dynamic code generation using standard 64 bits instructions
* sse: Use dynamic code generation using SSE extensions (128 bits)
* avx: Use dynamic code generation using AVX extensions (256 bits)
- none: Completely disable dynamic code generation
- auto: Automatically detect available extensions and use the best one
- x64: Use dynamic code generation using standard 64 bits instructions
- sse: Use dynamic code generation using SSE extensions (128 bits)
- avx: Use dynamic code generation using AVX extensions (256 bits)
The default value is 'auto'. If a value is specified that is not detected on
run-time, it will automatically fall back to the next available option.
*Limitations:*
_Limitations:_
*Known Issues:*
_Known Issues:_
To solve a conflict between the dynamic code generator and SELinux, it
has been necessary to create a dynamic file on runtime in the directory
/usr/libexec/glusterfs. This directory only exists if the server package
@@ -271,20 +285,20 @@ Bugs addressed since release-3.9 are listed below.
- [#1325531](https://bugzilla.redhat.com/1325531): Statedump: Add per xlator ref counting for inode
- [#1325792](https://bugzilla.redhat.com/1325792): "gluster vol heal test statistics heal-count replica" seems doesn't work
- [#1330604](https://bugzilla.redhat.com/1330604): out-of-tree builds generate XDR headers and source files in the original directory
- [#1336371](https://bugzilla.redhat.com/1336371): Sequential volume start&stop is failing with SSL enabled setup.
- [#1341948](https://bugzilla.redhat.com/1341948): DHT: Rebalance- Misleading log messages from __dht_check_free_space function
- [#1336371](https://bugzilla.redhat.com/1336371): Sequential volume start&stop is failing with SSL enabled setup.
- [#1341948](https://bugzilla.redhat.com/1341948): DHT: Rebalance- Misleading log messages from \_\_dht_check_free_space function
- [#1344714](https://bugzilla.redhat.com/1344714): removal of file from nfs mount crashs ganesha server
- [#1349385](https://bugzilla.redhat.com/1349385): [FEAT]jbr: Add rollbacking of failed fops
- [#1355956](https://bugzilla.redhat.com/1355956): RFE : move ganesha related configuration into shared storage
- [#1356076](https://bugzilla.redhat.com/1356076): DHT doesn't evenly balance files on FreeBSD with ZFS
- [#1356960](https://bugzilla.redhat.com/1356960): OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume
- [#1356960](https://bugzilla.redhat.com/1356960): OOM Kill on client when heal is in progress on 1\*(2+1) arbiter volume
- [#1357753](https://bugzilla.redhat.com/1357753): JSON output for all Events CLI commands
- [#1357754](https://bugzilla.redhat.com/1357754): Delayed Events if any one Webhook is slow
- [#1358296](https://bugzilla.redhat.com/1358296): tier: breaking down the monolith processing function tier_migrate_using_query_file()
- [#1359612](https://bugzilla.redhat.com/1359612): [RFE] Geo-replication Logging Improvements
- [#1360670](https://bugzilla.redhat.com/1360670): Add output option `--xml` to man page of gluster
- [#1363595](https://bugzilla.redhat.com/1363595): Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.
- [#1363965](https://bugzilla.redhat.com/1363965): geo-replication *changes.log does not respect the log-level configured
- [#1363965](https://bugzilla.redhat.com/1363965): geo-replication \*changes.log does not respect the log-level configured
- [#1364420](https://bugzilla.redhat.com/1364420): [RFE] History Crawl performance improvement
- [#1365395](https://bugzilla.redhat.com/1365395): Support for rc.d and init for Service management
- [#1365740](https://bugzilla.redhat.com/1365740): dht: Update stbuf from servers having layout
@@ -298,7 +312,7 @@ Bugs addressed since release-3.9 are listed below.
- [#1368138](https://bugzilla.redhat.com/1368138): Crash of glusterd when using long username with geo-replication
- [#1368312](https://bugzilla.redhat.com/1368312): Value of `replica.split-brain-status' attribute of a directory in metadata split-brain in a dist-rep volume reads that it is not in split-brain
- [#1368336](https://bugzilla.redhat.com/1368336): [RFE] Tier Events
- [#1369077](https://bugzilla.redhat.com/1369077): The directories get renamed when data bricks are offline in 4*(2+1) volume
- [#1369077](https://bugzilla.redhat.com/1369077): The directories get renamed when data bricks are offline in 4\*(2+1) volume
- [#1369124](https://bugzilla.redhat.com/1369124): fix unused variable warnings from out-of-tree builds generate XDR headers and source files i...
- [#1369397](https://bugzilla.redhat.com/1369397): segment fault in changelog_cleanup_dispatchers
- [#1369403](https://bugzilla.redhat.com/1369403): [RFE]: events from protocol server
@@ -366,14 +380,14 @@ Bugs addressed since release-3.9 are listed below.
- [#1384142](https://bugzilla.redhat.com/1384142): crypt: changes needed for openssl-1.1 (coming in Fedora 26)
- [#1384297](https://bugzilla.redhat.com/1384297): glusterfs can't self heal character dev file for invalid dev_t parameters
- [#1384906](https://bugzilla.redhat.com/1384906): arbiter volume write performance is bad with sharding
- [#1385104](https://bugzilla.redhat.com/1385104): invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) 0-dict: !this || !value for key=link-count [Invalid argument]
- [#1385104](https://bugzilla.redhat.com/1385104): invalid argument warning messages seen in fuse client logs 2016-09-30 06:34:58.938667] W [dict.c:418ict_set] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x58722) 0-dict: !this || !value for key=link-count [Invalid argument]
- [#1385575](https://bugzilla.redhat.com/1385575): pmap_signin event fails to update brickinfo->signed_in flag
- [#1385593](https://bugzilla.redhat.com/1385593): Fix some spelling mistakes in comments and log messages
- [#1385839](https://bugzilla.redhat.com/1385839): Incorrect volume type in the "glusterd_state" file generated using CLI "gluster get-state"
- [#1385839](https://bugzilla.redhat.com/1385839): Incorrect volume type in the "glusterd_state" file generated using CLI "gluster get-state"
- [#1386088](https://bugzilla.redhat.com/1386088): Memory Leaks in snapshot code path
- [#1386097](https://bugzilla.redhat.com/1386097): 4 of 8 bricks (2 dht subvols) crashed on systemic setup
- [#1386097](https://bugzilla.redhat.com/1386097): 4 of 8 bricks (2 dht subvols) crashed on systemic setup
- [#1386123](https://bugzilla.redhat.com/1386123): geo-replica slave node goes faulty for non-root user session due to fail to locate gluster binary
- [#1386141](https://bugzilla.redhat.com/1386141): Error and warning message getting while removing glusterfs-events package
- [#1386141](https://bugzilla.redhat.com/1386141): Error and warning message getting while removing glusterfs-events package
- [#1386188](https://bugzilla.redhat.com/1386188): Asynchronous Unsplit-brain still causes Input/Output Error on system calls
- [#1386200](https://bugzilla.redhat.com/1386200): Log all published events
- [#1386247](https://bugzilla.redhat.com/1386247): [Eventing]: 'gluster volume tier <volname> start force' does not generate a TIER_START event
@@ -417,7 +431,7 @@ Bugs addressed since release-3.9 are listed below.
- [#1395648](https://bugzilla.redhat.com/1395648): ganesha-ha.conf --status should validate if the VIPs are assigned to right nodes
- [#1395660](https://bugzilla.redhat.com/1395660): Checkpoint completed event missing master node detail
- [#1395687](https://bugzilla.redhat.com/1395687): Client side IObuff leaks at a high pace consumes complete client memory and hence making gluster volume inaccessible
- [#1395993](https://bugzilla.redhat.com/1395993): heal info --xml when bricks are down in a systemic environment is not displaying anything even after more than 30minutes
- [#1395993](https://bugzilla.redhat.com/1395993): heal info --xml when bricks are down in a systemic environment is not displaying anything even after more than 30minutes
- [#1396038](https://bugzilla.redhat.com/1396038): refresh-config fails and crashes ganesha when mdcache is enabled on the volume.
- [#1396048](https://bugzilla.redhat.com/1396048): A hard link is lost during rebalance+lookup
- [#1396062](https://bugzilla.redhat.com/1396062): [geo-rep]: Worker crashes seen while renaming directories in loop
@@ -447,11 +461,11 @@ Bugs addressed since release-3.9 are listed below.
- [#1400013](https://bugzilla.redhat.com/1400013): [USS,SSL] .snaps directory is not reachable when I/O encryption (SSL) is enabled
- [#1400026](https://bugzilla.redhat.com/1400026): Duplicate value assigned to GD_MSG_DAEMON_STATE_REQ_RCVD and GD_MSG_BRICK_CLEANUP_SUCCESS messages
- [#1400237](https://bugzilla.redhat.com/1400237): Ganesha services are not stopped when pacemaker quorum is lost
- [#1400613](https://bugzilla.redhat.com/1400613): [GANESHA] failed to create directory of hostname of new node in var/lib/nfs/ganesha/ in already existing cluster nodes
- [#1400818](https://bugzilla.redhat.com/1400818): possible memory leak on client when writing to a file while another client issues a truncate
- [#1400613](https://bugzilla.redhat.com/1400613): [GANESHA] failed to create directory of hostname of new node in var/lib/nfs/ganesha/ in already existing cluster nodes
- [#1400818](https://bugzilla.redhat.com/1400818): possible memory leak on client when writing to a file while another client issues a truncate
- [#1401095](https://bugzilla.redhat.com/1401095): log the error when locking the brick directory fails
- [#1401218](https://bugzilla.redhat.com/1401218): Fix compound fops memory leaks
- [#1401404](https://bugzilla.redhat.com/1401404): [Arbiter] IO's Halted and heal info command hung
- [#1401404](https://bugzilla.redhat.com/1401404): [Arbiter] IO's Halted and heal info command hung
- [#1401777](https://bugzilla.redhat.com/1401777): atime becomes zero when truncating file via ganesha (or gluster-NFS)
- [#1401801](https://bugzilla.redhat.com/1401801): [RFE] Use Host UUID to find local nodes to spawn workers
- [#1401812](https://bugzilla.redhat.com/1401812): RFE: Make readdirp parallel in dht
@@ -463,7 +477,7 @@ Bugs addressed since release-3.9 are listed below.
- [#1402369](https://bugzilla.redhat.com/1402369): Getting the warning message while erasing the gluster "glusterfs-server" package.
- [#1402710](https://bugzilla.redhat.com/1402710): ls and move hung on disperse volume
- [#1402730](https://bugzilla.redhat.com/1402730): self-heal not happening, as self-heal info lists the same pending shards to be healed
- [#1402828](https://bugzilla.redhat.com/1402828): Snapshot: Snapshot create command fails when gluster-shared-storage volume is stopped
- [#1402828](https://bugzilla.redhat.com/1402828): Snapshot: Snapshot create command fails when gluster-shared-storage volume is stopped
- [#1402841](https://bugzilla.redhat.com/1402841): Files remain unhealed forever if shd is disabled and re-enabled while healing is in progress.
- [#1403130](https://bugzilla.redhat.com/1403130): [GANESHA] Adding a node to cluster failed to allocate resource-agents to new node.
- [#1403780](https://bugzilla.redhat.com/1403780): Incorrect incrementation of volinfo refcnt during volume start
@@ -495,7 +509,7 @@ Bugs addressed since release-3.9 are listed below.
- [#1408757](https://bugzilla.redhat.com/1408757): Fix failure of split-brain-favorite-child-policy.t in CentOS7
- [#1408758](https://bugzilla.redhat.com/1408758): tests/bugs/glusterd/bug-913555.t fails spuriously
- [#1409078](https://bugzilla.redhat.com/1409078): RFE: Need a command to check op-version compatibility of clients
- [#1409186](https://bugzilla.redhat.com/1409186): Dict_t leak in dht_migration_complete_check_task and dht_rebalance_inprogress_task
- [#1409186](https://bugzilla.redhat.com/1409186): Dict_t leak in dht_migration_complete_check_task and dht_rebalance_inprogress_task
- [#1409202](https://bugzilla.redhat.com/1409202): Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user.
- [#1409206](https://bugzilla.redhat.com/1409206): Extra lookup/fstats are sent over the network when a brick is down.
- [#1409727](https://bugzilla.redhat.com/1409727): [ganesha + EC]posix compliance rename tests failed on EC volume with nfs-ganesha mount.
@@ -531,7 +545,7 @@ Bugs addressed since release-3.9 are listed below.
- [#1417042](https://bugzilla.redhat.com/1417042): glusterd restart is starting the offline shd daemon on other node in the cluster
- [#1417135](https://bugzilla.redhat.com/1417135): [Stress] : SHD Logs flooded with "Heal Failed" messages,filling up "/" quickly
- [#1417521](https://bugzilla.redhat.com/1417521): [SNAPSHOT] With all USS plugin enable .snaps directory is not visible in cifs mount as well as windows mount
- [#1417527](https://bugzilla.redhat.com/1417527): glusterfind: After glusterfind pre command execution all temporary files and directories /usr/var/lib/misc/glusterfsd/glusterfind/<session>/<volume>/ should be removed
- [#1417527](https://bugzilla.redhat.com/1417527): glusterfind: After glusterfind pre command execution all temporary files and directories /usr/var/lib/misc/glusterfsd/glusterfind/<session>/<volume>/ should be removed
- [#1417804](https://bugzilla.redhat.com/1417804): debug/trace: Print iatts of individual entries in readdirp callback for better debugging experience
- [#1418091](https://bugzilla.redhat.com/1418091): [RFE] Support multiple bricks in one process (multiplexing)
- [#1418536](https://bugzilla.redhat.com/1418536): Portmap allocates way too much memory (256KB) on stack
@@ -555,11 +569,11 @@ Bugs addressed since release-3.9 are listed below.
- [#1420987](https://bugzilla.redhat.com/1420987): warning messages seen in glusterd logs while setting the volume option
- [#1420989](https://bugzilla.redhat.com/1420989): when server-quorum is enabled, volume get returns 0 value for server-quorum-ratio
- [#1420991](https://bugzilla.redhat.com/1420991): Modified volume options not synced once offline nodes comes up.
- [#1421017](https://bugzilla.redhat.com/1421017): CLI option "--timeout" is accepting non numeric and negative values.
- [#1421017](https://bugzilla.redhat.com/1421017): CLI option "--timeout" is accepting non numeric and negative values.
- [#1421956](https://bugzilla.redhat.com/1421956): Disperse: Fallback to pre-compiled code execution when dynamic code generation fails
- [#1422350](https://bugzilla.redhat.com/1422350): glustershd process crashed on systemic setup
- [#1422363](https://bugzilla.redhat.com/1422363): [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
- [#1422391](https://bugzilla.redhat.com/1422391): Gluster NFS server crashing in __mnt3svc_umountall
- [#1422391](https://bugzilla.redhat.com/1422391): Gluster NFS server crashing in \_\_mnt3svc_umountall
- [#1422766](https://bugzilla.redhat.com/1422766): Entry heal messages in glustershd.log while no entries shown in heal info
- [#1422777](https://bugzilla.redhat.com/1422777): DHT doesn't evenly balance files on FreeBSD with ZFS
- [#1422819](https://bugzilla.redhat.com/1422819): [Geo-rep] Recreating geo-rep session with same slave after deleting with reset-sync-time fails to sync

View File

@@ -6,17 +6,17 @@ bugs in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
1. auth-allow setting was broken with 3.10 release and is now fixed (#1429117)
1. auth-allow setting was broken with 3.10 release and is now fixed (#1429117)
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- If you are using sharded volumes, DO NOT rebalance them till this is
fixed
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- If you are using sharded volumes, DO NOT rebalance them till this is
fixed
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
## Bugs addressed
@@ -28,7 +28,7 @@ A total of 31 patches have been merged, addressing 26 bugs:
- [#1426222](https://bugzilla.redhat.com/1426222): build: fixes to build 3.9.0rc2 on Debian (jessie)
- [#1426323](https://bugzilla.redhat.com/1426323): common-ha: no need to remove nodes one-by-one in teardown
- [#1426329](https://bugzilla.redhat.com/1426329): [Ganesha] : Add comment to Ganesha HA config file ,about cluster name's length limitation
- [#1427387](https://bugzilla.redhat.com/1427387): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1427387](https://bugzilla.redhat.com/1427387): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1427399](https://bugzilla.redhat.com/1427399): [RFE] capture portmap details in glusterd's statedump
- [#1427461](https://bugzilla.redhat.com/1427461): Bricks take up new ports upon volume restart after add-brick op with brick mux enabled
- [#1428670](https://bugzilla.redhat.com/1428670): Disconnects in nfs mount leads to IO hang and mount inaccessible
@@ -36,7 +36,7 @@ A total of 31 patches have been merged, addressing 26 bugs:
- [#1429117](https://bugzilla.redhat.com/1429117): auth failure after upgrade to GlusterFS 3.10
- [#1429402](https://bugzilla.redhat.com/1429402): Restore atime/mtime for symlinks and other non-regular files.
- [#1429773](https://bugzilla.redhat.com/1429773): disallow increasing replica count for arbiter volumes
- [#1430512](https://bugzilla.redhat.com/1430512): /libgfxdr.so.0.0.1: undefined symbol: __gf_free
- [#1430512](https://bugzilla.redhat.com/1430512): /libgfxdr.so.0.0.1: undefined symbol: \_\_gf_free
- [#1430844](https://bugzilla.redhat.com/1430844): build/packaging: Debian and Ubuntu don't have /usr/libexec/; results in bad packages
- [#1431175](https://bugzilla.redhat.com/1431175): volume start command hangs
- [#1431176](https://bugzilla.redhat.com/1431176): USS is broken when multiplexing is on

View File

@@ -6,6 +6,7 @@ the new features that were added and bugs fixed in the GlusterFS
3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
@@ -13,10 +14,9 @@ the new features that were added and bugs fixed in the GlusterFS
1. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
Bugs addressed since release-3.10.9 are listed below.
- [#1498081](https://bugzilla.redhat.com/1498081): dht_(f)xattrop does not implement migration checks
- [#1498081](https://bugzilla.redhat.com/1498081): dht\_(f)xattrop does not implement migration checks
- [#1534848](https://bugzilla.redhat.com/1534848): entries not getting cleared post healing of softlinks (stale entries showing up in heal info)

View File

@@ -6,6 +6,7 @@ the new features that were added and bugs fixed in the GlusterFS
3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
@@ -13,13 +14,12 @@ the new features that were added and bugs fixed in the GlusterFS
1. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
Bugs addressed since release-3.10.10 are listed below.
- [#1486542](https://bugzilla.redhat.com/1486542): "ganesha.so cannot open" warning message in glusterd log in non ganesha setup.
- [#1486542](https://bugzilla.redhat.com/1486542): "ganesha.so cannot open" warning message in glusterd log in non ganesha setup.
- [#1544461](https://bugzilla.redhat.com/1544461): 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on Ubuntu 14
- [#1544787](https://bugzilla.redhat.com/1544787): tests/bugs/cli/bug-1169302.t fails spuriously
- [#1546912](https://bugzilla.redhat.com/1546912): tests/bugs/posix/bug-990028.t fails in release-3.10 branch
- [#1549482](https://bugzilla.redhat.com/1549482): Quota: After deleting directory from mount point on which quota was configured, quota list command output is blank
- [#1549482](https://bugzilla.redhat.com/1549482): Quota: After deleting directory from mount point on which quota was configured, quota list command output is blank

View File

@@ -8,6 +8,7 @@ GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
This release contains a fix for a security vulerability in Gluster as follows,
- http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-1088
- https://nvd.nist.gov/vuln/detail/CVE-2018-1088
@@ -24,7 +25,6 @@ See, this [guide](https://docs.gluster.org/en/v3/Administrator%20Guide/SSL/) for
1. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
Bugs addressed since release-3.10.11 are listed below.

View File

@@ -6,18 +6,19 @@ contains a listing of all the new features that were added and
bugs in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
1. Many bugs brick multiplexing and nfs-ganesha+ha bugs have been addressed.
2. Rebalance and remove brick operations have been disabled for sharded volumes
to prevent data corruption.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
## Bugs addressed
@@ -40,12 +41,12 @@ A total of 63 patches have been merged, addressing 46 bugs:
- [#1443349](https://bugzilla.redhat.com/1443349): [Eventing]: Unrelated error message displayed when path specified during a 'webhook-test/add' is missing a schema
- [#1441576](https://bugzilla.redhat.com/1441576): [geo-rep]: rsync should not try to sync internal xattrs
- [#1441927](https://bugzilla.redhat.com/1441927): [geo-rep]: Worker crashes with [Errno 16] Device or resource busy: '.gfid/00000000-0000-0000-0000-000000000001/dir.166 while renaming directories
- [#1401877](https://bugzilla.redhat.com/1401877): [GANESHA] Symlinks from /etc/ganesha/ganesha.conf to shared\_storage are created on the non-ganesha nodes in 8 node gluster having 4 node ganesha cluster
- [#1425723](https://bugzilla.redhat.com/1425723): nfs-ganesha volume export file remains stale in shared\_storage\_volume when volume is deleted
- [#1401877](https://bugzilla.redhat.com/1401877): [GANESHA] Symlinks from /etc/ganesha/ganesha.conf to shared_storage are created on the non-ganesha nodes in 8 node gluster having 4 node ganesha cluster
- [#1425723](https://bugzilla.redhat.com/1425723): nfs-ganesha volume export file remains stale in shared_storage_volume when volume is deleted
- [#1427759](https://bugzilla.redhat.com/1427759): nfs-ganesha: Incorrect error message returned when disable fails
- [#1438325](https://bugzilla.redhat.com/1438325): Need to improve remove-brick failure message when the brick process is down.
- [#1438338](https://bugzilla.redhat.com/1438338): glusterd is setting replicate volume property over disperse volume or vice versa
- [#1438340](https://bugzilla.redhat.com/1438340): glusterd is not validating for allowed values while setting "cluster.brick-multiplex" property
- [#1438340](https://bugzilla.redhat.com/1438340): glusterd is not validating for allowed values while setting "cluster.brick-multiplex" property
- [#1441476](https://bugzilla.redhat.com/1441476): Glusterd crashes when restarted with many volumes
- [#1444128](https://bugzilla.redhat.com/1444128): [BrickMultiplex] gluster command not responding and .snaps directory is not visible after executing snapshot related command
- [#1445260](https://bugzilla.redhat.com/1445260): [GANESHA] Volume start and stop having ganesha enable on it,turns off cache-invalidation on volume
@@ -54,10 +55,10 @@ A total of 63 patches have been merged, addressing 46 bugs:
- [#1435779](https://bugzilla.redhat.com/1435779): Inode ref leak on anonymous reads and writes
- [#1440278](https://bugzilla.redhat.com/1440278): [GSS] NFS Sub-directory mount not working on solaris10 client
- [#1450378](https://bugzilla.redhat.com/1450378): GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
- [#1449779](https://bugzilla.redhat.com/1449779): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1449779](https://bugzilla.redhat.com/1449779): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1450564](https://bugzilla.redhat.com/1450564): glfsheal: crashed(segfault) with disperse volume in RDMA
- [#1443501](https://bugzilla.redhat.com/1443501): Don't wind post-op on a brick where the fop phase failed.
- [#1444892](https://bugzilla.redhat.com/1444892): When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st\_size value.
- [#1444892](https://bugzilla.redhat.com/1444892): When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st_size value.
- [#1449169](https://bugzilla.redhat.com/1449169): Multiple bricks WILL crash after TCP port probing
- [#1440805](https://bugzilla.redhat.com/1440805): Update rfc.sh to check Change-Id consistency for backports
- [#1443010](https://bugzilla.redhat.com/1443010): snapshot: snapshots appear to be failing with respect to secure geo-rep slave
@@ -65,8 +66,7 @@ A total of 63 patches have been merged, addressing 46 bugs:
- [#1444773](https://bugzilla.redhat.com/1444773): explicitly specify executor to be bash for tests
- [#1445407](https://bugzilla.redhat.com/1445407): remove bug-1421590-brick-mux-reuse-ports.t
- [#1440742](https://bugzilla.redhat.com/1440742): Test files clean up for tier during 3.10
- [#1448790](https://bugzilla.redhat.com/1448790): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1448790](https://bugzilla.redhat.com/1448790): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1435942](https://bugzilla.redhat.com/1435942): Enabling parallel-readdir causes dht linkto files to be visible on the mount,
- [#1437763](https://bugzilla.redhat.com/1437763): File-level WORM allows ftruncate() on read-only files
- [#1439148](https://bugzilla.redhat.com/1439148): Parallel readdir on Gluster NFS displays less number of dentries

View File

@@ -6,18 +6,20 @@ contain a listing of all the new features that were added and
bugs in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
1. No Major changes
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
@@ -27,13 +29,12 @@ A total of 18 patches have been merged, addressing 13 bugs:
- [#1450773](https://bugzilla.redhat.com/1450773): Quota: After upgrade from 3.7 to higher version , gluster quota list command shows "No quota configured on volume repvol"
- [#1450934](https://bugzilla.redhat.com/1450934): [New] - Replacing an arbiter brick while I/O happens causes vm pause
- [#1450947](https://bugzilla.redhat.com/1450947): Autoconf leaves unexpanded variables in path names of non-shell-scripttext files
- [#1451371](https://bugzilla.redhat.com/1451371): crash in dht\_rmdir\_do
- [#1451371](https://bugzilla.redhat.com/1451371): crash in dht_rmdir_do
- [#1451561](https://bugzilla.redhat.com/1451561): AFR returns the node uuid of the same node for every file in the replica
- [#1451587](https://bugzilla.redhat.com/1451587): cli xml status of detach tier broken
- [#1451977](https://bugzilla.redhat.com/1451977): Add logs to identify whether disconnects are voluntary or due to network problems
- [#1451995](https://bugzilla.redhat.com/1451995): Log message shows error code as success even when rpc fails to connect
- [#1453056](https://bugzilla.redhat.com/1453056): [DHt] : segfault in dht\_selfheal\_dir\_setattr while running regressions
- [#1453056](https://bugzilla.redhat.com/1453056): [DHt] : segfault in dht_selfheal_dir_setattr while running regressions
- [#1453087](https://bugzilla.redhat.com/1453087): Brick Multiplexing: On reboot of a node Brick multiplexing feature lost on that node as multiple brick processes get spawned
- [#1456682](https://bugzilla.redhat.com/1456682): tierd listens to a port.
- [#1457054](https://bugzilla.redhat.com/1457054): glusterfs client crash on io-cache.so(\_\_ioc\_page\_wakeup+0x44)
- [#1457054](https://bugzilla.redhat.com/1457054): glusterfs client crash on io-cache.so(\_\_ioc_page_wakeup+0x44)

View File

@@ -6,26 +6,28 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
1. No Major changes
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
3. Another rebalance related bug is being worked upon [#1467010](https://bugzilla.redhat.com/1467010)
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- Status of this bug can be tracked here, [#1426508](https://bugzilla.redhat.com/1426508)
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
3. Another rebalance related bug is being worked upon [#1467010](https://bugzilla.redhat.com/1467010)
## Bugs addressed
A total of 18 patches have been merged, addressing 13 bugs:
- [#1457732](https://bugzilla.redhat.com/1457732): "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf
- [#1459760](https://bugzilla.redhat.com/1459760): Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
- [#1459760](https://bugzilla.redhat.com/1459760): Glusterd segmentation fault in ' \_Unwind_Backtrace' while running peer probe
- [#1460649](https://bugzilla.redhat.com/1460649): posix-acl: Whitelist virtual ACL xattrs
- [#1460914](https://bugzilla.redhat.com/1460914): Rebalance estimate time sometimes shows negative values
- [#1460993](https://bugzilla.redhat.com/1460993): Revert CLI restrictions on running rebalance in VM store use case

View File

@@ -6,19 +6,22 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1467010](https://bugzilla.redhat.com/show_bug.cgi?id=1467010)
has a fix with this release. As further testing is still in progress, the issue
is retained as a major issue.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1467010](https://bugzilla.redhat.com/show_bug.cgi?id=1467010)
has a fix with this release. As further testing is still in progress, the issue
is retained as a major issue.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
@@ -46,4 +49,4 @@ Bugs addressed since release-3.10.4 are listed below.
- [#1476212](https://bugzilla.redhat.com/1476212): [geo-rep]: few of the self healed hardlinks on master did not sync to slave
- [#1478498](https://bugzilla.redhat.com/1478498): scripts: invalid test in S32gluster_enable_shared_storage.sh
- [#1478499](https://bugzilla.redhat.com/1478499): packaging: /var/lib/glusterd/options should be %config(noreplace)
- [#1480594](https://bugzilla.redhat.com/1480594): nfs process crashed in "nfs3_getattr"
- [#1480594](https://bugzilla.redhat.com/1480594): nfs process crashed in "nfs3_getattr"

View File

@@ -6,18 +6,21 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed
@@ -28,7 +31,7 @@ Bugs addressed since release-3.10.5 are listed below.
- [#1482857](https://bugzilla.redhat.com/1482857): glusterd fails to start
- [#1483997](https://bugzilla.redhat.com/1483997): packaging: use rdma-core(-devel) instead of ibverbs, rdmacm; disable rdma on armv7hl
- [#1484443](https://bugzilla.redhat.com/1484443): packaging: /run and /var/run; prefer /run
- [#1486542](https://bugzilla.redhat.com/1486542): "ganesha.so cannot open" warning message in glusterd log in non ganesha setup.
- [#1486542](https://bugzilla.redhat.com/1486542): "ganesha.so cannot open" warning message in glusterd log in non ganesha setup.
- [#1487042](https://bugzilla.redhat.com/1487042): AFR returns the node uuid of the same node for every file in the replica
- [#1487647](https://bugzilla.redhat.com/1487647): with AFR now making both nodes to return UUID for a file will result in georep consuming more resources
- [#1488391](https://bugzilla.redhat.com/1488391): gluster-blockd process crashed and core generated
@@ -38,7 +41,7 @@ Bugs addressed since release-3.10.5 are listed below.
- [#1491691](https://bugzilla.redhat.com/1491691): rpc: TLSv1_2_method() is deprecated in OpenSSL-1.1
- [#1491966](https://bugzilla.redhat.com/1491966): AFR entry self heal removes a directory's .glusterfs symlink.
- [#1491985](https://bugzilla.redhat.com/1491985): Add NULL gfid checks before creating file
- [#1491995](https://bugzilla.redhat.com/1491995): afr: check op_ret value in __afr_selfheal_name_impunge
- [#1491995](https://bugzilla.redhat.com/1491995): afr: check op_ret value in \_\_afr_selfheal_name_impunge
- [#1492010](https://bugzilla.redhat.com/1492010): Launch metadata heal in discover code path.
- [#1495430](https://bugzilla.redhat.com/1495430): Make event-history feature configurable and have it disabled by default
- [#1496321](https://bugzilla.redhat.com/1496321): [afr] split-brain observed on T files post hardlink and rename in x3 volume

View File

@@ -6,18 +6,21 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed

View File

@@ -6,18 +6,21 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed

View File

@@ -6,18 +6,21 @@ the new features that were added and bugs fixed in the GlusterFS
3.10 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance)
there are reports of VM images getting corrupted.
- The last known cause for corruption [#1498081](https://bugzilla.redhat.com/show_bug.cgi?id=1498081)
is still pending, and not yet a part of this release.
2. Brick multiplexing is being tested and fixed aggressively but we still have a
few crashes and memory leaks to fix.
## Bugs addressed

View File

@@ -11,6 +11,7 @@ of bugs that have been addressed is included further below.
## Major changes and features
### Switched to storhaug for ganesha and samba high availability
**Notes for users:**
High Availability (HA) support for NFS-Ganesha (NFS) and Samba (SMB)
@@ -26,6 +27,7 @@ There are many to choose from in most popular Linux distributions.
Choose the one the best fits your environment and use it.
### Added SELinux support for Gluster Volumes
**Notes for users:**
A new xlator has been introduced (`features/selinux`) to allow setting the
@@ -40,17 +42,20 @@ This feature is intended to be the base for implementing Labelled-NFS in
NFS-Ganesha and SELinux support for FUSE mounts in the Linux kernel.
**Limitations:**
- The Linux kernel does not support mounting of FUSE filesystems with SELinux
support, yet.
- NFS-Ganesha does not support Labelled-NFS, yet.
**Known Issues:**
- There has been limited testing, because other projects can not consume the
functionality yet without being part of a release. So far, no problems have
been observed, but this might change when other projects start to seriously
use this.
### Several memory leaks are fixed in gfapi during graph switches
**Notes for users:**
Gluster API (or gfapi), has had a few memory leak issues arising specifically
@@ -59,9 +64,11 @@ addressed in this release, and more work towards ironing out the pending leaks
are in the works across the next few releases.
**Limitations:**
- There are still a few leaks to be addressed when graph switches occur
### get-state CLI is enhanced to provide client and brick capacity related information
**Notes for users:**
The get-state CLI output now optionally accommodates client related information
@@ -80,11 +87,13 @@ bricks as obtained from `gluster volume status <volname>|all detail` has also
been added to the get-state output.
**Limitations:**
- Information for non-local bricks and clients connected to non-local bricks
won't be available. This is a known limitation of the get-state command, since
get-state command doesn't provide information on non-local bricks.
won't be available. This is a known limitation of the get-state command, since
get-state command doesn't provide information on non-local bricks.
### Ability to serve negative lookups from cache has been added
**Notes for users:**
Before creating / renaming any file, lookups (around, 5-6 when using the SMB
@@ -99,10 +108,13 @@ Execute the following commands to enable negative-lookup cache:
# gluster volume set <volname> features.cache-invalidation-timeout 600
# gluster volume set <VOLNAME> nl-cache on
```
**Limitations**
- This feature is supported only for SMB access, for this release
### New xlator to help developers detecting resource leaks has been added
**Notes for users:**
This is intended as a developer feature, and hence there is no direct user
@@ -114,6 +126,7 @@ gfapi and any xlator in between the API and the sink xlator.
More details can be found in [this](http://lists.gluster.org/pipermail/gluster-devel/2017-April/052618.html) thread on the gluster-devel lists
### Feature for metadata-caching/small file performance is production ready
**Notes for users:**
Over the course of releases several fixes and enhancements have been made to
@@ -132,15 +145,18 @@ SMB access, by enabling metadata caching:
- Renaming files
To enable metadata caching execute the following commands:
```bash
# gluster volume set group metadata-cache
# gluster volume set network.inode-lru-limit <n>
```
\<n\>, is set to 50000 by default. It should be increased if the number of
concurrently accessed files in the volume is very high. Increasing this number
increases the memory footprint of the brick processes.
### "Parallel Readdir" feature introduced in 3.10.0 is production ready
**Notes for users:**
This feature was introduced in 3.10 and was experimental in nature. Over the
@@ -150,6 +166,7 @@ stabilized and is ready for use in production environments.
For further details refer: [3.10.0 release notes](https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md)
### Object versioning is enabled only if bitrot is enabled
**Notes for users:**
Object versioning was turned on by default on brick processes by the bitrot
@@ -161,6 +178,7 @@ To fix this, object versioning is disabled by default, and is only enabled as
a part of enabling the bitrot option.
### Distribute layer provides more robust transactions during directory namespace operations
**Notes for users:**
Distribute layer in Gluster, creates and maintains directories in all subvolumes
@@ -173,6 +191,7 @@ ensuring better consistency of the file system as a whole, when dealing with
racing operations, operating on the same directory object.
### gfapi extended readdirplus API has been added
**Notes for users:**
An extended readdirplus API `glfs_xreaddirplus` is added to get extra
@@ -184,10 +203,12 @@ involving directory listing.
The API syntax and usage can be found in [`glfs.h`](https://github.com/gluster/glusterfs/blob/v3.11.0rc1/api/src/glfs.h#L810) header file.
**Limitations:**
- This API currently has support to only return stat and handles (`glfs_object`)
for each dirent of the directory, but can be extended in the future.
for each dirent of the directory, but can be extended in the future.
### Improved adoption of standard refcounting functions across the code
**Notes for users:**
This change does not impact users, it is an internal code cleanup activity
@@ -195,10 +216,12 @@ that ensures that we ref count in a standard manner, thus avoiding unwanted
bugs due to different implementations of the same.
**Known Issues:**
- This standardization started with this release and is expected to continue
across releases.
across releases.
### Performance improvements to rebalance have been made
**Notes for users:**
Both crawling and migration improvement has been done in rebalance. The crawler
@@ -209,7 +232,7 @@ both the nodes divide the load among each other giving boost to migration
performance. And also there have been some optimization to avoid redundant
network operations (or RPC calls) in the process of migrating a file.
Further, file migration now avoids syncop framework and is managed entirely by
Further, file migration now avoids syncop framework and is managed entirely by
rebalance threads giving performance boost.
Also, There is a change to throttle settings in rebalance. Earlier user could
@@ -220,21 +243,23 @@ of threads rebalance process will work with, thereby translating to the number
of files being migrated in parallel.
### Halo Replication feature in AFR has been introduced
**Notes for users:**
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster. Clients can also write synchronously to the cluster
rest of the cluster. Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.
To enable halo feature execute the following commands:
```bash
# gluster volume set cluster.halo-enabled yes
```
You may have to set the following following options to change defaults.
`cluster.halo-shd-latency`: The threshold below which self-heal daemons will
`cluster.halo-shd-latency`: The threshold below which self-heal daemons will
consider children (bricks) connected.
`cluster.halo-nfsd-latency`: The threshold below which NFS daemons will consider
@@ -249,12 +274,14 @@ If the number of children falls below this threshold the next
best (chosen by latency) shall be swapped in.
### FALLOCATE support with EC
**Notes for users**
Support for FALLOCATE file operation on EC volume is added with this release.
EC volumes can now support basic FALLOCATE functionality.
### Self-heal window-size control option for EC
**Notes for users**
Support to control the maximum size of read/write operation carried out
@@ -262,14 +289,16 @@ during self-heal process has been added with this release. User has to tune
'disperse.self-heal-window-size' option on disperse volume to adjust the size.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- Status of this bug can be tracked here, #1426508
- Latest series of fixes for the issue (which are present in this release as
well) are not showing the previous corruption, and hence the fixes look
good, but this is maintained on the watch list nevetheness.
well) are not showing the previous corruption, and hence the fixes look
good, but this is maintained on the watch list nevetheness.
## Bugs addressed
@@ -289,7 +318,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1328342](https://bugzilla.redhat.com/1328342): [tiering]: gluster v reset of watermark levels can allow low watermark level to have a higher value than hi watermark level
- [#1353952](https://bugzilla.redhat.com/1353952): [geo-rep]: rsync should not try to sync internal xattrs
- [#1356076](https://bugzilla.redhat.com/1356076): DHT doesn't evenly balance files on FreeBSD with ZFS
- [#1359599](https://bugzilla.redhat.com/1359599): BitRot :- bit-rot.signature and bit-rot.version xattr should not be set if bitrot is not enabled on volume
- [#1359599](https://bugzilla.redhat.com/1359599): BitRot :- bit-rot.signature and bit-rot.version xattr should not be set if bitrot is not enabled on volume
- [#1369393](https://bugzilla.redhat.com/1369393): dead loop in changelog_rpc_server_destroy
- [#1383893](https://bugzilla.redhat.com/1383893): glusterd restart is starting the offline shd daemon on other node in the cluster
- [#1384989](https://bugzilla.redhat.com/1384989): libglusterfs : update correct memory segments in glfs-message-id
@@ -304,7 +333,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1399593](https://bugzilla.redhat.com/1399593): Obvious typo in cleanup code in rpc_clnt_notify
- [#1401571](https://bugzilla.redhat.com/1401571): bitrot quarantine dir misspelled
- [#1401812](https://bugzilla.redhat.com/1401812): RFE: Make readdirp parallel in dht
- [#1401877](https://bugzilla.redhat.com/1401877): [GANESHA] Symlinks from /etc/ganesha/ganesha.conf to shared_storage are created on the non-ganesha nodes in 8 node gluster having 4 node ganesha cluster
- [#1401877](https://bugzilla.redhat.com/1401877): [GANESHA] Symlinks from /etc/ganesha/ganesha.conf to shared_storage are created on the non-ganesha nodes in 8 node gluster having 4 node ganesha cluster
- [#1402254](https://bugzilla.redhat.com/1402254): compile warning unused variable
- [#1402661](https://bugzilla.redhat.com/1402661): Samba crash when mounting a distributed dispersed volume over CIFS
- [#1404424](https://bugzilla.redhat.com/1404424): The data-self-heal option is not honored in AFR
@@ -317,10 +346,10 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1411334](https://bugzilla.redhat.com/1411334): Improve output of "gluster volume status detail"
- [#1412135](https://bugzilla.redhat.com/1412135): rename of the same file from multiple clients with caching enabled may result in duplicate files
- [#1412549](https://bugzilla.redhat.com/1412549): EXPECT_WITHIN is taking too much time even if the result matches with expected value
- [#1413526](https://bugzilla.redhat.com/1413526): glusterfind: After glusterfind pre command execution all temporary files and directories /usr/var/lib/misc/glusterfsd/glusterfind/<session>/<volume>/ should be removed
- [#1413526](https://bugzilla.redhat.com/1413526): glusterfind: After glusterfind pre command execution all temporary files and directories /usr/var/lib/misc/glusterfsd/glusterfind/<session>/<volume>/ should be removed
- [#1413971](https://bugzilla.redhat.com/1413971): Bonnie test suite failed with "Can't open file" error
- [#1414287](https://bugzilla.redhat.com/1414287): repeated operation failed warnings in gluster mount logs with disperse volume
- [#1414346](https://bugzilla.redhat.com/1414346): Quota: After upgrade from 3.7 to higher version , gluster quota list command shows "No quota configured on volume repvol"
- [#1414346](https://bugzilla.redhat.com/1414346): Quota: After upgrade from 3.7 to higher version , gluster quota list command shows "No quota configured on volume repvol"
- [#1414645](https://bugzilla.redhat.com/1414645): Typo in glusterfs code comments
- [#1414782](https://bugzilla.redhat.com/1414782): Add logs to selfheal code path to be helpful for debug
- [#1414902](https://bugzilla.redhat.com/1414902): packaging: python/python2(/python3) cleanup
@@ -341,7 +370,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1418095](https://bugzilla.redhat.com/1418095): Portmap allocates way too much memory (256KB) on stack
- [#1418213](https://bugzilla.redhat.com/1418213): [Ganesha+SSL] : Bonnie++ hangs during rewrites.
- [#1418249](https://bugzilla.redhat.com/1418249): [RFE] Need to have group cli option to set all md-cache options using a single command
- [#1418259](https://bugzilla.redhat.com/1418259): Quota: After deleting directory from mount point on which quota was configured, quota list command output is blank
- [#1418259](https://bugzilla.redhat.com/1418259): Quota: After deleting directory from mount point on which quota was configured, quota list command output is blank
- [#1418417](https://bugzilla.redhat.com/1418417): packaging: remove glusterfs-ganesha subpackage
- [#1418629](https://bugzilla.redhat.com/1418629): glustershd process crashed on systemic setup
- [#1418900](https://bugzilla.redhat.com/1418900): [RFE] Include few more options in virt file
@@ -355,7 +384,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1420619](https://bugzilla.redhat.com/1420619): Entry heal messages in glustershd.log while no entries shown in heal info
- [#1420623](https://bugzilla.redhat.com/1420623): [RHV-RHGS]: Application VM paused after add brick operation and VM didn't comeup after power cycle.
- [#1420637](https://bugzilla.redhat.com/1420637): Modified volume options not synced once offline nodes comes up.
- [#1420697](https://bugzilla.redhat.com/1420697): CLI option "--timeout" is accepting non numeric and negative values.
- [#1420697](https://bugzilla.redhat.com/1420697): CLI option "--timeout" is accepting non numeric and negative values.
- [#1420713](https://bugzilla.redhat.com/1420713): glusterd: storhaug, remove all vestiges ganesha
- [#1421023](https://bugzilla.redhat.com/1421023): Binary file gf_attach generated during build process should be git ignored
- [#1421590](https://bugzilla.redhat.com/1421590): Bricks take up new ports upon volume restart after add-brick op with brick mux enabled
@@ -364,9 +393,9 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1421653](https://bugzilla.redhat.com/1421653): dht_setxattr returns EINVAL when a file is deleted during the FOP
- [#1421721](https://bugzilla.redhat.com/1421721): volume start command hangs
- [#1421724](https://bugzilla.redhat.com/1421724): glusterd log is flooded with stale disconnect rpc messages
- [#1421759](https://bugzilla.redhat.com/1421759): Gluster NFS server crashing in __mnt3svc_umountall
- [#1421759](https://bugzilla.redhat.com/1421759): Gluster NFS server crashing in \_\_mnt3svc_umountall
- [#1421937](https://bugzilla.redhat.com/1421937): [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
- [#1421938](https://bugzilla.redhat.com/1421938): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1421938](https://bugzilla.redhat.com/1421938): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1421955](https://bugzilla.redhat.com/1421955): Disperse: Fallback to pre-compiled code execution when dynamic code generation fails
- [#1422074](https://bugzilla.redhat.com/1422074): GlusterFS truncates nanoseconds to microseconds when setting mtime
- [#1422152](https://bugzilla.redhat.com/1422152): Bricks not coming up when ran with address sanitizer
@@ -387,7 +416,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1424815](https://bugzilla.redhat.com/1424815): Fix erronous comparaison of flags resulting in UUID always sent
- [#1424894](https://bugzilla.redhat.com/1424894): Some switches don't have breaks causing unintended fall throughs.
- [#1424905](https://bugzilla.redhat.com/1424905): Coverity: Memory issues and dead code
- [#1425288](https://bugzilla.redhat.com/1425288): glusterd is not validating for allowed values while setting "cluster.brick-multiplex" property
- [#1425288](https://bugzilla.redhat.com/1425288): glusterd is not validating for allowed values while setting "cluster.brick-multiplex" property
- [#1425515](https://bugzilla.redhat.com/1425515): tests: quota-anon-fd-nfs.t needs to check if nfs mount is avialable before mounting
- [#1425623](https://bugzilla.redhat.com/1425623): Free all xlator specific resources when xlator->fini() gets called
- [#1425676](https://bugzilla.redhat.com/1425676): gfids are not populated in release/releasedir requests
@@ -415,8 +444,8 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1428510](https://bugzilla.redhat.com/1428510): memory leak in features/locks xlator
- [#1429198](https://bugzilla.redhat.com/1429198): Restore atime/mtime for symlinks and other non-regular files.
- [#1429200](https://bugzilla.redhat.com/1429200): disallow increasing replica count for arbiter volumes
- [#1429330](https://bugzilla.redhat.com/1429330): [crawler]: auxiliary mount remains even after crawler finishes
- [#1429696](https://bugzilla.redhat.com/1429696): ldd libgfxdr.so.0.0.1: undefined symbol: __gf_free
- [#1429330](https://bugzilla.redhat.com/1429330): [crawler]: auxiliary mount remains even after crawler finishes
- [#1429696](https://bugzilla.redhat.com/1429696): ldd libgfxdr.so.0.0.1: undefined symbol: \_\_gf_free
- [#1430042](https://bugzilla.redhat.com/1430042): Transport endpoint not connected error seen on client when glusterd is restarted
- [#1430148](https://bugzilla.redhat.com/1430148): USS is broken when multiplexing is on
- [#1430608](https://bugzilla.redhat.com/1430608): [RFE] Pass slave volume in geo-rep as read-only
@@ -452,7 +481,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1438370](https://bugzilla.redhat.com/1438370): rebalance: Allow admin to change thread count for rebalance
- [#1438411](https://bugzilla.redhat.com/1438411): [Ganesha + EC] : Input/Output Error while creating LOTS of smallfiles
- [#1438738](https://bugzilla.redhat.com/1438738): Inode ref leak on anonymous reads and writes
- [#1438772](https://bugzilla.redhat.com/1438772): build: clang/llvm has __builtin_ffs() and __builtin_popcount()
- [#1438772](https://bugzilla.redhat.com/1438772): build: clang/llvm has **builtin_ffs() and **builtin_popcount()
- [#1438810](https://bugzilla.redhat.com/1438810): File-level WORM allows ftruncate() on read-only files
- [#1438858](https://bugzilla.redhat.com/1438858): explicitly specify executor to be bash for tests
- [#1439527](https://bugzilla.redhat.com/1439527): [disperse] Don't count healing brick as healthy brick
@@ -491,7 +520,7 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1449004](https://bugzilla.redhat.com/1449004): [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd restart and not coming back up after volume start force
- [#1449191](https://bugzilla.redhat.com/1449191): Multiple bricks WILL crash after TCP port probing
- [#1449311](https://bugzilla.redhat.com/1449311): [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest
- [#1449775](https://bugzilla.redhat.com/1449775): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1449775](https://bugzilla.redhat.com/1449775): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1449921](https://bugzilla.redhat.com/1449921): afr: include quorum type and count when dumping afr priv
- [#1449924](https://bugzilla.redhat.com/1449924): When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st_size value.
- [#1449933](https://bugzilla.redhat.com/1449933): Brick Multiplexing :- resetting a brick bring down other bricks with same PID
@@ -499,25 +528,25 @@ Bugs addressed since release-3.10.0 are listed below.
- [#1450377](https://bugzilla.redhat.com/1450377): GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
- [#1450565](https://bugzilla.redhat.com/1450565): glfsheal: crashed(segfault) with disperse volume in RDMA
- [#1450729](https://bugzilla.redhat.com/1450729): Brick Multiplexing: seeing Input/Output Error for .trashcan
- [#1450933](https://bugzilla.redhat.com/1450933): [New] - Replacing an arbiter brick while I/O happens causes vm pause
- [#1450933](https://bugzilla.redhat.com/1450933): [New] - Replacing an arbiter brick while I/O happens causes vm pause
- [#1451033](https://bugzilla.redhat.com/1451033): contrib: timer-wheel 32-bit bug, use builtin_fls, license, etc
- [#1451573](https://bugzilla.redhat.com/1451573): AFR returns the node uuid of the same node for every file in the replica
- [#1451586](https://bugzilla.redhat.com/1451586): crash in dht_rmdir_do
- [#1451591](https://bugzilla.redhat.com/1451591): cli xml status of detach tier broken
- [#1451887](https://bugzilla.redhat.com/1451887): Add tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t to bad tests
- [#1452000](https://bugzilla.redhat.com/1452000): Spacing issue in fix-layout status output
- [#1453050](https://bugzilla.redhat.com/1453050): [DHt] : segfault in dht_selfheal_dir_setattr while running regressions
- [#1453050](https://bugzilla.redhat.com/1453050): [DHt] : segfault in dht_selfheal_dir_setattr while running regressions
- [#1453086](https://bugzilla.redhat.com/1453086): Brick Multiplexing: On reboot of a node Brick multiplexing feature lost on that node as multiple brick processes get spawned
- [#1453152](https://bugzilla.redhat.com/1453152): [Parallel Readdir] : Mounts fail when performance.parallel-readdir is set to "off"
- [#1454533](https://bugzilla.redhat.com/1454533): lock_revocation.t Marked as bad in 3.11 for CentOS as well
- [#1454569](https://bugzilla.redhat.com/1454569): [geo-rep + nl]: Multiple crashes observed on slave with "nlc_lookup_cbk"
- [#1454597](https://bugzilla.redhat.com/1454597): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1454597](https://bugzilla.redhat.com/1454597): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1454612](https://bugzilla.redhat.com/1454612): glusterd on a node crashed after running volume profile command
- [#1454686](https://bugzilla.redhat.com/1454686): Implement FALLOCATE FOP for EC
- [#1454853](https://bugzilla.redhat.com/1454853): Seeing error "Failed to get the total number of files. Unable to estimate time to complete rebalance" in rebalance logs
- [#1455177](https://bugzilla.redhat.com/1455177): ignore incorrect uuid validation in gd_validate_mgmt_hndsk_req
- [#1455423](https://bugzilla.redhat.com/1455423): dht: dht self heal fails with no hashed subvol error
- [#1455907](https://bugzilla.redhat.com/1455907): heal info shows the status of the bricks as "Transport endpoint is not connected" though bricks are up
- [#1455907](https://bugzilla.redhat.com/1455907): heal info shows the status of the bricks as "Transport endpoint is not connected" though bricks are up
- [#1456224](https://bugzilla.redhat.com/1456224): [gluster-block]:Need a volume group profile option for gluster-block volume to add necessary options to be added.
- [#1456225](https://bugzilla.redhat.com/1456225): gluster-block is not working as expected when shard is enabled
- [#1456331](https://bugzilla.redhat.com/1456331): [Bitrot]: Brick process crash observed while trying to recover a bad file in disperse volume

View File

@@ -7,6 +7,7 @@ GlusterFS 3.11 stable release.
## Major changes, features and limitations addressed in this release
### Improved disperse performance
Fix for bug [#1456259](https://bugzilla.redhat.com/1456259) changes the way
messages are read and processed from the socket layers on the Gluster client.
This has shown performance improvements on disperse volumes, and is applicable
@@ -14,6 +15,7 @@ to other volume types as well, where there maybe multiple applications or users
accessing the same mount point.
### Group settings for enabling negative lookup caching provided
Ability to serve negative lookups from cache was added in 3.11.0 and with
this release, a group volume set option is added for ease in enabling this
feature.
@@ -21,6 +23,7 @@ feature.
See [group-nl-cache](https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache) for more details.
### Gluster fuse now implements "-oauto_unmount" feature.
libfuse has an auto_unmount option which, if enabled, ensures that the file
system is unmounted at FUSE server termination by running a separate monitor
process that performs the unmount when that occurs. This release implements that
@@ -30,15 +33,17 @@ Note that "auto unmount" (robust or not) is a leaky abstraction, as the kernel
cannot guarantee that at the path where the FUSE fs is mounted is actually the
toplevel mount at the time of the umount(2) call, for multiple reasons,
among others, see:
- fuse-devel: ["fuse: feasible to distinguish between umount and abort?"](http://fuse.996288.n3.nabble.com/fuse-feasible-to-distinguish-between-umount-and-abort-tt14358.html)
- https://github.com/libfuse/libfuse/issues/122
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- Status of this bug can be tracked here, #1465123
## Bugs addressed
@@ -46,7 +51,7 @@ among others, see:
Bugs addressed since release-3.11.0 are listed below.
- [#1456259](https://bugzilla.redhat.com/1456259): limited throughput with disperse volume over small number of bricks
- [#1457058](https://bugzilla.redhat.com/1457058): glusterfs client crash on io-cache.so(__ioc_page_wakeup+0x44)
- [#1457058](https://bugzilla.redhat.com/1457058): glusterfs client crash on io-cache.so(\_\_ioc_page_wakeup+0x44)
- [#1457289](https://bugzilla.redhat.com/1457289): tierd listens to a port.
- [#1457339](https://bugzilla.redhat.com/1457339): DHT: slow readdirp performance
- [#1457616](https://bugzilla.redhat.com/1457616): "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf
@@ -55,8 +60,8 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1458664](https://bugzilla.redhat.com/1458664): [Geo-rep]: METADATA errors are seen even though everything is in sync
- [#1459090](https://bugzilla.redhat.com/1459090): all: spelling errors (debian package maintainer)
- [#1459095](https://bugzilla.redhat.com/1459095): extras/hook-scripts: non-portable shell syntax (debian package maintainer)
- [#1459392](https://bugzilla.redhat.com/1459392): possible repeatedly recursive healing of same file with background heal not happening when IO is going on
- [#1459759](https://bugzilla.redhat.com/1459759): Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
- [#1459392](https://bugzilla.redhat.com/1459392): possible repeatedly recursive healing of same file with background heal not happening when IO is going on
- [#1459759](https://bugzilla.redhat.com/1459759): Glusterd segmentation fault in ' \_Unwind_Backtrace' while running peer probe
- [#1460647](https://bugzilla.redhat.com/1460647): posix-acl: Whitelist virtual ACL xattrs
- [#1460894](https://bugzilla.redhat.com/1460894): Rebalance estimate time sometimes shows negative values
- [#1460895](https://bugzilla.redhat.com/1460895): Upcall missing invalidations

View File

@@ -10,13 +10,14 @@ There are no major features or changes made in this release.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1465123) has a fix with this
release. As further testing is still in progress, the issue is retained as
a major issue.
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1465123
## Bugs addressed
@@ -26,8 +27,8 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1463512](https://bugzilla.redhat.com/1463512): USS: stale snap entries are seen when activation/deactivation performed during one of the glusterd's unavailability
- [#1463513](https://bugzilla.redhat.com/1463513): [geo-rep]: extended attributes are not synced if the entry and extended attributes are done within changelog roleover/or entry sync
- [#1463517](https://bugzilla.redhat.com/1463517): Brick Multiplexing:dmesg shows request_sock_TCP: Possible SYN flooding on port 49152 and memory related backtraces
- [#1463528](https://bugzilla.redhat.com/1463528): [Perf] 35% drop in small file creates on smbv3 on *2
- [#1463626](https://bugzilla.redhat.com/1463626): [Ganesha]Bricks got crashed while running posix compliance test suit on V4 mount
- [#1463528](https://bugzilla.redhat.com/1463528): [Perf] 35% drop in small file creates on smbv3 on \*2
- [#1463626](https://bugzilla.redhat.com/1463626): [Ganesha]Bricks got crashed while running posix compliance test suit on V4 mount
- [#1464316](https://bugzilla.redhat.com/1464316): DHT: Pass errno as an argument to gf_msg
- [#1465123](https://bugzilla.redhat.com/1465123): Fd based fops fail with EBADF on file migration
- [#1465854](https://bugzilla.redhat.com/1465854): Regression: Heal info takes longer time when a brick is down
@@ -36,7 +37,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1467268](https://bugzilla.redhat.com/1467268): Heal info shows incorrect status
- [#1468118](https://bugzilla.redhat.com/1468118): disperse seek does not correctly handle the end of file
- [#1468200](https://bugzilla.redhat.com/1468200): [Geo-rep]: entry failed to sync to slave with ENOENT errror
- [#1468457](https://bugzilla.redhat.com/1468457): selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
- [#1468457](https://bugzilla.redhat.com/1468457): selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
- [#1469459](https://bugzilla.redhat.com/1469459): Rebalance hangs on remove-brick if the target volume changes
- [#1470938](https://bugzilla.redhat.com/1470938): Regression: non-disruptive(in-service) upgrade on EC volume fails
- [#1471025](https://bugzilla.redhat.com/1471025): glusterfs process leaking memory when error occurs

View File

@@ -14,13 +14,14 @@ There are no major features or changes made in this release.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1465123) has a fix with the 3.11.2
release. As further testing is still in progress, the issue is retained as
a major issue.
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1465123
## Bugs addressed

View File

@@ -20,6 +20,7 @@ captures the list of features that were introduced with 3.11.
## Major changes and features
### Ability to mount sub-directories using the Gluster FUSE protocol
**Notes for users:**
With this release, it is possible define sub-directories to be mounted by
@@ -31,15 +32,19 @@ client. This feature helps sharing a volume among the multiple consumers along
with enabling restricting access to the sub-directory of choice.
Option controlling sub-directory allow/deny rules can be set as follows:
```
# gluster volume set <volname> auth.allow "/subdir1(192.168.1.*),/(192.168.10.*),/subdir2(192.168.8.*)"
```
How to mount from the client:
```
# mount -t glusterfs <hostname>:/<volname>/<subdir> /<mount_point>
```
Or,
```
# mount -t glusterfs <hostname>:/<volname> -osubdir_mount=<subdir> /<mount_point>
```
@@ -47,14 +52,15 @@ Or,
**Limitations:**
- There are no throttling or QoS support for this feature. The feature will
just provide the namespace isolation for the different clients.
just provide the namespace isolation for the different clients.
**Known Issues:**
- Once we cross more than 1000s of subdirs in 'auth.allow' option, the
performance of reconnect / authentication would be impacted.
performance of reconnect / authentication would be impacted.
### GFID to path conversion is enabled by default
**Notes for users:**
Prior to this feature, only when quota was enabled, did the on disk data have
@@ -80,18 +86,20 @@ None
None
### Various enhancements have been made to the output of get-state CLI command
**Notes for users:**
The command `#gluster get-state` has been enhanced to output more information
as below,
- Arbiter bricks are marked more clearly in a volume that has the feature
enabled
enabled
- Ability to get all volume options (both set and defaults) in the get-state
output
output
- Rebalance time estimates, for ongoing rebalance, is captured in the get-state
output
output
- If geo-replication is configured, then get-state now captures the session
details of the same
details of the same
**Limitations:**
@@ -102,6 +110,7 @@ None
None
### Provided an option to set a limit on number of bricks multiplexed in a processes
**Notes for users:**
This release includes a global option to be switched on only if brick
@@ -111,19 +120,22 @@ node. If the limit set by this option is insufficient for a single process,
more processes are spawned for the subsequent bricks.
Usage:
```
#gluster volume set all cluster.max-bricks-per-process <value>
```
### Provided an option to use localtime timestamps in log entries
**Limitations:**
Gluster defaults to UTC timestamps. glusterd, glusterfsd, and server-side
glusterfs daemons will use UTC until one of,
1. command line option is processed,
2. gluster config (/var/lib/glusterd/options) is loaded,
3. admin manually sets localtime-logging (cluster.localtime-logging, e.g.
`#gluster volume set all cluster.localtime-logging enable`).
`#gluster volume set all cluster.localtime-logging enable`).
There is no mount option to make the FUSE client enable localtime logging.
@@ -144,6 +156,7 @@ and also enhancing the ability for file placement in the distribute translator
when used with the option `min-free-disk`.
### Provided a means to resolve GFID split-brain using the gluster CLI
**Notes for users:**
The existing CLI commands to heal files under split-brain did not handle cases
@@ -152,6 +165,7 @@ the same CLI commands can now address GFID split-brain situations based on the
choices provided.
The CLI options that are enhanced to help with this situation are,
```
volume heal <VOLNAME> split-brain {bigger-file <FILE> |
latest-mtime <FILE> |
@@ -167,14 +181,16 @@ None
None
### Developer related: Added a 'site.h' for more vendor/company specific defaults
**Notes for developers:**
**NOTE**: Also relevant for users building from sources and needing different
defaults for some options
Most people consume Gluster in one of two ways:
* From packages provided by their OS/distribution vendor
* By building themselves from source
- From packages provided by their OS/distribution vendor
- By building themselves from source
For the first group it doesn't matter whether configuration is done in a
configure script, via command-line options to that configure script, or in a
@@ -198,6 +214,7 @@ file. Further guidelines for how to determine whether an option should go in
configure.ac or site.h are explained within site.h itself.
### Developer related: Added xxhash library to libglusterfs for required use
**Notes for developers:**
Function gf_xxh64_wrapper has been added as a wrapper into libglusterfs for
@@ -206,6 +223,7 @@ consumption by interested developers.
Reference to code can be found [here](https://github.com/gluster/glusterfs/blob/v3.12.0alpha1/libglusterfs/src/common-utils.h#L835)
### Developer related: glfs_ipc API in libgfapi is removed as a public interface
**Notes for users:**
glfs_ipc API was maintained as a public API in the GFAPI libraries. This has
@@ -219,14 +237,15 @@ this change.
API
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1465123) has a fix with this
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1465123
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1465123) has a fix with this
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1465123
## Bugs addressed
@@ -243,13 +262,13 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1400924](https://bugzilla.redhat.com/1400924): [RFE] Rsync flags for performance improvements
- [#1402406](https://bugzilla.redhat.com/1402406): Client stale file handle error in dht-linkfile.c under SPEC SFS 2014 VDA workload
- [#1414242](https://bugzilla.redhat.com/1414242): [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest
- [#1421938](https://bugzilla.redhat.com/1421938): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1421938](https://bugzilla.redhat.com/1421938): systemic testing: seeing lot of ping time outs which would lead to splitbrains
- [#1424817](https://bugzilla.redhat.com/1424817): Fix wrong operators, found by coverty
- [#1428061](https://bugzilla.redhat.com/1428061): Halo Replication feature for AFR translator
- [#1428673](https://bugzilla.redhat.com/1428673): possible repeatedly recursive healing of same file with background heal not happening when IO is going on
- [#1428673](https://bugzilla.redhat.com/1428673): possible repeatedly recursive healing of same file with background heal not happening when IO is going on
- [#1430608](https://bugzilla.redhat.com/1430608): [RFE] Pass slave volume in geo-rep as read-only
- [#1431908](https://bugzilla.redhat.com/1431908): Enabling parallel-readdir causes dht linkto files to be visible on the mount,
- [#1433906](https://bugzilla.redhat.com/1433906): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1433906](https://bugzilla.redhat.com/1433906): quota: limit-usage command failed with error " Failed to start aux mount"
- [#1437748](https://bugzilla.redhat.com/1437748): Spacing issue in fix-layout status output
- [#1438966](https://bugzilla.redhat.com/1438966): Multiple bricks WILL crash after TCP port probing
- [#1439068](https://bugzilla.redhat.com/1439068): Segmentation fault when creating a qcow2 with qemu-img
@@ -270,7 +289,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1447826](https://bugzilla.redhat.com/1447826): potential endless loop in function glusterfs_graph_validate_options
- [#1447828](https://bugzilla.redhat.com/1447828): Should use dict_set_uint64 to set fd->pid when dump fd's info to dict
- [#1447953](https://bugzilla.redhat.com/1447953): Remove inadvertently merged IPv6 code
- [#1447960](https://bugzilla.redhat.com/1447960): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1447960](https://bugzilla.redhat.com/1447960): [Tiering]: High and low watermark values when set to the same level, is allowed
- [#1447966](https://bugzilla.redhat.com/1447966): 'make cscope' fails on a clean tree due to missing generated XDR files
- [#1448150](https://bugzilla.redhat.com/1448150): USS: stale snap entries are seen when activation/deactivation performed during one of the glusterd's unavailability
- [#1448265](https://bugzilla.redhat.com/1448265): use common function iov_length to instead of duplicate code
@@ -286,7 +305,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1449329](https://bugzilla.redhat.com/1449329): When either killing or restarting a brick with performance.stat-prefetch on, stat sometimes returns a bad st_size value.
- [#1449348](https://bugzilla.redhat.com/1449348): disperse seek does not correctly handle the end of file
- [#1449495](https://bugzilla.redhat.com/1449495): glfsheal: crashed(segfault) with disperse volume in RDMA
- [#1449610](https://bugzilla.redhat.com/1449610): [New] - Replacing an arbiter brick while I/O happens causes vm pause
- [#1449610](https://bugzilla.redhat.com/1449610): [New] - Replacing an arbiter brick while I/O happens causes vm pause
- [#1450010](https://bugzilla.redhat.com/1450010): [gluster-block]:Need a volume group profile option for gluster-block volume to add necessary options to be added.
- [#1450559](https://bugzilla.redhat.com/1450559): Error 0-socket.management: socket_poller XX.XX.XX.XX:YYY failed (Input/output error) during any volume operation
- [#1450630](https://bugzilla.redhat.com/1450630): [brick multiplexing] detach a brick if posix health check thread complaints about underlying brick
@@ -299,7 +318,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1451724](https://bugzilla.redhat.com/1451724): glusterfind pre crashes with "UnicodeDecodeError: 'utf8' codec can't decode" error when the `--no-encode` is used
- [#1452006](https://bugzilla.redhat.com/1452006): tierd listens to a port.
- [#1452084](https://bugzilla.redhat.com/1452084): [Ganesha] : Stale linkto files after unsuccessfuly hardlinks
- [#1452102](https://bugzilla.redhat.com/1452102): [DHt] : segfault in dht_selfheal_dir_setattr while running regressions
- [#1452102](https://bugzilla.redhat.com/1452102): [DHt] : segfault in dht_selfheal_dir_setattr while running regressions
- [#1452378](https://bugzilla.redhat.com/1452378): Cleanup unnecessary logs in fix_quorum_options
- [#1452527](https://bugzilla.redhat.com/1452527): Shared volume doesn't get mounted on few nodes after rebooting all nodes in cluster.
- [#1452956](https://bugzilla.redhat.com/1452956): glusterd on a node crashed after running volume profile command
@@ -307,9 +326,9 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1453977](https://bugzilla.redhat.com/1453977): Brick Multiplexing: Deleting brick directories of the base volume must gracefully detach from glusterfsd without impacting other volumes IO(currently seeing transport end point error)
- [#1454317](https://bugzilla.redhat.com/1454317): [Bitrot]: Brick process crash observed while trying to recover a bad file in disperse volume
- [#1454375](https://bugzilla.redhat.com/1454375): ignore incorrect uuid validation in gd_validate_mgmt_hndsk_req
- [#1454418](https://bugzilla.redhat.com/1454418): Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe
- [#1454418](https://bugzilla.redhat.com/1454418): Glusterd segmentation fault in ' \_Unwind_Backtrace' while running peer probe
- [#1454701](https://bugzilla.redhat.com/1454701): DHT: Pass errno as an argument to gf_msg
- [#1454865](https://bugzilla.redhat.com/1454865): [Brick Multiplexing] heal info shows the status of the bricks as "Transport endpoint is not connected" though bricks are up
- [#1454865](https://bugzilla.redhat.com/1454865): [Brick Multiplexing] heal info shows the status of the bricks as "Transport endpoint is not connected" though bricks are up
- [#1454872](https://bugzilla.redhat.com/1454872): [Geo-rep]: Make changelog batch size configurable
- [#1455049](https://bugzilla.redhat.com/1455049): [GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file
- [#1455104](https://bugzilla.redhat.com/1455104): dht: dht self heal fails with no hashed subvol error
@@ -317,8 +336,8 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1455301](https://bugzilla.redhat.com/1455301): gluster-block is not working as expected when shard is enabled
- [#1455559](https://bugzilla.redhat.com/1455559): [Geo-rep]: METADATA errors are seen even though everything is in sync
- [#1455831](https://bugzilla.redhat.com/1455831): libglusterfs: updates old comment for 'arena_size'
- [#1456361](https://bugzilla.redhat.com/1456361): DHT : for many operation directory/file path is '(null)' in brick log
- [#1456385](https://bugzilla.redhat.com/1456385): glusterfs client crash on io-cache.so(__ioc_page_wakeup+0x44)
- [#1456361](https://bugzilla.redhat.com/1456361): DHT : for many operation directory/file path is '(null)' in brick log
- [#1456385](https://bugzilla.redhat.com/1456385): glusterfs client crash on io-cache.so(\_\_ioc_page_wakeup+0x44)
- [#1456405](https://bugzilla.redhat.com/1456405): Brick Multiplexing:dmesg shows request_sock_TCP: Possible SYN flooding on port 49152 and memory related backtraces
- [#1456582](https://bugzilla.redhat.com/1456582): "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf
- [#1456653](https://bugzilla.redhat.com/1456653): nlc_lookup_cbk floods logs
@@ -333,7 +352,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1458197](https://bugzilla.redhat.com/1458197): io-stats usability/performance statistics enhancements
- [#1458539](https://bugzilla.redhat.com/1458539): [Negative Lookup]: negative lookup features doesn't seem to work on restart of volume
- [#1458582](https://bugzilla.redhat.com/1458582): add all as volume option in gluster volume get usage
- [#1458768](https://bugzilla.redhat.com/1458768): [Perf] 35% drop in small file creates on smbv3 on *2
- [#1458768](https://bugzilla.redhat.com/1458768): [Perf] 35% drop in small file creates on smbv3 on \*2
- [#1459402](https://bugzilla.redhat.com/1459402): brick process crashes while running bug-1432542-mpx-restart-crash.t in a loop
- [#1459530](https://bugzilla.redhat.com/1459530): [RFE] Need a way to resolve gfid split brains
- [#1459620](https://bugzilla.redhat.com/1459620): [geo-rep]: Worker crashed with TypeError: expected string or buffer
@@ -349,17 +368,17 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1461655](https://bugzilla.redhat.com/1461655): glusterd crashes when statedump is taken
- [#1461792](https://bugzilla.redhat.com/1461792): lk fop succeeds even when lock is not acquired on at least quorum number of bricks
- [#1461845](https://bugzilla.redhat.com/1461845): [Bitrot]: Inconsistency seen with 'scrub ondemand' - fails to trigger scrub
- [#1462200](https://bugzilla.redhat.com/1462200): glusterd status showing failed when it's stopped in RHEL7
- [#1462200](https://bugzilla.redhat.com/1462200): glusterd status showing failed when it's stopped in RHEL7
- [#1462241](https://bugzilla.redhat.com/1462241): glusterfind: syntax error due to uninitialized variable 'end'
- [#1462790](https://bugzilla.redhat.com/1462790): with AFR now making both nodes to return UUID for a file will result in georep consuming more resources
- [#1463178](https://bugzilla.redhat.com/1463178): [Ganesha]Bricks got crashed while running posix compliance test suit on V4 mount
- [#1463178](https://bugzilla.redhat.com/1463178): [Ganesha]Bricks got crashed while running posix compliance test suit on V4 mount
- [#1463365](https://bugzilla.redhat.com/1463365): Changes for Maintainers 2.0
- [#1463648](https://bugzilla.redhat.com/1463648): Use GF_XATTR_LIST_NODE_UUIDS_KEY to figure out local subvols
- [#1464072](https://bugzilla.redhat.com/1464072): cns-brick-multiplexing: brick process fails to restart after gluster pod failure
- [#1464091](https://bugzilla.redhat.com/1464091): Regression: Heal info takes longer time when a brick is down
- [#1464110](https://bugzilla.redhat.com/1464110): [Scale] : Rebalance ETA (towards the end) may be inaccurate,even on a moderately large data set.
- [#1464327](https://bugzilla.redhat.com/1464327): glusterfs client crashes when reading large directory
- [#1464359](https://bugzilla.redhat.com/1464359): selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
- [#1464359](https://bugzilla.redhat.com/1464359): selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
- [#1465024](https://bugzilla.redhat.com/1465024): glusterfind: DELETE path needs to be unquoted before further processing
- [#1465075](https://bugzilla.redhat.com/1465075): Fd based fops fail with EBADF on file migration
- [#1465214](https://bugzilla.redhat.com/1465214): build failed with GF_DISABLE_MEMPOOL
@@ -424,7 +443,7 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1479717](https://bugzilla.redhat.com/1479717): Running sysbench on vm disk from plain distribute gluster volume causes disk corruption
- [#1480448](https://bugzilla.redhat.com/1480448): More useful error - replace 'not optimal'
- [#1480459](https://bugzilla.redhat.com/1480459): Gluster puts PID files in wrong place
- [#1481931](https://bugzilla.redhat.com/1481931): [Scale] : I/O errors on multiple gNFS mounts with "Stale file handle" during rebalance of an erasure coded volume.
- [#1481931](https://bugzilla.redhat.com/1481931): [Scale] : I/O errors on multiple gNFS mounts with "Stale file handle" during rebalance of an erasure coded volume.
- [#1482804](https://bugzilla.redhat.com/1482804): Negative Test: glusterd crashes for some of the volume options if set at cluster level
- [#1482835](https://bugzilla.redhat.com/1482835): glusterd fails to start
- [#1483402](https://bugzilla.redhat.com/1483402): DHT: readdirp fails to read some directories.
@@ -432,6 +451,6 @@ Bugs addressed since release-3.11.0 are listed below.
- [#1484440](https://bugzilla.redhat.com/1484440): packaging: /run and /var/run; prefer /run
- [#1484885](https://bugzilla.redhat.com/1484885): [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
- [#1486107](https://bugzilla.redhat.com/1486107): /var/lib/glusterd/peers File had a blank line, Stopped Glusterd from starting
- [#1486110](https://bugzilla.redhat.com/1486110): [quorum]: Replace brick is happened when Quorum not met.
- [#1486110](https://bugzilla.redhat.com/1486110): [quorum]: Replace brick is happened when Quorum not met.
- [#1486120](https://bugzilla.redhat.com/1486120): symlinks trigger faulty geo-replication state (rsnapshot usecase)
- [#1486122](https://bugzilla.redhat.com/1486122): gluster-block profile needs to have strict-o-direct

View File

@@ -1,20 +1,23 @@
# Release notes for Gluster 3.12.1
This is a bugfix release. The [Release Notes for 3.12.0](3.12.0.md),
[3.12.1](3.12.1.md) contain a listing of all the new features that
were added and bugs fixed in the GlusterFS 3.12 stable release.
[3.12.1](3.12.1.md) contain a listing of all the new features that
were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
No Major changes
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1465123) has a fix with this
release. As further testing is still in progress, the issue is retained as
a major issue.
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1465123
## Bugs addressed
@@ -24,7 +27,7 @@ This is a bugfix release. The [Release Notes for 3.12.0](3.12.0.md),
- [#1486538](https://bugzilla.redhat.com/1486538): [geo-rep+qr]: Crashes observed at slave from qr_lookup_sbk during rename/hardlink/rebalance
- [#1486557](https://bugzilla.redhat.com/1486557): Log entry of files skipped/failed during rebalance operation
- [#1487033](https://bugzilla.redhat.com/1487033): rpc: client_t and related objects leaked due to incorrect ref counts
- [#1487319](https://bugzilla.redhat.com/1487319): afr: check op_ret value in __afr_selfheal_name_impunge
- [#1487319](https://bugzilla.redhat.com/1487319): afr: check op_ret value in \_\_afr_selfheal_name_impunge
- [#1488119](https://bugzilla.redhat.com/1488119): scripts: mount.glusterfs contains non-portable bashisms
- [#1488168](https://bugzilla.redhat.com/1488168): Launch metadata heal in discover code path.
- [#1488387](https://bugzilla.redhat.com/1488387): gluster-blockd process crashed and core generated

View File

@@ -16,11 +16,12 @@ features that were added and bugs fixed in the GlusterFS 3.12 stable release.
Bugs addressed since release-3.12.9 are listed below
.
- [#1570475](https://bugzilla.redhat.com/1570475): Rebalance on few nodes doesn't seem to complete - stuck at FUTEX_WAIT
- [#1576816](https://bugzilla.redhat.com/1576816): GlusterFS can be improved
- [#1577164](https://bugzilla.redhat.com/1577164): gfapi: broken symbol versions
- [#1577845](https://bugzilla.redhat.com/1577845): Geo-rep: faulty session due to OSError: [Errno 95] Operation not supported
- [#1577862](https://bugzilla.redhat.com/1577862): [geo-rep]: Upgrade fails, session in FAULTY state
- [#1577862](https://bugzilla.redhat.com/1577862): [geo-rep]: Upgrade fails, session in FAULTY state
- [#1577868](https://bugzilla.redhat.com/1577868): Glusterd crashed on a few (master) nodes
- [#1577871](https://bugzilla.redhat.com/1577871): [geo-rep]: Geo-rep scheduler fails
- [#1580519](https://bugzilla.redhat.com/1580519): the regression test "tests/bugs/posix/bug-990028.t" fails

View File

@@ -8,6 +8,7 @@ GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
This release contains a fix for a security vulerability in Gluster as follows,
- http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-10841
- https://nvd.nist.gov/vuln/detail/CVE-2018-10841

View File

@@ -16,6 +16,7 @@ all the new features that were added and bugs fixed in the GlusterFS 3.12 stable
## Bugs addressed
Bugs addressed since release-3.12.12 are listed below
- [#1579673](https://bugzilla.redhat.com/1579673): Remove EIO from the dht_inode_missing macro
- [#1595528](https://bugzilla.redhat.com/1595528): rmdir is leaking softlinks to directories in .glusterfs
- [#1597120](https://bugzilla.redhat.com/1597120): Add quorum checks in post-op

View File

@@ -16,9 +16,9 @@ contain a listing of all the new features that were added and bugs fixed in the
## Bugs addressed
Bugs addressed in release-3.12.13 are listed below
- [#1599788](https://bugzilla.redhat.com/1599788): _is_prefix should return false for 0-length strings
- [#1599788](https://bugzilla.redhat.com/1599788): \_is_prefix should return false for 0-length strings
- [#1603093](https://bugzilla.redhat.com/1603093): directories are invisible on client side
- [#1613512](https://bugzilla.redhat.com/1613512): Backport glusterfs-client memory leak fix to 3.12.x
- [#1618838](https://bugzilla.redhat.com/1618838): gluster bash completion leaks TOP=0 into the environment
- [#1618348](https://bugzilla.redhat.com/1618348): [Ganesha] Ganesha crashed in mdcache_alloc_and_check_handle while running bonnie and untars with parallel lookups

View File

@@ -7,7 +7,9 @@ and [3.12.13](3.12.13.md) contain a listing of all the new features that were ad
the GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
1. This release contains fix for following security vulnerabilities,
- https://nvd.nist.gov/vuln/detail/CVE-2018-10904
- https://nvd.nist.gov/vuln/detail/CVE-2018-10907
- https://nvd.nist.gov/vuln/detail/CVE-2018-10911
@@ -21,10 +23,11 @@ the GlusterFS 3.12 stable release.
- https://nvd.nist.gov/vuln/detail/CVE-2018-10930
2. To resolve the security vulnerabilities following limitations were made in GlusterFS
- open,read,write on special files like char and block are no longer permitted
- io-stat xlator can dump stat info only to /var/run/gluster directory
3. Addressed an issue that affected copying a file over SSL/TLS in a volume
- open,read,write on special files like char and block are no longer permitted
- io-stat xlator can dump stat info only to /var/run/gluster directory
3. Addressed an issue that affected copying a file over SSL/TLS in a volume
Installing the updated packages and restarting gluster services on gluster
brick hosts, will fix the security issues.
@@ -38,7 +41,7 @@ brick hosts, will fix the security issues.
Bugs addressed since release-3.12.14 are listed below.
- [#1622405](https://bugzilla.redhat.com/1622405): Problem with SSL/TLS encryption on Gluster 4.0 & 4.1
- [#1625286](https://bugzilla.redhat.com/1625286): Information Exposure in posix_get_file_contents function in posix-helpers.c
- [#1625286](https://bugzilla.redhat.com/1625286): Information Exposure in posix_get_file_contents function in posix-helpers.c
- [#1625648](https://bugzilla.redhat.com/1625648): I/O to arbitrary devices on storage server
- [#1625654](https://bugzilla.redhat.com/1625654): Stack-based buffer overflow in server-rpc-fops.c allows remote attackers to execute arbitrary code
- [#1625656](https://bugzilla.redhat.com/1625656): Improper deserialization in dict.c:dict_unserialize() can allow attackers to read arbitrary memory

View File

@@ -5,6 +5,7 @@ This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.
fixed in the GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
1.) In a pure distribute volume there is no source to heal the replaced brick
from and hence would cause a loss of data that was present in the replaced brick.
The CLI has been enhanced to prevent a user from inadvertently using replace brick
@@ -12,31 +13,32 @@ fixed in the GlusterFS 3.12 stable release.
an existing brick in a pure distribute volume.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption #1465123 is still pending, and not yet
part of this release.
part of this release.
2. Gluster volume restarts fail if the sub directory export feature is in use.
Status of this issue can be tracked here, #1501315
2. Gluster volume restarts fail if the sub directory export feature is in use.
Status of this issue can be tracked here, #1501315
3. Mounting a gluster snapshot will fail, when attempting a FUSE based mount of
the snapshot. So for the current users, it is recommend to only access snapshot
via ".snaps" directory on a mounted gluster volume.
Status of this issue can be tracked here, #1501378
3. Mounting a gluster snapshot will fail, when attempting a FUSE based mount of
the snapshot. So for the current users, it is recommend to only access snapshot
via ".snaps" directory on a mounted gluster volume.
Status of this issue can be tracked here, #1501378
## Bugs addressed
A total of 31 patches have been merged, addressing 28 bugs
- [#1490493](https://bugzilla.redhat.com/1490493): Sub-directory mount details are incorrect in /proc/mounts
- [#1491178](https://bugzilla.redhat.com/1491178): GlusterD returns a bad memory pointer in glusterd_get_args_from_dict()
- [#1491292](https://bugzilla.redhat.com/1491292): Provide brick list as part of VOLUME_CREATE event.
- [#1491690](https://bugzilla.redhat.com/1491690): rpc: TLSv1_2_method() is deprecated in OpenSSL-1.1
- [#1492026](https://bugzilla.redhat.com/1492026): set the shard-block-size to 64MB in virt profile
- [#1492026](https://bugzilla.redhat.com/1492026): set the shard-block-size to 64MB in virt profile
- [#1492061](https://bugzilla.redhat.com/1492061): CLIENT_CONNECT event not being raised
- [#1492066](https://bugzilla.redhat.com/1492066): AFR_SUBVOL_UP and AFR_SUBVOLS_DOWN events not working
- [#1493975](https://bugzilla.redhat.com/1493975): disallow replace brick operation on plain distribute volume

View File

@@ -5,22 +5,22 @@ This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.
were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
1. The two regression related to with subdir mount got fixed
- gluster volume restart failure (#1465123)
- mounting gluster snapshot via fuse (#1501378)
1. The two regression related to with subdir mount got fixed - gluster volume restart failure (#1465123) - mounting gluster snapshot via fuse (#1501378)
2. Improvements for "help" command with in gluster cli (#1509786)
3. Introduction of new api glfs_fd_set_lkowner() to set lock owner
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption #1465123 is still pending, and not yet
part of this release.
part of this release.
## Bugs addressed

View File

@@ -5,19 +5,21 @@ This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.
the new features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption #1465123 is still pending, and not yet
part of this release.
part of this release.
## Bugs addressed
A total of 13 patches have been merged, addressing 12 bugs
- [#1478411](https://bugzilla.redhat.com/1478411): Directory listings on fuse mount are very slow due to small number of getdents() entries
- [#1511782](https://bugzilla.redhat.com/1511782): In Replica volume 2*2 when quorum is set, after glusterd restart nfs server is coming up instead of self-heal daemon
- [#1511782](https://bugzilla.redhat.com/1511782): In Replica volume 2\*2 when quorum is set, after glusterd restart nfs server is coming up instead of self-heal daemon
- [#1512432](https://bugzilla.redhat.com/1512432): Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently
- [#1513258](https://bugzilla.redhat.com/1513258): NetBSD port
- [#1514380](https://bugzilla.redhat.com/1514380): default timeout of 5min not honored for analyzing split-brain files post setfattr replica.split-brain-heal-finalize

View File

@@ -4,16 +4,19 @@ This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.
[3.12.2](3.12.2.md), [3.12.3](3.12.3.md), [3.12.4](3.12.4.md), [3.12.5](3.12.5.md) contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption #1465123 is still pending, and not yet
part of this release.
part of this release.
## Bugs addressed
A total of 12 patches have been merged, addressing 11 bugs
- [#1489043](https://bugzilla.redhat.com/1489043): The number of bytes of the quota specified in version 3.7 or later is incorrect
- [#1511301](https://bugzilla.redhat.com/1511301): In distribute volume after glusterd restart, brick goes offline
- [#1525850](https://bugzilla.redhat.com/1525850): rdma transport may access an obsolete item in gf_rdma_device_t->all_mr, and causes glusterfsd/glusterfs process crash.

View File

@@ -3,29 +3,32 @@
This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.12.1.md), [3.12.2](3.12.2.md), [3.12.3](3.12.3.md), [3.12.4](3.12.4.md), [3.12.5](3.12.5.md), [3.12.5](3.12.6.md) contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption #1465123 is still pending, and not yet
part of this release.
part of this release.
## Bugs addressed
A total of 16 patches have been merged, addressing 16 bugs
- [#1510342](https://bugzilla.redhat.com/1510342): Not all files synced using geo-replication
- [#1533269](https://bugzilla.redhat.com/1533269): Random GlusterFSD process dies during rebalance
- [#1534847](https://bugzilla.redhat.com/1534847): entries not getting cleared post healing of softlinks (stale entries showing up in heal info)
- [#1536334](https://bugzilla.redhat.com/1536334): [Disperse] Implement open fd heal for disperse volume
- [#1537346](https://bugzilla.redhat.com/1537346): glustershd/glusterd is not using right port when connecting to glusterfsd process
- [#1539516](https://bugzilla.redhat.com/1539516): DHT log messages: Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0
- [#1540224](https://bugzilla.redhat.com/1540224): dht_(f)xattrop does not implement migration checks
- [#1540224](https://bugzilla.redhat.com/1540224): dht\_(f)xattrop does not implement migration checks
- [#1541267](https://bugzilla.redhat.com/1541267): dht_layout_t leak in dht_populate_inode_for_dentry
- [#1541930](https://bugzilla.redhat.com/1541930): A down brick is incorrectly considered to be online and makes the volume to be started without any brick available
- [#1542054](https://bugzilla.redhat.com/1542054): tests/bugs/cli/bug-1169302.t fails spuriously
- [#1542475](https://bugzilla.redhat.com/1542475): Random failures in tests/bugs/nfs/bug-974972.t
- [#1542601](https://bugzilla.redhat.com/1542601): The used space in the volume increases when the volume is expanded
- [#1542615](https://bugzilla.redhat.com/1542615): tests/bugs/core/multiplex-limit-issue-151.t fails sometimes in upstream master
- [#1542615](https://bugzilla.redhat.com/1542615): tests/bugs/core/multiplex-limit-issue-151.t fails sometimes in upstream master
- [#1542826](https://bugzilla.redhat.com/1542826): Mark tests/bugs/posix/bug-990028.t bad on release-3.12
- [#1542934](https://bugzilla.redhat.com/1542934): Seeing timer errors in the rebalance logs
- [#1543016](https://bugzilla.redhat.com/1543016): dht_lookup_unlink_of_false_linkto_cbk fails with "Permission denied"

View File

@@ -1,17 +1,19 @@
# Release notes for Gluster 3.12.7
This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.12.1.md), [3.12.2](3.12.2.md), [3.12.3](3.12.3.md), [3.12.4](3.12.4.md), [3.12.5](3.12.5.md), [3.12.6](3.12.6.md) contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Bugs addressed
## Major issues
1. Consider a case in which one of the nodes goes down in gluster cluster with brick multiplexing enabled, if volume operations are performed then post when the node comes back, brick processes will fail to come up. The issue is tracked in #1543708 and will be fixed by next release.
A total of 8 patches have been merged, addressing 8 bugs
A total of 8 patches have been merged, addressing 8 bugs
- [#1517260](https://bugzilla.redhat.com/1517260): Volume wrong size
- [#1543709](https://bugzilla.redhat.com/1543709): Optimize glusterd_import_friend_volume code path
- [#1544635](https://bugzilla.redhat.com/1544635): Though files are in split-brain able to perform writes to the file
- [#1547841](https://bugzilla.redhat.com/1547841): Typo error in __dht_check_free_space function log message
- [#1547841](https://bugzilla.redhat.com/1547841): Typo error in \_\_dht_check_free_space function log message
- [#1548078](https://bugzilla.redhat.com/1548078): [Rebalance] "Migrate file failed: <filepath>: failed to get xattr [No data available]" warnings in rebalance logs
- [#1548270](https://bugzilla.redhat.com/1548270): DHT calls dht_lookup_everywhere for 1xn volumes
- [#1549505](https://bugzilla.redhat.com/1549505): Backport patch to reduce duplicate code in server-rpc-fops.c

View File

@@ -1,9 +1,11 @@
# Release notes for Gluster 3.12.8
This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.12.1.md), [3.12.2](3.12.2.md), [3.12.3](3.12.3.md), [3.12.4](3.12.4.md), [3.12.5](3.12.5.md), [3.12.6](3.12.6.md), [3.12.7](3.12.7.md) contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Bugs addressed
A total of 9 patches have been merged, addressing 9 bugs
- [#1543708](https://bugzilla.redhat.com/1543708): glusterd fails to attach brick during restart of the node
- [#1546627](https://bugzilla.redhat.com/1546627): Syntactical errors in hook scripts for managing SELinux context on bricks
- [#1549473](https://bugzilla.redhat.com/1549473): possible memleak in glusterfsd process with brick multiplexing on
@@ -12,4 +14,4 @@ This is a bugfix release. The release notes for [3.12.0](3.12.0.md), [3.12.1](3.
- [#1558352](https://bugzilla.redhat.com/1558352): [EC] Read performance of EC volume exported over gNFS is significantly lower than write performance
- [#1561731](https://bugzilla.redhat.com/1561731): Rebalance failures on a dispersed volume with lookup-optimize enabled
- [#1562723](https://bugzilla.redhat.com/1562723): SHD is not healing entries in halo replication
- [#1565590](https://bugzilla.redhat.com/1565590): timer: Possible race condition between gf_timer_* routines
- [#1565590](https://bugzilla.redhat.com/1565590): timer: Possible race condition between gf*timer*\* routines

View File

@@ -7,6 +7,7 @@ features that were added and bugs fixed in the GlusterFS 3.12 stable release.
## Major changes, features and limitations addressed in this release
This release contains a fix for a security vulerability in Gluster as follows,
- http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-1088
- https://nvd.nist.gov/vuln/detail/CVE-2018-1088

View File

@@ -15,11 +15,13 @@ The Gluster heal info CLI now has a 'summary' option displaying the statistics
of entries pending heal, in split-brain and currently being healed, per brick.
Usage:
```
# gluster volume heal <volname> info summary
```
Sample output:
```
Brick <brickname>
Status: Connected
@@ -68,7 +70,7 @@ before, even when only 1 brick is online.
Further reference: [mailing list discussions on topic](http://lists.gluster.org/pipermail/gluster-users/2017-September/032524.html)
### Support for max-port range in glusterd.vol
### Support for max-port range in glusterd.vol
**Notes for users:**
@@ -102,6 +104,7 @@ endpoint (called gfproxy) on the gluster server nodes, thus thinning the client
stack.
Usage:
```
# gluster volume set <volname> config.gfproxyd enable
```
@@ -110,6 +113,7 @@ The above enables the gfproxy protocol service on the server nodes. To mount a
client that interacts with this end point, use the --thin-client mount option.
Example:
```
# glusterfs --thin-client --volfile-id=<volname> --volfile-server=<host> <mountpoint>
```
@@ -134,6 +138,7 @@ feature is disabled. The option takes a numeric percentage value, that reserves
up to that percentage of disk space.
Usage:
```
# gluster volume set <volname> storage.reserve <number>
```
@@ -146,6 +151,7 @@ Gluster CLI is enhanced with an option to list all connected clients to a volume
volume.
Usage:
```
# gluster volume status <volname/all> client-list
```
@@ -165,6 +171,7 @@ This feature is enabled by default, and can be toggled using the boolean option,
This feature enables users to punch hole in files created on disperse volumes.
Usage:
```
# fallocate -p -o <offset> -l <len> <file_name>
```
@@ -186,7 +193,6 @@ There are currently no statistics included in the `statedump` about the actual
behavior of the memory pools. This means that the efficiency of the memory
pools can not be verified.
### Gluster APIs added to register callback functions for upcalls
**Notes for developers:**
@@ -201,8 +207,8 @@ int glfs_upcall_register (struct glfs *fs, uint32_t event_list,
glfs_upcall_cbk cbk, void *data);
int glfs_upcall_unregister (struct glfs *fs, uint32_t event_list);
```
libgfapi [header](https://github.com/gluster/glusterfs/blob/release-3.13/api/src/glfs.h#L970) files include the complete synopsis about these APIs definition and their usage.
libgfapi [header](https://github.com/gluster/glusterfs/blob/release-3.13/api/src/glfs.h#L970) files include the complete synopsis about these APIs definition and their usage.
**Limitations:**
An application can register only a single callback function for all the upcall
@@ -237,13 +243,15 @@ responses and enable better qualification of the translator stacks.
For usage refer to this [test case](https://github.com/gluster/glusterfs/blob/v3.13.0rc0/tests/features/delay-gen.t).
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1515434) has a fix with this
release. As further testing is still in progress, the issue is retained as
a major issue.
release. As further testing is still in progress, the issue is retained as
a major issue.
- Status of this bug can be tracked here, #1515434
## Bugs addressed
@@ -252,13 +260,13 @@ Bugs addressed since release-3.12.0 are listed below.
- [#1248393](https://bugzilla.redhat.com/1248393): DHT: readdirp fails to read some directories.
- [#1258561](https://bugzilla.redhat.com/1258561): Gluster puts PID files in wrong place
- [#1261463](https://bugzilla.redhat.com/1261463): AFR : [RFE] Improvements needed in "gluster volume heal info" commands
- [#1261463](https://bugzilla.redhat.com/1261463): AFR : [RFE] Improvements needed in "gluster volume heal info" commands
- [#1294051](https://bugzilla.redhat.com/1294051): Though files are in split-brain able to perform writes to the file
- [#1328994](https://bugzilla.redhat.com/1328994): When a feature fails needing a higher opversion, the message should state what version it needs.
- [#1335251](https://bugzilla.redhat.com/1335251): mgmt/glusterd: clang compile warnings in glusterd-snapshot.c
- [#1350406](https://bugzilla.redhat.com/1350406): [storage/posix] - posix_do_futimes function not implemented
- [#1365683](https://bugzilla.redhat.com/1365683): Fix crash bug when mnt3_resolve_subdir_cbk fails
- [#1371806](https://bugzilla.redhat.com/1371806): DHT :- inconsistent 'custom extended attributes',uid and gid, Access permission (for directories) if User set/modifies it after bringing one or more sub-volume down
- [#1371806](https://bugzilla.redhat.com/1371806): DHT :- inconsistent 'custom extended attributes',uid and gid, Access permission (for directories) if User set/modifies it after bringing one or more sub-volume down
- [#1376326](https://bugzilla.redhat.com/1376326): separating attach tier and add brick
- [#1388509](https://bugzilla.redhat.com/1388509): gluster volume heal info "healed" and "heal-failed" showing wrong information
- [#1395492](https://bugzilla.redhat.com/1395492): trace/error-gen be turned on together while use 'volume set' command to set one of them
@@ -314,14 +322,14 @@ Bugs addressed since release-3.12.0 are listed below.
- [#1480099](https://bugzilla.redhat.com/1480099): More useful error - replace 'not optimal'
- [#1480445](https://bugzilla.redhat.com/1480445): Log entry of files skipped/failed during rebalance operation
- [#1480525](https://bugzilla.redhat.com/1480525): Make choose-local configurable through `volume-set` command
- [#1480591](https://bugzilla.redhat.com/1480591): [Scale] : I/O errors on multiple gNFS mounts with "Stale file handle" during rebalance of an erasure coded volume.
- [#1480591](https://bugzilla.redhat.com/1480591): [Scale] : I/O errors on multiple gNFS mounts with "Stale file handle" during rebalance of an erasure coded volume.
- [#1481199](https://bugzilla.redhat.com/1481199): mempool: run-time crash when built with --disable-mempool
- [#1481600](https://bugzilla.redhat.com/1481600): rpc: client_t and related objects leaked due to incorrect ref counts
- [#1482023](https://bugzilla.redhat.com/1482023): snpashots issues with other processes accessing the mounted brick snapshots
- [#1482344](https://bugzilla.redhat.com/1482344): Negative Test: glusterd crashes for some of the volume options if set at cluster level
- [#1482906](https://bugzilla.redhat.com/1482906): /var/lib/glusterd/peers File had a blank line, Stopped Glusterd from starting
- [#1482923](https://bugzilla.redhat.com/1482923): afr: check op_ret value in __afr_selfheal_name_impunge
- [#1483058](https://bugzilla.redhat.com/1483058): [quorum]: Replace brick is happened when Quorum not met.
- [#1482923](https://bugzilla.redhat.com/1482923): afr: check op_ret value in \_\_afr_selfheal_name_impunge
- [#1483058](https://bugzilla.redhat.com/1483058): [quorum]: Replace brick is happened when Quorum not met.
- [#1483995](https://bugzilla.redhat.com/1483995): packaging: use rdma-core(-devel) instead of ibverbs, rdmacm; disable rdma on armv7hl
- [#1484215](https://bugzilla.redhat.com/1484215): Add Deepshika has CI Peer
- [#1484225](https://bugzilla.redhat.com/1484225): [rpc]: EPOLLERR - disconnecting now messages every 3 secs after completing rebalance
@@ -344,7 +352,7 @@ Bugs addressed since release-3.12.0 are listed below.
- [#1488909](https://bugzilla.redhat.com/1488909): Fix the type of 'len' in posix.c, clang is showing a warning
- [#1488913](https://bugzilla.redhat.com/1488913): Sub-directory mount details are incorrect in /proc/mounts
- [#1489432](https://bugzilla.redhat.com/1489432): disallow replace brick operation on plain distribute volume
- [#1489823](https://bugzilla.redhat.com/1489823): set the shard-block-size to 64MB in virt profile
- [#1489823](https://bugzilla.redhat.com/1489823): set the shard-block-size to 64MB in virt profile
- [#1490642](https://bugzilla.redhat.com/1490642): glusterfs client crash when removing directories
- [#1490897](https://bugzilla.redhat.com/1490897): GlusterD returns a bad memory pointer in glusterd_get_args_from_dict()
- [#1491025](https://bugzilla.redhat.com/1491025): rpc: TLSv1_2_method() is deprecated in OpenSSL-1.1
@@ -408,13 +416,13 @@ Bugs addressed since release-3.12.0 are listed below.
- [#1510022](https://bugzilla.redhat.com/1510022): Revert experimental and 4.0 features to prepare for 3.13 release
- [#1511274](https://bugzilla.redhat.com/1511274): Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait reappears) when still in progress
- [#1511293](https://bugzilla.redhat.com/1511293): In distribute volume after glusterd restart, brick goes offline
- [#1511768](https://bugzilla.redhat.com/1511768): In Replica volume 2*2 when quorum is set, after glusterd restart nfs server is coming up instead of self-heal daemon
- [#1511768](https://bugzilla.redhat.com/1511768): In Replica volume 2\*2 when quorum is set, after glusterd restart nfs server is coming up instead of self-heal daemon
- [#1512435](https://bugzilla.redhat.com/1512435): Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently
- [#1512460](https://bugzilla.redhat.com/1512460): disperse eager-lock degrades performance for file create workloads
- [#1513259](https://bugzilla.redhat.com/1513259): NetBSD port
- [#1514419](https://bugzilla.redhat.com/1514419): gluster volume splitbrain info needs to display output of each brick in a stream fashion instead of buffering and dumping at the end
- [#1515045](https://bugzilla.redhat.com/1515045): bug-1247563.t is failing on master
- [#1515572](https://bugzilla.redhat.com/1515572): Accessing a file when source brick is down results in that FOP being hung
- [#1515572](https://bugzilla.redhat.com/1515572): Accessing a file when source brick is down results in that FOP being hung
- [#1516313](https://bugzilla.redhat.com/1516313): Bringing down data bricks in cyclic order results in arbiter brick becoming the source for heal.
- [#1517692](https://bugzilla.redhat.com/1517692): Memory leak in locks xlator
- [#1518257](https://bugzilla.redhat.com/1518257): EC DISCARD doesn't punch hole properly

View File

@@ -5,17 +5,19 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.13 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
1. Expanding a gluster volume that is sharded may cause file corruption
1. Expanding a gluster volume that is sharded may cause file corruption
- Sharded volumes are typically used for VM images, if such volumes are
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
expanded or possibly contracted (i.e add/remove bricks and rebalance) there
are reports of VM images getting corrupted.
- The last known cause for corruption (Bug #1515434) is still under review.
- Status of this bug can be tracked here, [#1515434](https://bugzilla.redhat.com/1515434)
## Bugs addressed
Bugs addressed since release-3.13.0 are listed below.

View File

@@ -5,9 +5,11 @@ contain a listing of all the new features that were added and
bugs fixed in the GlusterFS 3.13 stable release.
## Major changes, features and limitations addressed in this release
**No Major changes**
## Major issues
**No Major iissues**
## Bugs addressed
@@ -15,7 +17,7 @@ bugs fixed in the GlusterFS 3.13 stable release.
Bugs addressed since release-3.13.1 are listed below.
- [#1511293](https://bugzilla.redhat.com/1511293): In distribute volume after glusterd restart, brick goes offline
- [#1515434](https://bugzilla.redhat.com/1515434): dht_(f)xattrop does not implement migration checks
- [#1515434](https://bugzilla.redhat.com/1515434): dht\_(f)xattrop does not implement migration checks
- [#1516313](https://bugzilla.redhat.com/1516313): Bringing down data bricks in cyclic order results in arbiter brick becoming the source for heal.
- [#1529055](https://bugzilla.redhat.com/1529055): Test case ./tests/bugs/bug-1371806_1.t is failing
- [#1529084](https://bugzilla.redhat.com/1529084): fstat returns ENOENT/ESTALE

View File

@@ -28,6 +28,7 @@ to files in glusterfs using its GFID
For more information refer [here](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/gfid%20access.md).
### Prevent NFS restart on Volume change
Earlier any volume change (volume option, volume start, volume stop, volume
delete,brick add, etc) required restarting NFS server.
@@ -48,7 +49,7 @@ directory read performance.
zerofill feature allows creation of pre-allocated and zeroed-out files on
GlusterFS volumes by offloading the zeroing part to server and/or storage
(storage offloads use SCSI WRITESAME), thereby achieves quick creation of
pre-allocated and zeroed-out VM disk image by using server/storage off-loads.
pre-allocated and zeroed-out VM disk image by using server/storage off-loads.
For more information refer [here](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/Zerofill.md).
@@ -93,7 +94,7 @@ The Volume group is represented as directory and logical volumes as files.
remove-brick CLI earlier used to remove the brick forcefully ( without data migration ),
when called without any arguments. This mode of 'remove-brick' cli, without any
arguments has been deprecated.
arguments has been deprecated.
### Experimental Features
@@ -126,24 +127,26 @@ The following features are experimental with this release:
- AUTH support for exported nfs sub-directories added
### Known Issues:
- The following configuration changes are necessary for qemu and samba
integration with libgfapi to work seamlessly:
1) gluster volume set <volname> server.allow-insecure on
```{ .text .no-copy }
1) gluster volume set <volname> server.allow-insecure on
2) Edit /etc/glusterfs/glusterd.vol to contain this line:
option rpc-auth-allow-insecure on
Post 1), restarting the volume would be necessary.
Post 2), restarting glusterd would be necessary.
2) Edit /etc/glusterfs/glusterd.vol to contain this line:
option rpc-auth-allow-insecure on
- RDMA connection manager needs IPoIB for connection establishment. More
details can be found [here](https://github.com/gluster/glusterfs-specs/blob/master/done/Features/rdmacm.md).
Post 1), restarting the volume would be necessary.
Post 2), restarting glusterd would be necessary.
```
- RDMA connection manager needs IPoIB for connection establishment. More
details can be found [here](https://github.com/gluster/glusterfs-specs/blob/master/done/Features/rdmacm.md).
- For Block Device translator based volumes open-behind translator at the
client side needs to be disabled.
client side needs to be disabled.
- libgfapi clients calling glfs_fini before a successfull glfs_init will cause the client to
hang as reported [here](http://lists.gnu.org/archive/html/gluster-devel/2014-04/msg00179.html).

View File

@@ -15,83 +15,82 @@ additions:
### Bugs Fixed:
* [765202](https://bugzilla.redhat.com/765202): lgetxattr called with invalid keys on the bricks
* [833586](https://bugzilla.redhat.com/833586): inodelk hang from marker_rename_release_newp_lock
* [859581](https://bugzilla.redhat.com/859581): self-heal process can sometimes create directories instead of symlinks for the root gfid file in .glusterfs
* [986429](https://bugzilla.redhat.com/986429): Backupvolfile server option should work internal to GlusterFS framework
* [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume heal info" should list the entries that actually required to be healed.
* [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic Links
* [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file self-heal there are warning messages reported in glustershd.log file
* [1063190](https://bugzilla.redhat.com/1063190): Volume was not accessible after server side quorum was met
* [1064096](https://bugzilla.redhat.com/1064096): The old Python Translator code (not Glupy) should be removed
* [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a gluster mount with replica 3 (quorum-type auto) leads to a split-brain
* [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS with mmap() on a sparse file created with open(), seek(), write()
* [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal mismatching user extended attributes without any changelogs
* [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked as versioned .so files, creating <xlator>.so.0.0.0
* [1086743](https://bugzilla.redhat.com/1086743): Add documentation for the Feature: RDMA-connection manager (RDMA-CM)
* [1086748](https://bugzilla.redhat.com/1086748): Add documentation for the Feature: AFR CLI enhancements
* [1086749](https://bugzilla.redhat.com/1086749): Add documentation for the Feature: Exposing Volume Capabilities
* [1086750](https://bugzilla.redhat.com/1086750): Add documentation for the Feature: File Snapshots in GlusterFS
* [1086751](https://bugzilla.redhat.com/1086751): Add documentation for the Feature: gfid-access
* [1086752](https://bugzilla.redhat.com/1086752): Add documentation for the Feature: On-Wire Compression/Decompression
* [1086754](https://bugzilla.redhat.com/1086754): Add documentation for the Feature: Quota Scalability
* [1086755](https://bugzilla.redhat.com/1086755): Add documentation for the Feature: readdir-ahead
* [1086756](https://bugzilla.redhat.com/1086756): Add documentation for the Feature: zerofill API for GlusterFS
* [1086758](https://bugzilla.redhat.com/1086758): Add documentation for the Feature: Changelog based parallel geo-replication
* [1086760](https://bugzilla.redhat.com/1086760): Add documentation for the Feature: Write Once Read Many (WORM) volume
* [1086762](https://bugzilla.redhat.com/1086762): Add documentation for the Feature: BD Xlator - Block Device translator
* [1086766](https://bugzilla.redhat.com/1086766): Add documentation for the Feature: Libgfapi
* [1086774](https://bugzilla.redhat.com/1086774): Add documentation for the Feature: Access Control List - Version 3 support for Gluster NFS
* [1086781](https://bugzilla.redhat.com/1086781): Add documentation for the Feature: Eager locking
* [1086782](https://bugzilla.redhat.com/1086782): Add documentation for the Feature: glusterfs and oVirt integration
* [1086783](https://bugzilla.redhat.com/1086783): Add documentation for the Feature: qemu 1.3 - libgfapi integration
* [1088848](https://bugzilla.redhat.com/1088848): Spelling errors in rpc/rpc-transport/rdma/src/rdma.c
* [1089054](https://bugzilla.redhat.com/1089054): gf-error-codes.h is missing from source tarball
* [1089470](https://bugzilla.redhat.com/1089470): SMB: Crash on brick process during compile kernel.
* [1089934](https://bugzilla.redhat.com/1089934): list dir with more than N files results in Input/output error
* [1091340](https://bugzilla.redhat.com/1091340): Doc: Add glfs_fini known issue to release notes 3.5
* [1091392](https://bugzilla.redhat.com/1091392): glusterfs.spec.in: minor/nit changes to sync with Fedora spec
* [1095256](https://bugzilla.redhat.com/1095256): Excessive logging from self-heal daemon, and bricks
* [1095595](https://bugzilla.redhat.com/1095595): Stick to IANA standard while allocating brick ports
* [1095775](https://bugzilla.redhat.com/1095775): Add support in libgfapi to fetch volume info from glusterd.
* [1095971](https://bugzilla.redhat.com/1095971): Stopping/Starting a Gluster volume resets ownership
* [1096040](https://bugzilla.redhat.com/1096040): AFR : self-heal-daemon not clearing the change-logs of all the sources after self-heal
* [1096425](https://bugzilla.redhat.com/1096425): i/o error when one user tries to access RHS volume over NFS with 100+ GIDs
* [1099878](https://bugzilla.redhat.com/1099878): Need support for handle based Ops to fetch/modify extended attributes of a file
* [1101647](https://bugzilla.redhat.com/1101647): gluster volume heal volname statistics heal-count not giving desired output.
* [1102306](https://bugzilla.redhat.com/1102306): license: xlators/features/glupy dual license GPLv2 and LGPLv3+
* [1103413](https://bugzilla.redhat.com/1103413): Failure in gf_log_init reopening stderr
* [1104592](https://bugzilla.redhat.com/1104592): heal info may give Success instead of transport end point not connected when a brick is down.
* [1104915](https://bugzilla.redhat.com/1104915): glusterfsd crashes while doing stress tests
* [1104919](https://bugzilla.redhat.com/1104919): Fix memory leaks in gfid-access xlator.
* [1104959](https://bugzilla.redhat.com/1104959): Dist-geo-rep : some of the files not accessible on slave after the geo-rep sync from master to slave.
* [1105188](https://bugzilla.redhat.com/1105188): Two instances each, of brick processes, glusterfs-nfs and quotad seen after glusterd restart
* [1105524](https://bugzilla.redhat.com/1105524): Disable nfs.drc by default
* [1107937](https://bugzilla.redhat.com/1107937): quota-anon-fd-nfs.t fails spuriously
* [1109832](https://bugzilla.redhat.com/1109832): I/O fails for for glusterfs 3.4 AFR clients accessing servers upgraded to glusterfs 3.5
* [1110777](https://bugzilla.redhat.com/1110777): glusterfsd OOM - using all memory when quota is enabled
- [765202](https://bugzilla.redhat.com/765202): lgetxattr called with invalid keys on the bricks
- [833586](https://bugzilla.redhat.com/833586): inodelk hang from marker_rename_release_newp_lock
- [859581](https://bugzilla.redhat.com/859581): self-heal process can sometimes create directories instead of symlinks for the root gfid file in .glusterfs
- [986429](https://bugzilla.redhat.com/986429): Backupvolfile server option should work internal to GlusterFS framework
- [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume heal info" should list the entries that actually required to be healed.
- [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic Links
- [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file self-heal there are warning messages reported in glustershd.log file
- [1063190](https://bugzilla.redhat.com/1063190): Volume was not accessible after server side quorum was met
- [1064096](https://bugzilla.redhat.com/1064096): The old Python Translator code (not Glupy) should be removed
- [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a gluster mount with replica 3 (quorum-type auto) leads to a split-brain
- [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS with mmap() on a sparse file created with open(), seek(), write()
- [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal mismatching user extended attributes without any changelogs
- [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked as versioned .so files, creating <xlator>.so.0.0.0
- [1086743](https://bugzilla.redhat.com/1086743): Add documentation for the Feature: RDMA-connection manager (RDMA-CM)
- [1086748](https://bugzilla.redhat.com/1086748): Add documentation for the Feature: AFR CLI enhancements
- [1086749](https://bugzilla.redhat.com/1086749): Add documentation for the Feature: Exposing Volume Capabilities
- [1086750](https://bugzilla.redhat.com/1086750): Add documentation for the Feature: File Snapshots in GlusterFS
- [1086751](https://bugzilla.redhat.com/1086751): Add documentation for the Feature: gfid-access
- [1086752](https://bugzilla.redhat.com/1086752): Add documentation for the Feature: On-Wire Compression/Decompression
- [1086754](https://bugzilla.redhat.com/1086754): Add documentation for the Feature: Quota Scalability
- [1086755](https://bugzilla.redhat.com/1086755): Add documentation for the Feature: readdir-ahead
- [1086756](https://bugzilla.redhat.com/1086756): Add documentation for the Feature: zerofill API for GlusterFS
- [1086758](https://bugzilla.redhat.com/1086758): Add documentation for the Feature: Changelog based parallel geo-replication
- [1086760](https://bugzilla.redhat.com/1086760): Add documentation for the Feature: Write Once Read Many (WORM) volume
- [1086762](https://bugzilla.redhat.com/1086762): Add documentation for the Feature: BD Xlator - Block Device translator
- [1086766](https://bugzilla.redhat.com/1086766): Add documentation for the Feature: Libgfapi
- [1086774](https://bugzilla.redhat.com/1086774): Add documentation for the Feature: Access Control List - Version 3 support for Gluster NFS
- [1086781](https://bugzilla.redhat.com/1086781): Add documentation for the Feature: Eager locking
- [1086782](https://bugzilla.redhat.com/1086782): Add documentation for the Feature: glusterfs and oVirt integration
- [1086783](https://bugzilla.redhat.com/1086783): Add documentation for the Feature: qemu 1.3 - libgfapi integration
- [1088848](https://bugzilla.redhat.com/1088848): Spelling errors in rpc/rpc-transport/rdma/src/rdma.c
- [1089054](https://bugzilla.redhat.com/1089054): gf-error-codes.h is missing from source tarball
- [1089470](https://bugzilla.redhat.com/1089470): SMB: Crash on brick process during compile kernel.
- [1089934](https://bugzilla.redhat.com/1089934): list dir with more than N files results in Input/output error
- [1091340](https://bugzilla.redhat.com/1091340): Doc: Add glfs_fini known issue to release notes 3.5
- [1091392](https://bugzilla.redhat.com/1091392): glusterfs.spec.in: minor/nit changes to sync with Fedora spec
- [1095256](https://bugzilla.redhat.com/1095256): Excessive logging from self-heal daemon, and bricks
- [1095595](https://bugzilla.redhat.com/1095595): Stick to IANA standard while allocating brick ports
- [1095775](https://bugzilla.redhat.com/1095775): Add support in libgfapi to fetch volume info from glusterd.
- [1095971](https://bugzilla.redhat.com/1095971): Stopping/Starting a Gluster volume resets ownership
- [1096040](https://bugzilla.redhat.com/1096040): AFR : self-heal-daemon not clearing the change-logs of all the sources after self-heal
- [1096425](https://bugzilla.redhat.com/1096425): i/o error when one user tries to access RHS volume over NFS with 100+ GIDs
- [1099878](https://bugzilla.redhat.com/1099878): Need support for handle based Ops to fetch/modify extended attributes of a file
- [1101647](https://bugzilla.redhat.com/1101647): gluster volume heal volname statistics heal-count not giving desired output.
- [1102306](https://bugzilla.redhat.com/1102306): license: xlators/features/glupy dual license GPLv2 and LGPLv3+
- [1103413](https://bugzilla.redhat.com/1103413): Failure in gf_log_init reopening stderr
- [1104592](https://bugzilla.redhat.com/1104592): heal info may give Success instead of transport end point not connected when a brick is down.
- [1104915](https://bugzilla.redhat.com/1104915): glusterfsd crashes while doing stress tests
- [1104919](https://bugzilla.redhat.com/1104919): Fix memory leaks in gfid-access xlator.
- [1104959](https://bugzilla.redhat.com/1104959): Dist-geo-rep : some of the files not accessible on slave after the geo-rep sync from master to slave.
- [1105188](https://bugzilla.redhat.com/1105188): Two instances each, of brick processes, glusterfs-nfs and quotad seen after glusterd restart
- [1105524](https://bugzilla.redhat.com/1105524): Disable nfs.drc by default
- [1107937](https://bugzilla.redhat.com/1107937): quota-anon-fd-nfs.t fails spuriously
- [1109832](https://bugzilla.redhat.com/1109832): I/O fails for for glusterfs 3.4 AFR clients accessing servers upgraded to glusterfs 3.5
- [1110777](https://bugzilla.redhat.com/1110777): glusterfsd OOM - using all memory when quota is enabled
### Known Issues:
- The following configuration changes are necessary for qemu and samba
integration with libgfapi to work seamlessly:
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
~~~
gluster volume stop <volname>
gluster volume start <volname>
~~~
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
~~~
option rpc-auth-allow-insecure on
~~~
4. restarting glusterd is necessary
~~~
service glusterd restart
~~~
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
gluster volume stop <volname>
gluster volume start <volname>
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
option rpc-auth-allow-insecure on
4. restarting glusterd is necessary
service glusterd restart
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
- For Block Device translator based volumes open-behind translator at the client side needs to be disabled.
@@ -104,5 +103,5 @@ additions:
- After enabling `server.manage-gids`, the volume needs to be stopped and
started again to have the option enabled in the brick processes
gluster volume stop <volname>
gluster volume start <volname>
gluster volume stop <volname>
gluster volume start <volname>

View File

@@ -4,12 +4,12 @@ This is mostly a bugfix release. The [Release Notes for 3.5.0](./3.5.0.md) and [
### Bugs Fixed:
- [1096020](https://bugzilla.redhat.com/1096020): NFS server crashes in _socket_read_vectored_request
- [1096020](https://bugzilla.redhat.com/1096020): NFS server crashes in \_socket_read_vectored_request
- [1100050](https://bugzilla.redhat.com/1100050): Can't write to quota enable folder
- [1103050](https://bugzilla.redhat.com/1103050): nfs: reset command does not alter the result for nfs options earlier set
- [1105891](https://bugzilla.redhat.com/1105891): features/gfid-access: stat on .gfid virtual directory return EINVAL
- [1111454](https://bugzilla.redhat.com/1111454): creating symlinks generates errors on stripe volume
- [1112111](https://bugzilla.redhat.com/1112111): Self-heal errors with "afr crawl failed for child 0 with ret -1" while performing rolling upgrade.
- [1112111](https://bugzilla.redhat.com/1112111): Self-heal errors with "afr crawl failed for child 0 with ret -1" while performing rolling upgrade.
- [1112348](https://bugzilla.redhat.com/1112348): [AFR] I/O fails when one of the replica nodes go down
- [1112659](https://bugzilla.redhat.com/1112659): Fix inode leaks in gfid-access xlator
- [1112980](https://bugzilla.redhat.com/1112980): NFS subdir authentication doesn't correctly handle multi-(homed,protocol,etc) network addresses
@@ -18,8 +18,8 @@ This is mostly a bugfix release. The [Release Notes for 3.5.0](./3.5.0.md) and [
- [1113749](https://bugzilla.redhat.com/1113749): client_t clienttable cliententries are never expanded when all entries are used
- [1113894](https://bugzilla.redhat.com/1113894): AFR : self-heal of few files not happening when a AWS EC2 Instance is back online after a restart
- [1113959](https://bugzilla.redhat.com/1113959): Spec %post server does not wait for the old glusterd to exit
- [1114501](https://bugzilla.redhat.com/1114501): Dist-geo-rep : deletion of files on master, geo-rep fails to propagate to slaves.
- [1115369](https://bugzilla.redhat.com/1115369): Allow the usage of the wildcard character '*' to the options "nfs.rpc-auth-allow" and "nfs.rpc-auth-reject"
- [1114501](https://bugzilla.redhat.com/1114501): Dist-geo-rep : deletion of files on master, geo-rep fails to propagate to slaves.
- [1115369](https://bugzilla.redhat.com/1115369): Allow the usage of the wildcard character '\*' to the options "nfs.rpc-auth-allow" and "nfs.rpc-auth-reject"
- [1115950](https://bugzilla.redhat.com/1115950): glfsheal: Improve the way in which we check the presence of replica volumes
- [1116672](https://bugzilla.redhat.com/1116672): Resource cleanup doesn't happen for clients on servers after disconnect
- [1116997](https://bugzilla.redhat.com/1116997): mounting a volume over NFS (TCP) with MOUNT over UDP fails
@@ -32,34 +32,33 @@ This is mostly a bugfix release. The [Release Notes for 3.5.0](./3.5.0.md) and [
- The following configuration changes are necessary for 'qemu' and 'samba vfs
plugin' integration with libgfapi to work seamlessly:
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
~~~
gluster volume stop <volname>
gluster volume start <volname>
~~~
```
gluster volume stop <volname>
gluster volume start <volname>
```
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
~~~
option rpc-auth-allow-insecure on
~~~
```
option rpc-auth-allow-insecure on
```
4. restarting glusterd is necessary
4. restarting glusterd is necessary
~~~
service glusterd restart
~~~
```
service glusterd restart
```
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
- For Block Device translator based volumes open-behind translator at the
client side needs to be disabled.
gluster volume set <volname> performance.open-behind disabled
- libgfapi clients calling `glfs_fini` before a successfull `glfs_init` will cause the client to
hang as reported [here](http://lists.gnu.org/archive/html/gluster-devel/2014-04/msg00179.html).
The workaround is NOT to call `glfs_fini` for error cases encountered before a successfull

View File

@@ -10,7 +10,7 @@ features that were added and bugs fixed in the GlusterFS 3.5 stable release.
- [1100204](https://bugzilla.redhat.com/1100204): brick failure detection does not work for ext4 filesystems
- [1126801](https://bugzilla.redhat.com/1126801): glusterfs logrotate config file pollutes global config
- [1129527](https://bugzilla.redhat.com/1129527): DHT :- data loss - file is missing on renaming same file from multiple client at same time
- [1129541](https://bugzilla.redhat.com/1129541): [DHT:REBALANCE]: Rebalance failures are seen with error message " remote operation failed: File exists"
- [1129541](https://bugzilla.redhat.com/1129541): [DHT:REBALANCE]: Rebalance failures are seen with error message " remote operation failed: File exists"
- [1132391](https://bugzilla.redhat.com/1132391): NFS interoperability problem: stripe-xlator removes EOF at end of READDIR
- [1133949](https://bugzilla.redhat.com/1133949): Minor typo in afr logging
- [1136221](https://bugzilla.redhat.com/1136221): The memories are exhausted quickly when handle the message which has multi fragments in a single record
@@ -44,27 +44,27 @@ features that were added and bugs fixed in the GlusterFS 3.5 stable release.
- The following configuration changes are necessary for 'qemu' and 'samba vfs
plugin' integration with libgfapi to work seamlessly:
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
1. gluster volume set <volname> server.allow-insecure on
2. restarting the volume is necessary
~~~
gluster volume stop <volname>
gluster volume start <volname>
~~~
```
gluster volume stop <volname>
gluster volume start <volname>
```
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
3. Edit `/etc/glusterfs/glusterd.vol` to contain this line:
~~~
option rpc-auth-allow-insecure on
~~~
```
option rpc-auth-allow-insecure on
```
4. restarting glusterd is necessary
4. restarting glusterd is necessary
~~~
service glusterd restart
~~~
```
service glusterd restart
```
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
More details are also documented in the Gluster Wiki on the [Libgfapi with qemu libvirt](https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.5/libgfapi%20with%20qemu%20libvirt.md) page.
- For Block Device translator based volumes open-behind translator at the
client side needs to be disabled.

Some files were not shown because too many files have changed in this diff Show More