diff --git a/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md b/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md index 601f8d2..85093f0 100644 --- a/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md +++ b/docs/Administrator-Guide/Split-brain-and-ways-to-deal-with-it.md @@ -1,46 +1,55 @@ # Split brain and the ways to deal with it ### Split brain: -Split brain is a situation where two or more replicated copies of a file become divergent. When a file is in split brain, there is an inconsistency in either data or metadata of the file amongst the bricks of a replica and do not have enough information to authoritatively pick a copy as being pristine and heal the bad copies, despite all bricks being up and online. For a directory, there is also an entry split brain where a file inside it can have different gfid/file-type across the bricks of a replica. Split brain can happen mainly because of 2 reasons: -1. Due to network disconnect: -Where a client temporarily loses connection to the bricks. +Split brain is a situation where two or more replicated copies of a file become divergent. When a file is in split brain, there is an inconsistency in either data or metadata of the file amongst the bricks of a replica and do not have enough information to authoritatively pick a copy as being pristine and heal the bad copies, despite all bricks being up and online. For a directory, there is also an entry split brain where a file inside it can have different gfid/file-type across the bricks of a replica. + +Split brain can happen mainly because of 2 reasons: + +1. Due to network disconnect, where a client temporarily loses connection to the bricks. + - There is a replica pair of 2 bricks, brick1 on server1 and brick2 on server2. - Client1 loses connection to brick2 and client2 loses connection to brick1 due to network split. - Writes from client1 goes to brick1 and from client2 goes to brick2, which is nothing but split-brain. + 2. Gluster brick processes going down or returning error: + - Server1 is down and server2 is up: Writes happen on server 2. - Server1 comes up, server2 goes down (Heal not happened / data on server 2 is not replicated on server1): Writes happen on server1. - Server2 comes up: Both server1 and server2 has data independent of each other. -If we use the replica 2 volume, it is not possible to prevent split-brain without losing availability. +If we use the `replica 2` volume, it is not possible to prevent split-brain without losing availability. ### Ways to deal with split brain: In glusterfs there are ways to resolve split brain. You can see the detailed description of how to resolve a split-brain [here](../Troubleshooting/resolving-splitbrain.md). Moreover, there are ways to reduce the chances of ending up in split-brain situations. They are: -1. Replica 3 volume + +1. volume with `replica 3` 2. Arbiter volume -Both of these uses the client-quorum option of glusterfs to avoid the split-brain situations. +Both of these use the client-quorum option of glusterfs to avoid the split-brain situations. ### Client quorum: This is a feature implemented in Automatic File Replication (AFR here on) module, to prevent split-brains in the I/O path for replicate/distributed-replicate volumes. By default, if the client-quorum is not met for a particular replica subvol, it becomes read-only. The other subvols (in a dist-rep volume) will still have R/W access. [Here](arbiter-volumes-and-quorum.md#client-quorum) you can see more details about client-quorum. #### Client quorum in replica 2 volumes: -In a replica 2 volume it is not possible to achieve high availability and consistency at the same time, without sacrificing tolerance to partition. If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only. +In a `replica 2` volume it is not possible to achieve high availability and consistency at the same time, without sacrificing tolerance to partition. If we set the client-quorum option to auto, then the first brick must always be up, irrespective of the status of the second brick. If only the second brick is up, the subvolume becomes read-only. If the quorum-type is set to fixed, and the quorum-count is set to 1, then we may end up in split brain. + - Brick1 is up and brick2 is down. Quorum is met and write happens on brick1. - Brick1 goes down and brick2 comes up (No heal happened). Quorum is met, write happens on brick2. - Brick1 comes up. Quorum is met, but both the bricks have independent writes - split-brain. + To avoid this we have to set the quorum-count to 2, which will cost the availability. Even if we have one replica brick up and running, the quorum is not met and we end up seeing EROFS. ### 1. Replica 3 volume: -When we create a replicated or distributed replicated volume with replica count 3, the cluster.quorum-type option is set to auto by default. That means at least 2 bricks should be up and running to satisfy the quorum and allow the writes. This is the recommended setting for a replica 3 volume and this should not be changed. Here is how it prevents files from ending up in split brain: +When we create a replicated or distributed replicated volume with replica count 3, the cluster.quorum-type option is set to auto by default. That means at least 2 bricks should be up and running to satisfy the quorum and allow the writes. This is the recommended setting for a `replica 3` volume and this should not be changed. Here is how it prevents files from ending up in split brain: B1, B2, and B3 are the 3 bricks of a replica 3 volume. + 1. B1 & B2 are up and B3 is down. Quorum is met and write happens on B1 & B2. 2. B3 comes up and B2 is down. Quorum is met and write happens on B1 & B3. 3. B2 comes up and B1 goes down. Quorum is met. But when a write request comes, AFR sees that B2 & B3 are blaming each other (B2 says that some writes are pending on B3 and B3 says that some writes are pending on B2), therefore the write is not allowed and is failed with EIO. -Command to create a replica 3 volume: +Command to create a `replica 3` volume: ```sh $gluster volume create replica 3 host1:brick1 host2:brick2 host3:brick3 ``` @@ -65,6 +74,7 @@ Since the arbiter brick has only name and metadata of the files, there are some You can find more details on arbiter [here](arbiter-volumes-and-quorum.md). ### Differences between replica 3 and arbiter volumes: + 1. In case of a replica 3 volume, we store the entire file in all the bricks and it is recommended to have bricks of same size. But in case of arbiter, since we do not store data, the size of the arbiter brick is comparatively lesser than the other bricks. 2. Arbiter is a state between replica 2 and replica 3 volume. If we have only arbiter and one of the other brick is up and the arbiter brick blames the other brick, then we can not proceed with the FOPs. 4. Replica 3 gives high availability compared to arbiter, because unlike in arbiter, replica 3 has a full copy of the data in all 3 bricks. diff --git a/docs/Administrator-Guide/arbiter-volumes-and-quorum.md b/docs/Administrator-Guide/arbiter-volumes-and-quorum.md index b4108fc..4b87ec2 100644 --- a/docs/Administrator-Guide/arbiter-volumes-and-quorum.md +++ b/docs/Administrator-Guide/arbiter-volumes-and-quorum.md @@ -2,7 +2,7 @@ The arbiter volume is a special subset of replica volumes that is aimed at preventing split-brains and providing the same consistency guarantees as a normal -replica 3 volume without consuming 3x space. +`replica 3` volume without consuming 3x space. @@ -22,7 +22,7 @@ replica 3 volume without consuming 3x space. The syntax for creating the volume is: ``` -# gluster volume create replica 2 arbiter 1 ... +# gluster volume create replica 2 arbiter 1 ... ``` **Note**: The earlier syntax used to be ```replica 3 arbiter 1``` but that was leading to confusions among users about the total no. of data bricks. For the @@ -33,7 +33,7 @@ arbiter volume. For example: ``` -# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick +# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick volume create: testvol: success: please start the volume to access data ``` @@ -66,9 +66,9 @@ performance.readdir-ahead: on ` ``` The arbiter brick will store only the file/directory names (i.e. the tree structure) -and extended attributes (metadata) but not any data. i.e. the file size +and extended attributes (metadata) but not any data, i.e. the file size (as shown by `ls -l`) will be zero bytes. It will also store other gluster -metadata like the .glusterfs folder and its contents. +metadata like the `.glusterfs` folder and its contents. _**Note:** Enabling the arbiter feature **automatically** configures_ _client-quorum to 'auto'. This setting is **not** to be changed._ @@ -76,11 +76,10 @@ _client-quorum to 'auto'. This setting is **not** to be changed._ ## Arbiter brick(s) sizing Since the arbiter brick does not store file data, its disk usage will be considerably -less than the other bricks of the replica. The sizing of the brick will depend on +smaller than for the other bricks of the replica. The sizing of the brick will depend on how many files you plan to store in the volume. A good estimate will be 4KB times the number of files in the replica. Note that the estimate also -depends on the inode space alloted by the underlying filesystem for a given -disk size. +depends on the inode space allocated by the underlying filesystem for a given disk size. The `maxpct` value in XFS for volumes of size 1TB to 50TB is only 5%. If you want to store say 300 million files, 4KB x 300M gives us 1.2TB. @@ -130,7 +129,7 @@ greater than 50%, so that two nodes separated from each other do not believe they have quorum simultaneously. For a two-node plain replica volume, this would mean both nodes need to be up and running. So there is no notion of HA/failover. -There are users who create a replica 2 volume from 2 nodes and peer-probe +There are users who create a `replica 2` volume from 2 nodes and peer-probe a 'dummy' node without bricks and enable server quorum with a ratio of 51%. This does not prevent files from getting into split-brain. For example, if B1 and B2 are the bricks/nodes of the replica and B3 is the dummy node, we can @@ -176,7 +175,7 @@ The following volume set options are used to configure it: to specify the number of bricks to be active to participate in quorum. If the quorum-type is auto then this option has no significance. -Earlier, when quorm was not met, the replica subvolume turned read-only. But +Earlier, when quorum was not met, the replica subvolume turned read-only. But since [glusterfs-3.13](https://docs.gluster.org/en/latest/release-notes/3.13.0/#addition-of-checks-for-allowing-lookups-in-afr-and-removal-of-clusterquorum-reads-volume-option) and upwards, the subvolume becomes unavailable, i.e. all the file operations fail with ENOTCONN error instead of becoming EROFS. This means the ```cluster.quorum-reads``` volume option is also not supported. @@ -185,16 +184,16 @@ This means the ```cluster.quorum-reads``` volume option is also not supported. ## Replica 2 and Replica 3 volumes From the above descriptions, it is clear that client-quorum cannot really be applied -to a replica 2 volume:(without costing HA). +to a `replica 2` volume (without costing HA). If the quorum-type is set to auto, then by the description given earlier, the first brick must always be up, irrespective of the status of the second brick. IOW, if only the second brick is up, the subvol returns ENOTCONN, i.e. no HA. If quorum-type is set to fixed, then the quorum-count *has* to be two to prevent split-brains (otherwise a write can succeed in brick1, another in brick2 =>split-brain). -So for all practical purposes, if you want high availability in a replica 2 volume, +So for all practical purposes, if you want high availability in a `replica 2` volume, it is recommended not to enable client-quorum. -In a replica 3 volume, client-quorum is enabled by default and set to 'auto'. +In a `replica 3` volume, client-quorum is enabled by default and set to 'auto'. This means 2 bricks need to be up for the write to succeed. Here is how this configuration prevents files from ending up in split-brain: diff --git a/docs/Contributors-Guide/Adding-your-blog.md b/docs/Contributors-Guide/Adding-your-blog.md index ebcd227..0d17eb3 100644 --- a/docs/Contributors-Guide/Adding-your-blog.md +++ b/docs/Contributors-Guide/Adding-your-blog.md @@ -7,5 +7,3 @@ OK, you can do that by editing planet-gluster [feeds](https://github.com/gluster Please find instructions mentioned in the file and send a pull request. Once approved, all your gluster related posts will appear in [planet.gluster.org](http://planet.gluster.org) website. - - diff --git a/docs/Contributors-Guide/Bug-Reporting-Guidelines.md b/docs/Contributors-Guide/Bug-Reporting-Guidelines.md index fe266d7..4b984e0 100644 --- a/docs/Contributors-Guide/Bug-Reporting-Guidelines.md +++ b/docs/Contributors-Guide/Bug-Reporting-Guidelines.md @@ -1,31 +1,29 @@ -Before filing an issue ----------------------- +## Before filing an issue If you are finding any issues, these preliminary checks as useful: -- Is SELinux enabled? (you can use `getenforce` to check) -- Are iptables rules blocking any data traffic? (`iptables -L` can - help check) -- Are all the nodes reachable from each other? [ Network problem ] -- Please search [issues](https://github.com/gluster/glusterfs/issues) - to see if the bug has already been reported - - If an issue has been already filed for a particular release and - you found the issue in another release, add a comment in issue. +- Is SELinux enabled? (you can use `getenforce` to check) +- Are iptables rules blocking any data traffic? (`iptables -L` can + help check) +- Are all the nodes reachable from each other? [ Network problem ] +- Please search [issues](https://github.com/gluster/glusterfs/issues) + to see if the bug has already been reported + + - If an issue has been already filed for a particular release and you found the issue in another release, add a comment in issue. Anyone can search in github issues, you don't need an account. Searching requires some effort, but helps avoid duplicates, and you may find that your problem has already been solved. -Reporting An Issue ------------------- +## Reporting An Issue -- You should have an account with github.com -- Here is the link to file an issue: - [Github](https://github.com/gluster/glusterfs/issues/new) +- You should have an account with github.com +- Here is the link to file an issue: + [Github](https://github.com/gluster/glusterfs/issues/new) -*Note: Please go through all below sections to understand what +_Note: Please go through all below sections to understand what information we need to put in a bug. So it will help the developer to -root cause and fix it* +root cause and fix it_ ### Required Information @@ -33,84 +31,86 @@ You should gather the information below before creating the bug report. #### Package Information -- Location from which the packages are used -- Package Info - version of glusterfs package installed +- Location from which the packages are used +- Package Info - version of glusterfs package installed #### Cluster Information -- Number of nodes in the cluster -- Hostnames and IPs of the gluster Node [if it is not a security - issue] - - Hostname / IP will help developers in understanding & - correlating with the logs -- Output of `gluster peer status` -- Node IP, from which the "x" operation is done - - "x" here means any operation that causes the issue +- Number of nodes in the cluster +- Hostnames and IPs of the gluster Node [if it is not a security + issue] + + - Hostname / IP will help developers in understanding & correlating with the logs + +- Output of `gluster peer status` +- Node IP, from which the "x" operation is done + + - "x" here means any operation that causes the issue #### Volume Information -- Number of volumes -- Volume Names -- Volume on which the particular issue is seen [ if applicable ] -- Type of volumes -- Volume options if available -- Output of `gluster volume info` -- Output of `gluster volume status` -- Get the statedump of the volume with the problem - -`$ gluster volume statedump ` +- Number of volumes +- Volume Names +- Volume on which the particular issue is seen [ if applicable ] +- Type of volumes +- Volume options if available +- Output of `gluster volume info` +- Output of `gluster volume status` +- Get the statedump of the volume with the problem `gluster volume statedump ` This dumps statedump per brick process in `/var/run/gluster` -*NOTE: Collect statedumps from one gluster Node in a directory.* +_NOTE: Collect statedumps from one gluster Node in a directory._ Repeat it in all Nodes containing the bricks of the volume. All the so collected directories could be archived, compressed and attached to bug #### Brick Information -- xfs options when a brick partition was done - - This could be obtained with this command : +- xfs options when a brick partition was done -`$ xfs_info /dev/mapper/vg1-brick` + - This could be obtained with this command: `xfs_info /dev/mapper/vg1-brick` -- Extended attributes on the bricks - - This could be obtained with this command: +- Extended attributes on the bricks -`$ getfattr -d -m. -ehex /rhs/brick1/b1` + - This could be obtained with this command: `getfattr -d -m. -ehex /rhs/brick1/b1` #### Client Information -- OS Type ( Ubuntu, Fedora, RHEL ) -- OS Version: In case of Linux distro get the following : +- OS Type ( Ubuntu, Fedora, RHEL ) +- OS Version: In case of Linux distro get the following : -`uname -r` -`cat /etc/issue` +```console +uname -r +cat /etc/issue +``` -- Fuse or NFS Mount point on the client with output of mount commands -- Output of `df -Th` command +- Fuse or NFS Mount point on the client with output of mount commands +- Output of `df -Th` command #### Tool Information -- If any tools are used for testing, provide the info/version about it -- if any IO is simulated using a script, provide the script +- If any tools are used for testing, provide the info/version about it +- if any IO is simulated using a script, provide the script #### Logs Information -- You can check logs for issues/warnings/errors. - - Self-heal logs - - Rebalance logs - - Glusterd logs - - Brick logs - - NFS logs (if applicable) - - Samba logs (if applicable) - - Client mount log -- Add the entire logs as attachment, if its very large to paste as a - comment +- You can check logs for issues/warnings/errors. + + - Self-heal logs + - Rebalance logs + - Glusterd logs + - Brick logs + - NFS logs (if applicable) + - Samba logs (if applicable) + - Client mount log + +- Add the entire logs as attachment, if its very large to paste as a + comment #### SOS report for CentOS/Fedora -- Get the sosreport from the involved gluster Node and Client [ in - case of CentOS /Fedora ] -- Add a meaningful name/IP to the sosreport, by renaming/adding - hostname/ip to the sosreport name +- Get the sosreport from the involved gluster Node and Client [ in + case of CentOS /Fedora ] +- Add a meaningful name/IP to the sosreport, by renaming/adding + hostname/ip to the sosreport name diff --git a/docs/Contributors-Guide/Bug-Triage.md b/docs/Contributors-Guide/Bug-Triage.md index 04e7eaa..7eb167f 100644 --- a/docs/Contributors-Guide/Bug-Triage.md +++ b/docs/Contributors-Guide/Bug-Triage.md @@ -1,25 +1,24 @@ -Issues Triage Guidelines -======================== +# Issues Triage Guidelines -- Triaging of issues is an important task; when done correctly, it can - reduce the time between reporting an issue and the availability of a - fix enormously. +- Triaging of issues is an important task; when done correctly, it can + reduce the time between reporting an issue and the availability of a + fix enormously. -- Triager should focus on new issues, and try to define the problem - easily understandable and as accurate as possible. The goal of the - triagers is to reduce the time that developers need to solve the bug - report. +- Triager should focus on new issues, and try to define the problem + easily understandable and as accurate as possible. The goal of the + triagers is to reduce the time that developers need to solve the bug + report. -- A triager is like an assistant that helps with the information - gathering and possibly the debugging of a new bug report. Because a - triager helps preparing a bug before a developer gets involved, it - can be a very nice role for new community members that are - interested in technical aspects of the software. +- A triager is like an assistant that helps with the information + gathering and possibly the debugging of a new bug report. Because a + triager helps preparing a bug before a developer gets involved, it + can be a very nice role for new community members that are + interested in technical aspects of the software. -- Triagers will stumble upon many different kind of issues, ranging - from reports about spelling mistakes, or unclear log messages to - memory leaks causing crashes or performance issues in environments - with several hundred storage servers. +- Triagers will stumble upon many different kind of issues, ranging + from reports about spelling mistakes, or unclear log messages to + memory leaks causing crashes or performance issues in environments + with several hundred storage servers. Nobody expects that triagers can prepare all bug reports. Therefore most developers will be able to assist the triagers, answer questions and @@ -28,17 +27,16 @@ more experienced and will rely less on developers. **Issue triage can be summarized as below points:** -- Is the issue a bug? an enhancement request? or a question? Assign the relevant label. -- Is there enough information in the issue description? -- Is it a duplicate issue? -- Is it assigned to correct component of GlusterFS? -- Is the bug summary is correct? -- Assigning issue or Adding people's github handle in the comment, so they get notified. +- Is the issue a bug? an enhancement request? or a question? Assign the relevant label. +- Is there enough information in the issue description? +- Is it a duplicate issue? +- Is it assigned to correct component of GlusterFS? +- Is the bug summary is correct? +- Assigning issue or Adding people's github handle in the comment, so they get notified. The detailed discussion about the above points are below. -Is there enough information? ----------------------------- +## Is there enough information? It's hard to generalize what makes a good report. For "average" reporters is definitely often helpful to have good steps to reproduce, @@ -46,42 +44,38 @@ GlusterFS software version , and information about the test/production environment, Linux/GNU distribution. If the reporter is a developer, steps to reproduce can sometimes be -omitted as context is obvious. *However, this can create a problem for +omitted as context is obvious. _However, this can create a problem for contributors that need to find their way, hence it is strongly advised -to list the steps to reproduce an issue.* +to list the steps to reproduce an issue._ Other tips: -- There should be only one issue per report. Try not to mix related or - similar looking bugs per report. +- There should be only one issue per report. Try not to mix related or + similar looking bugs per report. -- It should be possible to call the described problem fixed at some - point. "Improve the documentation" or "It runs slow" could never be - called fixed, while "Documentation should cover the topic Embedding" - or "The page at should load - in less than five seconds" would have a criterion. A good summary of - the bug will also help others in finding existing bugs and prevent - filing of duplicates. +- It should be possible to call the described problem fixed at some + point. "Improve the documentation" or "It runs slow" could never be + called fixed, while "Documentation should cover the topic Embedding" + or "The page at should load + in less than five seconds" would have a criterion. A good summary of + the bug will also help others in finding existing bugs and prevent + filing of duplicates. -- If the bug is a graphical problem, you may want to ask for a - screenshot to attach to the bug report. Make sure to ask that the - screenshot should not contain any confidential information. +- If the bug is a graphical problem, you may want to ask for a + screenshot to attach to the bug report. Make sure to ask that the + screenshot should not contain any confidential information. -Is it a duplicate? ------------------- +## Is it a duplicate? If you think that you have found a duplicate but you are not totally sure, just add a comment like "This issue looks related to issue #NNN" (and replace NNN by issue-id) so somebody else can take a look and help judging. - -Is it assigned with correct label? ----------------------------------- +## Is it assigned with correct label? Go through the labels and assign the appropriate label -Are the fields correct? ------------------------ +## Are the fields correct? ### Description @@ -89,8 +83,8 @@ Sometimes the description does not summarize the bug itself well. You may want to update the bug summary to make the report distinguishable. A good title may contain: -- A brief explanation of the root cause (if it was found) -- Some of the symptoms people are experiencing +- A brief explanation of the root cause (if it was found) +- Some of the symptoms people are experiencing ### Assigning issue or Adding people's github handle in the comment diff --git a/docs/Contributors-Guide/GlusterFS-Release-process.md b/docs/Contributors-Guide/GlusterFS-Release-process.md index 5888bcf..1ef6bff 100644 --- a/docs/Contributors-Guide/GlusterFS-Release-process.md +++ b/docs/Contributors-Guide/GlusterFS-Release-process.md @@ -15,7 +15,7 @@ Minor releases will have guaranteed backwards compatibilty with earlier minor re Each GlusterFS major release has a 4-6 month release window, in which changes get merged. This window is split into two phases. 1. A Open phase, where all changes get merged -1. A Stability phase, where only changes that stabilize the release get merged. +2. A Stability phase, where only changes that stabilize the release get merged. The first 2-4 months of a release window will be the Open phase, and the last month will be the stability phase. @@ -30,8 +30,8 @@ All changes will be accepted during the Open phase. The changes have a few requi - a change fixing a bug SHOULD have public test case - a change introducing a new feature MUST have a disable switch that can disable the feature during a build - #### Stability phase + This phase is used to stabilize any new features introduced in the open phase, or general bug fixes for already existing features. A new `release-` branch is created at the beginning of this phase. All changes need to be sent to the master branch before getting backported to the new release branch. @@ -54,6 +54,7 @@ Patches accepted in the Stability phase have the following requirements: Patches that do not satisfy the above requirements can still be submitted for review, but cannot be merged. ## Release procedure + This procedure is followed by a release maintainer/manager, to perform the actual release. The release procedure for both major releases and minor releases is nearly the same. @@ -63,6 +64,7 @@ The procedure for the major releases starts at the beginning of the Stability ph _TODO: Add the release verification procedure_ ### Release steps + The release-manager needs to follow the following steps, to actually perform the release once ready. #### Create tarball @@ -73,9 +75,11 @@ The release-manager needs to follow the following steps, to actually perform the 4. create the tarball with the [release job in Jenkins](http://build.gluster.org/job/release/) #### Notify packagers + Notify the packagers that we need packages created. Provide the link to the source tarball from the Jenkins release job to the [packagers mailinglist](mailto:packaging@gluster.org). A list of the people involved in the package maintenance for the different distributions is in the `MAINTAINERS` file in the sources, all of them should be subscribed to the packagers mailinglist. #### Create a new Tracker Bug for the next release + The tracker bugs are used as guidance for blocker bugs and should get created when a release is made. To create one - Create a [new milestone](https://github.com/gluster/glusterfs/milestones/new) @@ -83,19 +87,21 @@ The tracker bugs are used as guidance for blocker bugs and should get created wh - issues that were not fixed in previous release, but in milestone should be moved to the new milestone. #### Create Release Announcement -(Major releases) -The Release Announcement is based off the release notes. This needs to indicate: - * What this release's overall focus is - * Which versions will stop receiving updates as of this release - * Links to the direct download folder - * Feature set - -Best practice as of version-8 is to create a collaborative version of the release notes that both the release manager and community lead work on together, and the release manager posts to the mailing lists (gluster-users@, gluster-devel@, announce@). +(Major releases) +The Release Announcement is based off the release notes. This needs to indicate: + +- What this release's overall focus is +- Which versions will stop receiving updates as of this release +- Links to the direct download folder +- Feature set + +Best practice as of version-8 is to create a collaborative version of the release notes that both the release manager and community lead work on together, and the release manager posts to the mailing lists (gluster-users@, gluster-devel@, announce@). #### Create Upgrade Guide -(Major releases) -If required, as in the case of a major release, an upgrade guide needs to be available at the same time as the release. + +(Major releases) +If required, as in the case of a major release, an upgrade guide needs to be available at the same time as the release. This document should go under the [Upgrade Guide](https://github.com/gluster/glusterdocs/tree/master/Upgrade-Guide) section of the [glusterdocs](https://github.com/gluster/glusterdocs) repository. #### Send Release Announcement @@ -103,13 +109,15 @@ This document should go under the [Upgrade Guide](https://github.com/gluster/glu Once the Fedora/EL RPMs are ready (and any others that are ready by then), send the release announcement: - Gluster Mailing lists - - [gluster-announce](https://lists.gluster.org/mailman/listinfo/announce/) - - [gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel) - - [gluster-users](https://lists.gluster.org/mailman/listinfo/gluster-users/) - -- [Gluster Blog](https://planet.gluster.org/) -The blog will automatically post to both Facebook and Twitter. Be careful with this! - - [Gluster Twitter account](https://twitter.com/gluster) - - [Gluster Facebook page](https://www.facebook.com/GlusterInc) -- [Gluster LinkedIn group](https://www.linkedin.com/company/gluster/about/) + - [gluster-announce](https://lists.gluster.org/mailman/listinfo/announce/) + - [gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel) + - [gluster-users](https://lists.gluster.org/mailman/listinfo/gluster-users/) + +- [Gluster Blog](https://planet.gluster.org/) + The blog will automatically post to both Facebook and Twitter. Be careful with this! + + - [Gluster Twitter account](https://twitter.com/gluster) + - [Gluster Facebook page](https://www.facebook.com/GlusterInc) + +- [Gluster LinkedIn group](https://www.linkedin.com/company/gluster/about/) diff --git a/docs/Contributors-Guide/Guidelines-For-Maintainers.md b/docs/Contributors-Guide/Guidelines-For-Maintainers.md index ad7366d..f3dc3a9 100644 --- a/docs/Contributors-Guide/Guidelines-For-Maintainers.md +++ b/docs/Contributors-Guide/Guidelines-For-Maintainers.md @@ -13,8 +13,10 @@ explicitly called out. ### Guidelines that Maintainers are expected to adhere to -1. Ensure qualitative and timely management of patches sent for review. -2. For merging patches into the repository, it is expected of maintainers to: +1. Ensure qualitative and timely management of patches sent for review. + +2. For merging patches into the repository, it is expected of maintainers to: + - Merge patches of owned components only. - Seek approvals from all maintainers before merging a patchset spanning multiple components. @@ -28,14 +30,15 @@ explicitly called out. quality of the codebase. - Not merge patches written by themselves until there is a +2 Code Review vote by other reviewers. -3. The responsibility of merging a patch into a release branch in normal - circumstances will be that of the release maintainer's. Only in exceptional - situations, maintainers & sub-maintainers will merge patches into a release - branch. -4. Release maintainers will ensure approval from appropriate maintainers before - merging a patch into a release branch. -5. Maintainers have a responsibility to the community, it is expected of - maintainers to: + +3. The responsibility of merging a patch into a release branch in normal + circumstances will be that of the release maintainer's. Only in exceptional + situations, maintainers & sub-maintainers will merge patches into a release + branch. +4. Release maintainers will ensure approval from appropriate maintainers before + merging a patch into a release branch. + +5. Maintainers have a responsibility to the community, it is expected of maintainers to: - Facilitate the community in all aspects. - Be very active and visible in the community. - Be objective and consider the larger interests of the community ahead of @@ -53,4 +56,3 @@ Any questions or comments regarding these guidelines can be routed to Github can be used to list patches that need reviews and/or can get merged from [Pull Requests](https://github.com/gluster/glusterfs/pulls) - diff --git a/docs/Contributors-Guide/Index.md b/docs/Contributors-Guide/Index.md index 4d91291..f198112 100644 --- a/docs/Contributors-Guide/Index.md +++ b/docs/Contributors-Guide/Index.md @@ -1,28 +1,23 @@ # Workflow Guide -Bug Handling ------------- +## Bug Handling -- [Bug reporting guidelines](./Bug-Reporting-Guidelines.md) - - Guideline for reporting a bug in GlusterFS -- [Bug triage guidelines](./Bug-Triage.md) - Guideline on how to - triage bugs for GlusterFS +- [Bug reporting guidelines](./Bug-Reporting-Guidelines.md) - + Guideline for reporting a bug in GlusterFS +- [Bug triage guidelines](./Bug-Triage.md) - Guideline on how to + triage bugs for GlusterFS -Release Process ---------------- +## Release Process -- [GlusterFS Release process](./GlusterFS-Release-process.md) - - Our release process / checklist +- [GlusterFS Release process](./GlusterFS-Release-process.md) - + Our release process / checklist -Patch Acceptance ----------------- +## Patch Acceptance -- The [Guidelines For Maintainers](./Guidelines-For-Maintainers.md) explains when - maintainers can merge patches. +- The [Guidelines For Maintainers](./Guidelines-For-Maintainers.md) explains when + maintainers can merge patches. -Blogging about gluster ----------------- - -- The [Adding your gluster blog](./Adding-your-blog.md) explains how to add your -gluster blog to Community blogger. +## Blogging about gluster +- The [Adding your gluster blog](./Adding-your-blog.md) explains how to add your + gluster blog to Community blogger. diff --git a/docs/Developer-guide/Backport-Guidelines.md b/docs/Developer-guide/Backport-Guidelines.md index f013338..068f9af 100644 --- a/docs/Developer-guide/Backport-Guidelines.md +++ b/docs/Developer-guide/Backport-Guidelines.md @@ -1,4 +1,5 @@ # Backport Guidelines + In GlusterFS project, as a policy, any new change, bug fix, etc., are to be fixed in 'devel' branch before release branches. When a bug is fixed in the devel branch, it might be desirable or necessary in release branch. @@ -9,17 +10,17 @@ understand how to request for backport from community. ## Policy -* No feature from devel would be backported to the release branch -* CVE ie., security vulnerability [(listed on the CVE database)](https://cve.mitre.org/cve/search_cve_list.html) -reported in the existing releases would be backported, after getting fixed -in devel branch. -* Only topics which bring about data loss or, unavailability would be -backported to the release. -* For any other issues, the project recommends that the installation be -upgraded to a newer release where the specific bug has been addressed. +- No feature from devel would be backported to the release branch +- CVE ie., security vulnerability [(listed on the CVE database)](https://cve.mitre.org/cve/search_cve_list.html) + reported in the existing releases would be backported, after getting fixed + in devel branch. +- Only topics which bring about data loss or, unavailability would be + backported to the release. +- For any other issues, the project recommends that the installation be + upgraded to a newer release where the specific bug has been addressed. - Gluster provides 'rolling' upgrade support, i.e., one can upgrade their -server version without stopping the application I/O, so we recommend migrating -to higher version. + server version without stopping the application I/O, so we recommend migrating + to higher version. ## Things to pay attention to while backporting a patch. @@ -27,12 +28,10 @@ If your patch meets the criteria above, or you are a user, who prefer to have a fix backported, because your current setup is facing issues, below are the steps you need to take care to submit a patch on release branch. -* The patch should have same 'Change-Id'. - +- The patch should have same 'Change-Id'. ### How to contact release owners? All release owners are part of 'gluster-devel@gluster.org' mailing list. Please write your expectation from next release there, so we can take that to consideration while making the release. - diff --git a/docs/Developer-guide/Building-GlusterFS.md b/docs/Developer-guide/Building-GlusterFS.md index 1a9c9fd..4164761 100644 --- a/docs/Developer-guide/Building-GlusterFS.md +++ b/docs/Developer-guide/Building-GlusterFS.md @@ -7,9 +7,11 @@ This page describes how to build and install GlusterFS. The following packages are required for building GlusterFS, - GNU Autotools - - Automake - - Autoconf - - Libtool + + - Automake + - Autoconf + - Libtool + - lex (generally flex) - GNU Bison - OpenSSL @@ -258,9 +260,9 @@ cd extras/LinuxRPM make glusterrpms ``` -This will create rpms from the source in 'extras/LinuxRPM'. *(Note: You +This will create rpms from the source in 'extras/LinuxRPM'. _(Note: You will need to install the rpmbuild requirements including rpmbuild and -mock)*
+mock)_
For CentOS / Enterprise Linux 8 the dependencies can be installed via: ```console diff --git a/docs/Developer-guide/Developers-Index.md b/docs/Developer-guide/Developers-Index.md index 3f324fa..669576f 100644 --- a/docs/Developer-guide/Developers-Index.md +++ b/docs/Developer-guide/Developers-Index.md @@ -1,8 +1,8 @@ -Developers -========== +# Developers ### Contributing to the Gluster community -------------------------------------- + +--- Are you itching to send in patches and participate as a developer in the Gluster community? Here are a number of starting points for getting @@ -10,36 +10,37 @@ involved. All you need is your 'github' account to be handy. Remember that, [Gluster community](https://github.com/gluster) has multiple projects, each of which has its own way of handling PRs and patches. Decide on which project you want to contribute. Below documents are mostly about 'GlusterFS' project, which is the core of Gluster Community. -Workflow --------- +## Workflow -- [Simplified Developer Workflow](./Simplified-Development-Workflow.md) - - A simpler and faster intro to developing with GlusterFS, than the document below -- [Developer Workflow](./Development-Workflow.md) - - Covers detail about requirements from a patch; tools and toolkits used by developers. - This is recommended reading in order to begin contributions to the project. -- [GD2 Developer Workflow](https://github.com/gluster/glusterd2/blob/master/doc/development-guide.md) - - Helps in on-boarding developers to contribute in GlusterD2 project. +- [Simplified Developer Workflow](./Simplified-Development-Workflow.md) -Compiling Gluster ------------------ + - A simpler and faster intro to developing with GlusterFS, than the document below -- [Building GlusterFS](./Building-GlusterFS.md) - How to compile - Gluster from source code. +- [Developer Workflow](./Development-Workflow.md) -Developing ----------- + - Covers detail about requirements from a patch; tools and toolkits used by developers. + This is recommended reading in order to begin contributions to the project. -- [Projects](./Projects.md) - Ideas for projects you could - create -- [Fixing issues reported by tools for static code - analysis](./Fixing-issues-reported-by-tools-for-static-code-analysis.md) - - This is a good starting point for developers to fix bugs in - GlusterFS project. +- [GD2 Developer Workflow](https://github.com/gluster/glusterd2/blob/master/doc/development-guide.md) -Releases and Backports ----------------------- + - Helps in on-boarding developers to contribute in GlusterD2 project. -- [Backport Guidelines](./Backport-Guidelines.md) describe the steps that branches too. +## Compiling Gluster + +- [Building GlusterFS](./Building-GlusterFS.md) - How to compile + Gluster from source code. + +## Developing + +- [Projects](./Projects.md) - Ideas for projects you could + create +- [Fixing issues reported by tools for static code + analysis](./Fixing-issues-reported-by-tools-for-static-code-analysis.md) + + - This is a good starting point for developers to fix bugs in GlusterFS project. + +## Releases and Backports + +- [Backport Guidelines](./Backport-Guidelines.md) describe the steps that branches too. Some more GlusterFS Developer documentation can be found [in glusterfs documentation directory](https://github.com/gluster/glusterfs/tree/master/doc/developer-guide) diff --git a/docs/Developer-guide/Development-Workflow.md b/docs/Developer-guide/Development-Workflow.md index fa6096e..585f712 100644 --- a/docs/Developer-guide/Development-Workflow.md +++ b/docs/Developer-guide/Development-Workflow.md @@ -1,12 +1,10 @@ -Development workflow of Gluster -================================ +# Development workflow of Gluster This document provides a detailed overview of the development model followed by the GlusterFS project. For a simpler overview visit [Simplified development workflow](./Simplified-Development-Workflow.md). -##Basics --------- +## Basics The GlusterFS development model largely revolves around the features and functionality provided by Git version control system, Github and Jenkins @@ -31,8 +29,7 @@ all builds and tests can be viewed at 'regression' job which is designed to execute test scripts provided as part of the code change. -##Preparatory Setup -------------------- +## Preparatory Setup Here is a list of initial one-time steps before you can start hacking on code. @@ -46,9 +43,9 @@ Fork [GlusterFS repository](https://github.com/gluster/glusterfs/fork) Get yourself a working tree by cloning the development repository from ```console -# git clone git@github.com:${username}/glusterfs.git -# cd glusterfs/ -# git remote add upstream git@github.com:gluster/glusterfs.git +git clone git@github.com:${username}/glusterfs.git +cd glusterfs/ +git remote add upstream git@github.com:gluster/glusterfs.git ``` ### Preferred email and set username @@ -69,13 +66,14 @@ get alerts. Set up a filter rule in your mail client to tag or classify emails with the header + ```text list: ``` + as mails originating from the github system. -##Development & Other flows ---------------------------- +## Development & Other flows ### Issue @@ -90,17 +88,17 @@ as mails originating from the github system. - Make sure clang-format is installed and is run on the patch. ### Keep up-to-date + - GlusterFS is a large project with many developers, so there would be one or the other patch everyday. - It is critical for developer to be up-to-date with devel repo to be Conflict-Free when PR is opened. - Git provides many options to keep up-to-date, below is one of them ```console -# git fetch upstream -# git rebase upstream/devel +git fetch upstream +git rebase upstream/devel ``` -##Branching policy ------------------- +## Branching policy This section describes both, the branching policies on the public repo as well as the suggested best-practice for local branching @@ -130,13 +128,12 @@ change. The name of the branch on your personal fork can start with issueNNNN, followed by anything of your choice. If you are submitting changes to the devel branch, first create a local task branch like this - -```console +```{ .console .no-copy } # git checkout -b issueNNNN upstream/main ... ``` -##Building ----------- +## Building ### Environment Setup @@ -147,18 +144,19 @@ refer : [Building GlusterFS](./Building-GlusterFS.md) Once the required packages are installed for your appropiate system, generate the build configuration: + ```console -# ./autogen.sh -# ./configure --enable-fusermount +./autogen.sh +./configure --enable-fusermount ``` ### Build and install + ```console -# make && make install +make && make install ``` -##Commit policy / PR description --------------------------------- +## Commit policy / PR description Typically you would have a local branch per task. You will need to sign-off your commit (git commit -s) before sending the @@ -169,22 +167,21 @@ CONTRIBUTING file available in the repository root. Provide a meaningful commit message. Your commit message should be in the following format -- A short one-line title of format 'component: title', describing what the patch accomplishes -- An empty line following the subject -- Situation necessitating the patch -- Description of the code changes -- Reason for doing it this way (compared to others) -- Description of test cases -- When you open a PR, having a reference Issue for the commit is mandatory in GlusterFS. -- Commit message can have, either Fixes: #NNNN or Updates: #NNNN in a separate line in the commit message. - Here, NNNN is the Issue ID in glusterfs repository. -- Each commit needs the author to have the 'Signed-off-by: Name ' line. - Can do this by -s option for git commit. -- If the PR is not ready for review, apply the label work-in-progress. - Check the availability of "Draft PR" is present for you, if yes, use that instead. +- A short one-line title of format 'component: title', describing what the patch accomplishes +- An empty line following the subject +- Situation necessitating the patch +- Description of the code changes +- Reason for doing it this way (compared to others) +- Description of test cases +- When you open a PR, having a reference Issue for the commit is mandatory in GlusterFS. +- Commit message can have, either Fixes: #NNNN or Updates: #NNNN in a separate line in the commit message. + Here, NNNN is the Issue ID in glusterfs repository. +- Each commit needs the author to have the 'Signed-off-by: Name ' line. + Can do this by -s option for git commit. +- If the PR is not ready for review, apply the label work-in-progress. + Check the availability of "Draft PR" is present for you, if yes, use that instead. -##Push the change ------------------ +## Push the change After doing the local commit, it is time to submit the code for review. There is a script available inside glusterfs.git called rfc.sh. It is @@ -192,31 +189,34 @@ recommended you keep pushing to your repo every day, so you don't loose any work. You can submit your changes for review by simply executing ```console -# ./rfc.sh +./rfc.sh ``` + or + ```console -# git push origin HEAD:issueNNN +git push origin HEAD:issueNNN ``` This script rfc.sh does the following: -- The first time it is executed, it downloads a git hook from - and sets it up - locally to generate a Change-Id: tag in your commit message (if it - was not already generated.) -- Rebase your commit against the latest upstream HEAD. This rebase - also causes your commits to undergo massaging from the just - downloaded commit-msg hook. -- Prompt for a Reference Id for each commit (if it was not already provided) - and include it as a "fixes: #n" tag in the commit log. You can just hit - at this prompt if your submission is purely for review - purposes. -- Push the changes for review. On a successful push, you will see a URL pointing to - the change in [Pull requests](https://github.com/gluster/glusterfs/pulls) section. +- The first time it is executed, it downloads a git hook from + and sets it up + locally to generate a Change-Id: tag in your commit message (if it + was not already generated.) +- Rebase your commit against the latest upstream HEAD. This rebase + also causes your commits to undergo massaging from the just + downloaded commit-msg hook. +- Prompt for a Reference Id for each commit (if it was not already provided) + and include it as a "fixes: #n" tag in the commit log. You can just hit + at this prompt if your submission is purely for review + purposes. +- Push the changes for review. On a successful push, you will see a URL pointing to + the change in [Pull requests](https://github.com/gluster/glusterfs/pulls) section. ## Test cases and Verification ------------------------------- + +--- ### Auto-triggered tests @@ -258,13 +258,13 @@ To check and run all regression tests locally, run the below script from glusterfs root directory. ```console -# ./run-tests.sh +./run-tests.sh ``` To run a single regression test locally, run the below command. ```console -# prove -vf +prove -vf ``` **NOTE:** The testing framework needs perl-Test-Harness package to be installed. @@ -284,18 +284,17 @@ of the feature. Please go through glusto-tests project to understand more information on how to write and execute the tests in glusto. 1. Extend/Modify old test cases in existing scripts - This is typically -when present behavior (default values etc.) of code is changed. + when present behavior (default values etc.) of code is changed. 2. No test cases - This is typically when a code change is trivial -(e.g. fixing typos in output strings, code comments). + (e.g. fixing typos in output strings, code comments). 3. Only test case and no code change - This is typically when we are -adding test cases to old code (already existing before this regression -test policy was enforced). More details on how to work with test case -scripts can be found in tests/README. + adding test cases to old code (already existing before this regression + test policy was enforced). More details on how to work with test case + scripts can be found in tests/README. -##Reviewing / Commenting ------------------------- +## Reviewing / Commenting Code review with Github is relatively easy compared to other available tools. Each change is presented as multiple files and each file can be @@ -304,8 +303,7 @@ on each line by clicking on '+' icon and writing in your comments in the text box. Such in-line comments are saved as drafts, till you finally publish them by Starting a Review. -##Incorporate, rfc.sh, Reverify --------------------------------------- +## Incorporate, rfc.sh, Reverify Code review comments are notified via email. After incorporating the changes in code, you can mark each of the inline comments as 'done' @@ -313,8 +311,9 @@ changes in code, you can mark each of the inline comments as 'done' commits in the same branch with - ```console -# git commit -a -s +git commit -a -s ``` + Push the commit by executing rfc.sh. If your previous push was an "rfc" push (i.e, without a Issue Id) you will be prompted for a Issue Id again. You can re-push an rfc change without any other code change too @@ -332,8 +331,7 @@ comments can be made on the new patch as well, and the same cycle repeats. If no further changes are necessary, the reviewer can approve the patch. -##Submission Qualifiers ------------------------ +## Submission Qualifiers GlusterFS project follows 'Squash and Merge' method. @@ -350,8 +348,7 @@ The project maintainer will merge the changes once a patch meets these qualifiers. If you feel there is delay, feel free to add a comment, discuss the same in Slack channel, or send email. -##Submission Disqualifiers --------------------------- +## Submission Disqualifiers - +2 : is equivalent to "Approve" from the people in the maintainer's group. - +1 : can be given by a maintainer/reviewer by explicitly stating that in the comment. diff --git a/docs/Developer-guide/Easy-Fix-Bugs.md b/docs/Developer-guide/Easy-Fix-Bugs.md index 96db08c..54ec30e 100644 --- a/docs/Developer-guide/Easy-Fix-Bugs.md +++ b/docs/Developer-guide/Easy-Fix-Bugs.md @@ -2,8 +2,8 @@ Fixing easy issues is an excellent method to start contributing patches to Gluster. -Sometimes an *Easy Fix* issue has a patch attached. In those cases, -the *Patch* keyword has been added to the bug. These bugs can be +Sometimes an _Easy Fix_ issue has a patch attached. In those cases, +the _Patch_ keyword has been added to the bug. These bugs can be used by new contributors that would like to verify their workflow. [Bug 1099645](https://bugzilla.redhat.com/1099645) is one example of those. @@ -11,12 +11,12 @@ All such issues can be found [here](https://github.com/gluster/glusterfs/labels/ ### Guidelines for new comers -- While trying to write a patch, do not hesitate to ask questions. -- If something in the documentation is unclear, we do need to know so - that we can improve it. -- There are no stupid questions, and it's more stupid to not ask - questions that others can easily answer. Always assume that if you - have a question, someone else would like to hear the answer too. +- While trying to write a patch, do not hesitate to ask questions. +- If something in the documentation is unclear, we do need to know so + that we can improve it. +- There are no stupid questions, and it's more stupid to not ask + questions that others can easily answer. Always assume that if you + have a question, someone else would like to hear the answer too. [Reach out](https://www.gluster.org/community/) to the developers in #gluster on [Gluster Slack](https://gluster.slack.com) channel, or on diff --git a/docs/Developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md b/docs/Developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md index 268e98e..e0d7769 100644 --- a/docs/Developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md +++ b/docs/Developer-guide/Fixing-issues-reported-by-tools-for-static-code-analysis.md @@ -1,7 +1,6 @@ -Static Code Analysis Tools --------------------------- +## Static Code Analysis Tools -Bug fixes for issues reported by *Static Code Analysis Tools* should +Bug fixes for issues reported by _Static Code Analysis Tools_ should follow [Development Work Flow](./Development-Workflow.md) ### Coverity @@ -9,49 +8,48 @@ follow [Development Work Flow](./Development-Workflow.md) GlusterFS is part of [Coverity's](https://scan.coverity.com/) scan program. -- To see Coverity issues you have to be a member of the GlusterFS - project in Coverity scan website. -- Here is the link to [Coverity scan website](https://scan.coverity.com/projects/987) -- Go to above link and subscribe to GlusterFS project (as - contributor). It will send a request to Admin for including you in - the Project. -- Once admins for the GlusterFS Coverity scan approve your request, - you will be able to see the defects raised by Coverity. -- [Issue #1060](https://github.com/gluster/glusterfs/issues/1060) - can be used as a umbrella bug for Coverity issues in master - branch unless you are trying to fix a specific issue. -- When you decide to work on some issue, please assign it to your name - in the same Coverity website. So that we don't step on each others - work. -- When marking a bug intentional in Coverity scan website, please put - an explanation for the same. So that it will help others to - understand the reasoning behind it. +- To see Coverity issues you have to be a member of the GlusterFS + project in Coverity scan website. +- Here is the link to [Coverity scan website](https://scan.coverity.com/projects/987) +- Go to above link and subscribe to GlusterFS project (as + contributor). It will send a request to Admin for including you in + the Project. +- Once admins for the GlusterFS Coverity scan approve your request, + you will be able to see the defects raised by Coverity. +- [Issue #1060](https://github.com/gluster/glusterfs/issues/1060) + can be used as a umbrella bug for Coverity issues in master + branch unless you are trying to fix a specific issue. +- When you decide to work on some issue, please assign it to your name + in the same Coverity website. So that we don't step on each others + work. +- When marking a bug intentional in Coverity scan website, please put + an explanation for the same. So that it will help others to + understand the reasoning behind it. -*If you have more questions please send it to +_If you have more questions please send it to [gluster-devel](https://lists.gluster.org/mailman/listinfo/gluster-devel) mailing -list* +list_ ### CPP Check Cppcheck is available in Fedora and EL's EPEL repo -- Install Cppcheck +- Install Cppcheck - # dnf install cppcheck + dnf install cppcheck -- Clone GlusterFS code +- Clone GlusterFS code - # git clone https://github.com/gluster/glusterfs + git clone https://github.com/gluster/glusterfs -- Run Cpp check - - # cppcheck glusterfs/ 2>cppcheck.log +- Run Cpp check + cppcheck glusterfs/ 2>cppcheck.log ### Clang-Scan Daily Runs We have daily runs of static source code analysis tool clang-scan on -the glusterfs sources. There are daily analyses of the master and +the glusterfs sources. There are daily analyses of the master and on currently supported branches. Results are posted at diff --git a/docs/Developer-guide/Projects.md b/docs/Developer-guide/Projects.md index e394315..f204491 100644 --- a/docs/Developer-guide/Projects.md +++ b/docs/Developer-guide/Projects.md @@ -3,9 +3,7 @@ This page contains a list of project ideas which will be suitable for students (for GSOC, internship etc.) -Projects/Features which needs contributors ------------------------------------------- - +## Projects/Features which needs contributors ### RIO @@ -13,27 +11,23 @@ Issue: https://github.com/gluster/glusterfs/issues/243 This is a new distribution logic, which can scale Gluster to 1000s of nodes. - ### Composition xlator for small files Merge small files into a designated large file using our own custom semantics. This can improve our small file performance. - ### Path based geo-replication Issue: https://github.com/gluster/glusterfs/issues/460 This would allow remote volume to be of different type (NFS/S3 etc etc) too. - ### Project Quota support Issue: https://github.com/gluster/glusterfs/issues/184 This will make Gluster's Quota faster, and also provide desired behavior. - ### Cluster testing framework based on gluster-tester Repo: https://github.com/aravindavk/gluster-tester diff --git a/docs/Developer-guide/Simplified-Development-Workflow.md b/docs/Developer-guide/Simplified-Development-Workflow.md index c9a9cab..f3261f7 100644 --- a/docs/Developer-guide/Simplified-Development-Workflow.md +++ b/docs/Developer-guide/Simplified-Development-Workflow.md @@ -1,5 +1,4 @@ -Simplified development workflow for GlusterFS -============================================= +# Simplified development workflow for GlusterFS This page gives a simplified model of the development workflow used by the GlusterFS project. This will give the steps required to get a patch @@ -8,8 +7,7 @@ accepted into the GlusterFS source. Visit [Development Work Flow](./Development-Workflow.md) a more detailed description of the workflow. -##Initial preparation ---------------------- +## Initial preparation The GlusterFS development workflow revolves around [GitHub](http://github.com/gluster/glusterfs/) and @@ -17,13 +15,15 @@ The GlusterFS development workflow revolves around Using these both tools requires some initial preparation. ### Get the source + Git clone the GlusterFS source using -```console - git clone git@github.com:${username}/glusterfs.git - cd glusterfs/ - git remote add upstream git@github.com:gluster/glusterfs.git +```{ .console .no-copy } +git clone git@github.com:${username}/glusterfs.git +cd glusterfs/ +git remote add upstream git@github.com:gluster/glusterfs.git ``` + This will clone the GlusterFS source into a subdirectory named glusterfs with the devel branch checked out. @@ -34,7 +34,7 @@ distribution specific package manger to install git. After installation configure git. At the minimum, set a git user email. To set the email do, -```console +```{ .console .no-copy } git config --global user.name git config --global user.email ``` @@ -43,8 +43,7 @@ Next, install the build requirements for GlusterFS. Refer [Building GlusterFS - Build Requirements](./Building-GlusterFS.md#Build Requirements) for the actual requirements. -##Actual development --------------------- +## Actual development The commands in this section are to be run inside the glusterfs source directory. @@ -55,23 +54,25 @@ It is recommended to use separate local development branches for each change you want to contribute to GlusterFS. To create a development branch, first checkout the upstream branch you want to work on and update it. More details on the upstream branching model for GlusterFS -can be found at [Development Work Flow - Branching\_policy](./Development-Workflow.md#branching-policy). +can be found at [Development Work Flow - Branching_policy](./Development-Workflow.md#branching-policy). For example if you want to develop on the devel branch, ```console -# git checkout devel -# git pull +git checkout devel +git pull ``` Now, create a new branch from devel and switch to the new branch. It is recommended to have descriptive branch names. Do, -```console +```{ .console .no-copy } git branch issueNNNN git checkout issueNNNN ``` + or, -```console + +```{ .console .no-copy } git checkout -b issueNNNN upstream/main ``` @@ -100,8 +101,8 @@ working GlusterFS installation and needs to be run as root. To run the regression test suite, do ```console -# make install -# ./run-tests.sh +make install +./run-tests.sh ``` or, After uploading the patch The regression tests would be triggered @@ -113,7 +114,7 @@ If you haven't broken anything, you can now commit your changes. First identify the files that you modified/added/deleted using git-status and stage these files. -```console +```{ .console .no-copy } git status git add ``` @@ -121,7 +122,7 @@ git add Now, commit these changes using ```console -# git commit -s +git commit -s ``` Provide a meaningful commit message. The commit message policy is @@ -134,18 +135,19 @@ sign-off the commit with your configured email. To submit your change for review, run the rfc.sh script, ```console -# ./rfc.sh +./rfc.sh ``` + or -```console + +```{ .console .no-copy } git push origin HEAD:issueNNN ``` More details on the rfc.sh script are available at [Development Work Flow - rfc.sh](./Development-Workflow.md#rfc.sh). -##Review process ----------------- +## Review process Your change will now be reviewed by the GlusterFS maintainers and component owners. You can follow and take part in the review process @@ -186,8 +188,9 @@ review comments. Build and test to see if the new changes are working. Stage your changes and commit your new changes in new commits using, ```console -# git commit -a -s +git commit -a -s ``` + Now you can resubmit the commit for review using the rfc.sh script or git push. The formal review process could take a long time. To increase chances diff --git a/docs/Developer-guide/compiling-rpms.md b/docs/Developer-guide/compiling-rpms.md index f28933d..ab4783f 100644 --- a/docs/Developer-guide/compiling-rpms.md +++ b/docs/Developer-guide/compiling-rpms.md @@ -1,5 +1,4 @@ -How to compile GlusterFS RPMs from git source, for RHEL/CentOS, and Fedora --------------------------------------------------------------------------- +## How to compile GlusterFS RPMs from git source, for RHEL/CentOS, and Fedora Creating rpm's of GlusterFS from git source is fairly easy, once you know the steps. @@ -21,13 +20,13 @@ Specific instructions for compiling are below. If you're using: ### Preparation steps for Fedora 16-20 (only) -1. Install gcc, the python development headers, and python setuptools: +1. Install gcc, the python development headers, and python setuptools: - # sudo yum -y install gcc python-devel python-setuptools + sudo yum -y install gcc python-devel python-setuptools -2. If you're compiling GlusterFS version 3.4, then install python-swiftclient. Other GlusterFS versions don't need it: +2. If you're compiling GlusterFS version 3.4, then install python-swiftclient. Other GlusterFS versions don't need it: - # sudo easy_install simplejson python-swiftclient + sudo easy_install simplejson python-swiftclient Now follow through with the **Common Steps** part below. @@ -35,15 +34,15 @@ Now follow through with the **Common Steps** part below. You'll need EPEL installed first and some CentOS-specific packages. The commands below will get that done for you. After that, follow through the "Common steps" section. -1. Install EPEL first: +1. Install EPEL first: - # curl -OL `[`http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm`](http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm) - # sudo yum -y install epel-release-5-4.noarch.rpm --nogpgcheck + curl -OL http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm + sudo yum -y install epel-release-5-4.noarch.rpm --nogpgcheck -2. Install the packages required only on CentOS 5.x: +2. Install the packages required only on CentOS 5.x: - # sudo yum -y install buildsys-macros gcc ncurses-devel \ - python-ctypes python-sphinx10 redhat-rpm-config + sudo yum -y install buildsys-macros gcc ncurses-devel \ + python-ctypes python-sphinx10 redhat-rpm-config Now follow through with the **Common Steps** part below. @@ -51,32 +50,31 @@ Now follow through with the **Common Steps** part below. You'll need EPEL installed first and some CentOS-specific packages. The commands below will get that done for you. After that, follow through the "Common steps" section. -1. Install EPEL first: +1. Install EPEL first: - # sudo yum -y install `[`http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm`](http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm) + sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm -2. Install the packages required only on CentOS: +2. Install the packages required only on CentOS: - # sudo yum -y install python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + sudo yum -y install python-webob1.0 python-paste-deploy1.5 python-sphinx10 redhat-rpm-config Now follow through with the **Common Steps** part below. - ### Preparation steps for CentOS 8.x (only) -You'll need EPEL installed and then the powertools package enabled. +You'll need EPEL installed and then the powertools package enabled. -1. Install EPEL first: - - # sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm +1. Install EPEL first: -2. Enable the PowerTools repo and install CentOS 8.x specific packages for building the rpms. + sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm - # sudo yum --enablerepo=PowerTools install automake autoconf libtool flex bison openssl-devel \ - libxml2-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel \ - glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel \ - redhat-rpm-config rpcgen libtirpc-devel make python3-devel rsync libuuid-devel \ - rpm-build dbench perl-Test-Harness attr libcurl-devel selinux-policy-devel -y +2. Enable the PowerTools repo and install CentOS 8.x specific packages for building the rpms. + + sudo yum --enablerepo=PowerTools install automake autoconf libtool flex bison openssl-devel \ + libxml2-devel libaio-devel libibverbs-devel librdmacm-devel readline-devel lvm2-devel \ + glib2-devel userspace-rcu-devel libcmocka-devel libacl-devel sqlite-devel fuse-devel \ + redhat-rpm-config rpcgen libtirpc-devel make python3-devel rsync libuuid-devel \ + rpm-build dbench perl-Test-Harness attr libcurl-devel selinux-policy-devel -y Now follow through from Point 2 in the **Common Steps** part below. @@ -84,14 +82,14 @@ Now follow through from Point 2 in the **Common Steps** part below. You'll need EPEL installed first and some RHEL specific packages. The 2 commands below will get that done for you. After that, follow through the "Common steps" section. -1. Install EPEL first: +1. Install EPEL first: - # sudo yum -y install `[`http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm`](http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm) + sudo yum -y install http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm -2. Install the packages required only on RHEL: +2. Install the packages required only on RHEL: - # sudo yum -y --enablerepo=rhel-6-server-optional-rpms install python-webob1.0 \ - python-paste-deploy1.5 python-sphinx10 redhat-rpm-config + sudo yum -y --enablerepo=rhel-6-server-optional-rpms install python-webob1.0 \ + python-paste-deploy1.5 python-sphinx10 redhat-rpm-config Now follow through with the **Common Steps** part below. @@ -104,64 +102,65 @@ These steps are for both Fedora and RHEL/CentOS. At the end you'll have the comp - If you're on RHEL/CentOS 5.x and get a message about lvm2-devel not being available, it's ok. You can ignore it. :) - If you're on RHEL/CentOS 6.x and get any messages about python-eventlet, python-netifaces, python-sphinx and/or pyxattr not being available, it's ok. You can ignore them. :) - If you're on CentOS 8.x, you can skip step 1 and start from step 2. Also, for CentOS 8.x, the steps have been -tested for the master branch. It is unknown if it would work for older branches. + tested for the master branch. It is unknown if it would work for older branches.
-1. Install the needed packages +1. Install the needed packages - # sudo yum -y --disablerepo=rhs* --enablerepo=*optional-rpms install git autoconf \ - automake bison dos2unix flex fuse-devel glib2-devel libaio-devel \ - libattr-devel libibverbs-devel librdmacm-devel libtool libxml2-devel lvm2-devel make \ - openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces \ - python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel \ - rpm-build systemtap-sdt-devel tar libcmocka-devel + sudo yum -y --disablerepo=rhs* --enablerepo=*optional-rpms install git autoconf \ + automake bison dos2unix flex fuse-devel glib2-devel libaio-devel \ + libattr-devel libibverbs-devel librdmacm-devel libtool libxml2-devel lvm2-devel make \ + openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces \ + python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel \ + rpm-build systemtap-sdt-devel tar libcmocka-devel -2. Clone the GlusterFS git repository +2. Clone the GlusterFS git repository - # git clone `[`git://git.gluster.org/glusterfs`](git://git.gluster.org/glusterfs) - # cd glusterfs + git clone git://git.gluster.org/glusterfs + cd glusterfs -3. Choose which branch to compile +3. Choose which branch to compile If you want to compile the latest development code, you can skip this step and go on to the next one. :) If instead, you want to compile the code for a specific release of GlusterFS (such as v3.4), get the list of release names here: - # git branch -a | grep release - remotes/origin/release-2.0 - remotes/origin/release-3.0 - remotes/origin/release-3.1 - remotes/origin/release-3.2 - remotes/origin/release-3.3 - remotes/origin/release-3.4 - remotes/origin/release-3.5 + # git branch -a | grep release + remotes/origin/release-2.0 + remotes/origin/release-3.0 + remotes/origin/release-3.1 + remotes/origin/release-3.2 + remotes/origin/release-3.3 + remotes/origin/release-3.4 + remotes/origin/release-3.5 Then switch to the correct release using the git "checkout" command, and the name of the release after the "remotes/origin/" bit from the list above: - # git checkout release-3.4 + git checkout release-3.4 **NOTE -** The CentOS 5.x instructions have only been tested for the master branch in GlusterFS git. It is unknown (yet) if they work for branches older than release-3.5. - --- - If you are compiling the latest development code you can skip steps **4** and **5**. Instead, you can run the below command and you will get the RPMs. + *** - - # extras/LinuxRPM/make_glusterrpms - --- + If you are compiling the latest development code you can skip steps **4** and **5**. Instead, you can run the below command and you will get the RPMs. -4. Configure and compile GlusterFS + extras/LinuxRPM/make_glusterrpms + + *** + +4. Configure and compile GlusterFS Now you're ready to compile Gluster: - # ./autogen.sh - # ./configure --enable-fusermount - # make dist + ./autogen.sh + ./configure --enable-fusermount + make dist -5. Create the GlusterFS RPMs +5. Create the GlusterFS RPMs - # cd extras/LinuxRPM - # make glusterrpms + cd extras/LinuxRPM + make glusterrpms That should complete with no errors, leaving you with a directory containing the RPMs. diff --git a/docs/Developer-guide/coredump-on-customer-setup.md b/docs/Developer-guide/coredump-on-customer-setup.md index 7c8ae174..734c2b5 100644 --- a/docs/Developer-guide/coredump-on-customer-setup.md +++ b/docs/Developer-guide/coredump-on-customer-setup.md @@ -1,47 +1,52 @@ # Get core dump on a customer set up without killing the process ### Why do we need this? + Finding the root cause of an issue that occurred in the customer/production setup is a challenging task. Most of the time we cannot replicate/setup the environment and scenario which is leading to the issue on our test setup. In such cases, we got to grab most of the information from the system where the problem has occurred. -
+ ### What information we look for and also useful? + The information like a core dump is very helpful to catch the root cause of an issue by adding ASSERT() in the code at the places where we feel something is wrong and install the custom build on the affected setup. But the issue is ASSERT() would kill the process and produce the core dump. -
+ ### Is it a good idea to do ASSERT() on customer setup? -Remember we are seeking help from customer setup, they unlikely agree to kill the process and produce the + +Remember we are seeking help from customer setup, they unlikely agree to kill the process and produce the core dump for us to root cause it. It affects the customer’s business and nobody agrees with this proposal. -
+ ### What if we have a way to produce a core dump without a kill? -Yes, Glusterfs provides a way to do this. Gluster has customized ASSERT() i.e GF_ASSERT() in place which helps -in producing the core dump without killing the associated process and also provides a script which can be run on -the customer set up that produces the core dump without harming the running process (This presumes we already have -GF_ASSERT() at the expected place in the current build running on customer setup. If not, we need to install custom + +Yes, Glusterfs provides a way to do this. Gluster has customized ASSERT() i.e GF_ASSERT() in place which helps +in producing the core dump without killing the associated process and also provides a script which can be run on +the customer set up that produces the core dump without harming the running process (This presumes we already have +GF_ASSERT() at the expected place in the current build running on customer setup. If not, we need to install custom build on that setup by adding GF_ASSERT()). -
+ ### Is GF_ASSERT() newly introduced in Gluster code? -No. GF_ASSERT() is already there in the codebase before this improvement. In the debug build, GF_ASSERT() kills the -process and produces the core dump but in the production build, it just logs the error and moves on. What we have done -is we just changed the implementation of the code and now in production build also we get the core dump but the process + +No. GF_ASSERT() is already there in the codebase before this improvement. In the debug build, GF_ASSERT() kills the +process and produces the core dump but in the production build, it just logs the error and moves on. What we have done +is we just changed the implementation of the code and now in production build also we get the core dump but the process won’t be killed. The code places where GF_ASSERT() is not covered, please add it as per the requirement. -
## Here are the steps to achieve the goal: -- Add GF_ASSERT() in the Gluster code path where you expect something wrong is happening. -- Build the Gluster code, install and mount the Gluster volume (For detailed steps refer: Gluster quick start guide). -- Now, in the other terminal, run the gfcore.py script - `# ./extras/debug/gfcore.py $PID 1 /tmp/` (PID of the gluster process you are interested in, got it by `ps -ef | grep gluster` - in the previous step. For more details, check `# ./extras/debug/gfcore.py --help`) -- Hit the code path where you have introduced GF_ASSERT(). If GF_ASSERT() is in fuse_write() path, you can hit the code - path by writing on to a file present under Gluster moun. Ex: `# dd if=/dev/zero of=/mnt/glustrefs/abcd bs=1M count=1` - where `/mnt/glusterfs` is the gluster mount -- Go to the terminal where the gdb is running (step 3) and observe that the gdb process is terminated -- Go to the directory where the core-dump is produced. Default would be present working directory. -- Access the core dump using gdb Ex: `# gdb -ex "core-file $GFCORE_FILE" $GLUSTER_BINARY` - (1st arg would be core file name and 2nd arg is o/p of file command in the previous step) -- Observe that the Gluster process is unaffected by checking its process state. Check pid status using `ps -ef | grep gluster` -
-Thanks, Xavi Hernandez(jahernan@redhat.com) for the idea. This will ease many Gluster developer's/maintainer’s life. + +- Add GF_ASSERT() in the Gluster code path where you expect something wrong is happening. +- Build the Gluster code, install and mount the Gluster volume (For detailed steps refer: Gluster quick start guide). +- Now, in the other terminal, run the gfcore.py script + `# ./extras/debug/gfcore.py $PID 1 /tmp/` (PID of the gluster process you are interested in, got it by `ps -ef | grep gluster` + in the previous step. For more details, check `# ./extras/debug/gfcore.py --help`) +- Hit the code path where you have introduced GF_ASSERT(). If GF_ASSERT() is in fuse_write() path, you can hit the code + path by writing on to a file present under Gluster moun. Ex: `# dd if=/dev/zero of=/mnt/glustrefs/abcd bs=1M count=1` + where `/mnt/glusterfs` is the gluster mount +- Go to the terminal where the gdb is running (step 3) and observe that the gdb process is terminated +- Go to the directory where the core-dump is produced. Default would be present working directory. +- Access the core dump using gdb Ex: `# gdb -ex "core-file $GFCORE_FILE" $GLUSTER_BINARY` + (1st arg would be core file name and 2nd arg is o/p of file command in the previous step) +- Observe that the Gluster process is unaffected by checking its process state. Check pid status using `ps -ef | grep gluster` + + Thanks, Xavi Hernandez(jahernan@redhat.com) for the idea. This will ease many Gluster developer's/maintainer’s life. diff --git a/docs/GlusterFS-Tools/README.md b/docs/GlusterFS-Tools/README.md index bafd575..5c4cbe9 100644 --- a/docs/GlusterFS-Tools/README.md +++ b/docs/GlusterFS-Tools/README.md @@ -1,5 +1,4 @@ -GlusterFS Tools ---------------- +## GlusterFS Tools -- [glusterfind](./glusterfind.md) -- [gfind missing files](./gfind-missing-files.md) +- [glusterfind](./glusterfind.md) +- [gfind missing files](./gfind-missing-files.md) diff --git a/docs/GlusterFS-Tools/gfind-missing-files.md b/docs/GlusterFS-Tools/gfind-missing-files.md index f7f9e08..1275d6a 100644 --- a/docs/GlusterFS-Tools/gfind-missing-files.md +++ b/docs/GlusterFS-Tools/gfind-missing-files.md @@ -54,15 +54,15 @@ bash gfid_to_path.sh ## Things to keep in mind when running the tool -1. Running this tool can result in a crawl of the backend filesystem at each - brick which can be intensive. To ensure there is no impact on ongoing I/O on - RHS volumes, we recommend that this tool be run at a low I/O scheduling class - (best-effort) and priority. +1. Running this tool can result in a crawl of the backend filesystem at each + brick which can be intensive. To ensure there is no impact on ongoing I/O on + RHS volumes, we recommend that this tool be run at a low I/O scheduling class + (best-effort) and priority. - ionice -c 2 -p + ionice -c 2 -p -2. We do not recommend interrupting the tool when it is running - (e.g. by doing CTRL^C). It is better to wait for the tool to finish +2. We do not recommend interrupting the tool when it is running + (e.g. by doing CTRL^C). It is better to wait for the tool to finish execution. In case it is interrupted, manually unmount the Slave Volume. - umount + umount diff --git a/docs/GlusterFS-Tools/glusterfind.md b/docs/GlusterFS-Tools/glusterfind.md index 442b3f4..7424e3e 100644 --- a/docs/GlusterFS-Tools/glusterfind.md +++ b/docs/GlusterFS-Tools/glusterfind.md @@ -6,11 +6,23 @@ This tool should be run in one of the node, which will get Volume info and gets ## Session Management -Create a glusterfind session to remember the time when last sync or processing complete. For example, your backup application runs every day and gets incremental results on each run. The tool maintains session in `$GLUSTERD_WORKDIR/glusterfind/`, for each session it creates and directory and creates a sub directory with Volume name. (Default working directory is /var/lib/glusterd, in some systems this location may change. To find Working dir location run `grep working-directory /etc/glusterfs/glusterd.vol` or `grep working-directory /usr/local/etc/glusterfs/glusterd.vol` if source install) +Create a glusterfind session to remember the time when last sync or processing complete. For example, your backup application runs every day and gets incremental results on each run. The tool maintains session in `$GLUSTERD_WORKDIR/glusterfind/`, for each session it creates and directory and creates a sub directory with Volume name. (Default working directory is /var/lib/glusterd, in some systems this location may change. To find Working dir location run + +```console +grep working-directory /etc/glusterfs/glusterd.vol +``` + +or + +```console +grep working-directory /usr/local/etc/glusterfs/glusterd.vol +``` + +if you installed from the source. For example, if the session name is "backup" and volume name is "datavol", then the tool creates `$GLUSTERD_WORKDIR/glusterfind/backup/datavol`. Now onwards we refer this directory as `$SESSION_DIR`. -```text +```{ .text .no-copy } create => pre => post => [delete] ``` @@ -34,13 +46,13 @@ Incremental find uses Changelogs to get the list of GFIDs modified/created. Any If we set build-pgfid option in Volume GlusterFS starts recording each files parent directory GFID as xattr in file on any ENTRY fop. -```text +```{ .text .no-copy } trusted.pgfid.=NUM_LINKS ``` To convert from GFID to path, we can mount Volume with aux-gfid-mount option, and get Path information by a getfattr query. -```console +```{ .console .no-copy } getfattr -n glusterfs.ancestry.path -e text /mnt/datavol/.gfid/ ``` @@ -54,7 +66,7 @@ Tool collects the list of GFIDs failed to convert with above method and does a f ### Create the session -```console +```{ .console .no-copy } glusterfind create SESSION_NAME VOLNAME [--force] glusterfind create --help ``` @@ -63,7 +75,7 @@ Where, SESSION_NAME is any name without space to identify when run second time. Examples, -```console +```{ .console .no-copy } # glusterfind create --help # glusterfind create backup datavol # glusterfind create antivirus_scanner datavol @@ -72,7 +84,7 @@ Examples, ### Pre Command -```console +```{ .console .no-copy } glusterfind pre SESSION_NAME VOLUME_NAME OUTFILE glusterfind pre --help ``` @@ -83,7 +95,7 @@ To trigger the full find, call the pre command with `--full` argument. Multiple Examples, -```console +```{ .console .no-copy } # glusterfind pre backup datavol /root/backup.txt # glusterfind pre backup datavol /root/backup.txt --full @@ -97,27 +109,27 @@ Examples, Output file contains list of files/dirs relative to the Volume mount, if we need to prefix with any path to have absolute path then, ```console -# glusterfind pre backup datavol /root/backup.txt --file-prefix=/mnt/datavol/ +glusterfind pre backup datavol /root/backup.txt --file-prefix=/mnt/datavol/ ``` ### List Command To get the list of sessions and respective session time, -```console +```{ .console .no-copy } glusterfind list [--session SESSION_NAME] [--volume VOLUME_NAME] ``` Examples, -```console +```{ .console .no-copy } # glusterfind list # glusterfind list --session backup ``` Example output, -```console +```{ .text .no-copy } SESSION VOLUME SESSION TIME --------------------------------------------------------------------------- backup datavol 2015-03-04 17:35:34 @@ -125,26 +137,26 @@ backup datavol 2015-03-04 17:35:34 ### Post Command -```console +```{ .console .no-copy } glusterfind post SESSION_NAME VOLUME_NAME ``` Examples, ```console -# glusterfind post backup datavol +glusterfind post backup datavol ``` ### Delete Command -```console +```{ .console .no-copy } glusterfind delete SESSION_NAME VOLUME_NAME ``` Examples, ```console -# glusterfind delete backup datavol +glusterfind delete backup datavol ``` ## Adding more Crawlers @@ -170,7 +182,7 @@ Custom crawler can be executable script/binary which accepts volume name, brick For example, -```console +```{ .console .no-copy } /root/parallelbrickcrawl SESSION_NAME VOLUME BRICK_PATH OUTFILE START_TIME [--debug] ``` diff --git a/docs/Install-Guide/Configure.md b/docs/Install-Guide/Configure.md index d06a24b..9d475d9 100644 --- a/docs/Install-Guide/Configure.md +++ b/docs/Install-Guide/Configure.md @@ -3,6 +3,7 @@ For the Gluster to communicate within a cluster either the firewalls have to be turned off or enable communication for each server. + ```{ .console .no-copy } iptables -I INPUT -p all -s `` -j ACCEPT ``` @@ -115,14 +116,12 @@ Brick3: node03.yourdomain.net:/export/sdb1/brick ``` This shows us essentially what we just specified during the volume -creation. The one this to mention is the `Status`. A status of `Created` -means that the volume has been created, but hasn’t yet been started, -which would cause any attempt to mount the volume fail. +creation. The one key output worth noticing is `Status`. +A status of `Created` means that the volume has been created, +but hasn’t yet been started, which would cause any attempt to mount the volume fail. -Now, we should start the volume. +Now, we should start the volume before we try to mount it. ```console gluster volume start gv0 ``` - -Find all documentation [here](../index.md) diff --git a/docs/Ops-Guide/Overview.md b/docs/Ops-Guide/Overview.md index 743c9dc..fa2374a 100644 --- a/docs/Ops-Guide/Overview.md +++ b/docs/Ops-Guide/Overview.md @@ -6,7 +6,7 @@ planning but the growth has mostly been ad-hoc and need-based. Central to the plan of revitalizing the Gluster.org community is the ability to provide well-maintained infrastructure services with predictable uptimes and -resilience. We're migrating the existing services into the Community Cage. The +resilience. We're migrating the existing services into the Community Cage. The implied objective is that the transition would open up ways and means of the formation of a loose coalition among Infrastructure Administrators who provide expertise and guidance to the community projects within the OSAS team. diff --git a/docs/Ops-Guide/Tools.md b/docs/Ops-Guide/Tools.md index e2b5bb8..287e74a 100644 --- a/docs/Ops-Guide/Tools.md +++ b/docs/Ops-Guide/Tools.md @@ -1,23 +1,24 @@ ## Tools We Use -| Service/Tool | Purpose | Hosted At | -|----------------------|----------------------------------------------------|-----------------| -| Github | Code Review | Github | -| Jenkins | CI, build-verification-test | Temporary Racks | -| Backups | Website, Gerrit and Jenkins backup | Rackspace | -| Docs | Documentation content | mkdocs.org | -| download.gluster.org | Official download site of the binaries | Rackspace | -| Mailman | Lists mailman | Rackspace | -| www.gluster.org | Web asset | Rackspace | +| Service/Tool | Purpose | Hosted At | +| :------------------- | :------------------------------------: | --------------: | +| Github | Code Review | Github | +| Jenkins | CI, build-verification-test | Temporary Racks | +| Backups | Website, Gerrit and Jenkins backup | Rackspace | +| Docs | Documentation content | mkdocs.org | +| download.gluster.org | Official download site of the binaries | Rackspace | +| Mailman | Lists mailman | Rackspace | +| www.gluster.org | Web asset | Rackspace | ## Notes -* download.gluster.org: Resiliency is important for availability and metrics. + +- download.gluster.org: Resiliency is important for availability and metrics. Since it's official download, access need to restricted as much as possible. Few developers building the community packages have access. If anyone requires access can raise an issue at [gluster/project-infrastructure](https://github.com/gluster/project-infrastructure/issues/new) with valid reason -* Mailman: Should be migrated to a separate host. Should be made more redundant +- Mailman: Should be migrated to a separate host. Should be made more redundant (ie, more than 1 MX). -* www.gluster.org: Framework, Artifacts now exist under gluster.github.com. Has +- www.gluster.org: Framework, Artifacts now exist under gluster.github.com. Has various legacy installation of software (mediawiki, etc ), being cleaned as we find them. diff --git a/docs/Troubleshooting/README.md b/docs/Troubleshooting/README.md index 0741662..4ec0122 100644 --- a/docs/Troubleshooting/README.md +++ b/docs/Troubleshooting/README.md @@ -1,9 +1,8 @@ -Troubleshooting Guide ---------------------- +## Troubleshooting Guide + This guide describes some commonly seen issues and steps to recover from them. If that doesn’t help, reach out to the [Gluster community](https://www.gluster.org/community/), in which case the guide also describes what information needs to be provided in order to debug the issue. At minimum, we need the version of gluster running and the output of `gluster volume info`. - ### Where Do I Start? Is the issue already listed in the component specific troubleshooting sections? @@ -15,7 +14,6 @@ Is the issue already listed in the component specific troubleshooting sections? - [Gluster NFS Issues](./troubleshooting-gnfs.md) - [File Locks](./troubleshooting-filelocks.md) - If that didn't help, here is how to debug further. Identifying the problem and getting the necessary information to diagnose it is the first step in troubleshooting your Gluster setup. As Gluster operations involve interactions between multiple processes, this can involve multiple steps. @@ -25,5 +23,3 @@ Identifying the problem and getting the necessary information to diagnose it is - An operation failed - [High Memory Usage](./troubleshooting-memory.md) - [A Gluster process crashed](./gluster-crash.md) - - diff --git a/docs/Troubleshooting/gfid-to-path.md b/docs/Troubleshooting/gfid-to-path.md index 275fb71..3a25a1b 100644 --- a/docs/Troubleshooting/gfid-to-path.md +++ b/docs/Troubleshooting/gfid-to-path.md @@ -8,24 +8,26 @@ normal filesystem. The GFID of a file is stored in its xattr named #### Special mount using gfid-access translator: ```console -# mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol +mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol ``` Assuming, you have `GFID` of a file from changelog (or somewhere else). For trying this out, you can get `GFID` of a file from mountpoint: ```console -# getfattr -n glusterfs.gfid.string /mnt/testvol/dir/file +getfattr -n glusterfs.gfid.string /mnt/testvol/dir/file ``` --- + ### Get file path from GFID (Method 1): + **(Lists hardlinks delimited by `:`, returns path as seen from mountpoint)** #### Turn on build-pgfid option ```console -# gluster volume set test build-pgfid on +gluster volume set test build-pgfid on ``` Read virtual xattr `glusterfs.ancestry.path` which contains the file path @@ -36,7 +38,7 @@ getfattr -n glusterfs.ancestry.path -e text /mnt/testvol/.gfid/ **Example:** -```console +```{ .console .no-copy } [root@vm1 glusterfs]# ls -il /mnt/testvol/dir/ total 1 10610563327990022372 -rw-r--r--. 2 root root 3 Jul 17 18:05 file @@ -54,6 +56,7 @@ glusterfs.ancestry.path="/dir/file:/dir/file3" ``` ### Get file path from GFID (Method 2): + **(Does not list all hardlinks, returns backend brick path)** ```console @@ -70,4 +73,5 @@ trusted.glusterfs.pathinfo="( info` This lists all the files that require healing (and will be processed by the self-heal daemon). It prints either their path or their GFID. ### Interpreting the output + All the files listed in the output of this command need to be healed. The files listed may also be accompanied by the following tags: a) 'Is in split-brain' -A file in data or metadata split-brain will -be listed with " - Is in split-brain" appended after its path/GFID. E.g. +A file in data or metadata split-brain will +be listed with " - Is in split-brain" appended after its path/GFID. E.g. "/file4" in the output provided below. However, for a file in GFID split-brain, - the parent directory of the file is shown to be in split-brain and the file -itself is shown to be needing healing, e.g. "/dir" in the output provided below +the parent directory of the file is shown to be in split-brain and the file +itself is shown to be needing healing, e.g. "/dir" in the output provided below is in split-brain because of GFID split-brain of file "/dir/a". Files in split-brain cannot be healed without resolving the split-brain. @@ -36,11 +37,13 @@ b) 'Is possibly undergoing heal' When the heal info command is run, it (or to be more specific, the 'glfsheal' binary that is executed when you run the command) takes locks on each file to find if it needs healing. However, if the self-heal daemon had already started healing the file, it would have taken locks which glfsheal wouldn't be able to acquire. In such a case, it could print this message. Another possible case could be multiple glfsheal processes running simultaneously (e.g. multiple users ran a heal info command at the same time) and competing for same lock. The following is an example of heal info command's output. + ### Example + Consider a replica volume "test" with two bricks b1 and b2; self-heal daemon off, mounted at /mnt. -```console +```{ .console .no-copy } # gluster volume heal test info Brick \ - Is in split-brain @@ -63,24 +66,27 @@ Number of entries: 6 ``` ### Analysis of the output -It can be seen that -A) from brick b1, four entries need healing: -      1) file with gfid:6dc78b20-7eb6-49a3-8edb-087b90142246 needs healing -      2) "aaca219f-0e25-4576-8689-3bfd93ca70c2", -"39f301ae-4038-48c2-a889-7dac143e82dd" and "c3c94de2-232d-4083-b534-5da17fc476ac" - are in split-brain -B) from brick b2 six entries need healing- -      1) "a", "file2" and "file3" need healing -      2) "file1", "file4" & "/dir" are in split-brain +It can be seen that + +A) from brick b1, four entries need healing: + +- file with gfid:6dc78b20-7eb6-49a3-8edb-087b90142246 needs healing +- "aaca219f-0e25-4576-8689-3bfd93ca70c2", "39f301ae-4038-48c2-a889-7dac143e82dd" and "c3c94de2-232d-4083-b534-5da17fc476ac" are in split-brain + +B) from brick b2 six entries need healing- + +- "a", "file2" and "file3" need healing +- "file1", "file4" & "/dir" are in split-brain # 2. Volume heal info split-brain + Usage: `gluster volume heal info split-brain` This command only shows the list of files that are in split-brain. The output is therefore a subset of `gluster volume heal info` ### Example -```console +```{ .console .no-copy } # gluster volume heal test info split-brain Brick @@ -95,19 +101,22 @@ Brick Number of entries in split-brain: 3 ``` -Note that similar to the heal info command, for GFID split-brains (same filename but different GFID) +Note that similar to the heal info command, for GFID split-brains (same filename but different GFID) their parent directories are listed to be in split-brain. # 3. Resolution of split-brain using gluster CLI + Once the files in split-brain are identified, their resolution can be done from the gluster command line using various policies. Type-mismatch cannot be healed using this methods. Split-brain resolution commands let the user resolve data, metadata, and GFID split-brains. ## 3.1 Resolution of data/metadata split-brain using gluster CLI + Data and metadata split-brains can be resolved using the following policies: ## i) Select the bigger-file as source + This command is useful for per file healing where it is known/decided that the -file with bigger size is to be considered as source. +file with bigger size is to be considered as source. `gluster volume heal split-brain bigger-file ` Here, `` can be either the full file name as seen from the root of the volume (or) the GFID-string representation of the file, which sometimes gets displayed @@ -115,13 +124,14 @@ in the heal info command's output. Once this command is executed, the replica co size is found and healing is completed with that brick as a source. ### Example : + Consider the earlier output of the heal info split-brain command. -Before healing the file, notice file size and md5 checksums : +Before healing the file, notice file size and md5 checksums : On brick b1: -```console +```{ .console .no-copy } [brick1]# stat b1/dir/file1 File: ‘b1/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file @@ -138,7 +148,7 @@ Change: 2015-03-06 13:55:37.206880347 +0530 On brick b2: -```console +```{ .console .no-copy } [brick2]# stat b2/dir/file1 File: ‘b2/dir/file1’ Size: 13 Blocks: 16 IO Block: 4096 regular file @@ -153,7 +163,7 @@ Change: 2015-03-06 13:52:22.910758923 +0530 cb11635a45d45668a403145059c2a0d5 b2/dir/file1 ``` -**Healing file1 using the above command** :- +**Healing file1 using the above command** :- `gluster volume heal test split-brain bigger-file /dir/file1` Healed /dir/file1. @@ -161,7 +171,7 @@ After healing is complete, the md5sum and file size on both bricks should be the On brick b1: -```console +```{ .console .no-copy } [brick1]# stat b1/dir/file1 File: ‘b1/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file @@ -178,7 +188,7 @@ Change: 2015-03-06 14:17:12.880343950 +0530 On brick b2: -```console +```{ .console .no-copy } [brick2]# stat b2/dir/file1 File: ‘b2/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file @@ -195,7 +205,7 @@ Change: 2015-03-06 14:17:12.881343955 +0530 ## ii) Select the file with the latest mtime as source -```console +```{ .console .no-copy } gluster volume heal split-brain latest-mtime ``` @@ -203,20 +213,21 @@ As is perhaps self-explanatory, this command uses the brick having the latest mo ## iii) Select one of the bricks in the replica as the source for a particular file -```console +```{ .console .no-copy } gluster volume heal split-brain source-brick ``` Here, `` is selected as source brick and `` present in the source brick is taken as the source for healing. ### Example : + Notice the md5 checksums and file size before and after healing. Before heal : On brick b1: -```console +```{ .console .no-copy } [brick1]# stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file @@ -233,7 +244,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: -```console +```{ .console .no-copy } [brick2]# stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file @@ -251,7 +262,7 @@ Change: 2015-03-06 13:52:35.769833142 +0530 **Healing the file with gfid c3c94de2-232d-4083-b534-5da17fc476ac using the above command** : ```console -# gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac +gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac ``` Healed gfid:c3c94de2-232d-4083-b534-5da17fc476ac. @@ -260,7 +271,7 @@ After healing : On brick b1: -```console +```{ .console .no-copy } # stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file @@ -276,7 +287,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: -```console +```{ .console .no-copy } # stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file @@ -292,7 +303,7 @@ b6273b589df2dfdbd8fe35b1011e3183 b2/file4 ## iv) Select one brick of the replica as the source for all files -```console +```{ .console .no-copy } gluster volume heal split-brain source-brick ``` @@ -301,9 +312,10 @@ replica pair is source. As the result of the above command all split-brained files in `` are selected as source and healed to the sink. ### Example: + Consider a volume having three entries "a, b and c" in split-brain. -```console +```{ .console .no-copy } # gluster volume heal test split-brain source-brick test-host:/test/b1 Healed gfid:944b4764-c253-4f02-b35f-0d0ae2f86c0f. Healed gfid:3256d814-961c-4e6e-8df2-3a3143269ced. @@ -312,19 +324,24 @@ Number of healed entries: 3 ``` # 3.2 Resolution of GFID split-brain using gluster CLI + GFID split-brains can also be resolved by the gluster command line using the same policies that are used to resolve data and metadata split-brains. ## i) Selecting the bigger-file as source + This method is useful for per file healing and where you can decided that the file with bigger size is to be considered as source. Run the following command to obtain the path of the file that is in split-brain: -```console + +```{ .console .no-copy } # gluster volume heal VOLNAME info split-brain ``` From the output, identify the files for which file operations performed from the client failed with input/output error. + ### Example : -```console + +```{ .console .no-copy } # gluster volume heal testvol info Brick 10.70.47.45:/bricks/brick2/b0 /f5 @@ -340,19 +357,22 @@ Brick 10.70.47.144:/bricks/brick2/b1 Status: Connected Number of entries: 2 ``` + > **Note** > Entries which are in GFID split-brain may not be shown as in split-brain by the heal info or heal info split-brain commands always. For entry split-brains, it is the parent directory which is shown as being in split-brain. So one might need to run info split-brain to get the dir names and then heal info to get the list of files under that dir which might be in split-brain (it could just be needing heal without split-brain). In the above command, testvol is the volume name, b0 and b1 are the bricks. Execute the below getfattr command on the brick to fetch information if a file is in GFID split-brain or not. -```console +```{ .console .no-copy } # getfattr -d -e hex -m. ``` ### Example : + On brick /b0 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b0/f5 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f5 @@ -364,7 +384,8 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f5 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b1/f5 @@ -379,7 +400,8 @@ You can notice the difference in GFID for the file f5 in both the bricks. You can find the differences in the file size by executing stat command on the file from the bricks. On brick /b0 -```console + +```{ .console .no-copy } # stat /bricks/brick2/b0/f5 File: ‘/bricks/brick2/b0/f5’ Size: 15 Blocks: 8 IO Block: 4096 regular file @@ -393,7 +415,8 @@ Birth: - ``` On brick /b1 -```console + +```{ .console .no-copy } # stat /bricks/brick2/b1/f5 File: ‘/bricks/brick2/b1/f5’ Size: 2 Blocks: 8 IO Block: 4096 regular file @@ -408,12 +431,13 @@ Birth: - Execute the following command along with the full filename as seen from the root of the volume which is displayed in the heal info command's output: -```console +```{ .console .no-copy } # gluster volume heal VOLNAME split-brain bigger-file FILE ``` ### Example : -```console + +```{ .console .no-copy } # gluster volume heal testvol split-brain bigger-file /f5 GFID split-brain resolved for file /f5 ``` @@ -421,7 +445,8 @@ GFID split-brain resolved for file /f5 After the healing is complete, the GFID of the file on both the bricks must be the same as that of the file which had the bigger size. The following is a sample output of the getfattr command after completion of healing the file. On brick /b0 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b0/f5 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f5 @@ -431,7 +456,8 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f5 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b1/f5 @@ -441,14 +467,16 @@ trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d30303 ``` ## ii) Selecting the file with latest mtime as source + This method is useful for per file healing and if you want the file with latest mtime has to be considered as source. ### Example : + Lets take another file which is in GFID split-brain and try to heal that using the latest-mtime option. On brick /b0 -```console +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b0/f4 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f4 @@ -460,7 +488,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f4 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b1/f4 @@ -475,7 +504,8 @@ You can notice the difference in GFID for the file f4 in both the bricks. You can find the difference in the modification time by executing stat command on the file from the bricks. On brick /b0 -```console + +```{ .console .no-copy } # stat /bricks/brick2/b0/f4 File: ‘/bricks/brick2/b0/f4’ Size: 14 Blocks: 8 IO Block: 4096 regular file @@ -489,7 +519,8 @@ Birth: - ``` On brick /b1 -```console + +```{ .console .no-copy } # stat /bricks/brick2/b1/f4 File: ‘/bricks/brick2/b1/f4’ Size: 2 Blocks: 8 IO Block: 4096 regular file @@ -503,12 +534,14 @@ Birth: - ``` Execute the following command: -```console + +```{ .console .no-copy } # gluster volume heal VOLNAME split-brain latest-mtime FILE ``` ### Example : -```console + +```{ .console .no-copy } # gluster volume heal testvol split-brain latest-mtime /f4 GFID split-brain resolved for file /f4 ``` @@ -516,7 +549,9 @@ GFID split-brain resolved for file /f4 After the healing is complete, the GFID of the files on both bricks must be same. The following is a sample output of the getfattr command after completion of healing the file. You can notice that the file has been healed using the brick having the latest mtime as the source. On brick /b0 -```console# getfattr -d -m . -e hex /bricks/brick2/b0/f4 + +```{ .console .no-copy } +# getfattr -d -m . -e hex /bricks/brick2/b0/f4 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f4 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 @@ -525,7 +560,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f4 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b1/f4 @@ -535,13 +571,16 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` ## iii) Select one of the bricks in the replica as source for a particular file + This method is useful for per file healing and if you know which copy of the file is good. ### Example : + Lets take another file which is in GFID split-brain and try to heal that using the source-brick option. On brick /b0 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b0/f3 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f3 @@ -553,7 +592,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f3 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f3 @@ -567,14 +607,16 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 You can notice the difference in GFID for the file f3 in both the bricks. Execute the following command: -```console + +```{ .console .no-copy } # gluster volume heal VOLNAME split-brain source-brick HOSTNAME:export-directory-absolute-path FILE ``` In this command, FILE present in HOSTNAME : export-directory-absolute-path is taken as source for healing. ### Example : -```console + +```{ .console .no-copy } # gluster volume heal testvol split-brain source-brick 10.70.47.144:/bricks/brick2/b1 /f3 GFID split-brain resolved for file /f3 ``` @@ -582,7 +624,8 @@ GFID split-brain resolved for file /f3 After the healing is complete, the GFID of the file on both the bricks should be same as that of the brick which was chosen as source for healing. The following is a sample output of the getfattr command after the file is healed. On brick /b0 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b0/f3 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b0/f3 @@ -592,7 +635,8 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` On brick /b1 -```console + +```{ .console .no-copy } # getfattr -d -m . -e hex /bricks/brick2/b1/f3 getfattr: Removing leading '/' from absolute path names file: bricks/brick2/b1/f3 @@ -602,19 +646,22 @@ trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d30303 ``` > **Note** ->- One cannot use the GFID of the file as an argument with any of the CLI options to resolve GFID split-brain. It should be the absolute path as seen from the mount point to the file considered as source. > ->- With source-brick option there is no way to resolve all the GFID split-brain in one shot by not specifying any file path in the CLI as done while resolving data or metadata split-brain. For each file in GFID split-brain, run the CLI with the policy you want to use. +> - One cannot use the GFID of the file as an argument with any of the CLI options to resolve GFID split-brain. It should be the absolute path as seen from the mount point to the file considered as source. > ->- Resolving directory GFID split-brain using CLI with the "source-brick" option in a "distributed-replicated" volume needs to be done on all the sub-volumes explicitly, which are in this state. Since directories get created on all the sub-volumes, using one particular brick as source for directory GFID split-brain heals the directory for that particular sub-volume. Source brick should be chosen in such a way that after heal all the bricks of all the sub-volumes have the same GFID. +> - With source-brick option there is no way to resolve all the GFID split-brain in one shot by not specifying any file path in the CLI as done while resolving data or metadata split-brain. For each file in GFID split-brain, run the CLI with the policy you want to use. +> +> - Resolving directory GFID split-brain using CLI with the "source-brick" option in a "distributed-replicated" volume needs to be done on all the sub-volumes explicitly, which are in this state. Since directories get created on all the sub-volumes, using one particular brick as source for directory GFID split-brain heals the directory for that particular sub-volume. Source brick should be chosen in such a way that after heal all the bricks of all the sub-volumes have the same GFID. ## Note: + As mentioned earlier, type-mismatch can not be resolved using CLI. Type-mismatch means different st_mode values (for example, the entry is a file in one brick while it is a directory on the other). Trying to heal such entry would fail. ### Example + The entry named "entry1" is of different types on the bricks of the replica. Lets try to heal that using the split-brain CLI. -```console +```{ .console .no-copy } # gluster volume heal test split-brain source-brick test-host:/test/b1 /entry1 Healing /entry1 failed:Operation not permitted. Volume heal failed. @@ -623,22 +670,23 @@ Volume heal failed. However, they can be fixed by deleting the file from all but one bricks. See [Fixing Directory entry split-brain](#dir-split-brain) # An overview of working of heal info commands -When these commands are invoked, a "glfsheal" process is spawned which reads -the entries from the various sub-directories under `//.glusterfs/indices/` of all -the bricks that are up (that it can connect to) one after another. These -entries are GFIDs of files that might need healing. Once GFID entries from a -brick are obtained, based on the lookup response of this file on each -participating brick of replica-pair & trusted.afr.* extended attributes it is -found out if the file needs healing, is in split-brain etc based on the + +When these commands are invoked, a "glfsheal" process is spawned which reads +the entries from the various sub-directories under `//.glusterfs/indices/` of all +the bricks that are up (that it can connect to) one after another. These +entries are GFIDs of files that might need healing. Once GFID entries from a +brick are obtained, based on the lookup response of this file on each +participating brick of replica-pair & trusted.afr.\* extended attributes it is +found out if the file needs healing, is in split-brain etc based on the requirement of each command and displayed to the user. - # 4. Resolution of split-brain from the mount point + A set of getfattr and setfattr commands have been provided to detect the data and metadata split-brain status of a file and resolve split-brain, if any, from mount point. Consider a volume "test", having bricks b0, b1, b2 and b3. -```console +```{ .console .no-copy } # gluster volume info test Volume Name: test @@ -656,7 +704,7 @@ Brick4: test-host:/test/b3 Directory structure of the bricks is as follows: -```console +```{ .console .no-copy } # tree -R /test/b? /test/b0 ├── dir @@ -683,7 +731,7 @@ Directory structure of the bricks is as follows: Some files in the volume are in split-brain. -```console +```{ .console .no-copy } # gluster v heal test info split-brain Brick test-host:/test/b0/ /file100 @@ -708,7 +756,7 @@ Number of entries in split-brain: 2 ### To know data/metadata split-brain status of a file: -```console +```{ .console .no-copy } getfattr -n replica.split-brain-status ``` @@ -716,50 +764,52 @@ The above command executed from mount provides information if a file is in data/ This command is not applicable to gfid/directory split-brain. ### Example: -1) "file100" is in metadata split-brain. Executing the above mentioned command for file100 gives : -```console +1. "file100" is in metadata split-brain. Executing the above mentioned command for file100 gives : + +```{ .console .no-copy } # getfattr -n replica.split-brain-status file100 file: file100 replica.split-brain-status="data-split-brain:no metadata-split-brain:yes Choices:test-client-0,test-client-1" ``` -2) "file1" is in data split-brain. +2. "file1" is in data split-brain. -```console +```{ .console .no-copy } # getfattr -n replica.split-brain-status file1 file: file1 replica.split-brain-status="data-split-brain:yes metadata-split-brain:no Choices:test-client-2,test-client-3" ``` -3) "file99" is in both data and metadata split-brain. +3. "file99" is in both data and metadata split-brain. -```console +```{ .console .no-copy } # getfattr -n replica.split-brain-status file99 file: file99 replica.split-brain-status="data-split-brain:yes metadata-split-brain:yes Choices:test-client-2,test-client-3" ``` -4) "dir" is in directory split-brain but as mentioned earlier, the above command is not applicable to such split-brain. So it says that the file is not under data or metadata split-brain. +4. "dir" is in directory split-brain but as mentioned earlier, the above command is not applicable to such split-brain. So it says that the file is not under data or metadata split-brain. -```console +```{ .console .no-copy } # getfattr -n replica.split-brain-status dir file: dir replica.split-brain-status="The file is not under data or metadata split-brain" ``` -5) "file2" is not in any kind of split-brain. +5. "file2" is not in any kind of split-brain. -```console +```{ .console .no-copy } # getfattr -n replica.split-brain-status file2 file: file2 replica.split-brain-status="The file is not under data or metadata split-brain" ``` ### To analyze the files in data and metadata split-brain + Trying to do operations (say cat, getfattr etc) from the mount on files in split-brain, gives an input/output error. To enable the users analyze such files, a setfattr command is provided. -```console +```{ .console .no-copy } # setfattr -n replica.split-brain-choice -v "choiceX" ``` @@ -767,9 +817,9 @@ Using this command, a particular brick can be chosen to access the file in split ### Example: -1) "file1" is in data-split-brain. Trying to read from the file gives input/output error. +1. "file1" is in data-split-brain. Trying to read from the file gives input/output error. -```console +```{ .console .no-copy } # cat file1 cat: file1: Input/output error ``` @@ -778,13 +828,13 @@ Split-brain choices provided for file1 were test-client-2 and test-client-3. Setting test-client-2 as split-brain choice for file1 serves reads from b2 for the file. -```console +```{ .console .no-copy } # setfattr -n replica.split-brain-choice -v test-client-2 file1 ``` Now, read operations on the file can be done. -```console +```{ .console .no-copy } # cat file1 xyz ``` @@ -793,18 +843,18 @@ Similarly, to inspect the file from other choice, replica.split-brain-choice is Trying to inspect the file from a wrong choice errors out. -To undo the split-brain-choice that has been set, the above mentioned setfattr command can be used +To undo the split-brain-choice that has been set, the above mentioned setfattr command can be used with "none" as the value for extended attribute. ### Example: -```console +```{ .console .no-copy } # setfattr -n replica.split-brain-choice -v none file1 ``` Now performing cat operation on the file will again result in input/output error, as before. -```console +```{ .console .no-copy } # cat file cat: file1: Input/output error ``` @@ -812,13 +862,13 @@ cat: file1: Input/output error Once the choice for resolving split-brain is made, source brick is supposed to be set for the healing to be done. This is done using the following command: -```console +```{ .console .no-copy } # setfattr -n replica.split-brain-heal-finalize -v ``` ## Example -```console +```{ .console .no-copy } # setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1 ``` @@ -826,18 +876,19 @@ The above process can be used to resolve data and/or metadata split-brain on all **NOTE**: -1) If "fopen-keep-cache" fuse mount option is disabled then inode needs to be invalidated each time before selecting a new replica.split-brain-choice to inspect a file. This can be done by using: +1. If "fopen-keep-cache" fuse mount option is disabled then inode needs to be invalidated each time before selecting a new replica.split-brain-choice to inspect a file. This can be done by using: -```console +```{ .console .no-copy } # sefattr -n inode-invalidate -v 0 ``` -2) The above mentioned process for split-brain resolution from mount will not work on nfs mounts as it doesn't provide xattrs support. +2. The above mentioned process for split-brain resolution from mount will not work on nfs mounts as it doesn't provide xattrs support. # 5. Automagic unsplit-brain by [ctime|mtime|size|majority] -The CLI and fuse mount based resolution methods require intervention in the sense that the admin/ user needs to run the commands manually. There is a `cluster.favorite-child-policy` volume option which when set to one of the various policies available, automatically resolve split-brains without user intervention. The default value is 'none', i.e. it is disabled. -```console +The CLI and fuse mount based resolution methods require intervention in the sense that the admin/ user needs to run the commands manually. There is a `cluster.favorite-child-policy` volume option which when set to one of the various policies available, automatically resolve split-brains without user intervention. The default value is 'none', i.e. it is disabled. + +```{ .console .no-copy } # gluster volume set help | grep -A3 cluster.favorite-child-policy Option: cluster.favorite-child-policy Default Value: none @@ -846,40 +897,41 @@ Description: This option can be used to automatically resolve split-brains using `cluster.favorite-child-policy` applies to all files of the volume. It is assumed that if this option is enabled with a particular policy, you don't care to examine the split-brain files on a per file basis but just want the split-brain to be resolved as and when it occurs based on the set policy. - - # Manual Split-Brain Resolution: -Quick Start: -============ -1. Get the path of the file that is in split-brain: -> It can be obtained either by -> a) The command `gluster volume heal info split-brain`. -> b) Identify the files for which file operations performed - from the client keep failing with Input/Output error. +# Quick Start: -2. Close the applications that opened this file from the mount point. -In case of VMs, they need to be powered-off. +1. Get the path of the file that is in split-brain: -3. Decide on the correct copy: -> This is done by observing the afr changelog extended attributes of the file on -the bricks using the getfattr command; then identifying the type of split-brain -(data split-brain, metadata split-brain, entry split-brain or split-brain due to -gfid-mismatch); and finally determining which of the bricks contains the 'good copy' -of the file. -> `getfattr -d -m . -e hex `. -It is also possible that one brick might contain the correct data while the -other might contain the correct metadata. + > It can be obtained either by + > a) The command `gluster volume heal info split-brain`. + > b) Identify the files for which file operations performed from the client keep failing with Input/Output error. -4. Reset the relevant extended attribute on the brick(s) that contains the -'bad copy' of the file data/metadata using the setfattr command. -> `setfattr -n -v ` +1. Close the applications that opened this file from the mount point. + In case of VMs, they need to be powered-off. -5. Trigger self-heal on the file by performing lookup from the client: -> `ls -l ` +1. Decide on the correct copy: + + > This is done by observing the afr changelog extended attributes of the file on + > the bricks using the getfattr command; then identifying the type of split-brain + > (data split-brain, metadata split-brain, entry split-brain or split-brain due to + > gfid-mismatch); and finally determining which of the bricks contains the 'good copy' + > of the file. + > `getfattr -d -m . -e hex `. + > It is also possible that one brick might contain the correct data while the + > other might contain the correct metadata. + +1. Reset the relevant extended attribute on the brick(s) that contains the + 'bad copy' of the file data/metadata using the setfattr command. + + > `setfattr -n -v ` + +1. Trigger self-heal on the file by performing lookup from the client: + + > `ls -l ` + +# Detailed Instructions for steps 3 through 5: -Detailed Instructions for steps 3 through 5: -=========================================== To understand how to resolve split-brain we need to know how to interpret the afr changelog extended attributes. @@ -887,7 +939,7 @@ Execute `getfattr -d -m . -e hex ` Example: -```console +```{ .console .no-copy } [root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt \#file: brick-a/file.txt security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000 @@ -900,7 +952,7 @@ The extended attributes with `trusted.afr.-client-` are used by afr to maintain changelog of the file.The values of the `trusted.afr.-client-` are calculated by the glusterfs client (fuse or nfs-server) processes. When the glusterfs client modifies a file -or directory, the client contacts each brick and updates the changelog extended +or directory, the client contacts each brick and updates the changelog extended attribute according to the response of the brick. 'subvolume-index' is nothing but (brick number - 1) in @@ -908,7 +960,7 @@ attribute according to the response of the brick. Example: -```console +```{ .console .no-copy } [root@pranithk-laptop ~]# gluster volume info vol Volume Name: vol Type: Distributed-Replicate @@ -929,7 +981,7 @@ Example: In the example above: -```console +```{ .console .no-copy } Brick | Replica set | Brick subvolume index ---------------------------------------------------------------------------- -/gfs/brick-a | 0 | 0 @@ -945,25 +997,25 @@ Brick | Replica set | Brick subvolume index Each file in a brick maintains the changelog of itself and that of the files present in all the other bricks in its replica set as seen by that brick. -In the example volume given above, all files in brick-a will have 2 entries, +In the example volume given above, all files in brick-a will have 2 entries, one for itself and the other for the file present in its replica pair, i.e.brick-b: trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for itself (brick-a) -trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a +trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for brick-b as seen by brick-a Likewise, all files in brick-b will have: trusted.afr.vol-client-0=0x000000000000000000000000 -->changelog for brick-a as seen by brick-b -trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b) +trusted.afr.vol-client-1=0x000000000000000000000000 -->changelog for itself (brick-b) -The same can be extended for other replica pairs. +The same can be extended for other replica pairs. Interpreting Changelog (roughly pending operation count) Value: Each extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent changelog of data. Second 8 digits represent changelog -of metadata. Last 8 digits represent Changelog of directory entries. +of metadata. Last 8 digits represent Changelog of directory entries. Pictorially representing the same, we have: -```text +```{ .text .no-copy } 0x 000003d7 00000001 00000000 | | | | | \_ changelog of directory entries @@ -971,17 +1023,16 @@ Pictorially representing the same, we have: \ _ changelog of data ``` - For Directories metadata and entry changelogs are valid. For regular files data and metadata changelogs are valid. For special files like device files etc metadata changelog is valid. When a file split-brain happens it could be either data split-brain or meta-data split-brain or both. When a split-brain happens the changelog of the -file would be something like this: +file would be something like this: Example:(Lets consider both data, metadata split-brain on same file). -```console +```{ .console .no-copy } [root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a getfattr: Removing leading '/' from absolute path names \#file: gfs/brick-a/a @@ -1007,7 +1058,7 @@ on itself but failed on /gfs/brick-b/a. The second 8 digits of trusted.afr.vol-client-0 are all zeros (0x........00000000........), and the second 8 digits of trusted.afr.vol-client-1 are not all zeros (0x........00000001........). -So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded +So the changelog on /gfs/brick-a/a implies that some metadata operations succeeded on itself but failed on /gfs/brick-b/a. #### According to Changelog extended attributes on file /gfs/brick-b/a: @@ -1029,12 +1080,12 @@ file, it is in both data and metadata split-brain. #### Deciding on the correct copy: -The user may have to inspect stat,getfattr output of the files to decide which +The user may have to inspect stat,getfattr output of the files to decide which metadata to retain and contents of the file to decide which data to retain. Continuing with the example above, lets say we want to retain the data of /gfs/brick-a/a and metadata of /gfs/brick-b/a. -#### Resetting the relevant changelogs to resolve the split-brain: +#### Resetting the relevant changelogs to resolve the split-brain: For resolving data-split-brain: @@ -1068,27 +1119,31 @@ For trusted.afr.vol-client-1 Hence execute `setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /gfs/brick-a/a` -Thus after the above operations are done, the changelogs look like this: -[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a -getfattr: Removing leading '/' from absolute path names -\#file: gfs/brick-a/a -trusted.afr.vol-client-0=0x000000000000000000000000 -trusted.afr.vol-client-1=0x000003d70000000000000000 -trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 +Thus after the above operations are done, the changelogs look like this: -\#file: gfs/brick-b/a -trusted.afr.vol-client-0=0x000000000000000100000000 -trusted.afr.vol-client-1=0x000000000000000000000000 -trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 +```{ .console .no-copy } +[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a +getfattr: Removing leading '/' from absolute path names +\#file: gfs/brick-a/a +trusted.afr.vol-client-0=0x000000000000000000000000 +trusted.afr.vol-client-1=0x000003d70000000000000000 +trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 +\#file: gfs/brick-b/a +trusted.afr.vol-client-0=0x000000000000000100000000 +trusted.afr.vol-client-1=0x000000000000000000000000 +trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 +``` + +## Triggering Self-heal: -Triggering Self-heal: ---------------------- Perform `ls -l ` to trigger healing. Fixing Directory entry split-brain: ----------------------------------- + +--- + Afr has the ability to conservatively merge different entries in the directories when there is a split-brain on directory. If on one brick directory 'd' has entries '1', '2' and has entries '3', '4' on @@ -1108,9 +1163,11 @@ needs to be removed.The gfid-link files are present in the .glusterfs folder in the top-level directory of the brick. If the gfid of the file is 0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute got from the getfattr command earlier),the gfid-link file can be found at + > /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b #### Word of caution: + Before deleting the gfid-link, we have to ensure that there are no hard links to the file present on that brick. If hard-links exist,they must be deleted as well. diff --git a/docs/Troubleshooting/statedump.md b/docs/Troubleshooting/statedump.md index 3c33810..b89345d 100644 --- a/docs/Troubleshooting/statedump.md +++ b/docs/Troubleshooting/statedump.md @@ -2,20 +2,18 @@ A statedump is, as the name suggests, a dump of the internal state of a glusterfs process. It captures information about in-memory structures such as frames, call stacks, active inodes, fds, mempools, iobufs, and locks as well as xlator specific data structures. This can be an invaluable tool for debugging memory leaks and hung processes. +- [Generate a Statedump](#generate-a-statedump) +- [Read a Statedump](#read-a-statedump) +- [Debug with a Statedump](#debug-with-statedumps) - - - [Generate a Statedump](#generate-a-statedump) - - [Read a Statedump](#read-a-statedump) - - [Debug with a Statedump](#debug-with-statedumps) - -************************ - +--- ## Generate a Statedump + Run the command ```console -# gluster --print-statedumpdir +gluster --print-statedumpdir ``` on a gluster server node to find out which directory the statedumps will be created in. This directory may need to be created if not already present. @@ -38,7 +36,6 @@ kill -USR1 There are specific commands to generate statedumps for all brick processes/nfs server/quotad which can be used instead of the above. Run the following commands on one of the server nodes: - For bricks: ```console @@ -59,16 +56,17 @@ gluster volume statedump quotad The statedumps will be created in `statedump-directory` on each node. The statedumps for brick processes will be created with the filename `hyphenated-brick-path..dump.timestamp` while for all other processes it will be `glusterdump..dump.timestamp`. -*** +--- ## Read a Statedump Statedumps are text files and can be opened in any text editor. The first and last lines of the file contain the start and end time (in UTC)respectively of when the statedump file was written. ### Mallinfo + The mallinfo return status is printed in the following format. Please read _man mallinfo_ for more information about what each field means. -``` +```{.text .no-copy } [mallinfo] mallinfo_arena=100020224 /* Non-mmapped space allocated (bytes) */ mallinfo_ordblks=69467 /* Number of free chunks */ @@ -83,19 +81,19 @@ mallinfo_keepcost=133712 /* Top-most, releasable space (bytes) */ ``` ### Memory accounting stats + Each xlator defines data structures specific to its requirements. The statedump captures information about the memory usage and allocations of these structures for each xlator in the call-stack and prints them in the following format: For the xlator with the name _glusterfs_ -``` +```{.text .no-copy } [global.glusterfs - Memory usage] #[global. - Memory usage] num_types=119 #The number of data types it is using ``` - followed by the memory usage for each data-type for that translator. The following example displays a sample for the gf_common_mt_gf_timer_t type -``` +```{.text .no-copy } [global.glusterfs - usage-type gf_common_mt_gf_timer_t memusage] #[global. - usage-type memusage] size=112 #Total size allocated for data-type when the statedump was taken i.e. num_allocs * sizeof (data-type) @@ -113,7 +111,7 @@ Mempools are an optimization intended to reduce the number of allocations of a d Memory pool allocations by each xlator are displayed in the following format: -``` +```{.text .no-copy } [mempool] #Section name -----=----- pool-name=fuse:fd_t #pool-name=: @@ -129,10 +127,9 @@ max-stdalloc=0 #Maximum number of allocations from heap that were in active This information is also useful while debugging high memory usage issues as large hot_count and cur-stdalloc values may point to an element not being freed after it has been used. - ### Iobufs -``` +```{.text .no-copy } [iobuf.global] iobuf_pool=0x1f0d970 #The memory pool for iobufs iobuf_pool.default_page_size=131072 #The default size of iobuf (if no iobuf size is specified the default size is allocated) @@ -148,7 +145,7 @@ There are 3 lists of arenas 2. Purge list: arenas that can be purged(no active iobufs, active_cnt == 0). 3. Filled list: arenas without free iobufs. -``` +```{.text .no-copy } [purge.1] #purge. purge.1.mem_base=0x7fc47b35f000 #The address of the arena structure purge.1.active_cnt=0 #The number of iobufs active in that arena @@ -168,7 +165,7 @@ arena.5.page_size=32768 If the active_cnt of any arena is non zero, then the statedump will also have the iobuf list. -``` +```{.text .no-copy } [arena.6.active_iobuf.1] #arena..active_iobuf. arena.6.active_iobuf.1.ref=1 #refcount of the iobuf arena.6.active_iobuf.1.ptr=0x7fdb921a9000 #address of the iobuf @@ -180,12 +177,11 @@ arena.6.active_iobuf.2.ptr=0x7fdb92189000 A lot of filled arenas at any given point in time could be a sign of iobuf leaks. - ### Call stack The fops received by gluster are handled using call stacks. A call stack contains information about the uid/gid/pid etc of the process that is executing the fop. Each call stack contains different call-frames for each xlator which handles that fop. -``` +```{.text .no-copy } [global.callpool.stack.3] #global.callpool.stack. stack=0x7fc47a44bbe0 #Stack address uid=0 #Uid of the process executing the fop @@ -199,9 +195,10 @@ cnt=9 #Number of frames in this stack. ``` ### Call-frame + Each frame will have information about which xlator the frame belongs to, which function it wound to/from and which it will be unwound to, and whether it has unwound. -``` +```{.text .no-copy } [global.callpool.stack.3.frame.2] #global.callpool.stack..frame. frame=0x7fc47a611dbc #Frame address ref_count=0 #Incremented at the time of wind and decremented at the time of unwind. @@ -215,12 +212,11 @@ unwind_to=afr_lookup_cbk #Parent xlator function to unwind to To debug hangs in the system, see which xlator has not yet unwound its fop by checking the value of the _complete_ tag in the statedump. (_complete=0_ indicates the xlator has not yet unwound). - ### FUSE Operation History Gluster Fuse maintains a history of the operations that it has performed. -``` +```{.text .no-copy } [xlator.mount.fuse.history] TIME=2014-07-09 16:44:57.523364 message=[0] fuse_release: RELEASE(): 4590:, fd: 0x1fef0d8, gfid: 3afb4968-5100-478d-91e9-76264e634c9f @@ -234,7 +230,7 @@ message=[0] fuse_getattr_resume: 4591, STAT, path: (/iozone.tmp), gfid: (3afb496 ### Xlator configuration -``` +```{.text .no-copy } [cluster/replicate.r2-replicate-0] #Xlator type, name information child_count=2 #Number of children for the xlator #Xlator specific configuration below @@ -255,7 +251,7 @@ wait_count=1 ### Graph/inode table -``` +```{.text .no-copy } [active graph - 1] conn.1.bound_xl./data/brick01a/homegfs.hashsize=14057 @@ -268,7 +264,7 @@ conn.1.bound_xl./data/brick01a/homegfs.purge_size=0 #Number of inodes present ### Inode -``` +```{.text .no-copy } [conn.1.bound_xl./data/brick01a/homegfs.active.324] #324th inode in active inode list gfid=e6d337cf-97eb-44b3-9492-379ba3f6ad42 #Gfid of the inode nlookup=13 #Number of times lookups happened from the client or from fuse kernel @@ -285,9 +281,10 @@ ia_type=2 ``` ### Inode context + Each xlator can store information specific to it in the inode context. This context can also be printed in the statedump. Here is the inode context of the locks xlator -``` +```{.text .no-copy } [xlator.features.locks.homegfs-locks.inode] path=/homegfs/users/dfrobins/gfstest/r4/SCRATCH/fort.5102 - path of the file mandatory=0 @@ -301,10 +298,11 @@ lock-dump.domain.domain=homegfs-replicate-0:metadata #Domain name where metadata lock-dump.domain.domain=homegfs-replicate-0 #Domain name where entry/data operations take locks to maintain replication consistency inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=11141120, len=131072, pid = 18446744073709551615, owner=080b1ada117f0000, client=0xb7fc30, connection-id=compute-30-029.com-3505-2014/06/29-14:46:12:477358-homegfs-client-0-0-1, granted at Sun Jun 29 11:10:36 2014 #Active lock information ``` - -*** + +--- ## Debug With Statedumps + ### Memory leaks Statedumps can be used to determine whether the high memory usage of a process is caused by a leak. To debug the issue, generate statedumps for that process at regular intervals, or before and after running the steps that cause the memory used to increase. Once you have multiple statedumps, compare the memory allocation stats to see if any of them are increasing steadily as those could indicate a potential memory leak. @@ -315,7 +313,7 @@ The following examples walk through using statedumps to debug two different memo [BZ 1120151](https://bugzilla.redhat.com/show_bug.cgi?id=1120151) reported high memory usage by the self heal daemon whenever one of the bricks was wiped in a replicate volume and a full self-heal was invoked to heal the contents. This issue was debugged using statedumps to determine which data-structure was leaking memory. -A statedump of the self heal daemon process was taken using +A statedump of the self heal daemon process was taken using ```console kill -USR1 `` @@ -323,7 +321,7 @@ kill -USR1 `` On examining the statedump: -``` +```{.text .no-copy } grep -w num_allocs glusterdump.5225.dump.1405493251 num_allocs=77078 num_allocs=87070 @@ -338,6 +336,7 @@ hot-count=4095 ``` On searching for num_allocs with high values in the statedump, a `grep` of the statedump revealed a large number of allocations for the following data-types under the replicate xlator: + 1. gf_common_mt_asprintf 2. gf_common_mt_char 3. gf_common_mt_mem_pool. @@ -345,16 +344,15 @@ On searching for num_allocs with high values in the statedump, a `grep` of the s On checking the afr-code for allocations with tag `gf_common_mt_char`, it was found that the `data-self-heal` code path does not free one such allocated data structure. `gf_common_mt_mem_pool` suggests that there is a leak in pool memory. The `replicate-0:dict_t`, `glusterfs:data_t` and `glusterfs:data_pair_t` pools are using a lot of memory, i.e. cold_count is `0` and there are too many allocations. Checking the source code of dict.c shows that `key` in `dict` is allocated with `gf_common_mt_char` i.e. `2.` tag and value is created using gf_asprintf which in-turn uses `gf_common_mt_asprintf` i.e. `1.`. Checking the code for leaks in self-heal code paths led to a line which over-writes a variable with new dictionary even when it was already holding a reference to another dictionary. After fixing these leaks, we ran the same test to verify that none of the `num_allocs` values increased in the statedump of the self-daemon after healing 10,000 files. Please check [http://review.gluster.org/8316](http://review.gluster.org/8316) for more info about the patch/code. - #### Leaks in mempools: -The statedump output of mempools was used to test and verify the fixes for [BZ 1134221](https://bugzilla.redhat.com/show_bug.cgi?id=1134221). On code analysis, dict_t objects were found to be leaking (due to missing unref's) during name self-heal. + +The statedump output of mempools was used to test and verify the fixes for [BZ 1134221](https://bugzilla.redhat.com/show_bug.cgi?id=1134221). On code analysis, dict_t objects were found to be leaking (due to missing unref's) during name self-heal. Glusterfs was compiled with the -DDEBUG flags to have cold count set to 0 by default. The test involved creating 100 files on plain replicate volume, removing them from one of the backend bricks, and then triggering lookups on them from the mount point. A statedump of the mount process was taken before executing the test case and after it was completed. Statedump output of the fuse mount process before the test case was executed: -``` - +```{.text .no-copy } pool-name=glusterfs:dict_t hot-count=0 cold-count=0 @@ -364,12 +362,11 @@ max-alloc=0 pool-misses=33 cur-stdalloc=14 max-stdalloc=18 - ``` + Statedump output of the fuse mount process after the test case was executed: -``` - +```{.text .no-copy } pool-name=glusterfs:dict_t hot-count=0 cold-count=0 @@ -379,15 +376,15 @@ max-alloc=0 pool-misses=2841 cur-stdalloc=214 max-stdalloc=220 - ``` + Here, as cold count was 0 by default, cur-stdalloc indicates the number of dict_t objects that were allocated from the heap using mem_get(), and are yet to be freed using mem_put(). After running the test case (named selfheal of 100 files), there was a rise in the cur-stdalloc value (from 14 to 214) for dict_t. After the leaks were fixed, glusterfs was again compiled with -DDEBUG flags and the steps were repeated. Statedumps of the FUSE mount were taken before and after executing the test case to ascertain the validity of the fix. And the results were as follows: Statedump output of the fuse mount process before executing the test case: -``` +```{.text .no-copy } pool-name=glusterfs:dict_t hot-count=0 cold-count=0 @@ -397,11 +394,11 @@ max-alloc=0 pool-misses=33 cur-stdalloc=14 max-stdalloc=18 - ``` + Statedump output of the fuse mount process after executing the test case: -``` +```{.text .no-copy } pool-name=glusterfs:dict_t hot-count=0 cold-count=0 @@ -411,17 +408,18 @@ max-alloc=0 pool-misses=2837 cur-stdalloc=14 max-stdalloc=119 - ``` + The value of cur-stdalloc remained 14 after the test, indicating that the fix indeed does what it's supposed to do. ### Hangs caused by frame loss + [BZ 994959](https://bugzilla.redhat.com/show_bug.cgi?id=994959) reported that the Fuse mount hangs on a readdirp operation. Here are the steps used to locate the cause of the hang using statedump. Statedumps were taken for all gluster processes after reproducing the issue. The following stack was seen in the FUSE mount's statedump: -``` +```{.text .no-copy } [global.callpool.stack.1.frame.1] ref_count=1 translator=fuse @@ -463,8 +461,8 @@ parent=r2-quick-read wind_from=qr_readdirp wind_to=FIRST_CHILD (this)->fops->readdirp unwind_to=qr_readdirp_cbk - ``` + `unwind_to` shows that call was unwound to `afr_readdirp_cbk` from the r2-client-1 xlator. Inspecting that function revealed that afr is not unwinding the stack when fop failed. Check [http://review.gluster.org/5531](http://review.gluster.org/5531) for more info about patch/code changes. diff --git a/docs/Troubleshooting/troubleshooting-afr.md b/docs/Troubleshooting/troubleshooting-afr.md index 42bc2b4..8d85562 100644 --- a/docs/Troubleshooting/troubleshooting-afr.md +++ b/docs/Troubleshooting/troubleshooting-afr.md @@ -8,7 +8,7 @@ The first level of analysis always starts with looking at the log files. Which o Sometimes, you might need more verbose logging to figure out what’s going on: `gluster volume set $volname client-log-level $LEVEL` -where LEVEL can be any one of `DEBUG, WARNING, ERROR, INFO, CRITICAL, NONE, TRACE`. This should ideally make all the log files mentioned above to start logging at `$LEVEL`. The default is `INFO` but you can temporarily toggle it to `DEBUG` or `TRACE` if you want to see under-the-hood messages. Useful when the normal logs don’t give a clue as to what is happening. +where LEVEL can be any one of `DEBUG, WARNING, ERROR, INFO, CRITICAL, NONE, TRACE`. This should ideally make all the log files mentioned above to start logging at `$LEVEL`. The default is `INFO` but you can temporarily toggle it to `DEBUG` or `TRACE` if you want to see under-the-hood messages. Useful when the normal logs don’t give a clue as to what is happening. ## Heal related issues: @@ -20,17 +20,19 @@ Most issues I’ve seen on the mailing list and with customers can broadly fit i If the number of entries are large, then heal info will take longer than usual. While there are performance improvements to heal info being planned, a faster way to get an approx. count of the pending entries is to use the `gluster volume heal $VOLNAME statistics heal-count` command. -**Knowledge Hack:** Since we know that during the write transaction. the xattrop folder will capture the gfid-string of the file if it needs heal, we can also do an `ls /brick/.glusterfs/indices/xattrop|wc -l` on each brick to get the approx. no of entries that need heal. If this number reduces over time, it is a sign that the heal backlog is reducing. You will also see messages whenever a particular type of heal starts/ends for a given gfid, like so: +**Knowledge Hack:** Since we know that during the write transaction. the xattrop folder will capture the gfid-string of the file if it needs heal, we can also do an `ls /brick/.glusterfs/indices/xattrop|wc -l` on each brick to get the approx. no of entries that need heal. If this number reduces over time, it is a sign that the heal backlog is reducing. You will also see messages whenever a particular type of heal starts/ends for a given gfid, like so: -`[2019-05-07 12:05:14.460442] I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do] 0-testvol-replicate-0: performing entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb` +```{.text .no-copy } +[2019-05-07 12:05:14.460442] I [MSGID: 108026] [afr-self-heal-entry.c:883:afr_selfheal_entry_do] 0-testvol-replicate-0: performing entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb -`[2019-05-07 12:05:14.474710] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1` +[2019-05-07 12:05:14.474710] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed entry selfheal on d120c0cf-6e87-454b-965b-0d83a4c752bb. sources=[0] 2 sinks=1 -`[2019-05-07 12:05:14.493506] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1` +[2019-05-07 12:05:14.493506] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1 -`[2019-05-07 12:05:14.494577] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-testvol-replicate-0: performing metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5` +[2019-05-07 12:05:14.494577] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-testvol-replicate-0: performing metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5 -`[2019-05-07 12:05:14.498398] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1` +[2019-05-07 12:05:14.498398] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed metadata selfheal on a9b5f183-21eb-4fb3-a342-287d3a7dddc5. sources=[0] 2 sinks=1 +``` ### ii) Self-heal is stuck/ not getting completed. @@ -38,69 +40,88 @@ If a file seems to be forever appearing in heal info and not healing, check the - Examine the afr xattrs- Do they clearly indicate the good and bad copies? If there isn’t at least one good copy, then the file is in split-brain and you would need to use the split-brain resolution CLI. - Identify which node’s shds would be picking up the file for heal. If a file is listed in the heal info output under brick1 and brick2, then the shds on the nodes which host those bricks would attempt (and one of them would succeed) in doing the heal. - - Once the shd is identified, look at the shd logs to see if it is indeed connected to the bricks. +- Once the shd is identified, look at the shd logs to see if it is indeed connected to the bricks. This is good: -`[2019-05-07 09:53:02.912923] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-testvol-client-2: Connected to testvol-client-2, attached to remote volume '/bricks/brick3'` + +```{.text .no-copy } +[2019-05-07 09:53:02.912923] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-testvol-client-2: Connected to testvol-client-2, attached to remote volume '/bricks/brick3' +``` This indicates a disconnect: -`[2019-05-07 11:44:47.602862] I [MSGID: 114018] [client.c:2334:client_rpc_notify] 0-testvol-client-2: disconnected from testvol-client-2. Client process will keep trying to connect to glusterd until brick's port is available` -`[2019-05-07 11:44:50.953516] E [MSGID: 114058] [client-handshake.c:1456:client_query_portmap_cbk] 0-testvol-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.` +```{.text .no-copy } +[2019-05-07 11:44:47.602862] I [MSGID: 114018] [client.c:2334:client_rpc_notify] 0-testvol-client-2: disconnected from testvol-client-2. Client process will keep trying to connect to glusterd until brick's port is available + +[2019-05-07 11:44:50.953516] E [MSGID: 114058] [client-handshake.c:1456:client_query_portmap_cbk] 0-testvol-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. +``` Alternatively, take a statedump of the self-heal daemon (shd) and check if all client xlators are connected to the respective bricks. The shd must have `connected=1` for all the client xlators, meaning it can talk to all the bricks. -| Shd’s statedump entry of a client xlator that is connected to the 3rd brick | Shd’s statedump entry of the same client xlator if it is diconnected from the 3rd brick | -|:--------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------:| +| Shd’s statedump entry of a client xlator that is connected to the 3rd brick | Shd’s statedump entry of the same client xlator if it is diconnected from the 3rd brick | +| :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: | | [xlator.protocol.client.testvol-client-2.priv] connected=1 total_bytes_read=75004 ping_timeout=42 total_bytes_written=50608 ping_msgs_sent=0 msgs_sent=0 | [xlator.protocol.client.testvol-client-2.priv] connected=0 total_bytes_read=75004 ping_timeout=42 total_bytes_written=50608 ping_msgs_sent=0 msgs_sent=0 | If there are connection issues (i.e. `connected=0`), you would need to investigate and fix them. Check if the pid and the TCP/RDMA Port of the brick proceess from gluster volume status $VOLNAME matches that of `ps aux|grep glusterfsd|grep $brick-path` -`[root@tuxpad glusterfs]# gluster volume status` +```{.text .no-copy } +# gluster volume status Status of volume: testvol -Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------- -Brick 127.0.0.2:/bricks/brick1 49152 0 Y 12527 +Gluster process TCP Port RDMA Port Online Pid -`[root@tuxpad glusterfs]# ps aux|grep brick1` +--- -`root 12527 0.0 0.1 1459208 20104 ? Ssl 11:20 0:01 /usr/local/sbin/glusterfsd -s 127.0.0.2 --volfile-id testvol.127.0.0.2.bricks-brick1 -p /var/run/gluster/vols/testvol/127.0.0.2-bricks-brick1.pid -S /var/run/gluster/70529980362a17d6.socket --brick-name /bricks/brick1 -l /var/log/glusterfs/bricks/bricks-brick1.log --xlator-option *-posix.glusterd-uuid=d90b1532-30e5-4f9d-a75b-3ebb1c3682d4 --process-name brick --brick-port 49152 --xlator-option testvol-server.listen-port=49152` +Brick 127.0.0.2:/bricks/brick1 49152 0 Y 12527 +``` + +```{.text .no-copy } +# ps aux|grep brick1 + +root 12527 0.0 0.1 1459208 20104 ? Ssl 11:20 0:01 /usr/local/sbin/glusterfsd -s 127.0.0.2 --volfile-id testvol.127.0.0.2.bricks-brick1 -p /var/run/gluster/vols/testvol/127.0.0.2-bricks-brick1.pid -S /var/run/gluster/70529980362a17d6.socket --brick-name /bricks/brick1 -l /var/log/glusterfs/bricks/bricks-brick1.log --xlator-option *-posix.glusterd-uuid=d90b1532-30e5-4f9d-a75b-3ebb1c3682d4 --process-name brick --brick-port 49152 --xlator-option testvol-server.listen-port=49152 +``` Though this will likely match, sometimes there could be a bug leading to stale port usage. A quick workaround would be to restart glusterd on that node and check if things match. Report the issue to the devs if you see this problem. - I have seen some cases where a file is listed in heal info, and the afr xattrs indicate pending metadata or data heal but the file itself is not present on all bricks. Ideally, the parent directory of the file must have pending entry heal xattrs so that the file either gets created on the missing bricks or gets deleted from the ones where it is present. But if the parent dir doesn’t have xattrs, the entry heal can’t proceed. In such cases, you can - -- Either do a lookup directly on the file from the mount so that name heal is triggered and then shd can pickup the data/metadata heal. - -- Or manually set entry xattrs on the parent dir to emulate an entry heal so that the file gets created as a part of it. - -- If a brick’s underlying filesystem/lvm was damaged and fsck’d to recovery, some files/dirs might be missing on it. If there is a lot of missing info on the recovered bricks, it might be better to just to a replace-brick or reset-brick and let the heal fully sync everything rather than fiddling with afr xattrs of individual entries. -**Hack:** How to trigger heal on *any* file/directory + - Either do a lookup directly on the file from the mount so that name heal is triggered and then shd can pickup the data/metadata heal. + - Or manually set entry xattrs on the parent dir to emulate an entry heal so that the file gets created as a part of it. + - If a brick’s underlying filesystem/lvm was damaged and fsck’d to recovery, some files/dirs might be missing on it. If there is a lot of missing info on the recovered bricks, it might be better to just to a replace-brick or reset-brick and let the heal fully sync everything rather than fiddling with afr xattrs of individual entries. + +**Hack:** How to trigger heal on _any_ file/directory Knowing about self-heal logic and index heal from the previous post, we can sort of emulate a heal with the following steps. This is not something that you should be doing on your cluster but it pays to at least know that it is possible when push comes to shove. 1. Picking one brick as good and setting the afr pending xattr on it blaming the bad bricks. 2. Capture the gfid inside .glusterfs/indices/xattrop so that the shd can pick it up during index heal. 3. Finally, trigger index heal: gluster volume heal $VOLNAME . -*Example:* Let us say a FILE-1 exists with `trusted.gfid=0x1ad2144928124da9b7117d27393fea5c` on all bricks of a replica 3 volume called testvol. It has no afr xattrs. But you still need to emulate a heal. Let us say you choose brick-2 as the source. Let us do the steps listed above: +_Example:_ Let us say a FILE-1 exists with `trusted.gfid=0x1ad2144928124da9b7117d27393fea5c` on all bricks of a replica 3 volume called testvol. It has no afr xattrs. But you still need to emulate a heal. Let us say you choose brick-2 as the source. Let us do the steps listed above: -1. Make brick-2 blame the other 2 bricks: -[root@tuxpad fuse_mnt]# setfattr -n trusted.afr.testvol-client-2 -v 0x000000010000000000000000 /bricks/brick2/FILE-1 -[root@tuxpad fuse_mnt]# setfattr -n trusted.afr.testvol-client-1 -v 0x000000010000000000000000 /bricks/brick2/FILE-1 +1. Make brick-2 blame the other 2 bricks: -2. Store the gfid string inside xattrop folder as a hardlink to the base entry: -root@tuxpad ~]# cd /bricks/brick2/.glusterfs/indices/xattrop/ -[root@tuxpad xattrop]# ls -li -total 0 -17829255 ----------. 1 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7` -[root@tuxpad xattrop]# ln xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 1ad21449-2812-4da9-b711-7d27393fea5c -[root@tuxpad xattrop]# ll -total 0 -----------. 2 root root 0 May 10 11:20 1ad21449-2812-4da9-b711-7d27393fea5c -----------. 2 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 + setfattr -n trusted.afr.testvol-client-2 -v 0x000000010000000000000000 /bricks/brick2/FILE-1 + setfattr -n trusted.afr.testvol-client-1 -v 0x000000010000000000000000 /bricks/brick2/FILE-1 -3. Trigger heal: gluster volume heal testvol -The glustershd.log of node-2 should log about the heal. -[2019-05-10 06:10:46.027238] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on 1ad21449-2812-4da9-b711-7d27393fea5c. sources=[1] sinks=0 2 -So the data was healed from the second brick to the first and third brick. +2. Store the gfid string inside xattrop folder as a hardlink to the base entry: + + # cd /bricks/brick2/.glusterfs/indices/xattrop/ + # ls -li + total 0 + 17829255 ----------. 1 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7` + + # ln xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 1ad21449-2812-4da9-b711-7d27393fea5c + # ll + total 0 + ----------. 2 root root 0 May 10 11:20 1ad21449-2812-4da9-b711-7d27393fea5c + ----------. 2 root root 0 May 10 11:20 xattrop-a400ca91-cec9-4463-a183-aca9eaff9fa7 + +3. Trigger heal: `gluster volume heal testvol` + + The glustershd.log of node-2 should log about the heal. + + [2019-05-10 06:10:46.027238] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-testvol-replicate-0: Completed data selfheal on 1ad21449-2812-4da9-b711-7d27393fea5c. sources=[1] sinks=0 2 + + So the data was healed from the second brick to the first and third brick. ### iii) Self-heal is too slow @@ -109,7 +130,7 @@ If the heal backlog is decreasing and you see glustershd logging heals but you Option: cluster.shd-max-threads Default Value: 1 Description: Maximum number of parallel heals SHD can do per local brick. This can substantially lower heal times, but can also crush your bricks if you don’t have the storage hardware to support this. - + Option: cluster.shd-wait-qlength Default Value: 1024 Description: This option can be used to control number of heals that can wait in SHD per subvolume @@ -118,38 +139,45 @@ I’m not covering it here but it is possible to launch multiple shd instances ( ### iv) Self-heal is too aggressive and slows down the system. -If shd-max-threads are at the lowest value (i.e. 1) and you see if CPU usage of the bricks is too high, you can check if the volume’s profile info shows a lot of RCHECKSUM fops. Data self-heal does checksum calculation (i.e the `posix_rchecksum()` FOP) which can be CPU intensive. You can the `cluster.data-self-heal-algorithm` option to full. This does a full file copy instead of computing rolling checksums and syncing only the mismatching blocks. The tradeoff is that the network consumption will be increased. +If shd-max-threads are at the lowest value (i.e. 1) and you see if CPU usage of the bricks is too high, you can check if the volume’s profile info shows a lot of RCHECKSUM fops. Data self-heal does checksum calculation (i.e the `posix_rchecksum()` FOP) which can be CPU intensive. You can the `cluster.data-self-heal-algorithm` option to full. This does a full file copy instead of computing rolling checksums and syncing only the mismatching blocks. The tradeoff is that the network consumption will be increased. -You can also disable all client-side heals if they are turned on so that the client bandwidth is consumed entirely by the application FOPs and not the ones by client side background heals. i.e. turn off `cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal`. -Note: In recent versions of gluster, client-side heals are disabled by default. +You can also disable all client-side heals if they are turned on so that the client bandwidth is consumed entirely by the application FOPs and not the ones by client side background heals. i.e. turn off `cluster.metadata-self-heal, cluster.data-self-heal and cluster.entry-self-heal`. +Note: In recent versions of gluster, client-side heals are disabled by default. ## Mount related issues: - ### i) All fops are failing with ENOTCONN + +### i) All fops are failing with ENOTCONN Check mount log/ statedump for loss of quorum, just like for glustershd. If this is a fuse client (as opposed to an nfs/ gfapi client), you can also check the .meta folder to check the connection status to the bricks. -`[root@tuxpad ~]# cat /mnt/fuse_mnt/.meta/graphs/active/testvol-client-*/private |grep connected` -`connected = 0` -`connected = 1` -`connected = 1` +```{.text .no-copy } +# cat /mnt/fuse_mnt/.meta/graphs/active/testvol-client-*/private |grep connected -If `connected=0`, the connection to that brick is lost. Find out why. If the client is not connected to quorum number of bricks, then AFR fails lookups (and therefore any subsequent FOP) with Transport endpoint is not connected +connected = 0 +connected = 1 +connected = 1 +``` + +If `connected=0`, the connection to that brick is lost. Find out why. If the client is not connected to quorum number of bricks, then AFR fails lookups (and therefore any subsequent FOP) with Transport endpoint is not connected ### ii) FOPs on some files are failing with ENOTCONN Check mount log for the file being unreadable: -`[2019-05-10 11:04:01.607046] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 13-testvol-replicate-0: no read subvols for /FILE.txt` -`[2019-05-10 11:04:01.607775] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 234: LOOKUP() /FILE.txt => -1 (Transport endpoint is not connected)` -This means there was only 1 good copy and the client has lost connection to that brick. You need to ensure that the client is connected to all bricks. +```{.text .no-copy } +[2019-05-10 11:04:01.607046] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 13-testvol-replicate-0: no read subvols for /FILE.txt +[2019-05-10 11:04:01.607775] W [fuse-bridge.c:939:fuse_entry_cbk] 0-glusterfs-fuse: 234: LOOKUP() /FILE.txt => -1 (Transport endpoint is not connected) +``` + +This means there was only 1 good copy and the client has lost connection to that brick. You need to ensure that the client is connected to all bricks. ### iii) Mount is hung It can be difficult to pin-point the issue immediately and might require assistance from the developers but the first steps to debugging could be to - - strace the fuse mount; see where it is hung. - - Take a statedump of the mount to see which xlator has frames that are not wound (i.e. complete=0) and for which FOP. Then check the source code to see if there are any unhanded cases where the xlator doesn’t wind the FOP to its child. - - Take statedump of bricks to see if there are any stale locks. An indication of stale locks is the same lock being present in multiple statedumps or the ‘granted’ date being very old. +- strace the fuse mount; see where it is hung. +- Take a statedump of the mount to see which xlator has frames that are not wound (i.e. complete=0) and for which FOP. Then check the source code to see if there are any unhanded cases where the xlator doesn’t wind the FOP to its child. +- Take statedump of bricks to see if there are any stale locks. An indication of stale locks is the same lock being present in multiple statedumps or the ‘granted’ date being very old. Excerpt from a brick statedump: diff --git a/docs/Troubleshooting/troubleshooting-filelocks.md b/docs/Troubleshooting/troubleshooting-filelocks.md index ec5da40..aaf42b5 100644 --- a/docs/Troubleshooting/troubleshooting-filelocks.md +++ b/docs/Troubleshooting/troubleshooting-filelocks.md @@ -1,6 +1,4 @@ -Troubleshooting File Locks -========================== - +# Troubleshooting File Locks Use [statedumps](./statedump.md) to find and list the locks held on files. The statedump output also provides information on each lock @@ -13,11 +11,11 @@ lock using the following `clear lock` commands. 1. **Perform statedump on the volume to view the files that are locked using the following command:** - # gluster volume statedump inode + gluster volume statedump inode For example, to display statedump of test-volume: - # gluster volume statedump test-volume + gluster volume statedump test-volume Volume statedump successful The statedump files are created on the brick servers in the` /tmp` @@ -58,25 +56,23 @@ lock using the following `clear lock` commands. 2. **Clear the lock using the following command:** - # gluster volume clear-locks + gluster volume clear-locks For example, to clear the entry lock on `file1` of test-volume: - # gluster volume clear-locks test-volume / kind granted entry file1 + gluster volume clear-locks test-volume / kind granted entry file1 Volume clear-locks successful vol-locks: entry blocked locks=0 granted locks=1 3. **Clear the inode lock using the following command:** - # gluster volume clear-locks + gluster volume clear-locks For example, to clear the inode lock on `file1` of test-volume: - # gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0 + gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0 Volume clear-locks successful vol-locks: inode blocked locks=0 granted locks=1 Perform statedump on test-volume again to verify that the above inode and entry locks are cleared. - - diff --git a/docs/Troubleshooting/troubleshooting-georep.md b/docs/Troubleshooting/troubleshooting-georep.md index 9ef49fe..cb66538 100644 --- a/docs/Troubleshooting/troubleshooting-georep.md +++ b/docs/Troubleshooting/troubleshooting-georep.md @@ -8,13 +8,13 @@ to GlusterFS Geo-replication. For every Geo-replication session, the following three log files are associated to it (four, if the secondary is a gluster volume): -- **Primary-log-file** - log file for the process which monitors the Primary - volume -- **Secondary-log-file** - log file for process which initiates the changes in - secondary -- **Primary-gluster-log-file** - log file for the maintenance mount point - that Geo-replication module uses to monitor the Primary volume -- **Secondary-gluster-log-file** - is the secondary's counterpart of it +- **Primary-log-file** - log file for the process which monitors the Primary + volume +- **Secondary-log-file** - log file for process which initiates the changes in + secondary +- **Primary-gluster-log-file** - log file for the maintenance mount point + that Geo-replication module uses to monitor the Primary volume +- **Secondary-gluster-log-file** - is the secondary's counterpart of it **Primary Log File** @@ -28,7 +28,7 @@ gluster volume geo-replication config log-file For example: ```console -# gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file +gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file ``` **Secondary Log File** @@ -38,13 +38,13 @@ running on secondary machine), use the following commands: 1. On primary, run the following command: - # gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66 + gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66 Displays the session owner details. 2. On secondary, run the following command: - # gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log + gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log 3. Replace the session owner details (output of Step 1) to the output of Step 2 to get the location of the log file. @@ -52,7 +52,7 @@ running on secondary machine), use the following commands: /var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log ### Rotating Geo-replication Logs - + Administrators can rotate the log file of a particular primary-secondary session, as needed. When you run geo-replication's ` log-rotate` command, the log file is backed up with the current timestamp suffixed @@ -61,34 +61,34 @@ log file. **To rotate a geo-replication log file** -- Rotate log file for a particular primary-secondary session using the - following command: +- Rotate log file for a particular primary-secondary session using the + following command: - # gluster volume geo-replication log-rotate + gluster volume geo-replication log-rotate - For example, to rotate the log file of primary `Volume1` and secondary - `example.com:/data/remote_dir` : + For example, to rotate the log file of primary `Volume1` and secondary + `example.com:/data/remote_dir` : - # gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate + gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate log rotate successful -- Rotate log file for all sessions for a primary volume using the - following command: +- Rotate log file for all sessions for a primary volume using the + following command: - # gluster volume geo-replication log-rotate + gluster volume geo-replication log-rotate - For example, to rotate the log file of primary `Volume1`: + For example, to rotate the log file of primary `Volume1`: - # gluster volume geo-replication Volume1 log rotate + gluster volume geo-replication Volume1 log rotate log rotate successful -- Rotate log file for all sessions using the following command: +- Rotate log file for all sessions using the following command: - # gluster volume geo-replication log-rotate + gluster volume geo-replication log-rotate - For example, to rotate the log file for all sessions: + For example, to rotate the log file for all sessions: - # gluster volume geo-replication log rotate + gluster volume geo-replication log rotate log rotate successful ### Synchronization is not complete @@ -102,16 +102,14 @@ GlusterFS geo-replication begins synchronizing all the data. All files are compared using checksum, which can be a lengthy and high resource utilization operation on large data sets. - ### Issues in Data Synchronization **Description**: Geo-replication display status as OK, but the files do not get synced, only directories and symlink gets synced with the following error message in the log: -```console -[2011-05-02 13:42:13.467644] E [primary:288:regjob] GMaster: failed to -sync ./some\_file\` +```{ .text .no-copy } +[2011-05-02 13:42:13.467644] E [primary:288:regjob] GMaster: failed to sync ./some\_file\` ``` **Solution**: Geo-replication invokes rsync v3.0.0 or higher on the host @@ -123,7 +121,7 @@ required version. **Description**: Geo-replication displays status as faulty very often with a backtrace similar to the following: -```console +```{ .text .no-copy } 2011-04-28 14:06:18.378859] E [syncdutils:131:log\_raise\_exception] \: FAIL: Traceback (most recent call last): File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line @@ -139,28 +137,28 @@ the primary gsyncd module and secondary gsyncd module is broken and this can happen for various reasons. Check if it satisfies all the following pre-requisites: -- Password-less SSH is set up properly between the host and the remote - machine. -- If FUSE is installed in the machine, because geo-replication module - mounts the GlusterFS volume using FUSE to sync data. -- If the **Secondary** is a volume, check if that volume is started. -- If the Secondary is a plain directory, verify if the directory has been - created already with the required permissions. -- If GlusterFS 3.2 or higher is not installed in the default location - (in Primary) and has been prefixed to be installed in a custom - location, configure the `gluster-command` for it to point to the - exact location. -- If GlusterFS 3.2 or higher is not installed in the default location - (in secondary) and has been prefixed to be installed in a custom - location, configure the `remote-gsyncd-command` for it to point to - the exact place where gsyncd is located. +- Password-less SSH is set up properly between the host and the remote + machine. +- If FUSE is installed in the machine, because geo-replication module + mounts the GlusterFS volume using FUSE to sync data. +- If the **Secondary** is a volume, check if that volume is started. +- If the Secondary is a plain directory, verify if the directory has been + created already with the required permissions. +- If GlusterFS 3.2 or higher is not installed in the default location + (in Primary) and has been prefixed to be installed in a custom + location, configure the `gluster-command` for it to point to the + exact location. +- If GlusterFS 3.2 or higher is not installed in the default location + (in secondary) and has been prefixed to be installed in a custom + location, configure the `remote-gsyncd-command` for it to point to + the exact place where gsyncd is located. ### Intermediate Primary goes to Faulty State **Description**: In a cascading set-up, the intermediate primary goes to faulty state with the following log: -```console +```{ .text .no-copy } raise RuntimeError ("aborting on uuid change from %s to %s" % \\ RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f- 4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154 diff --git a/docs/Troubleshooting/troubleshooting-glusterd.md b/docs/Troubleshooting/troubleshooting-glusterd.md index c42936b..dfa2ed7 100644 --- a/docs/Troubleshooting/troubleshooting-glusterd.md +++ b/docs/Troubleshooting/troubleshooting-glusterd.md @@ -4,45 +4,40 @@ The glusterd daemon runs on every trusted server node and is responsible for the The gluster CLI sends commands to the glusterd daemon on the local node, which executes the operation and returns the result to the user. -
- ### Debugging glusterd #### Logs + Start by looking at the log files for clues as to what went wrong when you hit a problem. The default directory for Gluster logs is /var/log/glusterfs. The logs for the CLI and glusterd are: - - glusterd : /var/log/glusterfs/glusterd.log - - gluster CLI : /var/log/glusterfs/cli.log - +- glusterd : /var/log/glusterfs/glusterd.log +- gluster CLI : /var/log/glusterfs/cli.log #### Statedumps + Statedumps are useful in debugging memory leaks and hangs. See [Statedump](./statedump.md) for more details. -
- ### Common Issues and How to Resolve Them - -**"*Another transaction is in progress for volname*" or "*Locking failed on xxx.xxx.xxx.xxx"*** +**"_Another transaction is in progress for volname_" or "_Locking failed on xxx.xxx.xxx.xxx"_** As Gluster is distributed by nature, glusterd takes locks when performing operations to ensure that configuration changes made to a volume are atomic across the cluster. These errors are returned when: -* More than one transaction contends on the same lock. -> *Solution* : These are likely to be transient errors and the operation will succeed if retried once the other transaction is complete. +- More than one transaction contends on the same lock. -* A stale lock exists on one of the nodes. -> *Solution* : Repeating the operation will not help until the stale lock is cleaned up. Restart the glusterd process holding the lock + > _Solution_ : These are likely to be transient errors and the operation will succeed if retried once the other transaction is complete. - * Check the glusterd.log file to find out which node holds the stale lock. Look for the message: - `lock being held by ` - * Run `gluster peer status` to identify the node with the uuid in the log message. - * Restart glusterd on that node. +- A stale lock exists on one of the nodes. + > _Solution_ : Repeating the operation will not help until the stale lock is cleaned up. Restart the glusterd process holding the lock -
+ - Check the glusterd.log file to find out which node holds the stale lock. Look for the message: + `lock being held by ` + - Run `gluster peer status` to identify the node with the uuid in the log message. + - Restart glusterd on that node. **"_Transport endpoint is not connected_" errors but all bricks are up** @@ -51,51 +46,40 @@ Gluster client processes query glusterd for the ports the bricks processes are l If the port information in glusterd is incorrect, the client will fail to connect to the brick even though it is up. Operations which would need to access that brick may fail with "Transport endpoint is not connected". -*Solution* : Restart the glusterd service. - -
+_Solution_ : Restart the glusterd service. **"Peer Rejected"** `gluster peer status` returns "Peer Rejected" for a node. -```console +```{ .text .no-copy } Hostname: Uuid: State: Peer Rejected (Connected) ``` -This indicates that the volume configuration on the node is not in sync with the rest of the trusted storage pool. +This indicates that the volume configuration on the node is not in sync with the rest of the trusted storage pool. You should see the following message in the glusterd log for the node on which the peer status command was run: -```console +```{ .text .no-copy } Version of Cksums differ. local cksum = xxxxxx, remote cksum = xxxxyx on peer ``` -*Solution*: Update the cluster.op-version +_Solution_: Update the cluster.op-version - * Run `gluster volume get all cluster.max-op-version` to get the latest supported op-version. - * Update the cluster.op-version to the latest supported op-version by executing `gluster volume set all cluster.op-version `. - -
+- Run `gluster volume get all cluster.max-op-version` to get the latest supported op-version. +- Update the cluster.op-version to the latest supported op-version by executing `gluster volume set all cluster.op-version `. **"Accepted Peer Request"** -If the glusterd handshake fails while expanding a cluster, the view of the cluster will be inconsistent. The state of the peer in `gluster peer status` will be “accepted peer request” and subsequent CLI commands will fail with an error. -Eg. `Volume create command will fail with "volume create: testvol: failed: Host is not in 'Peer in Cluster' state` - +If the glusterd handshake fails while expanding a cluster, the view of the cluster will be inconsistent. The state of the peer in `gluster peer status` will be “accepted peer request” and subsequent CLI commands will fail with an error. +Eg. `Volume create command will fail with "volume create: testvol: failed: Host is not in 'Peer in Cluster' state` + In this case the value of the state field in `/var/lib/glusterd/peers/` will be other than 3. -*Solution*: - -* Stop glusterd -* Open `/var/lib/glusterd/peers/` -* Change state to 3 -* Start glusterd - - - - - - +_Solution_: +- Stop glusterd +- Open `/var/lib/glusterd/peers/` +- Change state to 3 +- Start glusterd diff --git a/docs/Troubleshooting/troubleshooting-gnfs.md b/docs/Troubleshooting/troubleshooting-gnfs.md index 7e2c61a..9d7c455 100644 --- a/docs/Troubleshooting/troubleshooting-gnfs.md +++ b/docs/Troubleshooting/troubleshooting-gnfs.md @@ -11,14 +11,14 @@ This error is encountered when the server has not started correctly. On most Linux distributions this is fixed by starting portmap: ```console -# /etc/init.d/portmap start +/etc/init.d/portmap start ``` On some distributions where portmap has been replaced by rpcbind, the following command is required: ```console -# /etc/init.d/rpcbind start +/etc/init.d/rpcbind start ``` After starting portmap or rpcbind, gluster NFS server needs to be @@ -32,13 +32,13 @@ This error can arise in case there is already a Gluster NFS server running on the same machine. This situation can be confirmed from the log file, if the following error lines exist: -```text +```{ .text .no-copy } [2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use -[2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use -[2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection -[2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed -[2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 -[2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed +[2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use +[2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection +[2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed +[2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 +[2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols ``` @@ -50,7 +50,7 @@ multiple NFS servers on the same machine. If the mount command fails with the following error message: -```console +```{ .text .no-copy } mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. ``` @@ -59,7 +59,7 @@ For NFS clients to mount the NFS server, rpc.statd service must be running on the clients. Start rpc.statd service by running the following command: ```console -# rpc.statd +rpc.statd ``` ### mount command takes too long to finish. @@ -71,14 +71,14 @@ NFS client. The resolution for this is to start either of these services by running the following command: ```console -# /etc/init.d/portmap start +/etc/init.d/portmap start ``` On some distributions where portmap has been replaced by rpcbind, the following command is required: ```console -# /etc/init.d/rpcbind start +/etc/init.d/rpcbind start ``` ### NFS server glusterfsd starts but initialization fails with “nfsrpc- service: portmap registration of program failed” error message in the log. @@ -88,8 +88,8 @@ still fail preventing clients from accessing the mount points. Such a situation can be confirmed from the following error messages in the log file: -```text -[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap +```{ .text .no-copy } +[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap [2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed [2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed @@ -104,12 +104,12 @@ file: On most Linux distributions, portmap can be started using the following command: - # /etc/init.d/portmap start + /etc/init.d/portmap start On some distributions where portmap has been replaced by rpcbind, run the following command: - # /etc/init.d/rpcbind start + /etc/init.d/rpcbind start After starting portmap or rpcbind, gluster NFS server needs to be restarted. @@ -126,8 +126,8 @@ file: On Linux, kernel NFS servers can be stopped by using either of the following commands depending on the distribution in use: - # /etc/init.d/nfs-kernel-server stop - # /etc/init.d/nfs stop + /etc/init.d/nfs-kernel-server stop + /etc/init.d/nfs stop 3. **Restart Gluster NFS server** @@ -135,7 +135,7 @@ file: mount command fails with following error -```console +```{ .text .no-copy } mount: mount to NFS server '10.1.10.11' failed: timed out (retrying). ``` @@ -175,14 +175,13 @@ Perform one of the following to resolve this issue: forcing the NFS client to use version 3. The **vers** option to mount command is used for this purpose: - # mount -o vers=3 + mount -o vers=3 -### showmount fails with clnt\_create: RPC: Unable to receive +### showmount fails with clnt_create: RPC: Unable to receive Check your firewall setting to open ports 111 for portmap requests/replies and Gluster NFS server requests/replies. Gluster NFS -server operates over the following port numbers: 38465, 38466, and -38467. +server operates over the following port numbers: 38465, 38466, and 38467. ### Application fails with "Invalid argument" or "Value too large for defined data type" error. @@ -193,9 +192,9 @@ numbers instead: nfs.enable-ino32 \ Applications that will benefit are those that were either: -- built 32-bit and run on 32-bit machines such that they do not - support large files by default -- built 32-bit on 64-bit systems +- built 32-bit and run on 32-bit machines such that they do not + support large files by default +- built 32-bit on 64-bit systems This option is disabled by default so NFS returns 64-bit inode numbers by default. @@ -203,6 +202,6 @@ by default. Applications which can be rebuilt from source are recommended to rebuild using the following flag with gcc: -``` +```console -D_FILE_OFFSET_BITS=64 ``` diff --git a/docs/Troubleshooting/troubleshooting-memory.md b/docs/Troubleshooting/troubleshooting-memory.md index 12336d5..70d83b2 100644 --- a/docs/Troubleshooting/troubleshooting-memory.md +++ b/docs/Troubleshooting/troubleshooting-memory.md @@ -1,5 +1,4 @@ -Troubleshooting High Memory Utilization -======================================= +# Troubleshooting High Memory Utilization If the memory utilization of a Gluster process increases significantly with time, it could be a leak caused by resources not being freed. If you suspect that you may have hit such an issue, try using [statedumps](./statedump.md) to debug the issue. @@ -12,4 +11,3 @@ If you are unable to figure out where the leak is, please [file an issue](https: - Steps to reproduce the issue if available - Statedumps for the process collected at intervals as the memory utilization increases - The Gluster log files for the process (if possible) - diff --git a/docs/Upgrade-Guide/README.md b/docs/Upgrade-Guide/README.md index b517fe5..be15fd1 100644 --- a/docs/Upgrade-Guide/README.md +++ b/docs/Upgrade-Guide/README.md @@ -1,32 +1,32 @@ -Upgrading GlusterFS -------------------- -- [About op-version](./op-version.md) +## Upgrading GlusterFS + +- [About op-version](./op-version.md) If you are using GlusterFS version 6.x or above, you can upgrade it to the following: -- [Upgrading to 10](./upgrade-to-10.md) -- [Upgrading to 9](./upgrade-to-9.md) -- [Upgrading to 8](./upgrade-to-8.md) -- [Upgrading to 7](./upgrade-to-7.md) +- [Upgrading to 10](./upgrade-to-10.md) +- [Upgrading to 9](./upgrade-to-9.md) +- [Upgrading to 8](./upgrade-to-8.md) +- [Upgrading to 7](./upgrade-to-7.md) If you are using GlusterFS version 5.x or above, you can upgrade it to the following: -- [Upgrading to 8](./upgrade-to-8.md) -- [Upgrading to 7](./upgrade-to-7.md) -- [Upgrading to 6](./upgrade-to-6.md) +- [Upgrading to 8](./upgrade-to-8.md) +- [Upgrading to 7](./upgrade-to-7.md) +- [Upgrading to 6](./upgrade-to-6.md) If you are using GlusterFS version 4.x or above, you can upgrade it to the following: -- [Upgrading to 6](./upgrade-to-6.md) -- [Upgrading to 5](./upgrade-to-5.md) +- [Upgrading to 6](./upgrade-to-6.md) +- [Upgrading to 5](./upgrade-to-5.md) If you are using GlusterFS version 3.4.x or above, you can upgrade it to following: -- [Upgrading to 3.5](./upgrade-to-3.5.md) -- [Upgrading to 3.6](./upgrade-to-3.6.md) -- [Upgrading to 3.7](./upgrade-to-3.7.md) -- [Upgrading to 3.9](./upgrade-to-3.9.md) -- [Upgrading to 3.10](./upgrade-to-3.10.md) -- [Upgrading to 3.11](./upgrade-to-3.11.md) -- [Upgrading to 3.12](./upgrade-to-3.12.md) -- [Upgrading to 3.13](./upgrade-to-3.13.md) +- [Upgrading to 3.5](./upgrade-to-3.5.md) +- [Upgrading to 3.6](./upgrade-to-3.6.md) +- [Upgrading to 3.7](./upgrade-to-3.7.md) +- [Upgrading to 3.9](./upgrade-to-3.9.md) +- [Upgrading to 3.10](./upgrade-to-3.10.md) +- [Upgrading to 3.11](./upgrade-to-3.11.md) +- [Upgrading to 3.12](./upgrade-to-3.12.md) +- [Upgrading to 3.13](./upgrade-to-3.13.md) diff --git a/docs/Upgrade-Guide/generic-upgrade-procedure.md b/docs/Upgrade-Guide/generic-upgrade-procedure.md index 2829069..3a2ad21 100644 --- a/docs/Upgrade-Guide/generic-upgrade-procedure.md +++ b/docs/Upgrade-Guide/generic-upgrade-procedure.md @@ -1,6 +1,7 @@ # Generic Upgrade procedure ### Pre-upgrade notes + - Online upgrade is only possible with replicated and distributed replicate volumes - Online upgrade is not supported for dispersed or distributed dispersed volumes - Ensure no configuration changes are done during the upgrade @@ -9,27 +10,28 @@ - It is recommended to have the same client and server, major versions running eventually ### Online upgrade procedure for servers + This procedure involves upgrading **one server at a time**, while keeping the volume(s) online and client IO ongoing. This procedure assumes that multiple replicas of a replica set, are not part of the same server in the trusted storage pool. > **ALERT:** If there are disperse or, pure distributed volumes in the storage pool being upgraded, this procedure is NOT recommended, use the [Offline upgrade procedure](#offline-upgrade-procedure) instead. #### Repeat the following steps, on each server in the trusted storage pool, to upgrade the entire pool to new-version : -1. Stop all gluster services, either using the command below, or through other means. +1. Stop all gluster services, either using the command below, or through other means. - # systemctl stop glusterd - # systemctl stop glustereventsd - # killall glusterfs glusterfsd glusterd + systemctl stop glusterd + systemctl stop glustereventsd + killall glusterfs glusterfsd glusterd -2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.) +2. Stop all applications that run on this server and access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.) -3. Install Gluster new-version, below example shows how to create a repository on fedora and use it to upgrade : +3. Install Gluster new-version, below example shows how to create a repository on fedora and use it to upgrade : - 3.1 Create a private repository (assuming /new-gluster-rpms/ folder has the new rpms ): + 3.1 Create a private repository (assuming /new-gluster-rpms/ folder has the new rpms ): - # createrepo /new-gluster-rpms/ + createrepo /new-gluster-rpms/ - 3.2 Create the .repo file in /etc/yum.d/ : + 3.2 Create the .repo file in /etc/yum.d/ : # cat /etc/yum.d/newglusterrepo.repo [newglusterrepo] @@ -38,76 +40,74 @@ This procedure involves upgrading **one server at a time**, while keeping the vo gpgcheck=0 enabled=1 - 3.3 Upgrade glusterfs, for example to upgrade glusterfs-server to x.y version : + 3.3 Upgrade glusterfs, for example to upgrade glusterfs-server to x.y version : - # yum update glusterfs-server-x.y.fc30.x86_64.rpm + yum update glusterfs-server-x.y.fc30.x86_64.rpm -4. Ensure that version reflects new-version in the output of, +4. Ensure that version reflects new-version in the output of, - # gluster --version + gluster --version -5. Start glusterd on the upgraded server +5. Start glusterd on the upgraded server - # systemctl start glusterd + systemctl start glusterd -6. Ensure that all gluster processes are online by checking the output of, +6. Ensure that all gluster processes are online by checking the output of, - # gluster volume status + gluster volume status -7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means, +7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means, - # systemctl start glustereventsd + systemctl start glustereventsd -8. Invoke self-heal on all the gluster volumes by running, +8. Invoke self-heal on all the gluster volumes by running, - # for i in `gluster volume list`; do gluster volume heal $i; done + for i in `gluster volume list`; do gluster volume heal $i; done -9. Verify that there are no heal backlog by running the command for all the volumes, +9. Verify that there are no heal backlog by running the command for all the volumes, - # gluster volume heal info + gluster volume heal info > **NOTE:** Before proceeding to upgrade the next server in the pool it is recommended to check the heal backlog. If there is a heal backlog, it is recommended to wait until the backlog is empty, or, the backlog does not contain any entries requiring a sync to the just upgraded server. -10. Restart any gfapi based application stopped previously in step (2) +1. Restart any gfapi based application stopped previously in step (2) ### Offline upgrade procedure + This procedure involves cluster downtime and during the upgrade window, clients are not allowed access to the volumes. #### Steps to perform an offline upgrade: -1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means, -```sh +1. On every server in the trusted storage pool, stop all gluster services, either using the command below, or through other means, - # systemctl stop glusterd - # systemctl stop glustereventsd - # killall glusterfs glusterfsd glusterd -``` -2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers + systemctl stop glusterd + systemctl stop glustereventsd + killall glusterfs glusterfsd glusterd -3. Install Gluster new-version, on all servers +2. Stop all applications that access the volumes via gfapi (qemu, NFS-Ganesha, Samba, etc.), across all servers -4. Ensure that version reflects new-version in the output of the following command on all servers, -```sh - # gluster --version -``` +3. Install Gluster new-version, on all servers -5. Start glusterd on all the upgraded servers -```sh - # systemctl start glusterd -``` -6. Ensure that all gluster processes are online by checking the output of, -```sh - # gluster volume status -``` +4. Ensure that version reflects new-version in the output of the following command on all servers, -7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means, -```sh - # systemctl start glustereventsd -``` + gluster --version -8. Restart any gfapi based application stopped previously in step (2) +5. Start glusterd on all the upgraded servers + + systemctl start glusterd + +6. Ensure that all gluster processes are online by checking the output of, + + gluster volume status + +7. If the glustereventsd service was previously enabled, it is required to start it using the commands below, or, through other means, + + systemctl start glustereventsd + +8. Restart any gfapi based application stopped previously in step (2) ### Post upgrade steps + Perform the following steps post upgrading the entire trusted storage pool, - It is recommended to update the op-version of the cluster. Refer, to the [op-version](./op-version.md) section for further details @@ -117,12 +117,13 @@ Perform the following steps post upgrading the entire trusted storage pool, #### If upgrading from a version lesser than Gluster 7.0 > **NOTE:** If you have ever enabled quota on your volumes then after the upgrade -is done, you will have to restart all the nodes in the cluster one by one so as to -fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols// directory.` -The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted -everything will be back to normal. +> is done, you will have to restart all the nodes in the cluster one by one so as to +> fix the checksum values in the quota.cksum file under the `/var/lib/glusterd/vols// directory.` +> The peers may go into `Peer rejected` state while doing so but once all the nodes are rebooted +> everything will be back to normal. ### Upgrade procedure for clients + Following are the steps to upgrade clients to the new-version version, 1. Unmount all glusterfs mount points on the client diff --git a/docs/Upgrade-Guide/op-version.md b/docs/Upgrade-Guide/op-version.md index 606d270..8384ad6 100644 --- a/docs/Upgrade-Guide/op-version.md +++ b/docs/Upgrade-Guide/op-version.md @@ -1,5 +1,5 @@ - ### op-version + op-version is the operating version of the Gluster which is running. op-version was introduced to ensure gluster running with different versions do not end up in a problem and backward compatibility issues can be tackled. @@ -13,19 +13,19 @@ Current op-version can be queried as below: For 3.10 onwards: ```console -# gluster volume get all cluster.op-version +gluster volume get all cluster.op-version ``` For release < 3.10: -```console +```{ .console .no-copy } # gluster volume get cluster.op-version ``` To get the maximum possible op-version a cluster can support, the following query can be used (this is available 3.10 release onwards): ```console -# gluster volume get all cluster.max-op-version +gluster volume get all cluster.max-op-version ``` For example, if some nodes in a cluster have been upgraded to X and some to X+, then the maximum op-version supported by the cluster is X, and the cluster.op-version can be bumped up to X to support new features. @@ -34,7 +34,7 @@ op-version can be updated as below. For example, after upgrading to glusterfs-4.0.0, set op-version as: ```console -# gluster volume set all cluster.op-version 40000 +gluster volume set all cluster.op-version 40000 ``` Note: @@ -46,11 +46,10 @@ When trying to set a volume option, it might happen that one or more of the conn To check op-version information for the connected clients and find the offending client, the following query can be used for 3.10 release onwards: -```console +```{ .console .no-copy } # gluster volume status clients ``` The respective clients can then be upgraded to the required version. This information could also be used to make an informed decision while bumping up the op-version of a cluster, so that connected clients can support all the new features provided by the upgraded cluster as well. - diff --git a/docs/Upgrade-Guide/upgrade-to-10.md b/docs/Upgrade-Guide/upgrade-to-10.md index fb75d79..70be58e 100644 --- a/docs/Upgrade-Guide/upgrade-to-10.md +++ b/docs/Upgrade-Guide/upgrade-to-10.md @@ -10,6 +10,7 @@ Refer, to the [generic upgrade procedure](./generic-upgrade-procedure.md) guide ## Major issues ### The following options are removed from the code base and require to be unset + before an upgrade from releases older than release 4.1.0, - features.lock-heal @@ -18,7 +19,7 @@ before an upgrade from releases older than release 4.1.0, To check if these options are set use, ```console -# gluster volume info +gluster volume info ``` and ensure that the above options are not part of the `Options Reconfigured:` @@ -26,7 +27,7 @@ section in the output of all volumes in the cluster. If these are set, then unset them using the following commands, -```console +```{ .console .no-copy } # gluster volume reset