1
0
mirror of https://github.com/gluster/glusterfs.git synced 2026-02-06 09:48:44 +01:00

702 Commits

Author SHA1 Message Date
Yaniv Kaul
bff9b26d8c memory accounting - reduce code in non DEBUG build (#3854)
- gf_mem_update_acct_info() is not needed when not in DEBUG mode
- re-order variables in the structure according to access pattern
- Turn xlator_mem_acct_unref() into xlator_mem_acct_destroy() and call it only when refcnt euqals 0 - which is quite rare.

Updates: #3855
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2022-10-18 10:19:48 +02:00
Yaniv Kaul
6b69f4802d afr-no-fsync.t: disable cluster.lookup-optimize and skip initial FSYNCDIR (#3709)
Per #2253 we need to disable lookup-optimize feature.

More importantly, there are some FSYNCDIR (4 of them) that appear for some reason too soon.
Re-order the profile start - give 5 seconds for those to be done with and then start profiling.

Fixes: #3708
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2022-09-21 11:46:46 +02:00
Yaniv Kaul
364230f587 refactor logging.c - remove 'THIS' where possible, use 'log instead of 'ctx' where possible (#3451)
* logging: reduce no. of calls to 'THIS', use 'ctx' or 'this' where possible

In many places, we can instead of calling 'THIS' either pass it or the context directly.
Changed functions and callers where it made sense.

Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* gf_time_fmt_tv_FT() - pass log pointer instead of ctx

Preparation for using log pointer across logging.c

Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* logging.c: refactor ctx->log and use log where possible

In many function, we pass ctx and only use ctx->log. Refactor accordingly.

Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* logging.c: appstr could be a pointer, not a pointer to a pointer

Unsure why, but there was an additional indirection which did not seem to be really used anywhere.

Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* logging.c: pass 'buf' parameter instead of all its variables to *repetitions() functions.

Seems easier to read.

Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* Remove unused variable in  _gf_msg_internal()

* logging.h: remove padding from structure

Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

* Simplify glusterd_check_log_level() function

Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2022-09-19 10:54:07 +02:00
Yaniv Kaul
27ade9815f cdc xlator improvements (#3802)
- Minor code movement, removal of unused variables.
- The default for min_file_size should not be 0 - we really should not bother with compressing < 1K files (assuming the idea is to save packets travel, not bandwidth overall)
- The default compression level is too high - zlib isn't that great anyway (performance-wise), so we should use the fastest possible
- deflateInit2() is called repeatedly (within do_cdc_compress() function) instead of once (in cdc_compress() function)
- The fixed size (GF_CDC_DEF_BUFFERSIZE) is too large - if FUSE can only do up to 128K, unsure what's the point in allocating up to 256K iobufs

Updates: #3797
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2022-09-19 10:34:42 +02:00
mohit84
ab255a8e8d dht[WIP]: Implement seek fop at dht level (#3792)
* dht: Implement seek fop at dht level

Before kernel minor version (.24) fuse does not
wind a seek fop but after that fuse winds a seek
fop so implement the fop at dht level.

Fixes: #3373
Change-Id: Ie9ef2f941099157996ab353fc4dc208a28fa8fc6
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2022-09-13 17:02:45 +05:30
Shree Vatsa N
b09d55178f gfapi: Add support for 'AT_EMPTY_PATH' flag (#3707)
Add support for AT_EMPTY_PATH flag for the following fops,
- glfs_fstatat, glfs_linkat, glfs_fchownat.

Acc. to man pages,
If pathname is an empty string, operate on the file referred to by dirfd,
(which may have been obtained using the open(2) O_PATH flag).

Updates: #2717
Sponsored-By: iXsystems, Inc https://www.ixsystems.com/

Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>
2022-09-07 11:46:40 +05:30
mohit84
beaec1e478 tests: Increase timeout for sparse_file_rebalance.t test case (#3779)
The test (./tests/basic/distribute/sparse_file_rebalance.t ) is
not finished within default time(200s), It is taking
time while a test is calling seek at 2M offset and trying to copy
sparse file.

Change-Id: Id174f4a9d66d1caaf69f495c3cf62a2e09e87b80
Fixes: #3778
Solution: To pass regression jobs increase the timeout to 300s
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2022-09-05 15:05:56 +05:30
Shwetha Acharya
5a507cb957 tests: replace localhost and 127.0.01 with valid ip (#2945)
* tests: replace localhost and 127.0.01  with valid ip

Problem:  From Fedora 34 it is mandatory to use valid ip
          instead of localhost, 127.0.0.1 or loopback address
          (0.0.0.0 to 0.255.255.255).

Solution: use $ip -o -4 addr and then filter out the valid ip
          (ipv6 addresses to be used once gluster is entirely
           ipv6 compatible)

Fixes: #2944
Change-Id: I282a0a519c650c6848ffa668ade86a4f35f9de42
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* ec/quota.t: correct the syntax of create command

Change-Id: Ie26efe48664737c8dea108e1708d9a66af74ef94
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* revert disperse no in qouta.t and replace
HOSTNAME by H0 in user-xlator.t

Change-Id: I3eca83aa1272beb9fc75e0d1d22478588e9f1038
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* bug-824753.t: grep for hostname instead of ip

gluster volume clear-locks returns output
with the hostname, hence the new code change
with respect to the ip does not hold good here

Adding `hostname` inplace of ip

Fixes: #2944
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* bug-765380.t: replace $H0 by `hostname`

count_hostname_or_uuid_from_pathinfo() returns
hostname so, so its output should be grepped
for hostname not ip.

Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* bug-921072.t: update ip in nfs.rpc-auth-allow and
              nfs.rpc-auth-reject

Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>

* bug-921072.t: update nfs.rpc-auth-allow and reject

Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
2022-09-01 15:24:05 +02:00
Shree Vatsa N
8a41e0eb1c gfapi: Add support for 'AT_REMOVEDIR' flag (#3703)
Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>
2022-08-16 10:04:51 +05:30
Shree Vatsa N
c27bdbeb10 gfapi: Implementation of glfs_*at syscalls (#3634)
* gfapi: Implement glfs_faccessat
* gfapi: Implement glfs_fchmodat
* gfapi: Implement glfs_fchownat
* gfapi: Implement glfs_linkat
* gfapi: Implement glfs_mknodat
* gfapi: Implement glfs_readlinkat
* gfapi: Implement glfs_renameat
* gfapi: Implement glfs_renameat2
* gfapi: Implement glfs_symlinkat
* gfapi: Implement glfs_unlinkat
* gfapi: Implement glfs_mkdirat

* glfs-openat: Use  the common methods
 * Add ENOENT & EEXIST checks

Updates: #2717
Sponsored-By: iXsystems, Inc <https://www.ixsystems.com>
Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>
2022-07-14 12:19:36 +05:30
Aravinda Vishwanathapura
a005a54a93 gfapi: glfs_fstatat implementation (#3542)
Example:

```c
struct stat stbuf = {
    0,
};
fd1 = glfs_open(fs, "/", O_PATH);
glfs_fstatat(fd1, filename, &stbuf, 0);
printf("Size: %zu\n", stbuf->st_size);
```

Change-Id: I6d4c95ce91191566d2b9a198d5ba382bbad22264
Updates: #2717
Sponsored-By: iXsystems, Inc <https://www.ixsystems.com>
Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.tech>
2022-06-30 16:48:15 +05:30
Aravinda Vishwanathapura
f64f400c8d gfapi: Implement glfs_openat()
Example:

```c
const char *buff = "Hello World!";

fd1 = glfs_open(fs, "/", O_PATH);
fd2 = glfs_openat(fd1, filename, O_RDWR, 0);
glfs_write(fd2, buff, strlen(buff), flags);
```

Updates: #2717
Sponsored-By: iXsystems, Inc <https://www.ixsystems.com>
Change-Id: I4f8ad20616587f2d6e5feba4869abec96706a514
Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.tech>
2022-06-27 11:12:34 +02:00
Xavi Hernandez
2ba1fda132 tests: improve performance and stability (#3486)
Some improvements that make tests runs faster or more reliably.

Change-Id: If1060d3040c3e9f40c70b4c3c5c6357a4d5d93ef
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2022-05-04 14:35:06 +05:30
Xavi Hernandez
9fd7afcf86 gfapi-async-calls-tests.c: fix issues (#3474)
Two main fixes:

- Compilation warning because of a type mismatch.
- Use of stack allocated buffers for background operations that may
  complete outside of the function.

Change-Id: I41187578c48987d6b9e890d7bbb8f928efbab15c
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2022-05-04 11:57:00 +05:30
Xavi Hernandez
d009ce27bf tests: fix duplicated code (#3475)
There were some duplicated functions inside include.rc with identical
code. These have been removed.

There were also two functions with the same name but with different
implementations in include.rc and snapshot.rc. This worked because
some tests did include the files in different order depending on
what they really need.

To address this, the following changes have been applied:

- Moved the function from snapshot.rc to cluster.rc and renamed it.
- Moved the function from include.rc to volume.rc.
- Fixed include order from all scripts.
- Modified scripts that required the renamed function.

Change-Id: I82d220da4e8cd0148d123a49d96ebefbeb7a954c
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2022-05-04 11:44:34 +05:30
Xavi Hernandez
4e3f7f1f8a tests: remove redundant vers=3 option for nfs (#3473)
'vers=3' option is implicitly added by nfs_mount() function, so it's
redundant to pass it as an argument. Additionally, some versions of
mount scripts complain if the option is passed twice.

Change-Id: Ief7ac73441882403c9c0ce599dfc2bf45795d017
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2022-05-03 21:06:45 +05:30
Shree Vatsa N
a0ae8799a3 tests: Add O_PATH testcase using glfs_open() (#3402)
Signed-off-by: Shree Vatsa N <vatsa@kadalu.io>
2022-04-14 07:50:45 +05:30
Amar Tumballi
1bc2c76eba Simple quota: based on namespace (#1750)
* the quota setting can happen only on 'namespace' inode.
* once set, the accounting is maintained only at the namespace level for whole tree.
* uses 'simple-quota' key to show the correct quota usage in distributed volume.
* statfs()'s response would be used to set the the volume level usage in xlator,
  in setxattr() call, when done through a special mount process.
* the xlator is designed to be on brick graph, and saves only the usage of data
  inside it.
* An option is provided to utilize the backend filesystem's quota feature for
  'accounting', which can improve the performance.
  - This PR expects backend quota to properly return `statfs()` (or `df`) output.
    If `features.simple-quota.use-backend` option is set, then there wont be any
    active accounting in simple-quota translator, but only `setxattr()` and
    `statfs()` is handled for special keys. We expect helper function to
    set 'quota-limit' and also 'namespace' xattr to aid the glusterfs process in
    general to identify the entries.

Updates: #1774
Change-Id: Id4229b720b57cde458b9b36e36ada3ffe2be0ac2
Signed-off-by: Amar Tumballi <amar@kadalu.io>
2022-03-31 11:17:41 +05:30
Yaniv Kaul
55420543c2 Multiple files: cleanup common include files (#3140)
As a continuation to #3130 , there are several .h files that are included in various files.
However, many of the definitions in those include files should be local to specific files, or is simply dead code.
We can easily clean them up.
Note - there are some functions that were deemed useful as common or utilities that were meant to be shared across the codebase.
After 10 or so years, I think it's OK to move some of them that were never shared across the code,
to their own users (and make them static while at it).

This commit specifically cleaned up glusterd-utils.c

    Updates: #3137
    Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2022-02-28 08:10:54 +05:30
Tamar Shacked
150f0de48b Adapting glusterfs to be endian-independent code (#3062)
Some regression tests fail on IBM 390x machine, the main RC seems to be
the different endianness: s390x is big-endien, x86 is little-endian.

For DHT the failures related to different hash-val which leads to unexpected location.

Work scope:
Detect the cases where assignment returns a different value due architectures and
use the little-endian to host byte order (use thie convertion as the t tests are running fine on x86).

Note:
Dict serialize/deserialize code already implements network byte order (big-endian)

Updates: #2491

Change-Id: I2900f0f363ab00af8c68900b8aa2e30c574120e5
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
2022-01-24 10:56:40 +01:00
Aravinda Vishwanathapura
8a17e531c0 snapshots: Support for ZFS Snapshots (#2855)
* snapshots: Support for ZFS Snapshots

No seperate mount done for mounting the Snapshot bricks.
`.zfs` virtual directory is used as Snapshot brick path.
For example: If the brick path is `/zpool1/brick1` and Snapshot
name is `s1` then the Snapshot brick path will be
`/zpool1/.zfs/snapshot/s1/brick1`

Cloned bricks will have the path similar to original bricks.
`<zpool>/<clone-name>/<brick-dir>`.

Change-Id: I912618bdcbaad3f9a971ef9b1fa4c00ef81b1198
Sponsored-By: iXsystems, Inc <https://www.ixsystems.com>
Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.io>

* snapshots: ZFS Snapshot integration tests
* snapshots: Change ZFS Pool to Dataset
* snapshots: Fix the ZFS Snapshot tests
2021-12-25 10:38:22 +05:30
Amar Tumballi
063720d1ee inode: implement namespace at inode level (#1763)
With this PR, specially on the brick side graph, inode table would be
properly set with 'namespace' inode reference. It is not guaranteed
with fuse/client side graph due to subdir mount.

To get 'namespace' for a corresponding inode, all one needs to do is,
check `ns_inode` pointer in inode structure.

Currently only special mounts with `PID < 0` can set the namespace
attribute, and in lookup, if namespace attribute is present, we set
the variable in inode.

By default, the ns_inode is set to 'root' inode when inode gets created.

Fixes: #1757
Change-Id: I69157e388538ea5d4b4e45d543575a04ee9ef221
Signed-off-by: Amar Tumballi <amar@kadalu.io>
2021-12-10 11:05:23 +01:00
Pranith Kumar Karampuri
2497f5260d tests: Test phase1 migration of dht operations (#2724)
Change-Id: I20e940c921a9990855ea5591212086ed64ed0ab2
2021-08-25 07:33:51 +05:30
Sheetal Pamecha
c3b960e2cd nl-cache: add test for symbolic link (#2657)
Updates: #1052

Change-Id: I8737d4cb7116c76ef4ec45dc4092166e46aa3a14
Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
2021-07-27 19:24:07 +05:30
Ravishankar N
36b37221af tests: fix yet another afr-lock-heal-basic.t spurious failure (#2438)
* tests: fix yet another afr-lock-heal-basic.t spurious failure

From the logs, it appears as if the lock info was not present in the
statedump when it was generated. Changed the logic to check for the lock
info in successive statedumps within PROCESS_UP_TIMEOUT.

Fixes: #2394
Change-Id: I5b071299d05a8c68b02735dfd8b510b0485dc9ce
Signed-off-by: Ravishankar N <ravishankar@redhat.com>

* remove sleep

Change-Id: I822446222d2fbf824c6eaf42f3c72808356071e3
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2021-05-17 12:44:31 +05:30
Ravishankar N
fda14cf655 glusterd: handle custom xlator failure cases
Problem-1:
custom xlator insertion was failing for those xlators in the brick graph
whose dbg_key was NULL in the server_graph_table. Looking at the git log,
the dbg_key was added in commit d1397dbd7d
for inserting debug xlators.

Fix: I think it is fine to define it for all brick xlators below server.

Problem-2:
In the commit-op phase, glusterd_op_set_volume() updates the volinfo
dict with the key-value pairs and then proceeds to create the volfiles.
If any of the steps fail, the volinfo dict retains those key-values,
until glusterd is restarted or `gluster vol reset $VOLNAME` is issued.

Fix:
Make a copy of the volinfo dict and if there are any failures in
proceeding with the set volume logic, restore the dict to its original
state.

Change-Id: I9010dab33d0139b8e6d603308e331b6d220a4849
Updates: #2370
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2021-05-05 16:16:58 +02:00
Pranith Kumar Karampuri
91206dd3cf Revert "fuse - remove unnecessary code block" (#2385)
As per the discussion on #2353 it is decided to revert this patch to fix
the acl regression issue.
This reverts commit 07bb13291f.

fixes: #2353
Change-Id: Ie47479d9f894bac9c5d8a83b05c42e1ee98230dc
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2021-05-05 09:59:35 +05:30
Jamie Nguyen
7a9f7cb3f6 tests: Fix incorrect variables in throttle-rebal.t (#2316)
The final test doesn't test what it means to test. It still fails as
expected, but only because at this point `THROTTLE_LEVEL` is still set
to `garbage`.

Easily fixed by correcting the typos in the variable names, and thus
fixes https://github.com/gluster/glusterfs/issues/2315

Signed-off-by: Jamie Nguyen <j@jamielinux.com>
2021-04-08 07:53:44 +05:30
Pranith Kumar Karampuri
088d8a575c cluster/dht: Provide option to disable fsync in data migration (#2259)
At the moment dht rebalance doesn't give any option to disable fsync
after data migration. Making this an option would give admins take
responsibility of data in a way that is suitable for their cluster.
Default value is still 'on', so that the behavior is intact for people
who don't care about this.

For example: If the data that is going to be migrated is already backed
up or snapshotted, there is no need for fsync to happen right after
migration which can affect active I/O on the volume from applications.

fixes: #2258
Change-Id: I7a50b8d3a2f270d79920ef306ceb6ba6451150c4
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2021-03-17 11:02:21 +05:30
Pranith Kumar Karampuri
46949c4951 features/index: Optimize link-count fetching code path (#1789)
* features/index: Optimize link-count fetching code path

Problem:
AFR requests 'link-count' in lookup to check if there are any pending
heals. Based on this information, afr will set dirent->inode to NULL in
readdirp when heals are ongoing to prevent serving bad data. When heals
are completed, link-count xattr is leading to doing an opendir of
xattrop directory and then reading the contents to figure out that there
is no healing needed for every lookup. This was not detected until this
github issue because ZFS in some cases can lead to very slow readdir()
calls. Since Glusterfs does lot of lookups, this was slowing down
all operations increasing load on the system.

Code problem:
index xlator on any xattrop operation adds index to the relevant dirs
and after the xattrop operation is done, will delete/keep the index in
that directory based on the value fetched in xattrop from posix. AFR
sends all-zero xattrop for changelog xattrs. This is leading to
priv->pending_count manipulation which sets the count back to -1. Next
Lookup operation triggers opendir/readdir to find the actual link-count in
lookup because in memory priv->pending_count is -ve.

Fix:
1) Don't add to index on all-zero xattrop for a key.
2) Set pending-count to -1 when the first gfid is added into xattrop
   directory, so that the next lookup can compute the link-count.

fixes: #1764
Change-Id: I8a02c7e811a72c46d78ddb2d9d4fdc2222a444e9
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>

* addressed comments

Change-Id: Ide42bb1c1237b525d168bf1a9b82eb1bdc3bc283
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>

* tests: Handle base index absence

Change-Id: I3cf11a8644ccf23e01537228766f864b63c49556
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>

* Addressed LOCK based comments, .t comments

Change-Id: I5f53e40820cade3a44259c1ac1a7f3c5f2f0f310
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2021-03-10 10:43:24 +05:30
Ravishankar N
f41f7dec05 cli: syntax check for arbiter volume creation (#2207)
commit 8e7bfd6a58 changed the syntax for
arbiter volume creation to 'replica 2 arbiter 1', while still allowing
the old syntax of 'replica 3 arbiter 1'. But while doing so, it also
removed a conditional check, thereby allowing replica count > 3. This
patch fixes it.

Fixes: #2192
Change-Id: Ie109325adb6d78e287e658fd5f59c26ad002e2d3
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2021-03-05 11:54:46 +05:30
mohit84
d858f3bc13 tests: Move tests/basic/glusterd-restart-shd-mux.t to flaky (#2191)
The test case ( tests/basic/glusterd-restart-shd-mux.t ) was
introduced as a part of shd mux feature but we observed the
feature is not stable and we already planned to revert a feature.
For the time being I am moving a test case to flaky to
avoid a frequent regression failure.

Fixes: #2190
Change-Id: I4a06a5d9212fb952a864d0f26db8323690978bfc
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2021-02-24 12:28:10 +05:30
Pranith Kumar Karampuri
2170ac242f Remove tests from components that are no longer in the tree (#2160)
fixes: #2159
Change-Id: Ibaaebc48b803ca6ad4335c11818c0c71a13e9f07
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2021-02-13 17:44:49 +05:30
Ryo Furuhashi
ea86b664f3 glusterd-volgen: Add functionality to accept any custom xlator (#1974)
* glusterd-volgen: Add functionality to accept any custom xlator

Add new function which allow users to insert any custom xlators.
It makes to provide a way to add any processing into file operations.

Users can deploy the plugin(xlator shared object) and integrate it to glusterfsd.

If users want to enable a custom xlator, do the follows:

1. put xlator object(.so file) into "XLATOR_DIR/user/"
2. set the option user.xlator.<xlator> to the existing xlator-name to specify of the position in graph
3. restart gluster volume

Options for custom xlator are able to set in "user.xlator.<xlator>.<optkey>".

Fixes: #1943
Signed-off-by:Ryo Furuhashi <ryo.furuhashi.nh@hitachi.com>
Co-authored-by: Yaniv Kaul <ykaul@redhat.com>
Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>
2021-02-05 09:26:03 +05:30
Ravishankar N
6c446ee57c tests: remove offensive language
TODO:
Remove 'slave-timeout' and 'slave-gluster-command-dir'.
These variables are defined in geo-replication/gsyncd.conf.in.
So I will remove them when I change that folder.

Change-Id: Ib9167ca586d83e01f8ec755cdf58b3438184c9dd
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2020-12-30 15:55:22 +05:30
Ravishankar N
b9a4120b2b all: change 'primary' to 'root' where it makes sense
As a part of offensive language removal, we changed 'master' to 'primary' in
some parts of the code that are *not* related to geo-replication via
commits e4c9a14429 and
0fd9246533.

But it is better to use 'root' in some places to distinguish it from the
geo-rep changes which use 'primary/secondary' instead of 'master/slave'.

This patch mainly changes glusterfs_ctx_t->primary to
glusterfs_ctx_t->root. Other places like meta xlator is also changed.
gf-changelog.c is not changed since it is related to geo-rep.

Updates: #1000
Change-Id: I3cd610f7bea06c7a28ae2c0104f34291023d1daf
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2020-12-02 13:24:13 +01:00
Ravishankar N
24fbfad8f6 glusterd: fix bug in enabling granular-entry-heal (#1752)
commit f5e1eb87d4 meant to enable  the
volume option only for replica volumes but inadvertently enabled
it for all volume types. Fixing it now.

Also found a bug in glusterd where disabling the option on plain
distribute was succeeding even though setting it in the fist place
fails. Fixed that too.

Fixes: #1483
Change-Id: Icb6c169a8eec44cc4fb4dd636405d3b3485e91b4
Reported-by: Sheetal Pamecha <spamecha@redhat.com>
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2020-11-05 18:34:39 +05:30
Ravishankar N
e4c9a14429 xlators: misc conscious language changes (#1715)
core:change xlator_t->ctx->master to xlator_t->ctx->primary
afr: just changed comments.
meta: change .meta/master to .meta/primary. Might break scripts.
changelog: variable/function name changes only.

These are unrelated to geo-rep.
Fixes: #1713

Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2020-11-02 18:03:01 +05:30
Pranith Kumar Karampuri
8b9c2e1cb6 extras/rebalance: Script to perform directory rebalance (#1676)
* extras/rebalance: Script to perform directory rebalance

How should the script be executed?
$ /path/to/directory-rebalance.py <dir-to-rebalance>
will do rebalance just for that directory. The script assumes that fix-layout
operation is completed for all the directories present inside the
<dir-to-rebalance>

How does it work?
For the given directory path that needs to be rebalanced, full crawl is
performed and the files that need to be healed and the size of each file
is first written to the index. Once building the index is completed, the
index is read and for each file the script executes equivalent of
setfattr -n trusted.distribute.migrate-data -v 1 <path/to/file>

Why does the script take two passes?
Printing a sensible ETA has been a primary goal of the script. Without
knowing the approximate size that will be rebalanced, it is difficult to
find ETA. Hence the script does one pass to find files, sizes which it
writes to the index file and then the next pass is done on the
index file. It takes a minute or two for the ETA to converge but in our
testing it has been giving a reasonable ETA

What versions does the script support?
For the script to work correctly, dht should handle
"trusted.distribute.migrate-data" setxattr correctly.

fixes: #1654
Change-Id: Ie5070127bd45f1a1b9cd18ed029e364420c971c1
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2020-10-30 11:59:04 +05:30
Pranith Kumar Karampuri
02074cfe3b cluster/dht: Perform migrate-file with lk-owner (#1581)
* cluster/dht: Perform migrate-file with lk-owner

1) Added GF_ASSERT() calls in client-xlator to find these
issues sooner.
2) Fuse is setting zero-lkowner with len as 8 when the fop
doesn't have any lk-owner. Changed this to have len as 0
just as we have in fops triggered from xlators lower to
fuse.

* syncop: Avoid frame allocation if we can
* cluster/dht: Set lkowner in daemon rebalance code path
* cluster/afr: Set lkowner for ta-selfheal
* cluster/ec: Destroy frame after heal is done
* Don't assert for lk-owner in lk call
* set lkowner for mandatory lock heal tests

fixes: #1529
Change-Id: Ia803db6b00869316893abb1cf435b898eec31228
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2020-10-29 09:52:20 +05:30
Dmitry Antipov
e2c609e4c5 tests: exclude more contrib/fuse-lib objects (#1694)
Exclude more contrib/fuse-lib objects to avoid
silly tests/basic/0symbol-check.t breakage.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1692
2020-10-27 19:44:56 +05:30
Ravishankar N
f5e1eb87d4 glusterd/afr: enable granular-entry-heal by default (#1621)
1. The option has been enabled and tested for quite some time now in RHHI-V
downstream and I think it is safe to make it 'on' by default. Since it
is not possible to simply change it from 'off' to 'on' without breaking
rolling upgrades, old clients etc., I have made it default only for new volumes
starting from op-verison GD_OP_VERSION_9_0.

Note: If you do a volume reset, the option will be turned back off.
This is okay as the dir's gfid will be captured in 'xattrop' folder  and heals
will proceed. There might be stale entries inside entry-changes' folder,
which will be removed when we enable the option again.

2. I encountered a cust. issue where entry heal was pending on a dir. with
236436 files in it and the glustershd.log output was just stuck at
"performing entry selfheal", so I have added logs to give us
more info in DEBUG level about whether entry heal and data heal are
progressing (metadata heal doesn't take much time). That way, we have a
quick visual indication to say things are not 'stuck' if we briefly
enable debug logs, instead of taking statedumps or checking profile info
etc.

Fixes: #1483
Change-Id: I4f116f8c92f8cd33f209b758ff14f3c7e1981422
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2020-10-22 15:06:41 +05:30
Pranith Kumar K
61d6b18c82 mount/fuse: Fix graph-switch when reader-thread-count is set
Problem:
The current graph-switch code sets priv->handle_graph_switch to false even
when graph-switch is in progress which leads to crashes in some cases

Fix:
priv->handle_graph_switch should be set to false only when graph-switch
completes.

fixes: #1539
Change-Id: I5b04f7220a0a6e65c5f5afa3e28d1afe9efcdc31
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
2020-10-05 13:15:07 +05:30
Pranith Kumar K
2d58ec8581 cluster/afr: Heal directory rename without rmdir/mkdir
Problem1:
When a directory is renamed while a brick
is down entry-heal always did an rm -rf on that directory on
the sink on old location and did mkdir and created the directory
hierarchy again in the new location. This is inefficient.

Problem2:
Renamedir heal order may lead to a scenario where directory in
the new location could be created before deleting it from old
location leading to 2 directories with same gfid in posix.

Fix:
As part of heal, if oldlocation is healed first and is not present in
source-brick always rename it into a hidden directory inside the
sink-brick so that when heal is triggered in new-location shd can
rename it from this hidden directory to the new-location.

If new-location heal is triggered first and it detects that the
directory already exists in the brick, then it should skip healing the
directory until it appears in the hidden directory.

Credits: Ravi for rename-data-loss.t script

Fixes: #1211
Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2020-04-13 19:31:51 +05:30
Môshe van der Sterre
8f0dc56556 gfapi: Move the SECURE_ACCESS_FILE check out of glfs_mgmt_init
glfs_mgmt_init is only called for glfs_set_volfile_server, but
secure_mgmt is also required to use glfs_set_volfile with SSL.

fixes: #829
Change-Id: Ibc769fe634d805e085232f85ce6e1c48bf4acc66
2020-09-28 06:12:32 +02:00
Sheena Artrip
868541b36f metadisp: new translator for data and metadata separation
Summary:

feature/metadisp is an xlator for performing "metadata dispersal" across
multiple children. it does this by flattening the complex
POSIX paths into /$GFID style paths, then forwarding the
metadata operations to its first child and forwarding the
data operations to its second child.

The purpose of this xlator is to allow separation of data and metadata,
in cases where metadata might be stored in another format (embedded kv?),
on another disk (ssd), on another host (dht2).

Change-Id: I392c8bd0c867a3237d144aea327323f700a2728d
Updates: #816
Signed-Off-By: Sheena Artrip <sheenobu@fb.com>
Tested-By: Amar Tumballi <amar@kadalu.io>
2020-01-29 15:12:17 -08:00
Amar Tumballi
5731d25c9f tests: provide an option to mark tests as 'flaky'
* also add some time gap in other tests to see if we get things properly
* create a directory 'tests/000/', which can host any tests, which are flaky.
* move all the tests mentioned in the issue to above directory.
* as the above dir gets tested first, all flaky tests would be reported quickly.
* change `run-tests.sh` to continue tests even if flaky tests fail.

Reference: gluster/project-infrastructure#72
Updates: #1000
Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e
Signed-off-by: Amar Tumballi <amar@kadalu.io>
2020-08-18 14:08:20 +05:30
Mohammed Rafi KC
268d6dcdbd afr/split-brain: fix client side split-brain resolution when quorum is enabled
Problem:
If we set favourite child policy, then automatic split-brain resolution
should work in all cases. This was failing when quorum count was set to
a non-zero value. The initial lookup before the read txn was failing
with ENOTCONN. Since we don't have a readable subvol, we were failing it.
We were only looking to the split brain resolution choice set through the
cli command.

Fix:
We will now consider the favourite child policy if split-brain choice
has not been set via cli command.

Change-Id: Id2016c3a90d0763ac6f1a0131571053f595576f0
Fixes: #1404
Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>
2020-07-29 13:42:53 +05:30
Pranith Kumar K
bd540db1e7 cluster/afr: Delay post-op for fsync
Problem:
AFR doesn't delay post-op for fsync fop. For fsync heavy workloads
this leads to un-necessary fxattrop/finodelk for every fsync leading
to bad performance.

Fix:
Have delayed post-op for fsync. Add special flag in xdata to indicate
that afr shouldn't delay post-op in cases where either the
process will terminate or graph-switch would happen. Otherwise it leads
to un-necessary heals when the graph-switch/process-termination
happens before delayed-post-op completes.

Fixes: #1253
Change-Id: I531940d13269a111c49e0510d49514dc169f4577
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2020-05-29 14:24:53 +05:30
Xavi Hernandez
db95388706 open-behind: rewrite of internal logic
There was a critical flaw in the previous implementation of open-behind.

When an open is done in the background, it's necessary to take a
reference on the fd_t object because once we "fake" the open answer,
the fd could be destroyed. However as long as there's a reference,
the release function won't be called. So, if the application closes
the file descriptor without having actually opened it, there will
always remain at least 1 reference, causing a leak.

To avoid this problem, the previous implementation didn't take a
reference on the fd_t, so there were races where the fd could be
destroyed while it was still in use.

To fix this, I've implemented a new xlator cbk that gets called from
fuse when the application closes a file descriptor.

The whole logic of handling background opens have been simplified and
it's more efficient now. Only if the fop needs to be delayed until an
open completes, a stub is created. Otherwise no memory allocations are
needed.

Correctly handling the close request while the open is still pending
has added a bit of complexity, but overall normal operation is simpler.

Change-Id: I6376a5491368e0e1c283cc452849032636261592
Fixes: #1225
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2020-05-12 23:54:54 +02:00