glusterfs

mirror of https://github.com/gluster/glusterfs.git synced 2026-02-06 09:48:44 +01:00

Author	SHA1	Message	Date
Yaniv Kaul	bff9b26d8c	memory accounting - reduce code in non DEBUG build (#3854 ) - gf_mem_update_acct_info() is not needed when not in DEBUG mode - re-order variables in the structure according to access pattern - Turn xlator_mem_acct_unref() into xlator_mem_acct_destroy() and call it only when refcnt euqals 0 - which is quite rare. Updates: #3855 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>	2022-10-18 10:19:48 +02:00
Yaniv Kaul	6b69f4802d	afr-no-fsync.t: disable cluster.lookup-optimize and skip initial FSYNCDIR (#3709 ) Per #2253 we need to disable lookup-optimize feature. More importantly, there are some FSYNCDIR (4 of them) that appear for some reason too soon. Re-order the profile start - give 5 seconds for those to be done with and then start profiling. Fixes: #3708 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>	2022-09-21 11:46:46 +02:00
Yaniv Kaul	364230f587	refactor logging.c - remove 'THIS' where possible, use 'log instead of 'ctx' where possible (#3451 ) * logging: reduce no. of calls to 'THIS', use 'ctx' or 'this' where possible In many places, we can instead of calling 'THIS' either pass it or the context directly. Changed functions and callers where it made sense. Updates: #3426 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * gf_time_fmt_tv_FT() - pass log pointer instead of ctx Preparation for using log pointer across logging.c Updates: #1000 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * logging.c: refactor ctx->log and use log where possible In many function, we pass ctx and only use ctx->log. Refactor accordingly. Updates: #3426 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * logging.c: appstr could be a pointer, not a pointer to a pointer Unsure why, but there was an additional indirection which did not seem to be really used anywhere. Updates: #3426 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * logging.c: pass 'buf' parameter instead of all its variables to repetitions() functions. Seems easier to read. Updates: #3426 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> Remove unused variable in _gf_msg_internal() * logging.h: remove padding from structure Updates: #3426 Signed-off-by: Yaniv Kaul <ykaul@redhat.com> * Simplify glusterd_check_log_level() function Signed-off-by: Yaniv Kaul <ykaul@redhat.com>	2022-09-19 10:54:07 +02:00
Yaniv Kaul	27ade9815f	cdc xlator improvements (#3802 ) - Minor code movement, removal of unused variables. - The default for min_file_size should not be 0 - we really should not bother with compressing < 1K files (assuming the idea is to save packets travel, not bandwidth overall) - The default compression level is too high - zlib isn't that great anyway (performance-wise), so we should use the fastest possible - deflateInit2() is called repeatedly (within do_cdc_compress() function) instead of once (in cdc_compress() function) - The fixed size (GF_CDC_DEF_BUFFERSIZE) is too large - if FUSE can only do up to 128K, unsure what's the point in allocating up to 256K iobufs Updates: #3797 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>	2022-09-19 10:34:42 +02:00
mohit84	ab255a8e8d	dht[WIP]: Implement seek fop at dht level (#3792 ) * dht: Implement seek fop at dht level Before kernel minor version (.24) fuse does not wind a seek fop but after that fuse winds a seek fop so implement the fop at dht level. Fixes: #3373 Change-Id: Ie9ef2f941099157996ab353fc4dc208a28fa8fc6 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>	2022-09-13 17:02:45 +05:30
Shree Vatsa N	b09d55178f	gfapi: Add support for 'AT_EMPTY_PATH' flag (#3707 ) Add support for AT_EMPTY_PATH flag for the following fops, - glfs_fstatat, glfs_linkat, glfs_fchownat. Acc. to man pages, If pathname is an empty string, operate on the file referred to by dirfd, (which may have been obtained using the open(2) O_PATH flag). Updates: #2717 Sponsored-By: iXsystems, Inc https://www.ixsystems.com/ Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>	2022-09-07 11:46:40 +05:30
mohit84	beaec1e478	tests: Increase timeout for sparse_file_rebalance.t test case (#3779 ) The test (./tests/basic/distribute/sparse_file_rebalance.t ) is not finished within default time(200s), It is taking time while a test is calling seek at 2M offset and trying to copy sparse file. Change-Id: Id174f4a9d66d1caaf69f495c3cf62a2e09e87b80 Fixes: #3778 Solution: To pass regression jobs increase the timeout to 300s Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>	2022-09-05 15:05:56 +05:30
Shwetha Acharya	5a507cb957	tests: replace localhost and 127.0.01 with valid ip (#2945 ) * tests: replace localhost and 127.0.01 with valid ip Problem: From Fedora 34 it is mandatory to use valid ip instead of localhost, 127.0.0.1 or loopback address (0.0.0.0 to 0.255.255.255). Solution: use $ip -o -4 addr and then filter out the valid ip (ipv6 addresses to be used once gluster is entirely ipv6 compatible) Fixes: #2944 Change-Id: I282a0a519c650c6848ffa668ade86a4f35f9de42 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * ec/quota.t: correct the syntax of create command Change-Id: Ie26efe48664737c8dea108e1708d9a66af74ef94 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * revert disperse no in qouta.t and replace HOSTNAME by H0 in user-xlator.t Change-Id: I3eca83aa1272beb9fc75e0d1d22478588e9f1038 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * bug-824753.t: grep for hostname instead of ip gluster volume clear-locks returns output with the hostname, hence the new code change with respect to the ip does not hold good here Adding `hostname` inplace of ip Fixes: #2944 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * bug-765380.t: replace $H0 by `hostname` count_hostname_or_uuid_from_pathinfo() returns hostname so, so its output should be grepped for hostname not ip. Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * bug-921072.t: update ip in nfs.rpc-auth-allow and nfs.rpc-auth-reject Signed-off-by: Shwetha K Acharya <sacharya@redhat.com> * bug-921072.t: update nfs.rpc-auth-allow and reject Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>	2022-09-01 15:24:05 +02:00
Shree Vatsa N	8a41e0eb1c	gfapi: Add support for 'AT_REMOVEDIR' flag (#3703 ) Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>	2022-08-16 10:04:51 +05:30
Shree Vatsa N	c27bdbeb10	gfapi: Implementation of glfs_at syscalls (#3634 ) gfapi: Implement glfs_faccessat * gfapi: Implement glfs_fchmodat * gfapi: Implement glfs_fchownat * gfapi: Implement glfs_linkat * gfapi: Implement glfs_mknodat * gfapi: Implement glfs_readlinkat * gfapi: Implement glfs_renameat * gfapi: Implement glfs_renameat2 * gfapi: Implement glfs_symlinkat * gfapi: Implement glfs_unlinkat * gfapi: Implement glfs_mkdirat * glfs-openat: Use the common methods * Add ENOENT & EEXIST checks Updates: #2717 Sponsored-By: iXsystems, Inc <https://www.ixsystems.com> Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>	2022-07-14 12:19:36 +05:30
Aravinda Vishwanathapura	a005a54a93	gfapi: glfs_fstatat implementation (#3542 ) Example: ```c struct stat stbuf = { 0, }; fd1 = glfs_open(fs, "/", O_PATH); glfs_fstatat(fd1, filename, &stbuf, 0); printf("Size: %zu\n", stbuf->st_size); ``` Change-Id: I6d4c95ce91191566d2b9a198d5ba382bbad22264 Updates: #2717 Sponsored-By: iXsystems, Inc <https://www.ixsystems.com> Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.tech>	2022-06-30 16:48:15 +05:30
Aravinda Vishwanathapura	f64f400c8d	gfapi: Implement glfs_openat() Example: ```c const char *buff = "Hello World!"; fd1 = glfs_open(fs, "/", O_PATH); fd2 = glfs_openat(fd1, filename, O_RDWR, 0); glfs_write(fd2, buff, strlen(buff), flags); ``` Updates: #2717 Sponsored-By: iXsystems, Inc <https://www.ixsystems.com> Change-Id: I4f8ad20616587f2d6e5feba4869abec96706a514 Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.tech>	2022-06-27 11:12:34 +02:00
Xavi Hernandez	2ba1fda132	tests: improve performance and stability (#3486 ) Some improvements that make tests runs faster or more reliably. Change-Id: If1060d3040c3e9f40c70b4c3c5c6357a4d5d93ef Updates: #3469 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2022-05-04 14:35:06 +05:30
Xavi Hernandez	9fd7afcf86	gfapi-async-calls-tests.c: fix issues (#3474 ) Two main fixes: - Compilation warning because of a type mismatch. - Use of stack allocated buffers for background operations that may complete outside of the function. Change-Id: I41187578c48987d6b9e890d7bbb8f928efbab15c Updates: #3469 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2022-05-04 11:57:00 +05:30
Xavi Hernandez	d009ce27bf	tests: fix duplicated code (#3475 ) There were some duplicated functions inside include.rc with identical code. These have been removed. There were also two functions with the same name but with different implementations in include.rc and snapshot.rc. This worked because some tests did include the files in different order depending on what they really need. To address this, the following changes have been applied: - Moved the function from snapshot.rc to cluster.rc and renamed it. - Moved the function from include.rc to volume.rc. - Fixed include order from all scripts. - Modified scripts that required the renamed function. Change-Id: I82d220da4e8cd0148d123a49d96ebefbeb7a954c Updates: #3469 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2022-05-04 11:44:34 +05:30
Xavi Hernandez	4e3f7f1f8a	tests: remove redundant vers=3 option for nfs (#3473 ) 'vers=3' option is implicitly added by nfs_mount() function, so it's redundant to pass it as an argument. Additionally, some versions of mount scripts complain if the option is passed twice. Change-Id: Ief7ac73441882403c9c0ce599dfc2bf45795d017 Updates: #3469 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2022-05-03 21:06:45 +05:30
Shree Vatsa N	a0ae8799a3	tests: Add O_PATH testcase using glfs_open() (#3402 ) Signed-off-by: Shree Vatsa N <vatsa@kadalu.io>	2022-04-14 07:50:45 +05:30
Amar Tumballi	1bc2c76eba	Simple quota: based on namespace (#1750 ) * the quota setting can happen only on 'namespace' inode. * once set, the accounting is maintained only at the namespace level for whole tree. * uses 'simple-quota' key to show the correct quota usage in distributed volume. * statfs()'s response would be used to set the the volume level usage in xlator, in setxattr() call, when done through a special mount process. * the xlator is designed to be on brick graph, and saves only the usage of data inside it. * An option is provided to utilize the backend filesystem's quota feature for 'accounting', which can improve the performance. - This PR expects backend quota to properly return `statfs()` (or `df`) output. If `features.simple-quota.use-backend` option is set, then there wont be any active accounting in simple-quota translator, but only `setxattr()` and `statfs()` is handled for special keys. We expect helper function to set 'quota-limit' and also 'namespace' xattr to aid the glusterfs process in general to identify the entries. Updates: #1774 Change-Id: Id4229b720b57cde458b9b36e36ada3ffe2be0ac2 Signed-off-by: Amar Tumballi <amar@kadalu.io>	2022-03-31 11:17:41 +05:30
Yaniv Kaul	55420543c2	Multiple files: cleanup common include files (#3140 ) As a continuation to #3130 , there are several .h files that are included in various files. However, many of the definitions in those include files should be local to specific files, or is simply dead code. We can easily clean them up. Note - there are some functions that were deemed useful as common or utilities that were meant to be shared across the codebase. After 10 or so years, I think it's OK to move some of them that were never shared across the code, to their own users (and make them static while at it). This commit specifically cleaned up glusterd-utils.c Updates: #3137 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>	2022-02-28 08:10:54 +05:30
Tamar Shacked	150f0de48b	Adapting glusterfs to be endian-independent code (#3062 ) Some regression tests fail on IBM 390x machine, the main RC seems to be the different endianness: s390x is big-endien, x86 is little-endian. For DHT the failures related to different hash-val which leads to unexpected location. Work scope: Detect the cases where assignment returns a different value due architectures and use the little-endian to host byte order (use thie convertion as the t tests are running fine on x86). Note: Dict serialize/deserialize code already implements network byte order (big-endian) Updates: #2491 Change-Id: I2900f0f363ab00af8c68900b8aa2e30c574120e5 Signed-off-by: Tamar Shacked <tshacked@redhat.com>	2022-01-24 10:56:40 +01:00
Aravinda Vishwanathapura	8a17e531c0	snapshots: Support for ZFS Snapshots (#2855 ) * snapshots: Support for ZFS Snapshots No seperate mount done for mounting the Snapshot bricks. `.zfs` virtual directory is used as Snapshot brick path. For example: If the brick path is `/zpool1/brick1` and Snapshot name is `s1` then the Snapshot brick path will be `/zpool1/.zfs/snapshot/s1/brick1` Cloned bricks will have the path similar to original bricks. `<zpool>/<clone-name>/<brick-dir>`. Change-Id: I912618bdcbaad3f9a971ef9b1fa4c00ef81b1198 Sponsored-By: iXsystems, Inc <https://www.ixsystems.com> Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.io> * snapshots: ZFS Snapshot integration tests * snapshots: Change ZFS Pool to Dataset * snapshots: Fix the ZFS Snapshot tests	2021-12-25 10:38:22 +05:30
Amar Tumballi	063720d1ee	inode: implement namespace at inode level (#1763 ) With this PR, specially on the brick side graph, inode table would be properly set with 'namespace' inode reference. It is not guaranteed with fuse/client side graph due to subdir mount. To get 'namespace' for a corresponding inode, all one needs to do is, check `ns_inode` pointer in inode structure. Currently only special mounts with `PID < 0` can set the namespace attribute, and in lookup, if namespace attribute is present, we set the variable in inode. By default, the ns_inode is set to 'root' inode when inode gets created. Fixes: #1757 Change-Id: I69157e388538ea5d4b4e45d543575a04ee9ef221 Signed-off-by: Amar Tumballi <amar@kadalu.io>	2021-12-10 11:05:23 +01:00
Pranith Kumar Karampuri	2497f5260d	tests: Test phase1 migration of dht operations (#2724 ) Change-Id: I20e940c921a9990855ea5591212086ed64ed0ab2	2021-08-25 07:33:51 +05:30
Sheetal Pamecha	c3b960e2cd	nl-cache: add test for symbolic link (#2657 ) Updates: #1052 Change-Id: I8737d4cb7116c76ef4ec45dc4092166e46aa3a14 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>	2021-07-27 19:24:07 +05:30
Ravishankar N	36b37221af	tests: fix yet another afr-lock-heal-basic.t spurious failure (#2438 ) * tests: fix yet another afr-lock-heal-basic.t spurious failure From the logs, it appears as if the lock info was not present in the statedump when it was generated. Changed the logic to check for the lock info in successive statedumps within PROCESS_UP_TIMEOUT. Fixes: #2394 Change-Id: I5b071299d05a8c68b02735dfd8b510b0485dc9ce Signed-off-by: Ravishankar N <ravishankar@redhat.com> * remove sleep Change-Id: I822446222d2fbf824c6eaf42f3c72808356071e3 Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2021-05-17 12:44:31 +05:30
Ravishankar N	fda14cf655	glusterd: handle custom xlator failure cases Problem-1: custom xlator insertion was failing for those xlators in the brick graph whose dbg_key was NULL in the server_graph_table. Looking at the git log, the dbg_key was added in commit `d1397dbd7d` for inserting debug xlators. Fix: I think it is fine to define it for all brick xlators below server. Problem-2: In the commit-op phase, glusterd_op_set_volume() updates the volinfo dict with the key-value pairs and then proceeds to create the volfiles. If any of the steps fail, the volinfo dict retains those key-values, until glusterd is restarted or `gluster vol reset $VOLNAME` is issued. Fix: Make a copy of the volinfo dict and if there are any failures in proceeding with the set volume logic, restore the dict to its original state. Change-Id: I9010dab33d0139b8e6d603308e331b6d220a4849 Updates: #2370 Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2021-05-05 16:16:58 +02:00
Pranith Kumar Karampuri	91206dd3cf	Revert "fuse - remove unnecessary code block" (#2385 ) As per the discussion on #2353 it is decided to revert this patch to fix the acl regression issue. This reverts commit `07bb13291f`. fixes: #2353 Change-Id: Ie47479d9f894bac9c5d8a83b05c42e1ee98230dc Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2021-05-05 09:59:35 +05:30
Jamie Nguyen	7a9f7cb3f6	tests: Fix incorrect variables in throttle-rebal.t (#2316 ) The final test doesn't test what it means to test. It still fails as expected, but only because at this point `THROTTLE_LEVEL` is still set to `garbage`. Easily fixed by correcting the typos in the variable names, and thus fixes https://github.com/gluster/glusterfs/issues/2315 Signed-off-by: Jamie Nguyen <j@jamielinux.com>	2021-04-08 07:53:44 +05:30
Pranith Kumar Karampuri	088d8a575c	cluster/dht: Provide option to disable fsync in data migration (#2259 ) At the moment dht rebalance doesn't give any option to disable fsync after data migration. Making this an option would give admins take responsibility of data in a way that is suitable for their cluster. Default value is still 'on', so that the behavior is intact for people who don't care about this. For example: If the data that is going to be migrated is already backed up or snapshotted, there is no need for fsync to happen right after migration which can affect active I/O on the volume from applications. fixes: #2258 Change-Id: I7a50b8d3a2f270d79920ef306ceb6ba6451150c4 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2021-03-17 11:02:21 +05:30
Pranith Kumar Karampuri	46949c4951	features/index: Optimize link-count fetching code path (#1789 ) * features/index: Optimize link-count fetching code path Problem: AFR requests 'link-count' in lookup to check if there are any pending heals. Based on this information, afr will set dirent->inode to NULL in readdirp when heals are ongoing to prevent serving bad data. When heals are completed, link-count xattr is leading to doing an opendir of xattrop directory and then reading the contents to figure out that there is no healing needed for every lookup. This was not detected until this github issue because ZFS in some cases can lead to very slow readdir() calls. Since Glusterfs does lot of lookups, this was slowing down all operations increasing load on the system. Code problem: index xlator on any xattrop operation adds index to the relevant dirs and after the xattrop operation is done, will delete/keep the index in that directory based on the value fetched in xattrop from posix. AFR sends all-zero xattrop for changelog xattrs. This is leading to priv->pending_count manipulation which sets the count back to -1. Next Lookup operation triggers opendir/readdir to find the actual link-count in lookup because in memory priv->pending_count is -ve. Fix: 1) Don't add to index on all-zero xattrop for a key. 2) Set pending-count to -1 when the first gfid is added into xattrop directory, so that the next lookup can compute the link-count. fixes: #1764 Change-Id: I8a02c7e811a72c46d78ddb2d9d4fdc2222a444e9 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com> * addressed comments Change-Id: Ide42bb1c1237b525d168bf1a9b82eb1bdc3bc283 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com> * tests: Handle base index absence Change-Id: I3cf11a8644ccf23e01537228766f864b63c49556 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com> * Addressed LOCK based comments, .t comments Change-Id: I5f53e40820cade3a44259c1ac1a7f3c5f2f0f310 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2021-03-10 10:43:24 +05:30
Ravishankar N	f41f7dec05	cli: syntax check for arbiter volume creation (#2207 ) commit `8e7bfd6a58` changed the syntax for arbiter volume creation to 'replica 2 arbiter 1', while still allowing the old syntax of 'replica 3 arbiter 1'. But while doing so, it also removed a conditional check, thereby allowing replica count > 3. This patch fixes it. Fixes: #2192 Change-Id: Ie109325adb6d78e287e658fd5f59c26ad002e2d3 Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2021-03-05 11:54:46 +05:30
mohit84	d858f3bc13	tests: Move tests/basic/glusterd-restart-shd-mux.t to flaky (#2191 ) The test case ( tests/basic/glusterd-restart-shd-mux.t ) was introduced as a part of shd mux feature but we observed the feature is not stable and we already planned to revert a feature. For the time being I am moving a test case to flaky to avoid a frequent regression failure. Fixes: #2190 Change-Id: I4a06a5d9212fb952a864d0f26db8323690978bfc Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>	2021-02-24 12:28:10 +05:30
Pranith Kumar Karampuri	2170ac242f	Remove tests from components that are no longer in the tree (#2160 ) fixes: #2159 Change-Id: Ibaaebc48b803ca6ad4335c11818c0c71a13e9f07 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2021-02-13 17:44:49 +05:30
Ryo Furuhashi	ea86b664f3	glusterd-volgen: Add functionality to accept any custom xlator (#1974 ) * glusterd-volgen: Add functionality to accept any custom xlator Add new function which allow users to insert any custom xlators. It makes to provide a way to add any processing into file operations. Users can deploy the plugin(xlator shared object) and integrate it to glusterfsd. If users want to enable a custom xlator, do the follows: 1. put xlator object(.so file) into "XLATOR_DIR/user/" 2. set the option user.xlator.<xlator> to the existing xlator-name to specify of the position in graph 3. restart gluster volume Options for custom xlator are able to set in "user.xlator.<xlator>.<optkey>". Fixes: #1943 Signed-off-by:Ryo Furuhashi <ryo.furuhashi.nh@hitachi.com> Co-authored-by: Yaniv Kaul <ykaul@redhat.com> Co-authored-by: Xavi Hernandez <xhernandez@users.noreply.github.com>	2021-02-05 09:26:03 +05:30
Ravishankar N	6c446ee57c	tests: remove offensive language TODO: Remove 'slave-timeout' and 'slave-gluster-command-dir'. These variables are defined in geo-replication/gsyncd.conf.in. So I will remove them when I change that folder. Change-Id: Ib9167ca586d83e01f8ec755cdf58b3438184c9dd Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2020-12-30 15:55:22 +05:30
Ravishankar N	b9a4120b2b	all: change 'primary' to 'root' where it makes sense As a part of offensive language removal, we changed 'master' to 'primary' in some parts of the code that are not related to geo-replication via commits `e4c9a14429` and `0fd9246533`. But it is better to use 'root' in some places to distinguish it from the geo-rep changes which use 'primary/secondary' instead of 'master/slave'. This patch mainly changes glusterfs_ctx_t->primary to glusterfs_ctx_t->root. Other places like meta xlator is also changed. gf-changelog.c is not changed since it is related to geo-rep. Updates: #1000 Change-Id: I3cd610f7bea06c7a28ae2c0104f34291023d1daf Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2020-12-02 13:24:13 +01:00
Ravishankar N	24fbfad8f6	glusterd: fix bug in enabling granular-entry-heal (#1752 ) commit `f5e1eb87d4` meant to enable the volume option only for replica volumes but inadvertently enabled it for all volume types. Fixing it now. Also found a bug in glusterd where disabling the option on plain distribute was succeeding even though setting it in the fist place fails. Fixed that too. Fixes: #1483 Change-Id: Icb6c169a8eec44cc4fb4dd636405d3b3485e91b4 Reported-by: Sheetal Pamecha <spamecha@redhat.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2020-11-05 18:34:39 +05:30
Ravishankar N	e4c9a14429	xlators: misc conscious language changes (#1715 ) core:change xlator_t->ctx->master to xlator_t->ctx->primary afr: just changed comments. meta: change .meta/master to .meta/primary. Might break scripts. changelog: variable/function name changes only. These are unrelated to geo-rep. Fixes: #1713 Change-Id: I58eb5fcd75d65fc8269633acc41313503dccf5ff Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2020-11-02 18:03:01 +05:30
Pranith Kumar Karampuri	8b9c2e1cb6	extras/rebalance: Script to perform directory rebalance (#1676 ) * extras/rebalance: Script to perform directory rebalance How should the script be executed? $ /path/to/directory-rebalance.py <dir-to-rebalance> will do rebalance just for that directory. The script assumes that fix-layout operation is completed for all the directories present inside the <dir-to-rebalance> How does it work? For the given directory path that needs to be rebalanced, full crawl is performed and the files that need to be healed and the size of each file is first written to the index. Once building the index is completed, the index is read and for each file the script executes equivalent of setfattr -n trusted.distribute.migrate-data -v 1 <path/to/file> Why does the script take two passes? Printing a sensible ETA has been a primary goal of the script. Without knowing the approximate size that will be rebalanced, it is difficult to find ETA. Hence the script does one pass to find files, sizes which it writes to the index file and then the next pass is done on the index file. It takes a minute or two for the ETA to converge but in our testing it has been giving a reasonable ETA What versions does the script support? For the script to work correctly, dht should handle "trusted.distribute.migrate-data" setxattr correctly. fixes: #1654 Change-Id: Ie5070127bd45f1a1b9cd18ed029e364420c971c1 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2020-10-30 11:59:04 +05:30
Pranith Kumar Karampuri	02074cfe3b	cluster/dht: Perform migrate-file with lk-owner (#1581 ) * cluster/dht: Perform migrate-file with lk-owner 1) Added GF_ASSERT() calls in client-xlator to find these issues sooner. 2) Fuse is setting zero-lkowner with len as 8 when the fop doesn't have any lk-owner. Changed this to have len as 0 just as we have in fops triggered from xlators lower to fuse. * syncop: Avoid frame allocation if we can * cluster/dht: Set lkowner in daemon rebalance code path * cluster/afr: Set lkowner for ta-selfheal * cluster/ec: Destroy frame after heal is done * Don't assert for lk-owner in lk call * set lkowner for mandatory lock heal tests fixes: #1529 Change-Id: Ia803db6b00869316893abb1cf435b898eec31228 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2020-10-29 09:52:20 +05:30
Dmitry Antipov	e2c609e4c5	tests: exclude more contrib/fuse-lib objects (#1694 ) Exclude more contrib/fuse-lib objects to avoid silly tests/basic/0symbol-check.t breakage. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Fixes: #1692	2020-10-27 19:44:56 +05:30
Ravishankar N	f5e1eb87d4	glusterd/afr: enable granular-entry-heal by default (#1621 ) 1. The option has been enabled and tested for quite some time now in RHHI-V downstream and I think it is safe to make it 'on' by default. Since it is not possible to simply change it from 'off' to 'on' without breaking rolling upgrades, old clients etc., I have made it default only for new volumes starting from op-verison GD_OP_VERSION_9_0. Note: If you do a volume reset, the option will be turned back off. This is okay as the dir's gfid will be captured in 'xattrop' folder and heals will proceed. There might be stale entries inside entry-changes' folder, which will be removed when we enable the option again. 2. I encountered a cust. issue where entry heal was pending on a dir. with 236436 files in it and the glustershd.log output was just stuck at "performing entry selfheal", so I have added logs to give us more info in DEBUG level about whether entry heal and data heal are progressing (metadata heal doesn't take much time). That way, we have a quick visual indication to say things are not 'stuck' if we briefly enable debug logs, instead of taking statedumps or checking profile info etc. Fixes: #1483 Change-Id: I4f116f8c92f8cd33f209b758ff14f3c7e1981422 Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2020-10-22 15:06:41 +05:30
Pranith Kumar K	61d6b18c82	mount/fuse: Fix graph-switch when reader-thread-count is set Problem: The current graph-switch code sets priv->handle_graph_switch to false even when graph-switch is in progress which leads to crashes in some cases Fix: priv->handle_graph_switch should be set to false only when graph-switch completes. fixes: #1539 Change-Id: I5b04f7220a0a6e65c5f5afa3e28d1afe9efcdc31 Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>	2020-10-05 13:15:07 +05:30
Pranith Kumar K	2d58ec8581	cluster/afr: Heal directory rename without rmdir/mkdir Problem1: When a directory is renamed while a brick is down entry-heal always did an rm -rf on that directory on the sink on old location and did mkdir and created the directory hierarchy again in the new location. This is inefficient. Problem2: Renamedir heal order may lead to a scenario where directory in the new location could be created before deleting it from old location leading to 2 directories with same gfid in posix. Fix: As part of heal, if oldlocation is healed first and is not present in source-brick always rename it into a hidden directory inside the sink-brick so that when heal is triggered in new-location shd can rename it from this hidden directory to the new-location. If new-location heal is triggered first and it detects that the directory already exists in the brick, then it should skip healing the directory until it appears in the hidden directory. Credits: Ravi for rename-data-loss.t script Fixes: #1211 Change-Id: I0cba2006f35cd03d314d18211ce0bd530e254843 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2020-04-13 19:31:51 +05:30
Môshe van der Sterre	8f0dc56556	gfapi: Move the SECURE_ACCESS_FILE check out of glfs_mgmt_init glfs_mgmt_init is only called for glfs_set_volfile_server, but secure_mgmt is also required to use glfs_set_volfile with SSL. fixes: #829 Change-Id: Ibc769fe634d805e085232f85ce6e1c48bf4acc66	2020-09-28 06:12:32 +02:00
Sheena Artrip	868541b36f	metadisp: new translator for data and metadata separation Summary: feature/metadisp is an xlator for performing "metadata dispersal" across multiple children. it does this by flattening the complex POSIX paths into /$GFID style paths, then forwarding the metadata operations to its first child and forwarding the data operations to its second child. The purpose of this xlator is to allow separation of data and metadata, in cases where metadata might be stored in another format (embedded kv?), on another disk (ssd), on another host (dht2). Change-Id: I392c8bd0c867a3237d144aea327323f700a2728d Updates: #816 Signed-Off-By: Sheena Artrip <sheenobu@fb.com> Tested-By: Amar Tumballi <amar@kadalu.io>	2020-01-29 15:12:17 -08:00
Amar Tumballi	5731d25c9f	tests: provide an option to mark tests as 'flaky' * also add some time gap in other tests to see if we get things properly * create a directory 'tests/000/', which can host any tests, which are flaky. * move all the tests mentioned in the issue to above directory. * as the above dir gets tested first, all flaky tests would be reported quickly. * change `run-tests.sh` to continue tests even if flaky tests fail. Reference: gluster/project-infrastructure#72 Updates: #1000 Change-Id: Ifdafa38d083ebd80f7ae3cbbc9aa3b68b6d21d0e Signed-off-by: Amar Tumballi <amar@kadalu.io>	2020-08-18 14:08:20 +05:30
Mohammed Rafi KC	268d6dcdbd	afr/split-brain: fix client side split-brain resolution when quorum is enabled Problem: If we set favourite child policy, then automatic split-brain resolution should work in all cases. This was failing when quorum count was set to a non-zero value. The initial lookup before the read txn was failing with ENOTCONN. Since we don't have a readable subvol, we were failing it. We were only looking to the split brain resolution choice set through the cli command. Fix: We will now consider the favourite child policy if split-brain choice has not been set via cli command. Change-Id: Id2016c3a90d0763ac6f1a0131571053f595576f0 Fixes: #1404 Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>	2020-07-29 13:42:53 +05:30
Pranith Kumar K	bd540db1e7	cluster/afr: Delay post-op for fsync Problem: AFR doesn't delay post-op for fsync fop. For fsync heavy workloads this leads to un-necessary fxattrop/finodelk for every fsync leading to bad performance. Fix: Have delayed post-op for fsync. Add special flag in xdata to indicate that afr shouldn't delay post-op in cases where either the process will terminate or graph-switch would happen. Otherwise it leads to un-necessary heals when the graph-switch/process-termination happens before delayed-post-op completes. Fixes: #1253 Change-Id: I531940d13269a111c49e0510d49514dc169f4577 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2020-05-29 14:24:53 +05:30
Xavi Hernandez	db95388706	open-behind: rewrite of internal logic There was a critical flaw in the previous implementation of open-behind. When an open is done in the background, it's necessary to take a reference on the fd_t object because once we "fake" the open answer, the fd could be destroyed. However as long as there's a reference, the release function won't be called. So, if the application closes the file descriptor without having actually opened it, there will always remain at least 1 reference, causing a leak. To avoid this problem, the previous implementation didn't take a reference on the fd_t, so there were races where the fd could be destroyed while it was still in use. To fix this, I've implemented a new xlator cbk that gets called from fuse when the application closes a file descriptor. The whole logic of handling background opens have been simplified and it's more efficient now. Only if the fop needs to be delayed until an open completes, a stub is created. Otherwise no memory allocations are needed. Correctly handling the close request while the open is still pending has added a bit of complexity, but overall normal operation is simpler. Change-Id: I6376a5491368e0e1c283cc452849032636261592 Fixes: #1225 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2020-05-12 23:54:54 +02:00

1 2 3 4 5 ...

702 Commits