Currently glusterd always sets send-gids to the negated value of
manage-gids. This is not correct when we just want to check user
permissions in the client side. In this case we don't need to enable
manage-gids, but we also don't need to enable send-gids.
Updates: #3781
Change-Id: Ia42825e7f6993896b1000a07d420b84d1d0422b6
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* protocol/server: fix the server_getspec to serve the volfiles
This commit fixes an option of serving volume files through
brick processes. While this feature is not a required feature
for any deployments using `glusterd`, it would be very useful
in scenarios where glusterd is not present, eg., container usecases,
and in projects like kadalu which only deals with management layer
changes.
few changes done with this commit:
* core: Add 'EVENT_SIGHUP' event to notify framework
* make `volfile` based process also handle SIGHUP
* add port parsing along with server, so we can have process hosted in any port
* test to demonstrate all this.
Updates: #3635, #3668
Change-Id: I5c8dfdee7d06b8d5fced4cc99059dfd8bed65260
Signed-off-by: Amar Tumballi <amar@kadalu.io>
- gf_mem_update_acct_info() is not needed when not in DEBUG mode
- re-order variables in the structure according to access pattern
- Turn xlator_mem_acct_unref() into xlator_mem_acct_destroy() and call it only when refcnt euqals 0 - which is quite rare.
Updates: #3855
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Problem: It takes a while for /var/log/glusterfs/glustershd.log
to get created and populated
after the volume start command firing.
Hence, a grep expecting particular value inside
/var/log/glusterfs/glustershd.log always fails
when executed soon after firing the volume
start command
Solution:Verify online_brick_count to make sure volume start is
complete and the log file is created
before grepping for the values in it.
Fixes: #3836
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
Per #2253 we need to disable lookup-optimize feature.
More importantly, there are some FSYNCDIR (4 of them) that appear for some reason too soon.
Re-order the profile start - give 5 seconds for those to be done with and then start profiling.
Fixes: #3708
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
There's no need to zero the first sectors of the PV, VG, LV or discard the FS on creation.
They were all just created and are empty anyway.
Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* logging: reduce no. of calls to 'THIS', use 'ctx' or 'this' where possible
In many places, we can instead of calling 'THIS' either pass it or the context directly.
Changed functions and callers where it made sense.
Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* gf_time_fmt_tv_FT() - pass log pointer instead of ctx
Preparation for using log pointer across logging.c
Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* logging.c: refactor ctx->log and use log where possible
In many function, we pass ctx and only use ctx->log. Refactor accordingly.
Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* logging.c: appstr could be a pointer, not a pointer to a pointer
Unsure why, but there was an additional indirection which did not seem to be really used anywhere.
Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* logging.c: pass 'buf' parameter instead of all its variables to *repetitions() functions.
Seems easier to read.
Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Remove unused variable in _gf_msg_internal()
* logging.h: remove padding from structure
Updates: #3426
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Simplify glusterd_check_log_level() function
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
- Minor code movement, removal of unused variables.
- The default for min_file_size should not be 0 - we really should not bother with compressing < 1K files (assuming the idea is to save packets travel, not bandwidth overall)
- The default compression level is too high - zlib isn't that great anyway (performance-wise), so we should use the fastest possible
- deflateInit2() is called repeatedly (within do_cdc_compress() function) instead of once (in cdc_compress() function)
- The fixed size (GF_CDC_DEF_BUFFERSIZE) is too large - if FUSE can only do up to 128K, unsure what's the point in allocating up to 256K iobufs
Updates: #3797
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* dht: Implement seek fop at dht level
Before kernel minor version (.24) fuse does not
wind a seek fop but after that fuse winds a seek
fop so implement the fop at dht level.
Fixes: #3373
Change-Id: Ie9ef2f941099157996ab353fc4dc208a28fa8fc6
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Issue:
The test case tries to simulate a scenario where the entry creation
succeeds on a single brick by filling the other two bricks. The condition
which was checking the file creation failure on the other 2 brick has been
changed recently to check for the number of blocks allocated for the file.
In some cases when the brick gets filled, it might still be able to create
few empty files but not accommodate data.
Fix:
Change the condition to check whether the file creation itself will fail on
the 2 brick before exiting the loop, so that the next file creation operation
will fail on both the bricks, simulating the expected scenario.
Change-Id: Ifd2ee5b7cbe6bc713c3e19eae79c31a91238579f
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Fixes: #3793
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Add support for AT_EMPTY_PATH flag for the following fops,
- glfs_fstatat, glfs_linkat, glfs_fchownat.
Acc. to man pages,
If pathname is an empty string, operate on the file referred to by dirfd,
(which may have been obtained using the open(2) O_PATH flag).
Updates: #2717
Sponsored-By: iXsystems, Inc https://www.ixsystems.com/
Signed-off-by: Shree Vatsa N <vatsa@kadalu.tech>
The test (./tests/basic/distribute/sparse_file_rebalance.t ) is
not finished within default time(200s), It is taking
time while a test is calling seek at 2M offset and trying to copy
sparse file.
Change-Id: Id174f4a9d66d1caaf69f495c3cf62a2e09e87b80
Fixes: #3778
Solution: To pass regression jobs increase the timeout to 300s
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* tests: replace localhost and 127.0.01 with valid ip
Problem: From Fedora 34 it is mandatory to use valid ip
instead of localhost, 127.0.0.1 or loopback address
(0.0.0.0 to 0.255.255.255).
Solution: use $ip -o -4 addr and then filter out the valid ip
(ipv6 addresses to be used once gluster is entirely
ipv6 compatible)
Fixes: #2944
Change-Id: I282a0a519c650c6848ffa668ade86a4f35f9de42
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* ec/quota.t: correct the syntax of create command
Change-Id: Ie26efe48664737c8dea108e1708d9a66af74ef94
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* revert disperse no in qouta.t and replace
HOSTNAME by H0 in user-xlator.t
Change-Id: I3eca83aa1272beb9fc75e0d1d22478588e9f1038
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* bug-824753.t: grep for hostname instead of ip
gluster volume clear-locks returns output
with the hostname, hence the new code change
with respect to the ip does not hold good here
Adding `hostname` inplace of ip
Fixes: #2944
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* bug-765380.t: replace $H0 by `hostname`
count_hostname_or_uuid_from_pathinfo() returns
hostname so, so its output should be grepped
for hostname not ip.
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* bug-921072.t: update ip in nfs.rpc-auth-allow and
nfs.rpc-auth-reject
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
* bug-921072.t: update nfs.rpc-auth-allow and reject
Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
The ./tests/bugs/posix/bug-1651445.t is getting failed continuously
while running test suite. The test case is failing after reaching a
situation while brick is throwing an ENOSPC error and after cleanup, as the
test case is trying to create a file it is failing. The file creation is
failing because the flag (disk_space_full) is reset after every 5s by
a thread posix_ctx_disk_thread_proc.
The test case is failing also in centos-8 because LVM reserved more
space in centos-8 as compare to centos-7
Solution: 1) After cleanup data wait for 5s to reset the flag. Earlier
the test case did the same but it was changed by the patch(#3637).
2) Change the overwrite condition in posix_writev.
3) In case of centos-8 call 2nd dd command with low block size.
Fixes: #3695
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: Ifa0310ba9266651557e29480f5ea476016726e41
The rebalance damon has skipped linkto files if it has found hashed
and cached subvol are same. Even It has skipped to cleanup linkto files
on decommissioned brick also. Due to this behavior If a user has tried
to run rmdir it gives an error ENOTEMPTY. Though the issue was already resolved
by the patch (https://review.gluster.org/#/c/glusterfs/+/17065/) but the
patch has scope of improvement. This approach slows the RMDIR performance.
I have observed if we do skip readdirp during rmdir the performance has
improved significantly. To improve the RMDIR performance first we need
to clean up link files during rebalance and for specific to rmdir
optimization I will send a separate patch.
Solution: Cleanup linkto file in case of decommissioned brick and
if rebalance has been found hashed and cached subvol
are the same.
Fixes: #3683
Change-Id: Ib9d70bcd1e36b25e8a3e2b86ed1ea928676e76e3
Credits: Xavi Hernandez <xhernandez@redhat.com>
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* posix: posix xlator does not respects storage.reserve value
In a small storage environment (brick_root is < 100G) the
POSIX xlator does not respects storage.reserve value.The flag value
is set after every 5s basis and so in that window if the client has
generated the data the posix xlator does not validate storage.reserve
spacee check and allow client to consume the brick space unless the
flag has not been set by a posixctxres thread.
Solution: Before doing any writev for an external client check
the current free storage space with writev buffer and if
it has surpassed the limit return ENOSPC. The priv->write_value
parameter has been updated during call unlink
and truncate fop also to use the correct value.
Fixes: #3636
Change-Id: I7e174553c22893dd44438f48406e895e13b5db5e
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* posix: Resolve reviewer comments
Fixes: #3636
Change-Id: I569b8e5d96f138204d25e9753a92cb19135bd584
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* posix: Calculate file written size based on (pre|post)op
block size difference to avoid overwrite cases.
Fixes: #3636
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Change-Id: I87efee72e9cdbd1a20df30b07a6e2587ce0675a6
Co-authored-by: root <root@localhost.localdomai>
When a volume is accessed via a fuse client, access control will take
place at two locations by default: once on client side, once on server
side.
The fuse client unconditionally performs access control (what can be
configured about it is whether to let the kernel do it -- that's the
default, or let the glusterfs client do it -- that's what happens with
'--acl' option). However, server side access control can be turned off
via the 'features.acl' volume option; indeed, this is a desirable
optimization if it can be asserted that only fuse clients will access
the volume (along with setting 'server.manage-gids off', as group data
won't be used by server in this case).
This commit enhances the 'many-groups-for-acl.t' test script to test
this configuration.
- The checks that are performed in a given configuration are extended
by an access attempt that should fail with not having sufficient
permissions. This is needed to demonstrate that access control is
properly in effect.
- A new configuration is added with
'feature.acl off; sever.manage-gids off'.
Change-Id: I2e2cd804f76550a5e8fbfc84c5de7f81318de073
Updates: #1000
Signed-off-by: Csaba Henk <csaba@redhat.com>
* Do not use a fixed password of 'pass' for nroot
Instead, we use a random enough password, which would be sufficient
for the minute this user is open. There should be at least around 32 000
differents passwords, which mean testing 533 passwords on the live server per
second. I suspect our builders aren't fast enough for that.
* Use a secure location for the script
Since the original script is in /tmp, a regular non root user could create
the file with lax permissions. The call to 'cat' would work, but then the
attacker could simply replace the content fast enough, and it would
be called each time ssh is run, which is quite often given the nature
of the georep feature testing.
This patch fix that by using mktemp to create a safe directory that
can't be accessed by a non root user, thus preventing the attack.
Change-Id: I92ef7a3c1cc8e57e9ffea8669fb21883efb81831
Signed-off-by: Michael Scherer <misc@redhat.com>
On a container, global loop devices may be seen by everyone. Instead
potentially using colliding devices simultaneously and blindly deleting
all of them on cleanup, let the kernel assign the names to avoid
concurrent use of the same device and destroy only those created by
the current container.
Change-Id: Idc61a47665ab009c5b335d04ff6ad3946ef49b79
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Some improvements that make tests runs faster or more reliably.
Change-Id: If1060d3040c3e9f40c70b4c3c5c6357a4d5d93ef
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
It's hard to make LVM work inside a container, but some changes have
been done to simplify it by adding a unique identifier (the hostname)
as part of the volume group name. This way it's possible to not
touch anything created by other containers during cleanup.
Also added some cleanup improvements.
Change-Id: I0fd8ff904faa71e30c4aaec1965b215176ebd7e1
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Two main fixes:
- Compilation warning because of a type mismatch.
- Use of stack allocated buffers for background operations that may
complete outside of the function.
Change-Id: I41187578c48987d6b9e890d7bbb8f928efbab15c
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Inside a container, kernel's fuse module may pass 'pid' as 0 in some
cases. This caused a failure of test bugs/protocol/bug-808400-fcntl.c
when run unside a container because it was explicitly checking the pid.
Fixed it by allowing the pid to be the expected value or 0.
Change-Id: I97fbbe1311d0ab093ddb6881abc1da707e581a44
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
There were some duplicated functions inside include.rc with identical
code. These have been removed.
There were also two functions with the same name but with different
implementations in include.rc and snapshot.rc. This worked because
some tests did include the files in different order depending on
what they really need.
To address this, the following changes have been applied:
- Moved the function from snapshot.rc to cluster.rc and renamed it.
- Moved the function from include.rc to volume.rc.
- Fixed include order from all scripts.
- Modified scripts that required the renamed function.
Change-Id: I82d220da4e8cd0148d123a49d96ebefbeb7a954c
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
'vers=3' option is implicitly added by nfs_mount() function, so it's
redundant to pass it as an argument. Additionally, some versions of
mount scripts complain if the option is passed twice.
Change-Id: Ief7ac73441882403c9c0ce599dfc2bf45795d017
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Some versions of grep consider that the htime file is binary, causing
test failures. It has been fixed by forcing it to interpret the data
as text.
Also fixed some clean up issues.
Change-Id: I20def8d2ea3e75c8db0aca8d6d7a3c4a8ecbba96
Updates: #3469
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
When we have more than one distribution set, it may not be
a good idea to automatic gfid split brain resolution using
favourite child policy
Because there could be a chance that the gfid that healed
could be a wrong one, since the source of truth can only
be determined by DHT.
It is still possible to resolve the gfid mismatch using the
CLI command
Change-Id: I4ddc82e6b93da9b8debb74222aff2d7d276673b3
fixes: #3288
Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>
* the quota setting can happen only on 'namespace' inode.
* once set, the accounting is maintained only at the namespace level for whole tree.
* uses 'simple-quota' key to show the correct quota usage in distributed volume.
* statfs()'s response would be used to set the the volume level usage in xlator,
in setxattr() call, when done through a special mount process.
* the xlator is designed to be on brick graph, and saves only the usage of data
inside it.
* An option is provided to utilize the backend filesystem's quota feature for
'accounting', which can improve the performance.
- This PR expects backend quota to properly return `statfs()` (or `df`) output.
If `features.simple-quota.use-backend` option is set, then there wont be any
active accounting in simple-quota translator, but only `setxattr()` and
`statfs()` is handled for special keys. We expect helper function to
set 'quota-limit' and also 'namespace' xattr to aid the glusterfs process in
general to identify the entries.
Updates: #1774
Change-Id: Id4229b720b57cde458b9b36e36ada3ffe2be0ac2
Signed-off-by: Amar Tumballi <amar@kadalu.io>
It is expected to have differences in time attributes for
Arbiter node as we are not doing most of the operations
on Arbiter node. Hence we don't need to check for mdata
value in the dict while calculating sinks for a metadata
heal.
Change-Id: I326017fcb5810d7ce765e189167a4101c6b0a046
Signed-off-by: Mohammed Rafi KC <rafi.kavungal@iternity.com>
As a continuation to #3130 , there are several .h files that are included in various files.
However, many of the definitions in those include files should be local to specific files, or is simply dead code.
We can easily clean them up.
Note - there are some functions that were deemed useful as common or utilities that were meant to be shared across the codebase.
After 10 or so years, I think it's OK to move some of them that were never shared across the code,
to their own users (and make them static while at it).
This commit specifically cleaned up glusterd-utils.c
Updates: #3137
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Some regression tests fail on IBM 390x machine, the main RC seems to be
the different endianness: s390x is big-endien, x86 is little-endian.
For DHT the failures related to different hash-val which leads to unexpected location.
Work scope:
Detect the cases where assignment returns a different value due architectures and
use the little-endian to host byte order (use thie convertion as the t tests are running fine on x86).
Note:
Dict serialize/deserialize code already implements network byte order (big-endian)
Updates: #2491
Change-Id: I2900f0f363ab00af8c68900b8aa2e30c574120e5
Signed-off-by: Tamar Shacked <tshacked@redhat.com>
Use installed xxhash library and xxhsum binary if found,
fallback to use everything from contrib/xxhash otherwise,
adjust gfid2path FUSE and NFS tests accordingly.
Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com>
Updates: #1000
There is a global list of locks from which you can decide if a lock
needs to be taken or not. There is no need to maintain a local list of
locks per client xlator.
fixes: #2483
Change-Id: I0860afc772f5ad78312d443e31c7310def673a6a
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
* snapshots: Support for ZFS Snapshots
No seperate mount done for mounting the Snapshot bricks.
`.zfs` virtual directory is used as Snapshot brick path.
For example: If the brick path is `/zpool1/brick1` and Snapshot
name is `s1` then the Snapshot brick path will be
`/zpool1/.zfs/snapshot/s1/brick1`
Cloned bricks will have the path similar to original bricks.
`<zpool>/<clone-name>/<brick-dir>`.
Change-Id: I912618bdcbaad3f9a971ef9b1fa4c00ef81b1198
Sponsored-By: iXsystems, Inc <https://www.ixsystems.com>
Signed-off-by: Aravinda Vishwanathapura <aravinda@kadalu.io>
* snapshots: ZFS Snapshot integration tests
* snapshots: Change ZFS Pool to Dataset
* snapshots: Fix the ZFS Snapshot tests
With this PR, specially on the brick side graph, inode table would be
properly set with 'namespace' inode reference. It is not guaranteed
with fuse/client side graph due to subdir mount.
To get 'namespace' for a corresponding inode, all one needs to do is,
check `ns_inode` pointer in inode structure.
Currently only special mounts with `PID < 0` can set the namespace
attribute, and in lookup, if namespace attribute is present, we set
the variable in inode.
By default, the ns_inode is set to 'root' inode when inode gets created.
Fixes: #1757
Change-Id: I69157e388538ea5d4b4e45d543575a04ee9ef221
Signed-off-by: Amar Tumballi <amar@kadalu.io>
**Description:**
Currently option `cluster-test-mode` was used to set
a user-defined logging directory for gluster related logs which
doesn't seem to be an appropriate name for the option.
**Fix:**
Updated the option to be `logging-directory` keeping in mind the
naming convention of other options like `working-directory`,
`run-directory`.
**NOTE:**
This option doesn't updates the path for cli and glusterd log file,
that still needs to be set manually via command line or through the
sysconfig file.
Updates: #2939
Change-Id: I5fbbeff21ea1a89d439537311a81389b57e7acde
Signed-off-by: nik-redhat <nladha@redhat.com>
Instead of using our own implementation, let's use the platform's endian conversion routines.
For example, be32toh() instead of ntoh32().
Fixes: #2735
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
On Linux, O_DIRECT requires the buffer to be aligned at logical sector
size boundary, and both buffer size and file offset should be multiple
of the logical sector size as well. Usually it is equal to 512 bytes and
may be obtained by BLKBSZGET ioctl() on a device special file. On the
other side O_DIRECT may give better performance with chunks aligned/sized
to multiple of the logical filesystem block size. The latter is obtained
via statvfs() in posix_init() and used to check whether the chunk is
suitable for O_DIRECT where applicable.
Also fix false positive tests/bugs/glusterfs/bug-866459.t to check
whether the test file is not truncated.
Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com>
Updates: #1000
The test case is throwing "No such file or directory"
error while running stat after run add-brick operation.
Solution: Need to check brick_count after run add-brick to avoid an issue.
Fixes: #2862
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Tweak dict and xlator interfaces to allow explicit values of 'time_t' type
and consistently use it for timeouts, time intervals, and whatever similar.
Adjust related tests to prefer 0 over -1 for disabled timeouts.
Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com>
Updates: #1000
glusterd_add_peers_to_auth_list function added peer names in auth.allow
list along with a user configured auth.allow and saved user configured
auth.allow as a old.auth.allow. After add brick ran successfully it
call glusterd_replace_old_auth_allow_list to swap the key and regenerate
a volume graph.The list is corrupted because during dict_del (auth.allow)
(and old.auth.allow) buffer is clean up and same buffer is used to
save a new key.
Solution: Save auth.allow and old.auth.allow as a key after calling the
function dict_set_dynstr_with_alloc, the fuction allocate a new buffer
to save a key value in dictionary.
Fixes: #2625
Change-Id: I359bf906d3521f1644db6b16948c62198a58ac93
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>