It was later allocated anyway, via strdup(), so now it's allocated and
deallocated along with the dentry structure. Switched to using malloc()
instead of calloc() as we were initializing all members. Lastly, removed
the (now unused) pool for dentries.
Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* inode.c/h: use flexible array member for inode _ctx struct
We always know the _ctx array size on inode creation - so we can allocate it along with the inode struct itself
This also moves the inode creation from using a memory pool (if memory pool are enabled anyway) to regular GF_CALLOC().
Next patch will remove the inode pool.
Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* inode: remove unused inode_pool memory pool
Updates: #1000
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
[inode.c] alternative hash strategy that grabs different bytes, without
mixing, and then modulo's that into the hash table size.
The old mechanism used the last two bytes only, thus allowing for a
distribution from 0x0 to 0xffff - which since 65536 is too small a hash
table to maintain sensible performance in some use cases is
insufficient. We require a much larger (minimum 4x larger but ideally
16x larger) distribution.
This isn't as predictive, it will use the sizeof(int) which can vary in
theory, but is generally 4 bytes, so we will use the first four completely
random bytes, and just straight modulo that, retaining the requirement
that the modulo must be a power of two (which further improves
performance by eliminating the multi-cycle division).
Glusterfs uses libuuid, which by default the entire 16 bytes is just
random, where time-based is used as a fallback. Should we be using a
time-based UUID values this might be a problem if a lot of files were
initially created around the same time, or files are created in clusters
around the same time. Previously if we were falling back to this two
bytes from the node portion of the UUID would have been used, which
would have been extremely bad for performance, as such I believe this
should not be a problem.
Signed-off-by: Jaco Kroon <jaco@uls.co.za>
Messages from quiesce xlator have been migrated to the new format as an
example.
This new approach has several advantages:
- Centralized message definition
- Customizable list of additional data per message
- Typed message arguments to enforce correctness
- Consistency between different instances of the same message
- Better uniformity in data type representation
- Compile-time generation of messages
- Easily modify the message and update all references
- All argument preparation is done only if the message will be logged
- Very easy to use
- Code auto-completion friendly
- All extra overhead is optimally optimized by gcc/clang
The main drawback is that it makes heavy use of some macros, which is
considered a bad practice sometimes, and it uses specific gcc/clang
extensions, but we are already using them in other places.
To create a new message:
GLFS_NEW(_comp, _name, _msg, _num, _fields...)
To deprecate an existing message:
GLFS_OLD(_comp, _name, _msg, _num, _fields...)
To permanently remove a message (but keep its ID reserved):
GLFS_GONE(_comp, _name, _msg, _num, _fields...)
To be able to mix messages using old and new interfaces, the messages
using the old interface can be defined this way:
GLFS_MIG(_comp, _name, _msg, _num, _fields...)
Each field is a list composed of the following elements:
(_type, _name, _source, _format, (_values) [, _extra])
- _type is the data type of the field (int32_t, const char *, ...)
- _name is the name of the field (it will also appear in the log
string)
- _source is the origin of the data
- _format is the C format string to represent the field
- _values is a list of values used by _format (generally it's only
_name)
- _extra is an optional extra code to prepare the representation of
the field (check GLFS_UUID() for an example)
There are some predefined macros for common data types.
Example:
Message definition:
GLFS_NEW(LIBGLUSTERFS, LG_MSG_EXAMPLE, "Example message", 3,
GLFS_UINT(number),
GLFS_ERR(error),
GLFS_STR(name)
)
Message invocation:
GF_LOG_I("test", LG_MSG_EXAMPLE(3, -2, "test"));
This will generate a message like this:
"Example message ({number=3}, {error=2 (File not found)}, {name='test'})"
Debug and trace messages are defined directly in the logging place:
GF_LOG_D("test", "Debug message", 3,
GLFS_UINT(number, 3),
GLFS_ERR(error, errno),
GLFS_STR(name, this->name)
);
Change-Id: I8f4bd7b9b90f649a52fe29a62222101eeccf0c68
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
As a continuation to #3130 , there are several .h files that are included in various files.
However, many of the definitions in those include files should be local to specific files, or is simply dead code.
We can easily clean them up.
Note - there are some functions that were deemed useful as common or utilities that were meant to be shared across the codebase.
After 10 or so years, I think it's OK to move some of them that were never shared across the code,
to their own users (and make them static while at it).
This commit specifically cleaned up glusterd-utils.c
Updates: #3137
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Using-Gluster-Test-Framework.md is pointing to some older
workflow which is deprecated since the move to github and
docs.gluster.org.
Updating the links with the correct ones.
Fixes: #3054
Change-Id: Id5505f9e5c537f56aa0d52b2d5f4354d58ea8b89
Signed-off-by: karthik-us <ksubrahm@redhat.com>
This session covers fuse and its trade offs. Details of the session include:
- various parts of fuse code in the glusterfs source tree
- the story FUSE version macros tell
- the tale of FUSE and fuse (historical context, terminology)
- to libfuse or not to libfuse?
- FUSE proto breakdown
- mount and INIT
updates: #2308
Change-Id: I764f925c8de4a2be0461e71d2c720005b9e617e3
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Description:
The log commands in the gluster man page still has
the `volume log locate` and `volume log filename` commands
which are deprecated and not available anymore.
Fix:
Removed those deprecated commands and fixed the `volume log rotate`
command.
Fixes: #2939
Change-Id: Ica55aa3f532fbfbb7bda8adbbfe20443f4f8464b
Signed-off-by: nik-redhat <nladha@redhat.com>
This session covers posixlk behavior in locks xlator along with code
walkthrough
Change-Id: Icea59a96bd5611a155ff95e4cef878b5f38358de
updates: #2308
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
This session covers self-heal daemon for replication
- Types of heal needed for a given file/directory
- Code walkthrough of data, metadata, entry self-heals
Change-Id: Icea59a96bd5611a155ff95e4cef878b5f38358de
updates: #2308
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
This session covers self-heal daemon of replication
- introduction
- crawl types
- code walkthrough
are covered in this session.
Change-Id: Icea59a96bd5611a155ff95e4cef878b5f38358de
updates: #2308
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
These sessions cover replication xlator introduction, update, read
transactions along with normal fops both explanation and code
walkthrough
Change-Id: Ic895b02aa2a7021bd423f0b407ff90951ceebc69
updates: #2308
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
...in order for it to be recognized as a valid option.
i.e. mount -t glusterfs $IP:$VOLUME -o localtime-logging /path/to/mount
Change-Id: I8859b31d92b54d1e0f877728a1bfff8b4ac37e56
Fixes: #2798
Signed-off-by: Ravishankar N <ravishankar.n@pavilion.io>
In case of glibc_pool it is good to use glusterfs mempool so revert
the commit 9cd6735ff5.
Change-Id: I780f0a1b7dae815becfd8c072735b6fdecb936f8
Updates: #1000
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Separate memory allocation type tracking from core infrastructure of
memory pools and drop the latter alongside with '--disable-mempool'
configuration option, provide '--disable-allocation-tracking' to control
the former instead.
Signed-off-by: Dmitry Antipov <dantipov@cloudlinux.com>
Updates: #1000
This session covers entrylk part of locks xlator along with its
connection to inodelk in deletion code path.
updates: #2308
Change-Id: Ie52af7529b9a744f9352df589291cab6b794a125
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
This session covers locks xlator introduction and inodelk code
walk through.
updates: #2308
Change-Id: Id583af4a29b67fd44d72d81ec11ffe5c4d5b1bc4
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
This patch implements the new I/O framework that supports the current
legacy mode and a new mode based on io_uring in a compatible way so that
users of the framework don't need (in most cases) to have different
implementations depending on what mode the system is using.
This patch only introduces the framework but it's not yet used by any
component. Migration to this framework will happen in future patches.
The framework is integrated with glusterd, glusterfs and glusterfsd.
Updates: #2123
Change-Id: I5f041fdb524d7a299f77c1a468791a6322aa7bfd
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
This session covers index xlator design and implementation
updates: #2308
Change-Id: I2eef965b073d4a3e95b0db808aee5eb3053dfc1f
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Added links to the session that covers memory-tracking in glusterfs and
io-threads xlator
Change-Id: I593db4aa3d8613a4b8e6e4596035c8a9dfeff0dc
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Added links to the session which covers client, server interaction in
connect/disconnect/reconnect code paths.
updates: #2308
Change-Id: I0ed43e5555397e63c29f532d2fc16e84a0398fc6
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Added links to the session which covers communication layer in
glusterfs.
Change-Id: I144dc8d758db477a3aa1d1994a3a51cb60c74390
updates: #2308
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Added links to the session which covers lifecycles of inode_t, fd_t and
how to debug ref leaks.
updates: #2308
Change-Id: Ib5378718423e65cd08e43ab7ec3a02bbac578b51
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Adding links to developer session 4 which covers programming model used
in gluster.
updates: #2308
Change-Id: Ia45b6ea25fb2300e380a6c624dd6821a87e1af52
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
Adding links to recordings, slides of the dev-session 1, 2
updates: #2308
Change-Id: I9e10173e2b3b0d70304fa8fa050734aba06a2c6b
Signed-off-by: Pranith Kumar K <pranith.karampuri@phonepe.com>
We only need passive and active lists, there's no need for a full
iobuf variable.
Also ensured passive_list is before active_list, as it's always accessed
first.
Note: this almost brings us to using 2 cachelines only for that structure.
We can easily make other variables smaller (page_size could be 4 bytes) and fit
exactly 2 cache lines.
Fixes: #2096
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Remove the unneeded backslash for glusterfs manpage, so
we can get "PATH" instead of "PATHR":
--dump-fuse=PATHR -> --dump-fuse=PATH
Updates: #1000
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Replace master and slave terminology in geo-replication with primary and
secondary respectively.
Change-Id: I3eb9242d2ce8340435265b764d28221d50f872c8
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
These were the only offensive language occurences in the code (.c) after
making the changes for geo-rep (whichis tracked in issue 1415).
Change-Id: I21cd558fdcf8098e988617991bd3673ef86e120d
Updates: #1000
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
As Issue #1310 pointed out, for fops that
have interrupt handlers the fop handler
needs to pay attention to give the proper
FUSE response when it's interrupted.
This change extends the interrupt documentation
with guidelines regarding the FUSE response.
Also:
- improve wording
- add an 'Overview' section to explain the
code flow before going in to the technical
details
Change-Id: I852bfb717b1bde73f220878d6376429564413820
updates: #1374
Signed-off-by: Csaba Henk <csaba@redhat.com>
Rename disk space checking thread to comply with
common convention, adjust related docs as well.
Change-Id: I36d642cf09773a28abd95bbe337ce29134ad96a4
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Fixes: #1248
The number of signing process threads (glfs_brpobj)
is set to 4 by default. The recommendation is to set
it to number of cores available. This patch makes it
configurable as follows
gluster vol bitrot <volname> signer-threads <count>
fixes: bz#1797869
Change-Id: Ia883b3e5e34e0bc8d095243508d320c9c9c58adc
Signed-off-by: Kotresh HR <khiremat@redhat.com>
The current lru-limit value still uses memory for
upto 128K inodes.
Reduce the default value of lru-limit to 64K.
Change-Id: Ica2dd4f8f5fde45cb5180d8f02c3d86114ac52b3
Fixes: bz#1753880
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Problem:
While running markdown-link-checker it was
observed that there were a large number of
404 links present in the documentation present
in the form of markdown files in the project.
This was casued due to the following reasons:
1. Repos being removed.
2. Typo in markdown links.
3. Restructring of directoires.
Solution:
Fixing all the 404 links present in the project.
fixes: bz#1746810
Change-Id: I30de745f848fca2e9c92eb7493f74738f0890ed9
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
If the glusterfs fuse client process is unable to
process the invalidate requests quickly enough, the
number of such requests quickly grows large enough
to use a significant amount of memory.
We are now introducing another option to set an upper
limit on these to prevent runaway memory usage.
Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788
Fixes: bz#1732717
Signed-off-by: N Balachandran <nbalacha@redhat.com>
gluster volume create <VOLNAME> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2>
<thin-arbiter-host>:<path-to-store-replica-id-file> [force]
The changes have been made in a way that the last brick in the bricks list
will be treated as the thin-arbiter.
GD1 will be manipulated to consider replica count to be as 2 and continue creating the
volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick
client xlator entries for each subvolume in fuse volfile, volfile generation is
modified in a way to inject these entries seperately in the volfile for every subvolume.
Few more additions -
1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count.
2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry
in fuse volfiles. The option can be set using the following CLI syntax -
gluster volume set <VOLNAME> client.ta-brick-port <PORTNO.>
3- Volume Info will contain a Thin-Arbiter-path entry to distinguish
from other replicate volumes.
Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406
Updates #687
Signed-off-by: Vishal Pandey <vpandey@redhat.com>
The BD xlator was removed some time ago. Remove it from the graph.
We can also remove the caps settings - only the BD xlator
was using it.
Lastly, remove the caps (which only BD was using) and the document
describing the translator.
Change-Id: Id0adcb2952f4832a5dc6301e726874522e07935d
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
There are a lot of fromatting error is markdown
files peresent under /doc directiory of the project.
Fixing formatting errors and sending a patch.
Fixes: bz#1718273
Change-Id: I08f938088bbaaafddf634f73616ea0dbfe7aedf3
Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
in both `--help` text and man page
updates: bz#1193929
Change-Id: I9aa9367c6863ac8e2403255280697c9e6be26cf0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.
From glusterfs --help,
<snip>
--auto-invalidation[=BOOL] controls whether fuse-kernel can
auto-invalidate attribute, dentry and page-cache.
Disable this only if same files/directories are
not accessed across two different mounts
concurrently [default: "on"]
</snip>
Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.
[1] https://www.spinics.net/lists/gluster-devel/msg25907.html
Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1664934