diff --git a/config-linux.md b/config-linux.md index 3223e97..6797bc9 100644 --- a/config-linux.md +++ b/config-linux.md @@ -483,17 +483,31 @@ The following parameters can be specified to set up the controller: ### Huge page limits -**`hugepageLimits`** (array of objects, OPTIONAL) represents the `hugetlb` controller which allows to limit the -HugeTLB usage per control group and enforces the controller limit during page fault. +**`hugepageLimits`** (array of objects, OPTIONAL) represents the `hugetlb` controller which allows to limit the HugeTLB reservations (if supported) or usage (page fault). +By default if supported by the kernel, `hugepageLimits` defines the hugepage sizes and limits for HugeTLB controller +reservation accounting, which allows to limit the HugeTLB reservations per control group and enforces the controller +limit at reservation time and at the fault of HugeTLB memory for which no reservation exists. +Otherwise if not supported by the kernel, this should fallback to the page fault accounting, which allows users to limit +the HugeTLB usage (page fault) per control group and enforces the limit during page fault. + +Note that reservation limits are superior to page fault limits, since reservation limits are enforced at reservation +time (on mmap or shget), and never causes the application to get SIGBUS signal if the memory was reserved before hand. +This allows for easier fallback to alternatives such as non-HugeTLB memory for example. In the case of page fault +accounting, it's very hard to avoid processes getting SIGBUS since the sysadmin needs precisely know the HugeTLB usage +of all the tasks in the system and make sure there is enough pages to satisfy all requests. Avoiding tasks getting +SIGBUS on overcommited systems is practically impossible with page fault accounting. + For more information, see the kernel cgroups documentation about [HugeTLB][cgroup-v1-hugetlb]. Each entry has the following structure: -* **`pageSize`** *(string, REQUIRED)* - hugepage size +* **`pageSize`** *(string, REQUIRED)* - hugepage size. The value has the format `B` (64KB, 2MB, 1GB), and must match the `` of the - corresponding control file found in `/sys/fs/cgroup/hugetlb/hugetlb..limit_in_bytes`. + corresponding control file found in `/sys/fs/cgroup/hugetlb/hugetlb..rsvd.limit_in_bytes` (if + hugetlb_cgroup reservation is supported) or `/sys/fs/cgroup/hugetlb/hugetlb..limit_in_bytes` (if not + supported). Values of `` are intended to be parsed using base 1024 ("1KB" = 1024, "1MB" = 1048576, etc). -* **`limit`** *(uint64, REQUIRED)* - limit in bytes of *hugepagesize* HugeTLB usage +* **`limit`** *(uint64, REQUIRED)* - limit in bytes of *hugepagesize* HugeTLB reservations (if supported) or usage. #### Example diff --git a/specs-go/config.go b/specs-go/config.go index 7240772..25f4e6e 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -254,12 +254,13 @@ type POSIXRlimit struct { Soft uint64 `json:"soft"` } -// LinuxHugepageLimit structure corresponds to limiting kernel hugepages +// LinuxHugepageLimit structure corresponds to limiting kernel hugepages. +// Default to reservation limits if supported. Otherwise fallback to page fault limits. type LinuxHugepageLimit struct { - // Pagesize is the hugepage size - // Format: "B' (e.g. 64KB, 2MB, 1GB, etc.) + // Pagesize is the hugepage size. + // Format: "B' (e.g. 64KB, 2MB, 1GB, etc.). Pagesize string `json:"pageSize"` - // Limit is the limit of "hugepagesize" hugetlb usage + // Limit is the limit of "hugepagesize" hugetlb reservations (if supported) or usage. Limit uint64 `json:"limit"` } @@ -394,7 +395,7 @@ type LinuxResources struct { Pids *LinuxPids `json:"pids,omitempty"` // BlockIO restriction configuration BlockIO *LinuxBlockIO `json:"blockIO,omitempty"` - // Hugetlb limit (in bytes) + // Hugetlb limits (in bytes). Default to reservation limits if supported. HugepageLimits []LinuxHugepageLimit `json:"hugepageLimits,omitempty"` // Network restriction configuration Network *LinuxNetwork `json:"network,omitempty"`