Root Cause Analysis of CVE-2025-53630
llama.cpp
Project Description
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.
llama.cpp is a open source library that enables efficient LLM inference directly on consumer hardware. The framework is build around the GGML tensor library and the GGUF file format, which allows for fast model loading and quantization to reduce memory usage while maintaining performance.
What is GGUF?
GGUF (GPT-Generated Unified Format) is a binary format designed for efficient storage, compression and deployment of large language models (LLMs).
The File Structure
A tensor
In a GGUF file, a tensor represents the fundamental unit of model data, containing the weights, biases, and normalization parameters that define the neural network’s structure. Each tensor is described by a metadata header specifying its name, shape (dimensions), data type and offset, followed immediately by its binary data block which stores the actual numerical values in a packed quantized format.
About vulnerability
Summary The vulnerability stems from unchecked arithmetic during the accumulation of tensor sizes. The GGUF parser fails to validate that the sum of all tensor sizes remain within the bounds of
UINT64_MAX. This results in an integer overflow, leading to a heap-based buffer overflow during model initialization.
This vulnerability occurs in the GGUF parser in the gguf_init_from_file_impl() function in ggml/src/gguf.cpp. The vulnerability is a Integer Overflow that can lead to a heap Out-of-Bounds Read/Write.
gguf_init_from_file_impl() iterates through tensor information read from the GGUF file and in this function there is a integer overflow vulnerability because of GGUF tensor size calculations (total size required for all tensor data). The size is stored in ctx->size which is a size_t variable.
The core bug happens here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// compute the total size of the data section, taking into account the alignment
{
ctx->size = 0; // [1]
for (size_t i = 0; i < ctx->info.size(); ++i) {
const gguf_tensor_info & ti = ctx->info[i];
if (ti.offset != ctx->size) {
GGML_LOG_ERROR("%s: tensor '%s' has offset %" PRIu64 ",
expected %zu\n",__func__, ti.t.name, ti.offset, ctx->size);
GGML_LOG_ERROR("%s: failed to read tensor data\n", __func__);
gguf_free(ctx);
return nullptr;
}
ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment); // [2]
}
}
[1] is the vulnerable variable (size_t) where the integer overflow will happen. [2] In this line the padded padded size of the current tensor is added to ctx->size, but there is no bounds checking here so if the total tensor data size can exceed UINT64_MAX (a 64-bit unsigned integer wrap around) then a integer overflow will occur, thus making ctx->size become smaller. To actually trigger the vulnerability a the triggering tensor must match the overflowed value of the previous tensors in the GGUF file.
An allocation of memory for the tensor data that uses ctx->size (which may be small, because of the vulnerability causing a under allocation) happens if prams.no_alloc is false.
1
2
3
struct ggml_tensor * data = nullptr; // [3]
if (!params.no_alloc) {
data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size); // [4]
[3] Here is the struct ggml_tensor.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct ggml_tensor {
enum ggml_type type;
struct ggml_backend_buffer * buffer;
int64_t ne[GGML_MAX_DIMS]; // number of elements
size_t nb[GGML_MAX_DIMS]; // stride in bytes:
// nb[0] = ggml_type_size(type)
// nb[1] = nb[0] * (ne[0] / ggml_blck_size(type)) + padding
// nb[i] = nb[i-1] * ne[i-1]
// compute data
enum ggml_op op;
// op params - allocated as int32_t for alignment
int32_t op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)];
int32_t flags;
struct ggml_tensor * src[GGML_MAX_SRC];
// source tensor and offset for views
struct ggml_tensor * view_src;
size_t view_offs;
void * data; // [5]
char name[GGML_MAX_NAME];
void * extra; // extra things e.g. for ggml-cuda.cu
char padding[8];
};
[4] ggml_new_tensor_1d() creates a 1D tensor (usually called as data or blob_tensor), and its data buffer is allocated using overflowing ctx->size, to allocated a small heap based buffer ([5] the pointer is data->data).
The data pointer (cur->data) of each tensor is set to point to a location within the buffer allocated earlier.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ggml_set_no_alloc(ctx_data, true); // [6]
// create the tensors
for (size_t i = 0; i < ctx->info.size(); ++i) {
const struct gguf_tensor_info & info = ctx->info[i]; // [7]
struct ggml_tensor * cur = ggml_new_tensor(ctx_data, info.t.type, GGML_MAX_DIMS, info.t.ne);
ok = ok && cur != nullptr;
if (!ok) {
break;
}
ggml_set_name(cur, info.t.name);
// point the data member to the appropriate location in the binary blob using the tensor info
if (!params.no_alloc) {
cur->data = (char *) data->data + info.offset; // [8] HEAP LOCATION | IF iNTERGER OVERFLOW THEN OUT-OF-BOUNDS ACCESS
}
}
[6] This line temporarily sets no_alloc for creating tensors. [7] info is the same as ti from where the vulnerability is initially introduced. [8] data->data is the pointer to the allocated heap buffer and info.offset is the offset read directly from the GGUF file for the current tensor. If info.offset is greater than ctx->size (after integer overflow), then cur->data will point to a out of bounds location than the allocated heap buffer.
Any operation that uses the cur->datapointer will result in a heap out-of-bounds read or write.
About the PoC
There is a already existing PoC, but i rewrote it so that I can better understand how the vulnerability is triggered.
The code snippet from examples/gguf/gguf.cpp (specifically from gguf_ex_read_1() ) attempts to read and print tensor data:
1
2
3
4
5
6
7
8
9
10
11
struct ggml_tensor * cur = ggml_get_tensor(ctx_data, name); //[1]
printf("%s: tensor[%d]: n_dims = %d, ne = (%d, %d, %d, %d),
name = %s, data = %p\n",
__func__, i, ggml_n_dims(cur), int(cur->ne[0]), int(cur->ne[1]), int(cur->ne[2]), int(cur->ne[3]), cur->name, cur->data);
// print first 10 elements
const float * data = (const float *) cur->data; //[2]
printf("%s data[:10] : ", name);
for (int j = 0; j < MIN(10, ggml_nelements(cur)); ++j) {
printf("%f ", data[j]); // [3]
}
printf("\n\n");
[1] cur is the tensor structure, [2] data holds a Out-of-Bounds pointer. [3] A Out-of-Bounds read occurs and is printed to the console.
The PoC generates a malicious GGUF file to trigger the vulnerability. It manipulates the tensor metadata (names, dimensions, types and offsets) to cause ctx-size integer overflow and pass relevant checks.
It can be found here: https://huggingface.co/yuuoniy/overflow/blob/main/generate_poc.c
I am currently working on exploiting this vulnerability.
