Exploring a Integer Overflow that leads Heap OOB R/W in the GGUF Parser: Root Cause Analysis in CVE-2025-53630

A deep dive into CVE-2025-53630 in Llama.cpp, analyzing how malformed GGUF model inputs lead to a crash. This post walks through reversing the parser, identifying the root cause, and understanding how the bug can be triggered in practice.

Posted Apr 16, 2026 Updated Jul 6, 2026

By Nkateko Tibane

5 min read

llama.cpp

Project Description The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud.

llama.cpp is a open source library that enables efficient LLM inference directly on consumer hardware. The framework is build around the GGML tensor library and the GGUF file format, which allows for fast model loading and quantization to reduce memory usage while maintaining performance.

What is GGUF?

GGUF (GPT-Generated Unified Format) is a binary format designed for efficient storage, compression and deployment of large language models (LLMs).

The File Structure

Source

A tensor

In a GGUF file, a tensor represents the fundamental unit of model data, containing the weights, biases, and normalization parameters that define the neural network’s structure. Each tensor is described by a metadata header specifying its name, shape (dimensions), data type and offset, followed immediately by its binary data block which stores the actual numerical values in a packed quantized format.

About vulnerability

Summary The vulnerability stems from unchecked arithmetic during the accumulation of tensor sizes. The GGUF parser fails to validate that the sum of all tensor sizes remain within the bounds of UINT64_MAX. This results in an integer overflow, leading to a heap-based buffer overflow during model initialization.

This vulnerability occurs in the GGUF parser in the gguf_init_from_file_impl() function in ggml/src/gguf.cpp. The vulnerability is a Integer Overflow that can lead to a heap Out-of-Bounds Read/Write.

gguf_init_from_file_impl() iterates through tensor information read from the GGUF file and in this function there is a integer overflow vulnerability because of GGUF tensor size calculations (total size required for all tensor data). The size is stored in ctx->size which is a size_t variable.

The core bug happens here:

  
// compute the total size of the data section, taking into account the alignment
{
	ctx->size = 0; // [1]
	for (size_t i = 0; i < ctx->info.size(); ++i) {
		const gguf_tensor_info & ti = ctx->info[i];
		if (ti.offset != ctx->size) {
			GGML_LOG_ERROR("%s: tensor '%s' has offset %" PRIu64 ", 
			expected %zu\n",__func__, ti.t.name, ti.offset, ctx->size);
			GGML_LOG_ERROR("%s: failed to read tensor data\n", __func__);
			gguf_free(ctx);
			return nullptr;
		}
	ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment); // [2]
	}
}

[1] is the vulnerable variable (size_t) where the integer overflow will happen. [2] In this line the padded padded size of the current tensor is added to ctx->size, but there is no bounds checking here so if the total tensor data size can exceed UINT64_MAX (a 64-bit unsigned integer wrap around) then a integer overflow will occur, thus making ctx->size become smaller. To actually trigger the vulnerability a the triggering tensor must match the overflowed value of the previous tensors in the GGUF file.

An allocation of memory for the tensor data that uses ctx->size (which may be small, because of the vulnerability causing a under allocation) happens if prams.no_alloc is false.

  
struct ggml_tensor * data = nullptr; // [3]
if (!params.no_alloc) {
	data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size); // [4]

[3] Here is the struct ggml_tensor.

  
struct ggml_tensor {
	enum ggml_type type;
	struct ggml_backend_buffer * buffer;
	int64_t ne[GGML_MAX_DIMS]; // number of elements
	size_t nb[GGML_MAX_DIMS]; // stride in bytes:
	// nb[0] = ggml_type_size(type)
	// nb[1] = nb[0] * (ne[0] / ggml_blck_size(type)) + padding
	// nb[i] = nb[i-1] * ne[i-1]
	// compute data
	enum ggml_op op;
	// op params - allocated as int32_t for alignment
	int32_t op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)];
	int32_t flags;
	struct ggml_tensor * src[GGML_MAX_SRC];
	// source tensor and offset for views
	struct ggml_tensor * view_src;
	size_t view_offs;
	void * data; // [5]
	char name[GGML_MAX_NAME];
	void * extra; // extra things e.g. for ggml-cuda.cu
	char padding[8];
};

[4] ggml_new_tensor_1d() creates a 1D tensor (usually called as data or blob_tensor), and its data buffer is allocated using overflowing ctx->size, to allocated a small heap based buffer (undersized heap allocation). [5] the pointer is data->data.

The data pointer (cur->data) of each tensor is set to point to a location within the buffer allocated earlier.

  
ggml_set_no_alloc(ctx_data, true); // [6]
// create the tensors
for (size_t i = 0; i < ctx->info.size(); ++i) {
	const struct gguf_tensor_info & info = ctx->info[i]; // [7]
	struct ggml_tensor * cur = ggml_new_tensor(ctx_data, info.t.type, GGML_MAX_DIMS, info.t.ne);
	ok = ok && cur != nullptr;
	if (!ok) {
		break;
	}
	ggml_set_name(cur, info.t.name);
	// point the data member to the appropriate location in the binary blob using the tensor info
	if (!params.no_alloc) {
	cur->data = (char *) data->data + info.offset; // [8] HEAP LOCATION | IF iNTERGER OVERFLOW THEN OUT-OF-BOUNDS ACCESS
}
}

[6] This line temporarily sets no_alloc for creating tensors. [7] info is the same as ti from where the vulnerability is initially introduced. [8] data->data is the pointer to the allocated heap buffer and info.offset is the offset read directly from the GGUF file for the current tensor. If info.offset is greater than ctx->size (after integer overflow), then cur->data will point to a out of bounds location than the allocated heap buffer.

Any operation that uses the cur->datapointer will result in a heap out-of-bounds read or write.

About the PoC

There is a already existing PoC, but i rewrote it so that I can better understand how the vulnerability is triggered.

The code snippet from examples/gguf/gguf.cpp (specifically from gguf_ex_read_1() ) attempts to read and print tensor data:

  
struct ggml_tensor * cur = ggml_get_tensor(ctx_data, name); //[1]
printf("%s: tensor[%d]: n_dims = %d, ne = (%d, %d, %d, %d),
	name = %s, data = %p\n",
		__func__, i, ggml_n_dims(cur), int(cur->ne[0]), int(cur->ne[1]), int(cur->ne[2]), int(cur->ne[3]), cur->name, cur->data);
// print first 10 elements
const float * data = (const float *) cur->data; //[2]
printf("%s data[:10] : ", name);
for (int j = 0; j < MIN(10, ggml_nelements(cur)); ++j) {
	printf("%f ", data[j]); // [3]
}
printf("\n\n");

[1] cur is the tensor structure, [2] data holds a Out-of-Bounds pointer. [3] A Out-of-Bounds read occurs and is printed to the console.

The PoC generates a malicious GGUF file to trigger the vulnerability. It manipulates the tensor metadata (names, dimensions, types and offsets) to cause ctx-size integer overflow and pass relevant checks.

It can be found here: https://huggingface.co/yuuoniy/overflow/blob/main/generate_poc.c

I am currently working on exploiting this vulnerability.

research, cve-analysis

research

This post is licensed under CC BY 4.0 by the author.