CVE-2026-53923 — CVE-2026-53923

CVE-2026-53923

HIGH · CVSS 7.5

EPSS exploitation probability: 24%

Published 2026-06-22T23:16:30.737 · Last modified 2026-06-24T16:51:00.307

Summary

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Does this affect you?

Add your gear to cvedb and we'll alert you only when a vendor you run ships something exploited.

Check my exposure →

References

This product uses data from the NVD API but is not endorsed or certified by the NVD. Informational only; not professional security advice.