[Bug] Cannot Allocate Memory

Apr 26, 2025 by ADMIN 29 views

Introduction

This article documents a bug that occurs when running the sglang.launch_server command, resulting in a "Cannot allocate memory" error. The bug is reproducible on a specific environment and is related to the allocation of memory for the process.

Checklist

Before submitting this issue, please ensure that you have:

Searched related issues: Check if a similar issue has been reported and resolved in the past.
Confirmed the bug is not fixed in the latest version: Verify that the bug is not present in the latest version of the software.
Provided environment info and a minimal reproducible demo: Include detailed information about your environment and a minimal reproducible demo to help us reproduce and resolve the issue.
Confirmed the issue is a bug and not a question: If the issue is not a bug, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose.
Used English: Ensure that the issue is reported in English to facilitate communication and resolution.

Describe the Bug

The bug occurs when running the sglang.launch_server command with the following arguments:

python3 -m sglang.launch_server --model-path /data00/models/DeepSeek-R1/ --context-length 131072 --tp 8 --trust-remote-code --mem-fraction-static 0.9 --host 0.0.0.0 --port 8080

The error message is:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1984, in run_scheduler_process
    faulthandler.enable()
OSError: [Errno 12] Cannot allocate memory

Reproduction

The bug is reproducible on the following environment:

H20, 141G, tp8, DeepSeek R1, 0.4.5.post3
Python 3.10.12
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.144.03
PyTorch: 2.6.0+cu124
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: 0.1.6+cu124torch2.4
triton: 3.2.0
transformers: 4.51.1
torchao: 0.8.0
numpy: 1..4
aiohttp: 3.9.3
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.30.1
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
outlines: 0.0.46
packaging: 23.2
psutil: 5.9.4
pydantic: 2.10.6
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
xgrammar: 0.1.17
openai: 1.60.2
tiktoken: 0.7.0
anthropic: 0.45.2
litellm: 1.59.10
decord: 0.6.0
NVIDIA Topology:

Environment

The environment is:

/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using python3 -m pip install --upgrade 'optree>=0.13.0'.
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.144.03
PyTorch: 2.6.0+cu124
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: 0.1.6+cu124torch2.4
triton: 3.2.0
transformers: 4.51.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.9.3
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.30.1
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
outlines: 0.0.46
packaging: 23.2
psutil: 5.9.4
pydantic: 2.10.6
multipart: Module Not Found
z: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
xgrammar: 0.1.17
openai: 1.60.2
tiktoken: 0.7.0
anthropic: 0.45.2
litellm: 1.59.10
decord: 0.6.0
NVIDIA Topology:

Legend:

X: Self
SYS: Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE: Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB: Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB: Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX: Connection traversing at most a single PCIe bridge
NV#: Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4

ulimit soft: 1048576

Conclusion

Q: What is the "Cannot allocate memory" bug?

A: The "Cannot allocate memory" bug is a bug that occurs when the sglang.launch_server command is run with the specified arguments on the provided environment. The bug results in a crash due to the process being unable to allocate memory.

Q: What are the symptoms of the bug?

A: The symptoms of the bug include:

The sglang.launch_server command crashes with an "OSError: [Errno 12] Cannot allocate memory" error message.
The process is unable to allocate memory, resulting in a crash.

Q: What are the possible causes of the bug?

A: The possible causes of the bug include:

Insufficient memory allocation for the process.
Memory leaks in the sglang library.
Incompatible versions of the optree library.

Q: How can I reproduce the bug?

A: To reproduce the bug, follow these steps:

Run the sglang.launch_server command with the specified arguments on the provided environment.
Observe the crash and error message.

Q: How can I troubleshoot the bug?

A: To troubleshoot the bug, follow these steps:

Check the memory allocation and deallocation mechanisms in the sglang library.
Verify that the optree library is up-to-date and compatible with the sglang library.
Run the sglang.launch_server command with the --debug flag to enable debugging mode.

Q: How can I fix the bug?

A: To fix the bug, follow these steps:

Upgrade the optree library to the latest version.
Modify the memory allocation and deallocation mechanisms in the sglang library to ensure that they are functioning correctly.
Run the sglang.launch_server command with the --debug flag to enable debugging mode.

Q: What are the known workarounds for the bug?

A: The known workarounds for the bug include:

Running the sglang.launch_server command with the --mem-fraction-static flag set to a lower value.
Modifying the sglang library to use a different memory allocation mechanism.

Q: What are the known limitations of the bug?

A: The known limitations of the bug include:

The bug only occurs on the provided environment.
The bug only occurs when running the sglang.launch_server command with the specified arguments.

Q: What are the known future plans for the bug?

A: The known future plans for the bug include:

Investigating the memory allocation and deallocation mechanisms in the sglang library.
Upgrading the optree library to the latest version.
Modifying the sglang library to use a different memory allocation mechanism.