[Bug] Cannot Allocate Memory
Introduction
This article documents a bug that occurs when running the sglang.launch_server
command, resulting in a "Cannot allocate memory" error. The bug is reproducible on a specific environment and is related to the allocation of memory for the process.
Checklist
Before submitting this issue, please ensure that you have:
- Searched related issues: Check if a similar issue has been reported and resolved in the past.
- Confirmed the bug is not fixed in the latest version: Verify that the bug is not present in the latest version of the software.
- Provided environment info and a minimal reproducible demo: Include detailed information about your environment and a minimal reproducible demo to help us reproduce and resolve the issue.
- Confirmed the issue is a bug and not a question: If the issue is not a bug, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose.
- Used English: Ensure that the issue is reported in English to facilitate communication and resolution.
Describe the Bug
The bug occurs when running the sglang.launch_server
command with the following arguments:
python3 -m sglang.launch_server --model-path /data00/models/DeepSeek-R1/ --context-length 131072 --tp 8 --trust-remote-code --mem-fraction-static 0.9 --host 0.0.0.0 --port 8080
The error message is:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1984, in run_scheduler_process
faulthandler.enable()
OSError: [Errno 12] Cannot allocate memory
Reproduction
The bug is reproducible on the following environment:
- H20, 141G, tp8, DeepSeek R1, 0.4.5.post3
- Python 3.10.12
- CUDA available: True
- GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
- GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
- CUDA_HOME: /usr/local/cuda
- NVCC: Cuda compilation tools, release 12.4, V12.4.131
- CUDA Driver Version: 550.144.03
- PyTorch: 2.6.0+cu124
- sglang: 0.4.5.post3
- sgl_kernel: 0.0.9.post2
- flashinfer: 0.1.6+cu124torch2.4
- triton: 3.2.0
- transformers: 4.51.1
- torchao: 0.8.0
- numpy: 1..4
- aiohttp: 3.9.3
- fastapi: 0.115.8
- hf_transfer: 0.1.9
- huggingface_hub: 0.30.1
- interegular: 0.3.3
- modelscope: 1.22.3
- orjson: 3.10.15
- outlines: 0.0.46
- packaging: 23.2
- psutil: 5.9.4
- pydantic: 2.10.6
- multipart: Module Not Found
- zmq: Module Not Found
- uvicorn: 0.34.0
- uvloop: 0.21.0
- vllm: 0.6.4.post1
- xgrammar: 0.1.17
- openai: 1.60.2
- tiktoken: 0.7.0
- anthropic: 0.45.2
- litellm: 1.59.10
- decord: 0.6.0
- NVIDIA Topology:
Environment
The environment is:
- /usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using
python3 -m pip install --upgrade 'optree>=0.13.0'
. - Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
- CUDA available: True
- GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
- GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
- CUDA_HOME: /usr/local/cuda
- NVCC: Cuda compilation tools, release 12.4, V12.4.131
- CUDA Driver Version: 550.144.03
- PyTorch: 2.6.0+cu124
- sglang: 0.4.5.post3
- sgl_kernel: 0.0.9.post2
- flashinfer: 0.1.6+cu124torch2.4
- triton: 3.2.0
- transformers: 4.51.1
- torchao: 0.8.0
- numpy: 1.26.4
- aiohttp: 3.9.3
- fastapi: 0.115.8
- hf_transfer: 0.1.9
- huggingface_hub: 0.30.1
- interegular: 0.3.3
- modelscope: 1.22.3
- orjson: 3.10.15
- outlines: 0.0.46
- packaging: 23.2
- psutil: 5.9.4
- pydantic: 2.10.6
- multipart: Module Not Found
- z: Module Not Found
- uvicorn: 0.34.0
- uvloop: 0.21.0
- vllm: 0.6.4.post1
- xgrammar: 0.1.17
- openai: 1.60.2
- tiktoken: 0.7.0
- anthropic: 0.45.2
- litellm: 1.59.10
- decord: 0.6.0
- NVIDIA Topology:
Legend:
- X: Self
- SYS: Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
- NODE: Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
- PHB: Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
- PXB: Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
- PIX: Connection traversing at most a single PCIe bridge
- NV#: Connection traversing a bonded set of # NVLinks
NIC Legend:
- NIC0: mlx5_0
- NIC1: mlx5_1
- NIC2: mlx5_2
- NIC3: mlx5_3
- NIC4: mlx5_4
ulimit soft: 1048576
Conclusion
Q: What is the "Cannot allocate memory" bug?
A: The "Cannot allocate memory" bug is a bug that occurs when the sglang.launch_server
command is run with the specified arguments on the provided environment. The bug results in a crash due to the process being unable to allocate memory.
Q: What are the symptoms of the bug?
A: The symptoms of the bug include:
- The
sglang.launch_server
command crashes with an "OSError: [Errno 12] Cannot allocate memory" error message. - The process is unable to allocate memory, resulting in a crash.
Q: What are the possible causes of the bug?
A: The possible causes of the bug include:
- Insufficient memory allocation for the process.
- Memory leaks in the
sglang
library. - Incompatible versions of the
optree
library.
Q: How can I reproduce the bug?
A: To reproduce the bug, follow these steps:
- Run the
sglang.launch_server
command with the specified arguments on the provided environment. - Observe the crash and error message.
Q: How can I troubleshoot the bug?
A: To troubleshoot the bug, follow these steps:
- Check the memory allocation and deallocation mechanisms in the
sglang
library. - Verify that the
optree
library is up-to-date and compatible with thesglang
library. - Run the
sglang.launch_server
command with the--debug
flag to enable debugging mode.
Q: How can I fix the bug?
A: To fix the bug, follow these steps:
- Upgrade the
optree
library to the latest version. - Modify the memory allocation and deallocation mechanisms in the
sglang
library to ensure that they are functioning correctly. - Run the
sglang.launch_server
command with the--debug
flag to enable debugging mode.
Q: What are the known workarounds for the bug?
A: The known workarounds for the bug include:
- Running the
sglang.launch_server
command with the--mem-fraction-static
flag set to a lower value. - Modifying the
sglang
library to use a different memory allocation mechanism.
Q: What are the known limitations of the bug?
A: The known limitations of the bug include:
- The bug only occurs on the provided environment.
- The bug only occurs when running the
sglang.launch_server
command with the specified arguments.
Q: What are the known future plans for the bug?
A: The known future plans for the bug include:
- Investigating the memory allocation and deallocation mechanisms in the
sglang
library. - Upgrading the
optree
library to the latest version. - Modifying the
sglang
library to use a different memory allocation mechanism.