[Bug] Cannot Allocate Memory

by ADMIN 29 views

Introduction

This article documents a bug that occurs when running the sglang.launch_server command, resulting in a "Cannot allocate memory" error. The bug is reproducible on a specific environment and is related to the allocation of memory for the process.

Checklist

Before submitting this issue, please ensure that you have:

  1. Searched related issues: Check if a similar issue has been reported and resolved in the past.
  2. Confirmed the bug is not fixed in the latest version: Verify that the bug is not present in the latest version of the software.
  3. Provided environment info and a minimal reproducible demo: Include detailed information about your environment and a minimal reproducible demo to help us reproduce and resolve the issue.
  4. Confirmed the issue is a bug and not a question: If the issue is not a bug, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose.
  5. Used English: Ensure that the issue is reported in English to facilitate communication and resolution.

Describe the Bug

The bug occurs when running the sglang.launch_server command with the following arguments:

python3 -m sglang.launch_server --model-path /data00/models/DeepSeek-R1/ --context-length 131072 --tp 8 --trust-remote-code --mem-fraction-static 0.9 --host 0.0.0.0 --port 8080

The error message is:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1984, in run_scheduler_process
    faulthandler.enable()
OSError: [Errno 12] Cannot allocate memory

Reproduction

The bug is reproducible on the following environment:

  • H20, 141G, tp8, DeepSeek R1, 0.4.5.post3
  • Python 3.10.12
  • CUDA available: True
  • GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
  • GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
  • CUDA_HOME: /usr/local/cuda
  • NVCC: Cuda compilation tools, release 12.4, V12.4.131
  • CUDA Driver Version: 550.144.03
  • PyTorch: 2.6.0+cu124
  • sglang: 0.4.5.post3
  • sgl_kernel: 0.0.9.post2
  • flashinfer: 0.1.6+cu124torch2.4
  • triton: 3.2.0
  • transformers: 4.51.1
  • torchao: 0.8.0
  • numpy: 1..4
  • aiohttp: 3.9.3
  • fastapi: 0.115.8
  • hf_transfer: 0.1.9
  • huggingface_hub: 0.30.1
  • interegular: 0.3.3
  • modelscope: 1.22.3
  • orjson: 3.10.15
  • outlines: 0.0.46
  • packaging: 23.2
  • psutil: 5.9.4
  • pydantic: 2.10.6
  • multipart: Module Not Found
  • zmq: Module Not Found
  • uvicorn: 0.34.0
  • uvloop: 0.21.0
  • vllm: 0.6.4.post1
  • xgrammar: 0.1.17
  • openai: 1.60.2
  • tiktoken: 0.7.0
  • anthropic: 0.45.2
  • litellm: 1.59.10
  • decord: 0.6.0
  • NVIDIA Topology:

Environment

The environment is:

  • /usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using python3 -m pip install --upgrade 'optree>=0.13.0'.
  • Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
  • CUDA available: True
  • GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
  • GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
  • CUDA_HOME: /usr/local/cuda
  • NVCC: Cuda compilation tools, release 12.4, V12.4.131
  • CUDA Driver Version: 550.144.03
  • PyTorch: 2.6.0+cu124
  • sglang: 0.4.5.post3
  • sgl_kernel: 0.0.9.post2
  • flashinfer: 0.1.6+cu124torch2.4
  • triton: 3.2.0
  • transformers: 4.51.1
  • torchao: 0.8.0
  • numpy: 1.26.4
  • aiohttp: 3.9.3
  • fastapi: 0.115.8
  • hf_transfer: 0.1.9
  • huggingface_hub: 0.30.1
  • interegular: 0.3.3
  • modelscope: 1.22.3
  • orjson: 3.10.15
  • outlines: 0.0.46
  • packaging: 23.2
  • psutil: 5.9.4
  • pydantic: 2.10.6
  • multipart: Module Not Found
  • z: Module Not Found
  • uvicorn: 0.34.0
  • uvloop: 0.21.0
  • vllm: 0.6.4.post1
  • xgrammar: 0.1.17
  • openai: 1.60.2
  • tiktoken: 0.7.0
  • anthropic: 0.45.2
  • litellm: 1.59.10
  • decord: 0.6.0
  • NVIDIA Topology:

Legend:

  • X: Self
  • SYS: Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  • NODE: Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  • PHB: Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  • PXB: Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  • PIX: Connection traversing at most a single PCIe bridge
  • NV#: Connection traversing a bonded set of # NVLinks

NIC Legend:

  • NIC0: mlx5_0
  • NIC1: mlx5_1
  • NIC2: mlx5_2
  • NIC3: mlx5_3
  • NIC4: mlx5_4

ulimit soft: 1048576

Conclusion

Q: What is the "Cannot allocate memory" bug?

A: The "Cannot allocate memory" bug is a bug that occurs when the sglang.launch_server command is run with the specified arguments on the provided environment. The bug results in a crash due to the process being unable to allocate memory.

Q: What are the symptoms of the bug?

A: The symptoms of the bug include:

  • The sglang.launch_server command crashes with an "OSError: [Errno 12] Cannot allocate memory" error message.
  • The process is unable to allocate memory, resulting in a crash.

Q: What are the possible causes of the bug?

A: The possible causes of the bug include:

  • Insufficient memory allocation for the process.
  • Memory leaks in the sglang library.
  • Incompatible versions of the optree library.

Q: How can I reproduce the bug?

A: To reproduce the bug, follow these steps:

  1. Run the sglang.launch_server command with the specified arguments on the provided environment.
  2. Observe the crash and error message.

Q: How can I troubleshoot the bug?

A: To troubleshoot the bug, follow these steps:

  1. Check the memory allocation and deallocation mechanisms in the sglang library.
  2. Verify that the optree library is up-to-date and compatible with the sglang library.
  3. Run the sglang.launch_server command with the --debug flag to enable debugging mode.

Q: How can I fix the bug?

A: To fix the bug, follow these steps:

  1. Upgrade the optree library to the latest version.
  2. Modify the memory allocation and deallocation mechanisms in the sglang library to ensure that they are functioning correctly.
  3. Run the sglang.launch_server command with the --debug flag to enable debugging mode.

Q: What are the known workarounds for the bug?

A: The known workarounds for the bug include:

  • Running the sglang.launch_server command with the --mem-fraction-static flag set to a lower value.
  • Modifying the sglang library to use a different memory allocation mechanism.

Q: What are the known limitations of the bug?

A: The known limitations of the bug include:

  • The bug only occurs on the provided environment.
  • The bug only occurs when running the sglang.launch_server command with the specified arguments.

Q: What are the known future plans for the bug?

A: The known future plans for the bug include:

  • Investigating the memory allocation and deallocation mechanisms in the sglang library.
  • Upgrading the optree library to the latest version.
  • Modifying the sglang library to use a different memory allocation mechanism.