Distributed Loading Error With From_pretrained For Tp_plan Is None
System Information
transformers
version: 4.52.0.dev0- Platform: Linux-5.15.120.bsk.2-amd64-x86_64-with-glibc2.36
- Python version: 3.10.16
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.5.3
- Accelerate version: 1.6.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: True
- Using GPU in script?: True
- GPU type: NVIDIA H100 80GB HBM3
Who Can Help?
- @zucchini-nlp
- @amyeroberts
- @qubvel
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below)
Reproduction
Minimal Code Snippet
import torch.distributed as dist
import torch
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
def setup():
dist.init_process_group(backend="nccl")
local_rank = dist.get_rank() % dist.get_world_size()
world_size = dist.get_world_size()
torch.cuda.set_device(local_rank)
return local_rank, world_size
def cleanup():
dist.destroy_process_group()
def main():
local_rank, world_size = setup()
model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
cleanup()
if __name__ == "__main__":
main()
Running with
torchrun --nproc_per_node="8" \
--nnodes="1" \
--node_rank="0" \
--master_addr="127.0.0.1" \
--master_port="12345" \
reproduce_tp_script.py
Error
[W422 02:54:33.854949743 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.872220475 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.046816487 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.118141019 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.245757189 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.367135370 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.496151955 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.503352722 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Qwen2_5OmniToken2WavModel does not support eager attention implementation, fall back to sdpa
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s][rank6]: Traceback (most recent call last):
[rank6]: File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 23, in <module>
[rank6]: main()
[rank6]: File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 19, in main
[rank6]: model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
[rank6]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4421, in from_pretrained
[rank6]: model = super().from_pretrained(
[rank6]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 282, in _wrapper
[rank6]: return func(*args, **kwargs)
[rank6]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4470, in from_pretrained
[rank6]: ) = cls._load_pretrained_model(
[rank6]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4869, in _load_pretrained_model
[rank6]: caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
[rank6]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5901, in caching_warmup
[rank6]: re.compile("|".join([re.escape(plan) for plan in model._tp_plan]))
[rank6]: TypeError: 'NoneType' object is not iterable
Qwen2_5OmniToken2WavModel does not support eager attention implementation, fall back to sdpa
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[rank5]: Traceback (most recent call last):
[rank5]: File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 23, in <module>
[rank5]: main()
[rank5]: File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 19, in main
[rank5]: model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
[rank5]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4421, in from_pretrained
[rank5]: model = super().from_pretrained(
[rank5]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 282, in _wrapper
[rank5]: return func(*args, **kwargs)
[rank5]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4470, in from_pretrained
[rank5]: ) = cls._load_pretrained_model(
[rank5]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4869, in _load_pretrained_model
[rank5]: caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
[rank5]: File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5901, in caching_allocator_warmup
[rank5]: re.compile("|".join([re.escape(plan) for plan in model._tp_plan]))
[rank5]: TypeError: 'NoneType' object is not iterable
...
Expected Behavior
Q: What is the issue with the from_pretrained
method in the transformers
library?
A: The from_pretrained
method is failing when trying to load a model with a tp_plan
that is None
. This is causing a TypeError
because the re.compile
function is trying to iterate over a NoneType
object.
Q: What is the tp_plan
attribute in the transformers
library?
A: The tp_plan
attribute is a plan for tensor parallelism, which is a technique used to parallelize the computation of large models across multiple GPUs.
Q: Why is the tp_plan
attribute None
in this case?
A: The tp_plan
attribute is None
because the model being loaded does not support tensor parallelism.
Q: What is the expected behavior when loading a model with a tp_plan
that is None
?
A: The expected behavior is that the from_pretrained
method should handle the case where the tp_plan
is None
and not try to iterate over it.
Q: How can I fix this issue?
A: To fix this issue, you can add a check before trying to iterate over the tp_plan
attribute. If the tp_plan
is None
, you can assign an empty list to it. This will prevent the TypeError
from occurring.
Q: What is the temporary solution you provided?
A: The temporary solution I provided is to manually hack the Qwen2_5OmniForConditionalGeneration._tp_plan
attribute to an empty list before calling the from_pretrained
method.
Q: Is this a bug in the transformers
library?
A: Yes, this is a bug in the transformers
library. The from_pretrained
method should handle the case where the tp_plan
is None
and not try to iterate over it.
Q: How can I report this bug?
A: You can report this bug by opening an issue on the transformers
GitHub repository.
Q: Will this bug affect other models in the transformers
library?
A: Yes, this bug will affect other models in the transformers
library that do not support tensor parallelism and have a tp_plan
attribute that is None
.
Q: How can I prevent this bug from occurring in the future?
A: To prevent this bug from occurring in the future, you can add a check before trying to iterate over the tp_plan
attribute and assign an empty list to it if it is None
.