Distributed Loading Error With From_pretrained For Tp_plan Is None

by ADMIN 67 views

System Information

  • transformers version: 4.52.0.dev0
  • Platform: Linux-5.15.120.bsk.2-amd64-x86_64-with-glibc2.36
  • Python version: 3.10.16
  • Huggingface_hub version: 0.30.2
  • Safetensors version: 0.5.3
  • Accelerate version: 1.6.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: True
  • Using GPU in script?: True
  • GPU type: NVIDIA H100 80GB HBM3

Who Can Help?

  • @zucchini-nlp
  • @amyeroberts
  • @qubvel

Information

  • [ ] The official example scripts
  • [x] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction

Minimal Code Snippet

import torch.distributed as dist
import torch
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor

def setup():
    dist.init_process_group(backend="nccl")
    local_rank = dist.get_rank() % dist.get_world_size()
    world_size = dist.get_world_size()
    torch.cuda.set_device(local_rank)
    return local_rank, world_size

def cleanup():
    dist.destroy_process_group()

def main():
    local_rank, world_size = setup()
    model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
    cleanup()

if __name__ == "__main__":
    main()

Running with

torchrun --nproc_per_node="8" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="12345" \
    reproduce_tp_script.py

Error

[W422 02:54:33.854949743 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.872220475 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.046816487 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.118141019 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.245757189 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.367135370 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.496151955 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W422 02:54:33.503352722 Utils.hpp:165] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
Qwen2_5OmniToken2WavModel does not support eager attention implementation, fall back to sdpa
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s][rank6]: Traceback (most recent call last):
[rank6]:   File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 23, in <module>
[rank6]:     main()
[rank6]:   File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 19, in main
[rank6]:     model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
[rank6]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4421, in from_pretrained
[rank6]:     model = super().from_pretrained(
[rank6]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 282, in _wrapper
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4470, in from_pretrained
[rank6]:     ) = cls._load_pretrained_model(
[rank6]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4869, in _load_pretrained_model
[rank6]:     caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
[rank6]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5901, in caching_warmup
[rank6]:     re.compile("|".join([re.escape(plan) for plan in model._tp_plan]))
[rank6]: TypeError: 'NoneType' object is not iterable
Qwen2_5OmniToken2WavModel does not support eager attention implementation, fall back to sdpa

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[rank5]: Traceback (most recent call last):
[rank5]:   File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 23, in <module>
[rank5]:     main()
[rank5]:   File "/opt/tiger/dev/lmms-eval/scripts/reproduce_tp_script.py", line 19, in main
[rank5]:     model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-7B", device_map = f"cuda:{local_rank}", torch_dtype="auto")
[rank5]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4421, in from_pretrained
[rank5]:     model = super().from_pretrained(
[rank5]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 282, in _wrapper
[rank5]:     return func(*args, **kwargs)
[rank5]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4470, in from_pretrained
[rank5]:     ) = cls._load_pretrained_model(
[rank5]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4869, in _load_pretrained_model
[rank5]:     caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
[rank5]:   File "/home/tiger/miniconda3/envs/reproduce/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5901, in caching_allocator_warmup
[rank5]:     re.compile("|".join([re.escape(plan) for plan in model._tp_plan]))
[rank5]: TypeError: 'NoneType' object is not iterable
...

Expected Behavior

Q: What is the issue with the from_pretrained method in the transformers library?

A: The from_pretrained method is failing when trying to load a model with a tp_plan that is None. This is causing a TypeError because the re.compile function is trying to iterate over a NoneType object.

Q: What is the tp_plan attribute in the transformers library?

A: The tp_plan attribute is a plan for tensor parallelism, which is a technique used to parallelize the computation of large models across multiple GPUs.

Q: Why is the tp_plan attribute None in this case?

A: The tp_plan attribute is None because the model being loaded does not support tensor parallelism.

Q: What is the expected behavior when loading a model with a tp_plan that is None?

A: The expected behavior is that the from_pretrained method should handle the case where the tp_plan is None and not try to iterate over it.

Q: How can I fix this issue?

A: To fix this issue, you can add a check before trying to iterate over the tp_plan attribute. If the tp_plan is None, you can assign an empty list to it. This will prevent the TypeError from occurring.

Q: What is the temporary solution you provided?

A: The temporary solution I provided is to manually hack the Qwen2_5OmniForConditionalGeneration._tp_plan attribute to an empty list before calling the from_pretrained method.

Q: Is this a bug in the transformers library?

A: Yes, this is a bug in the transformers library. The from_pretrained method should handle the case where the tp_plan is None and not try to iterate over it.

Q: How can I report this bug?

A: You can report this bug by opening an issue on the transformers GitHub repository.

Q: Will this bug affect other models in the transformers library?

A: Yes, this bug will affect other models in the transformers library that do not support tensor parallelism and have a tp_plan attribute that is None.

Q: How can I prevent this bug from occurring in the future?

A: To prevent this bug from occurring in the future, you can add a check before trying to iterate over the tp_plan attribute and assign an empty list to it if it is None.