[Dynamo][Inductor] `detectron2_fcos_r_50_fpn` In Export Config Failure On Dashboard

May 22, 2025 by ADMIN 84 views

**Dynamo Export Config Failure on Inductor Dashboard for `detectron2_fcos_r_50_fpn` Model**

Introduction

This article documents a failure in the export configuration on the Inductor dashboard for the detectron2_fcos_r_50_fpn model. The failure occurs on both NVIDIA and AMD devices, and it is suspected to be related to the issue reported in PyTorch issue #143222.

Error Logs

The error logs are as follows:

W0522 18:33:14.567000 1556598 site-packages/torch/fx/experimental/symbolic_shapes.py:6980] [0/0] failed during evaluate_expr(u0 < 800, hint=None, size_oblivious=True, forcing_spec=False
ERROR:common:
Traceback (most recent call last):
  File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 2257, in check_accuracy
    optimized_model_iter_fn = optimize_ctx(
  File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 1489, in export
    ep = torch.export.export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 319, in export
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 286, in export
    return _export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
    ep = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2172, in _export
    ep = _export_for_training(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
    ep = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2033, in _export_for_training
    export_artifact = export_func(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1405, in _strict_export
    gm_torch_level = _export_to_torch_ir(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 798, in _export_to_torch_ir
    gm_torch_level, _ = torch._dynamo.export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1788, in inner
    result_traced = opt_f(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 372, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 708, in _fn
    raise e.with_traceback(None) from e.__cause__
torch._dynamo.exc.UserError: Could not guard on data-dependent expression u0 < 800 (unhinted: u0 < 800).  (Size-like symbols: u0)

Caused by: batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)  # detectron2/structures/image_list.py:127 in from_tensors (_decomp/decompositions.py:744 in slice_forward)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

User Stack (most recent call last):
  (snipped, see stack below for prefix)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 95, in forward
    images = self.preprocess_image(batched_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 129, in preprocess_image
    images = ImageList.from_tensors(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/structures/image_list.py", line 127, in from_tensors
    batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)

Versions

The versions of the relevant libraries are as follows:

PyTorch: 2.8.0a0+git500a710
CUDA: N/A
ROCM: 6.4.43482-0f2d60242
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang: 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.0 25133 c7fe45cf4b819c5991fe208aaa96edf142730f1d)
CMake: version 3.31.2
Libc: glibc-2.35
Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.4.43482
MIOpen runtime version: 3.4.0
Is XNNPACK available: True

CPU

The CPU information is as follows:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8468
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
Stepping: 8
BogoMIPS: 4200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
Q&A: Dynamo Export Config Failure on Inductor Dashboard for detectron2_fcos_r_50_fpn Model ====================================================================================

Q: What is the issue with the `detectron2_fcos_r_50_fpn` model in the export configuration on the Inductor dashboard?

A: The issue is a failure in the export configuration on the Inductor dashboard for the detectron2_fcos_r_50_fpn model. The failure occurs on both NVIDIA and AMD devices, and it is suspected to be related to the issue reported in PyTorch issue #143222.

Q: What are the error logs for this issue?

A: The error logs are as follows:

W0522 18:33:14.567000 1556598 site-packages/torch/fx/experimental/symbolic_shapes.py:6980] [0/0] failed during evaluate_expr(u0 < 800, hint=None, size_oblivious=True, forcing_spec=False
ERROR:common:
Traceback (most recent call last):
  File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 2257, in check_accuracy
    optimized_model_iter_fn = optimize_ctx(
  File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 1489, in export
    ep = torch.export.export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 319, in export
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 286, in export
    return _export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
    ep = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2172, in _export
    ep = _export_for_training(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
    raise e
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
    ep = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2033, in _export_for_training
    export_artifact = export_func(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1405, in _strict_export
    gm_torch_level = _export_to_torch_ir(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 798, in _export_to_torch_ir
    gm_torch_level, _ = torch._dynamo.export(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1788, in inner
    result_traced = opt_f(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 372, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 708, in _fn
    raise e.with_traceback(None) from e.__cause__
torch._dynamo.exc.UserError: Could not guard on data-dependent expression u0 < 800 (unhinted: u0 < 800).  (Size-like symbols: u0)

Caused by: batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)  # detectron2/structures/image_list.py:127 in from_tensors (_decomp/decompositions.py:744 in slice_forward)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

User Stack (most recent call last):
  (snipped, see stack below for prefix)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 95, in forward
    images = self.preprocess_image(batched_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 129, in preprocess_image
    images = ImageList.from_tensors(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/structures/image_list.py", line 127, in from
    batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)

Q: What are the versions of the relevant libraries?

A: The versions of the relevant libraries are as follows:

PyTorch: 2.8.0a0+git500a710
CUDA: N/A
ROCM: 6.4.43482-0f2d60242
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang: 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.0 25133 c7fe45cf4b819c5991fe208aaa96edf142730f1d)
CMake: version 3.31.2
Libc: glibc-2.35
Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.4.43482
MIOpen runtime version: 3.4.0
Is XNNPACK available: True

Q: What is the CPU information?

A: The CPU information is as follows:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8468
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 48
Socket(s): 2
Stepping: 8
BogoMIPS: 4200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl x