[Dynamo][Inductor] `detectron2_fcos_r_50_fpn` In Export Config Failure On Dashboard
Introduction
This article documents a failure in the export configuration on the Inductor dashboard for the detectron2_fcos_r_50_fpn
model. The failure occurs on both NVIDIA and AMD devices, and it is suspected to be related to the issue reported in PyTorch issue #143222.
Error Logs
The error logs are as follows:
W0522 18:33:14.567000 1556598 site-packages/torch/fx/experimental/symbolic_shapes.py:6980] [0/0] failed during evaluate_expr(u0 < 800, hint=None, size_oblivious=True, forcing_spec=False
ERROR:common:
Traceback (most recent call last):
File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 2257, in check_accuracy
optimized_model_iter_fn = optimize_ctx(
File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 1489, in export
ep = torch.export.export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 319, in export
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 286, in export
return _export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
ep = fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2172, in _export
ep = _export_for_training(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
ep = fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2033, in _export_for_training
export_artifact = export_func(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1405, in _strict_export
gm_torch_level = _export_to_torch_ir(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 798, in _export_to_torch_ir
gm_torch_level, _ = torch._dynamo.export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1788, in inner
result_traced = opt_f(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 372, in __call__
return super().__call__(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 708, in _fn
raise e.with_traceback(None) from e.__cause__
torch._dynamo.exc.UserError: Could not guard on data-dependent expression u0 < 800 (unhinted: u0 < 800). (Size-like symbols: u0)
Caused by: batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img) # detectron2/structures/image_list.py:127 in from_tensors (_decomp/decompositions.py:744 in slice_forward)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing
User Stack (most recent call last):
(snipped, see stack below for prefix)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 95, in forward
images = self.preprocess_image(batched_inputs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 129, in preprocess_image
images = ImageList.from_tensors(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/structures/image_list.py", line 127, in from_tensors
batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)
Versions
The versions of the relevant libraries are as follows:
- PyTorch: 2.8.0a0+git500a710
- CUDA: N/A
- ROCM: 6.4.43482-0f2d60242
- OS: Ubuntu 22.04.5 LTS (x86_64)
- GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
- Clang: 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.0 25133 c7fe45cf4b819c5991fe208aaa96edf142730f1d)
- CMake: version 3.31.2
- Libc: glibc-2.35
- Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
- Python platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
- Is CUDA available: True
- CUDA runtime version: Could not collect
- cuDNN version: Could not collect
- HIP runtime version: 6.4.43482
- MIOpen runtime version: 3.4.0
- Is XNNPACK available: True
CPU
The CPU information is as follows:
- Architecture: x86_64
- CPU op-mode(s): 32-bit, 64-bit
- Address sizes: 46 bits physical, 57 bits virtual
- Byte Order: Little Endian
- CPU(s): 192
- On-line CPU(s) list: 0-191
- Vendor ID: GenuineIntel
- Model name: Intel(R) Xeon(R) Platinum 8468
- CPU family: 6
- Model: 143
- Thread(s) per core: 2
- Core(s) per socket: 48
- Socket(s): 2
- Stepping: 8
- BogoMIPS: 4200.00
- Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
Q&A: Dynamo Export Config Failure on Inductor Dashboard fordetectron2_fcos_r_50_fpn
Model ====================================================================================
Q: What is the issue with the detectron2_fcos_r_50_fpn
model in the export configuration on the Inductor dashboard?
A: The issue is a failure in the export configuration on the Inductor dashboard for the detectron2_fcos_r_50_fpn
model. The failure occurs on both NVIDIA and AMD devices, and it is suspected to be related to the issue reported in PyTorch issue #143222.
Q: What are the error logs for this issue?
A: The error logs are as follows:
W0522 18:33:14.567000 1556598 site-packages/torch/fx/experimental/symbolic_shapes.py:6980] [0/0] failed during evaluate_expr(u0 < 800, hint=None, size_oblivious=True, forcing_spec=False
ERROR:common:
Traceback (most recent call last):
File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 2257, in check_accuracy
optimized_model_iter_fn = optimize_ctx(
File "/home/niromero/docker_workspace/pytorch/benchmarks/dynamo/common.py", line 1489, in export
ep = torch.export.export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 319, in export
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/__init__.py", line 286, in export
return _export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
ep = fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2172, in _export
ep = _export_for_training(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1159, in wrapper
raise e
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1125, in wrapper
ep = fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/exported_program.py", line 123, in wrapper
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 2033, in _export_for_training
export_artifact = export_func(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 1405, in _strict_export
gm_torch_level = _export_to_torch_ir(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/export/_trace.py", line 798, in _export_to_torch_ir
gm_torch_level, _ = torch._dynamo.export(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1788, in inner
result_traced = opt_f(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 372, in __call__
return super().__call__(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 708, in _fn
raise e.with_traceback(None) from e.__cause__
torch._dynamo.exc.UserError: Could not guard on data-dependent expression u0 < 800 (unhinted: u0 < 800). (Size-like symbols: u0)
Caused by: batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img) # detectron2/structures/image_list.py:127 in from_tensors (_decomp/decompositions.py:744 in slice_forward)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing
User Stack (most recent call last):
(snipped, see stack below for prefix)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 95, in forward
images = self.preprocess_image(batched_inputs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/modeling/meta_arch/dense_detector.py", line 129, in preprocess_image
images = ImageList.from_tensors(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/detectron2/structures/image_list.py", line 127, in from
batched_imgs[i, ..., : img.shape[-2], : img.shape[-1]].copy_(img)
Q: What are the versions of the relevant libraries?
A: The versions of the relevant libraries are as follows:
- PyTorch: 2.8.0a0+git500a710
- CUDA: N/A
- ROCM: 6.4.43482-0f2d60242
- OS: Ubuntu 22.04.5 LTS (x86_64)
- GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
- Clang: 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.0 25133 c7fe45cf4b819c5991fe208aaa96edf142730f1d)
- CMake: version 3.31.2
- Libc: glibc-2.35
- Python: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
- Python platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
- Is CUDA available: True
- CUDA runtime version: Could not collect
- cuDNN version: Could not collect
- HIP runtime version: 6.4.43482
- MIOpen runtime version: 3.4.0
- Is XNNPACK available: True
Q: What is the CPU information?
A: The CPU information is as follows:
- Architecture: x86_64
- CPU op-mode(s): 32-bit, 64-bit
- Address sizes: 46 bits physical, 57 bits virtual
- Byte Order: Little Endian
- CPU(s): 192
- On-line CPU(s) list: 0-191
- Vendor ID: GenuineIntel
- Model name: Intel(R) Xeon(R) Platinum 8468
- CPU family: 6
- Model: 143
- Thread(s) per core: 2
- Core(s) per socket: 48
- Socket(s): 2
- Stepping: 8
- BogoMIPS: 4200.00
- Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl x