BH P100a Whisper_performance Demo Test Hangs On First Iteration With Harvested Fw+sw
Introduction
The BH P100a whisper performance demo test has been experiencing issues with harvested firmware (FW) and software (SW). The test hangs on the first iteration, causing a timeout. This article aims to investigate the root cause of the issue and provide a solution.
Background
The BH P100a whisper performance demo test is a critical component of the Tenstorrent (TT) Metal framework. It utilizes the TT-KMD (Kernel Management Daemon) to manage the device and execute the demo test. The test is designed to evaluate the performance of the P100a device in various scenarios.
Issue Description
The issue occurs when running the demo test on a P100a device with harvested FW and SW. The test hangs on the first iteration, causing a timeout. The error message indicates that the action has timed out.
Debugging Steps
To debug the issue, we need to investigate the logs and identify the root cause. The logs indicate that the model is running on audio data with a duration of 6.960s. However, the decode inference iterations are not progressing as expected.
Code Analysis
The code analysis reveals that the issue is related to the models/demos/whisper/demo/demo.py
file. Specifically, the test_demo_for_conditional_generation
function is causing the hang.
models/demos/whisper/demo/demo.py::test_demo_for_conditional_generation[device_params0-True-2-models.demos.whisper.tt.ttnn_optimized_functional_whisper] Metal | INFO | Initializing device 0. Program cache is NOT enabled
BuildKernels | INFO | Skipping deleting built cache
Metal | INFO | AI CLK for device 0 is: 1350 MHz
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-04-24 23:08:55.633 | WARNING | ttnn.model_preprocessing:from_torch:529 - ttnn: model cache can be enabled by passing model_name argument to preprocess_model[_parameters] and setting env variable TTNN_CONFIG_OVERRIDES='{"enable_model_cache": true}'
2025-04-24 23:08:57.989 | INFO | models.demos.whisper.tt.ttnn_optimized_functional_whisper:init_kv_cache:34 - Initializing KV cache with max batch size: 1 and max sequence length: 512
2025-04-24 23:08:58.041 | INFO | models.demos.whisper.demo.demo:_model_pipeline:249 - Running model on audio data with duration 6.960s
2025-04-24 23:09:12.209 | INFO | models.demos.whisper.demo.demo:run_generate:122 - Time to encoder states: 14168.323ms
Decode inference iterations: 0%| | 0/448 [00:00<?, ?it/s]
Decode inference iterations: 0%| | 1/448 [00:12<1:33:51, 12.60s/it]
Error: The action has timed out.
**Possible Causes--------------------
Based on the code analysis, the possible causes of the issue are:
- Model Cache: The model cache is not enabled, which may cause the model to reload on each iteration, leading to a hang.
- KV Cache: The KV cache is not initialized correctly, which may cause the model to fail to run on audio data.
- Device Initialization: The device is not initialized correctly, which may cause the model to fail to run on the device.
Solution
To solve the issue, we need to enable the model cache and initialize the KV cache correctly. We also need to ensure that the device is initialized correctly.
Step 1: Enable Model Cache
To enable the model cache, we need to pass the model_name
argument to the preprocess_model[_parameters]
function and set the TTNN_CONFIG_OVERRIDES
environment variable to {"enable_model_cache": true}
.
import os
os.environ["TTNN_CONFIG_OVERRIDES"] = '{"enable_model_cache": true}'
# Preprocess model parameters
model_name = "models.demos.whisper.tt.ttnn_optimized_functional_whisper"
preprocess_model_parameters = {
"model_name": model_name,
"enable_model_cache": True,
}
# Preprocess model
preprocess_model(preprocess_model_parameters)
Step 2: Initialize KV Cache
To initialize the KV cache correctly, we need to set the max_batch_size
and max_sequence_length
parameters.
# Initialize KV cache
kv_cache = init_kv_cache(max_batch_size=1, max_sequence_length=512)
Step 3: Initialize Device
To initialize the device correctly, we need to ensure that the device is initialized before running the model.
# Initialize device
device = init_device()
Conclusion
In conclusion, the BH P100a whisper performance demo test hangs on the first iteration with harvested FW and SW due to a combination of issues related to model cache, KV cache, and device initialization. By enabling the model cache, initializing the KV cache correctly, and ensuring that the device is initialized correctly, we can solve the issue and run the demo test successfully.
Future Work
Future work includes:
- Investigating the root cause: We need to investigate the root cause of the issue to ensure that it is fully resolved.
- Optimizing the demo test: We need to optimize the demo test to improve its performance and reduce the time it takes to run.
- Testing on different devices: We need to test the demo test on different devices to ensure that it works correctly on all devices.
References
- TT Metal Framework: The TT Metal framework is a software framework for building and deploying AI models on various devices.
- BH P100a Whisper Performance Demo Test: The BH P100a whisper performance demo test is a critical component of the TT Metal framework that evaluates the performance of the P100a device in various scenarios.
- Model Cache: The model cache is a mechanism that stores the model in memory to improve its performance.
- KV Cache: The KV cache is a mechanism that stores the key-value pairs in memory to improve the performance the model.
- Device Initialization: The device initialization is the process of initializing the device before running the model.
BH P100a Whisper Performance Demo Test Hangs on First Iteration with Harvested FW+SW: Q&A =====================================================================================
Introduction
The BH P100a whisper performance demo test has been experiencing issues with harvested firmware (FW) and software (SW). The test hangs on the first iteration, causing a timeout. This Q&A article aims to provide answers to common questions related to the issue.
Q: What is the BH P100a whisper performance demo test?
A: The BH P100a whisper performance demo test is a critical component of the Tenstorrent (TT) Metal framework. It utilizes the TT-KMD (Kernel Management Daemon) to manage the device and execute the demo test. The test is designed to evaluate the performance of the P100a device in various scenarios.
Q: What is the issue with the demo test?
A: The issue occurs when running the demo test on a P100a device with harvested FW and SW. The test hangs on the first iteration, causing a timeout.
Q: What are the possible causes of the issue?
A: The possible causes of the issue are:
- Model Cache: The model cache is not enabled, which may cause the model to reload on each iteration, leading to a hang.
- KV Cache: The KV cache is not initialized correctly, which may cause the model to fail to run on audio data.
- Device Initialization: The device is not initialized correctly, which may cause the model to fail to run on the device.
Q: How can I enable the model cache?
A: To enable the model cache, you need to pass the model_name
argument to the preprocess_model[_parameters]
function and set the TTNN_CONFIG_OVERRIDES
environment variable to {"enable_model_cache": true}
.
import os
os.environ["TTNN_CONFIG_OVERRIDES"] = '{"enable_model_cache": true}'
# Preprocess model parameters
model_name = "models.demos.whisper.tt.ttnn_optimized_functional_whisper"
preprocess_model_parameters = {
"model_name": model_name,
"enable_model_cache": True,
}
# Preprocess model
preprocess_model(preprocess_model_parameters)
Q: How can I initialize the KV cache correctly?
A: To initialize the KV cache correctly, you need to set the max_batch_size
and max_sequence_length
parameters.
# Initialize KV cache
kv_cache = init_kv_cache(max_batch_size=1, max_sequence_length=512)
Q: How can I initialize the device correctly?
A: To initialize the device correctly, you need to ensure that the device is initialized before running the model.
# Initialize device
device = init_device()
Q: What are the future work items?
A: The future work items include:
- Investigating the root cause: We need to investigate the root cause of the issue to ensure that it is fully resolved.
- Optimizing the demo test: We need to optimize the demo test to improve its performance and reduce the time it takes to run.
- Testing on different devices: We need to test the demo test on different devices to ensure that it works correctly on all devices.
Q: What are the references?
A: The references include:
- TT Metal Framework: The TT Metal framework is a software framework for building and deploying AI models on various devices.
- BH P100a Whisper Performance Demo Test: The BH P100a whisper performance demo test is a critical component of the TT Metal framework that evaluates the performance of the P100a device in various scenarios.
- Model Cache: The model cache is a mechanism that stores the model in memory to improve its performance.
- KV Cache: The KV cache is a mechanism that stores the key-value pairs in memory to improve the performance the model.
- Device Initialization: The device initialization is the process of initializing the device before running the model.
Conclusion
In conclusion, the BH P100a whisper performance demo test hangs on the first iteration with harvested FW and SW due to a combination of issues related to model cache, KV cache, and device initialization. By enabling the model cache, initializing the KV cache correctly, and ensuring that the device is initialized correctly, we can solve the issue and run the demo test successfully.