Tensor Flow Not Working As Expected
Issue Type
Bug
Have You Reproduced the Bug with TensorFlow Nightly?
No
Source
Binary
TensorFlow Version
v2.17.0-4548-g0d5d03d2e83 2.17.0
Custom Code
No
OS Platform and Distribution
Elementary OS 7.1 (Ubuntu 22.04)
Mobile Device
No response
Python Version
No response
Bazel Version
No response
GCC/Compiler Version
No response
CUDA/cuDNN Version
No response
GPU Model and Memory
No response
Current Behavior
My issue is similar to this: https://github.com/tensorflow/tensorflow/issues/79798
However, the solution of setting environment variable ROCM_PATH
does not work for me.
Specifying the GPU with ROCR_VISIBLE_DEVICES
environment variable also doesn't work.
I'm using this guide to install and test TensorFlow: https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-tensorflow.html
In the code tensortest.py
consists of the following:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10)])
predictions = model(x_train[:1]).numpy()
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
Standalone Code to Reproduce the Issue
$ echo $ROCM_PATH
$ python3 tensortest.py
Relevant Log Output
/opt/rocm-6.3.4
2025-04-21 15:15:44.686577: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.17.0
/usr/local/lib/python3.10/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(**kwargs)
2025-04-21 15:15:45.795712: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:45.795764: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.712177: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.712224: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.712250: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.712281: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713309: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713346: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713384: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713407: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713429: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713452: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713490: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713515: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713540: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713565: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713592: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.713604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15310 MB memory: -> device: 0, name: Radeon RX 7900 GRE, pci bus id: 0000:03:00.0
2025-04-21 15:15:47.865754: I external/local_xla/xla/stream_executor/rocm/rocm_executor.cc:920] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2025-04-21 15:15:47.865781: I tensorflow/core<br/>
**TensorFlow Not Working as Expected: Q&A**
=============================================
**Q: What is the issue with TensorFlow not working as expected?**
----------------------------------------------------------------
A: The issue is that TensorFlow is not working as expected on the system, despite following the installation guide and setting up the environment variables correctly.
**Q: What are the symptoms of the issue?**
------------------------------------------
A: The symptoms of the issue include:
* The system is unable to use the GPU for TensorFlow operations.
* The system is unable to compile the TensorFlow code.
* The system is throwing errors related to JIT compilation and device code generation.
**Q: What are the possible causes of the issue?**
------------------------------------------------
A: The possible causes of the issue include:
* Incorrect installation of TensorFlow.
* Incorrect setup of the environment variables.
* Incompatible system configuration.
* Incompatible GPU configuration.
**Q: How to troubleshoot the issue?**
--------------------------------------
A: To troubleshoot the issue, follow these steps:
1. Check the installation of TensorFlow and ensure that it is installed correctly.
2. Check the setup of the environment variables and ensure that they are set correctly.
3. Check the system configuration and ensure that it is compatible with TensorFlow.
4. Check the GPU configuration and ensure that it is compatible with TensorFlow.
5. Try running the TensorFlow code on a different system or environment to see if the issue is specific to the current system or environment.
**Q: What are the possible solutions to the issue?**
---------------------------------------------------
A: The possible solutions to the issue include:
* Reinstalling TensorFlow and setting up the environment variables correctly.
* Updating the system configuration to be compatible with TensorFlow.
* Updating the GPU configuration to be compatible with TensorFlow.
* Trying a different version of TensorFlow or a different version of the system or environment.
**Q: How to prevent the issue from occurring in the future?**
---------------------------------------------------------
A: To prevent the issue from occurring in the future, follow these best practices:
* Always follow the official installation guide for TensorFlow.
* Always set up the environment variables correctly.
* Always check the system configuration and ensure that it is compatible with TensorFlow.
* Always check the GPU configuration and ensure that it is compatible with TensorFlow.
* Always try running the TensorFlow code on a different system or environment to see if the issue is specific to the current system or environment.
**Q: What are the next steps to take if the issue persists?**
---------------------------------------------------------
A: If the issue persists, the next steps to take include:
* Seeking help from the TensorFlow community or forums.
* Seeking help from a TensorFlow expert or consultant.
* Filing a bug report with the TensorFlow team.
* Trying a different version of TensorFlow or a different version of the system or environment.
By following these steps and best practices, you can troubleshoot and resolve the issue of TensorFlow not working as expected.