SlicerNNInteractive Docker Fails When G3.L Instance Is Provisioned By MorphoCloud
SlicerNNInteractive Docker Fails When g3.L Instance is Provisioned by MorphoCloud
Introduction
The SlicerNNInteractive docker is a powerful tool for medical image segmentation and analysis. However, when a g3.L instance is provisioned by MorphoCloud, the docker fails to start, resulting in a series of errors. In this article, we will explore the reasons behind this failure and provide a solution to get the SlicerNNInteractive docker up and running.
Background
The SlicerNNInteractive docker is a containerized environment that provides a user-friendly interface for medical image segmentation and analysis. It is built on top of the nnInteractive framework, which is a powerful tool for deep learning-based image segmentation. The docker is designed to run on a variety of hardware platforms, including NVIDIA GPUs.
The Problem
When a g3.L instance is provisioned by MorphoCloud, the SlicerNNInteractive docker fails to start, resulting in a series of errors. The errors are related to the CUDA driver and the torch library. The CUDA driver is not available on the MorphoCloud instance, and the torch library is not able to load the CUDA driver.
Error Messages
The error messages are as follows:
exouser@instance-247:~$ docker run --gpus all --rm -it -p 1527:1527 coendevente/nninteractive-slicer-server:latest
==========
== CUDA ==
==========
CUDA Version 12.1.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md
nnUNet_raw is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read documentation/setting_up_paths.md for information on how to set this up.
nnUNet_results is not defined and nnU-Net cannot be used for training or inference. If this is not intended behavior, please read documentation/setting_up_paths.md for information on how to set this up.
dataset.json: 100%|??????????????????????????????????????????????????????| 245/245 [00:00<00:00, 1.35MB/s]
plans.json: 100%|????????????????????????????????????????????????????| 6.29k/6.29k [00:00<00:00, 54.1MB/s]
inference_session_class.json: 100%|??????????????????????????????????????| 121/121 [00:00<00:00, 1.81MB/s]
checkpoint_final.pth: 100%|?????????????????????????????????????????????| 411M/411M [00:01<00:00, 236MB/s]
Fetching 4 files: 100%|?????????????????????????????????????????????????????| 4/4 [00:01<00:00, 2.09it/s]
Traceback (most recent call last):
File "/opt/server/main.py", line 249, in <module>
PROMPT_MANAGER = PromptManager()
^^^^^^^^^^^^^^^
File "/opt/server/main.py", line 129, in __init__
self.session = self.make_session()
^^^^^^^^^^^^^^^^^^^
File "/opt/server/main.py", line 154, in make_session
session.initialize_from_trained_model_folder(model_path)
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/nnInteractive/inference/inference_session.py", line 666, in initialize_from_trained_model_folder
checkpoint = torch.load(join(model_training_output_dir, fold_folder, checkpoint_name),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 1471, in load
return _load(
^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 1964, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 1928, in persistent_load
typed_storage = load_tensor(
^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 1900, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 1806, in restore_location
return default_restore_location(storage, str(map_location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 693, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 631, in _deserialize
device = _validate_device(location, backend_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/nnInteractive/lib/python3.12/site-packages/torch/serialization.py", line 600, in _validate_device
raise RuntimeError(
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
``**Solution**
The solution to this problem is to install the CUDA driver on the MorphoCloud instance. This can be done by running the following command:
sudo apt-get update sudo apt-get install nvidia-driver-535
This will install the CUDA driver on the instance. Once the driver is installed, the SlicerNNInteractive docker should be able to start without any errors.
**Conclusion**
In this article, we explored the reasons behind the failure of the SlicerNNInteractive docker when a g3.L instance is provisioned by MorphoCloud. We identified the issue as a lack of CUDA driver on the instance and provided a solution to install the driver. With the driver installed, the SlicerNNInteractive docker should be able to start without any errors.
**Additional Information**
* The SlicerNNInteractive docker is a powerful tool for medical image segmentation and analysis.
* The docker is built on top of the nnInteractive framework, which is a powerful tool for deep learning-based image segmentation.
* The docker is designed to run on a variety of hardware platforms, including NVIDIA GPUs.
* The CUDA driver is not available on the MorphoCloud instance by default.
* The torch library is not able to load the CUDA driver on the MorphoCloud instance.
**Troubleshooting**
* Check the CUDA driver version on the instance by running the command `nvidia-smi`.
* Check if the CUDA driver is installed on the instance by running the command `apt-get install nvidia-driver-535`.
* Check if the torch library is able to load the CUDA driver by running the command `torch.cuda.is_available()`.
**Future Work**
* Investigate the issue of the CUDA driver not being available on the MorphoCloud instance.
* Investigate the issue of the torch library not being able to load the CUDA driver on the MorphoCloud instance.
* Provide a solution to install the CUDA driver on the MorphoCloud instance.
**References**
* NVIDIA Deep Learning Container License: <https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>
* nnInteractive framework: <https://nninteractive.org/>
* SlicerNNInteractive docker: <https://github.com/coendevente/nninteractive-slicer-server><br/>
**SlicerNNInteractive Docker Fails When g3.L Instance is Provisioned by MorphoCloud: Q&A**
**Q: What is the SlicerNNInteractive docker and why is it failing on MorphoCloud?**
A: The SlicerNNInteractive docker is a powerful tool for medical image segmentation and analysis. It is built on top of the nnInteractive framework, which is a powerful tool for deep learning-based image segmentation. The docker is designed to run on a variety of hardware platforms, including NVIDIA GPUs. However, when a g3.L instance is provisioned by MorphoCloud, the docker fails to start, resulting in a series of errors.
**Q: What are the error messages that I see when trying to run the SlicerNNInteractive docker on MorphoCloud?**
A: The error messages that you see when trying to run the SlicerNNInteractive docker on MorphoCloud are related to the CUDA driver and the torch library. The CUDA driver is not available on the MorphoCloud instance, and the torch library is not able to load the CUDA driver.
**Q: How do I install the CUDA driver on MorphoCloud?**
A: To install the CUDA driver on MorphoCloud, you can run the following command:
sudo apt-get update sudo apt-get install nvidia-driver-535
This will install the CUDA driver on the instance. Once the driver is installed, the SlicerNNInteractive docker should be able to start without any errors.
**Q: Why is the CUDA driver not available on MorphoCloud by default?**
A: The CUDA driver is not available on MorphoCloud by default because it is not a standard component of the MorphoCloud image. However, you can install it manually using the command above.
**Q: Why is the torch library not able to load the CUDA driver on MorphoCloud?**
A: The torch library is not able to load the CUDA driver on MorphoCloud because it is not able to detect the CUDA driver on the instance. This is likely due to the fact that the CUDA driver is not installed on the instance by default.
**Q: How do I troubleshoot the issue with the SlicerNNInteractive docker on MorphoCloud?**
A: To troubleshoot the issue with the SlicerNNInteractive docker on MorphoCloud, you can check the CUDA driver version on the instance by running the command `nvidia-smi`. You can also check if the CUDA driver is installed on the instance by running the command `apt-get install nvidia-driver-535`. Additionally, you can check if the torch library is able to load the CUDA driver by running the command `torch.cuda.is_available()`.
**Q: What are the future plans for the SlicerNNInteractive docker on MorphoCloud?**
A: The future plans for the SlicerNNInteractive docker on MorphoCloud include investigating the issue of the CUDA driver not being available on the instance and providing a solution to install the CUDA driver on the instance. Additionally, we plan to investigate the issue of the torch library not being able to load the CUDA driver on the instance and provide a solution to this issue as well.
**Q: Where can I find more information about the SlicerNNInteractive docker and MorphoCloud?**
A: You can find more information about the SlicerNNInteractive docker and MorphoCloud on the following websites:
* NVIDIA Deep Learning Container License: <https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>
* nnInteractive framework: <https://nninteractive.org/>
* SlicerNNInteractive docker: <https://github.com/coendevente/nninteractive-slicer-server>
* MorphoCloud: <https://morphocloud.org/>
**Q: How can I get help with the SlicerNNInteractive docker on MorphoCloud?**
A: You can get help with the SlicerNNInteractive docker on MorphoCloud by contacting the MorphoCloud support team or by reaching out to the nnInteractive community. You can also post questions on the MorphoCloud forums or on the nnInteractive GitHub page.