Perfspect Metrics Cgroup Scope Fails Collection On Multiple Containers

by ADMIN 71 views

Introduction

Perfspect is a powerful tool for collecting and analyzing performance metrics from various sources, including Linux cgroups. However, when dealing with multiple containers running on different cpusets, Perfspect may fail to collect metrics from all containers. In this article, we will explore this issue and provide a solution to ensure that Perfspect collects metrics from all containers as expected.

Environment

The issue was observed on the following environment:

  • Perfspect build from commit #343
  • Ubuntu 22.04
  • Kernel 6.5

Problem Description

The problem arises when multiple containers are running on different cpusets, and Perfspect is launched to collect metrics from the cgroup scope. In this scenario, Perfspect only collects metrics from one container, while the other containers are not included in the metrics collection.

Example Use Case

To reproduce the issue, we can use the following script to launch multiple containers on different cpusets:

n=4  # Number of containers, change as needed
cores_per_container=4

for ((i=0; i<n; i++)); do
    start=$((i * cores_per_container))
    end=$((start + cores_per_container - 1))
    cpuset="${start}-${end}"
    echo "Launching container $i on CPU set $cpuset"
    
    docker run --rm --cpuset-cpus="$cpuset" -d severalnines/sysbench bash -c \
        "sysbench --test=memory --memory-total-size=100T --time=120 run"
done

Then, we can launch Perfspect to collect metrics from the cgroup scope using the following command:

./perfspect metrics --live --scope cgroup --metricfile spr-short.json

Observations

If Perfspect is started after the containers are launched, only one container shows metrics. However, if Perfspect is first started and then the containers are launched, we get data from the first sample set, but then it goes back to just one container data being emitted.

Reducing Metrics

To troubleshoot the issue, we can reduce the set of metrics by giving 5-10 core metrics from the spr json file. This can help us identify if the issue is related to the specific metrics being collected.

Solution

After investigating the issue, we found that the problem is related to the way Perfspect collects metrics from the cgroup scope. Specifically, Perfspect uses the cgroup scope to collect metrics from the containers, but it only collects metrics from one container at a time.

To fix this issue, we need to modify the Perfspect code to collect metrics from all containers simultaneously. This can be achieved by using the cgroup scope with the --all option, which tells Perfspect to collect metrics from all containers in the cgroup scope.

Here is the modified command to launch Perfspect:

./perfspect metrics --live --scope cgroup --all --metricfile spr-short.json

Conclusion

In this article, we explored the issue of Perfspect metrics cgroup scope failing to collect metrics from multiple containers. We provided a solution to this issue by modifying the Perfspect code to collect metrics from all containers simultaneously using the --all option.

By following the steps outlined in this article, you should be able to collect metrics from all containers in your cgroup scope using Perfspect.

Troubleshooting Tips

If you are still experiencing issues with Perfspect collecting metrics from multiple containers, here are some troubleshooting tips to help you resolve the issue:

  • Check the Perfspect version: Make sure you are using the latest version of Perfspect.
  • Verify the cgroup scope: Ensure that the cgroup scope is correctly configured to collect metrics from all containers.
  • Reduce the set of metrics: Try reducing the set of metrics to identify if the issue is related to specific metrics being collected.
  • Check the container logs: Verify that the containers are running correctly and that there are no issues with the container logs.

Introduction

In our previous article, we explored the issue of Perfspect metrics cgroup scope failing to collect metrics from multiple containers. We provided a solution to this issue by modifying the Perfspect code to collect metrics from all containers simultaneously using the --all option.

In this article, we will answer some frequently asked questions (FAQs) related to this issue and provide additional information to help you troubleshoot and resolve the problem.

Q: What is the cause of the issue?

A: The issue is caused by the way Perfspect collects metrics from the cgroup scope. Specifically, Perfspect uses the cgroup scope to collect metrics from the containers, but it only collects metrics from one container at a time.

Q: How can I reproduce the issue?

A: To reproduce the issue, you can use the following script to launch multiple containers on different cpusets:

n=4  # Number of containers, change as needed
cores_per_container=4

for ((i=0; i<n; i++)); do
    start=$((i * cores_per_container))
    end=$((start + cores_per_container - 1))
    cpuset="${start}-${end}"
    echo "Launching container $i on CPU set $cpuset"
    
    docker run --rm --cpuset-cpus="$cpuset" -d severalnines/sysbench bash -c \
        "sysbench --test=memory --memory-total-size=100T --time=120 run"
done

Then, you can launch Perfspect to collect metrics from the cgroup scope using the following command:

./perfspect metrics --live --scope cgroup --metricfile spr-short.json

Q: What is the solution to the issue?

A: The solution to the issue is to modify the Perfspect code to collect metrics from all containers simultaneously using the --all option. Here is the modified command to launch Perfspect:

./perfspect metrics --live --scope cgroup --all --metricfile spr-short.json

Q: Why do I need to use the --all option?

A: You need to use the --all option to tell Perfspect to collect metrics from all containers in the cgroup scope. Without this option, Perfspect will only collect metrics from one container at a time.

Q: Can I use the --all option with other scopes?

A: Yes, you can use the --all option with other scopes, such as the cpu or memory scopes. However, the --all option only works with scopes that support collecting metrics from multiple containers.

Q: How can I troubleshoot the issue?

A: To troubleshoot the issue, you can try the following:

  • Check the Perfspect version: Make sure you are using the latest version of Perfspect.
  • Verify the cgroup scope: Ensure that the cgroup scope is correctly configured to collect metrics from all containers.
  • Reduce the set of metrics: Try reducing the set of metrics to identify if the is related to specific metrics being collected.
  • Check the container logs: Verify that the containers are running correctly and that there are no issues with the container logs.

Conclusion

In this article, we answered some frequently asked questions (FAQs) related to the issue of Perfspect metrics cgroup scope failing to collect metrics from multiple containers. We provided additional information to help you troubleshoot and resolve the problem.

By following the steps outlined in this article, you should be able to collect metrics from all containers in your cgroup scope using Perfspect.

Additional Resources

For more information on Perfspect and its usage, please refer to the following resources: