Writer Degrades To Busy Polling After Reconnection

by ADMIN 51 views

Writer Degradation to Busy Polling After Reconnection: A Critical Issue in Redis

Introduction

In the realm of Redis, a popular in-memory data store, a critical issue has been identified that affects the performance of the writer task. After a successful reconnection, the writer task degrades to busy polling, resulting in high CPU usage. This issue arises due to a specific change in the expiry of the writer timer, which leads to a busy wait situation. In this article, we will delve into the root cause of this issue, its implications, and the proposed fix.

The Root Cause

The issue is attributed to the following lines of code in the Boost Redis library:

// https://github.com/boostorg/redis/blob/328ad97a79cf66108a11294462aa636b5abce927/include/boost/redis/connection.hpp#L445-L447
void connection::writer::set_expiry(time_point expiry) {
    _expiry = expiry;
    _timer.expires_from_now(expiry - _now);
    _timer.async_wait([this](boost::system::error_code ec) {
        if (!ec) {
            _expiry = _now;
        }
    });
}

As we can see, the set_expiry function sets the expiry of the writer timer based on the provided expiry time point. However, when a wait is issued against a timer with an expiry date in the past, it completes immediately. This is because timers are stateful and keep the set expiration time. As a result, the writer task degrades to busy polling, resulting in high CPU usage.

The Busy Wait Situation

The busy wait situation occurs when the writer task continuously checks for notifications without waiting for the actual notification. This leads to a high CPU usage, as the task is constantly polling the Redis connection. The busy wait situation is a critical issue, as it can lead to performance degradation and even crashes.

The Proposed Fix

The proposed fix is to use a separate timer for waits. This simplifies the logic of the writer task and prevents the busy wait situation. The fix is trivial and can be implemented by creating a new timer for waits, rather than using the existing writer timer.

Conclusion

In conclusion, the writer degradation to busy polling after reconnection is a critical issue in Redis. The issue arises due to a specific change in the expiry of the writer timer, which leads to a busy wait situation. The proposed fix is to use a separate timer for waits, which simplifies the logic of the writer task and prevents the busy wait situation. We will issue a PR shortly to address this issue and keep track of it in case anyone has experienced it.

Related Issues

  • #250: This issue is related to the implementation of tests for #250.

Implementation Details

  • The fix involves creating a new timer for waits, rather than using the existing writer timer.
  • The new timer is used to wait for notifications, rather than polling the Redis connection.
  • The existing writer timer is used to set the expiry of the writer task.

Code Snippet

// Create a new timer for waits
boost::asio::steady_timer wait_timer(io_service);

// Set the expiry of the wait timer
wait_timer.expires_from_now(expiry - _now);

// Wait for notifications using the new timer
wait_timer.async([this](boost::system::error_code ec) {
    if (!ec) {
        // Handle notification
    }
});

Future Work

  • Further investigation into the root cause of the issue.
  • Implementation of additional fixes to prevent similar issues in the future.
  • Review of the Boost Redis library to ensure that similar issues are not present.

References

Introduction

In our previous article, we discussed the critical issue of writer degradation to busy polling after reconnection in Redis. This issue arises due to a specific change in the expiry of the writer timer, which leads to a busy wait situation. In this article, we will provide a Q&A section to address common questions and concerns related to this issue.

Q&A

Q: What is the root cause of the writer degradation to busy polling after reconnection?

A: The root cause of the writer degradation to busy polling after reconnection is due to a specific change in the expiry of the writer timer. When a wait is issued against a timer with an expiry date in the past, it completes immediately, leading to a busy wait situation.

Q: What is a busy wait situation?

A: A busy wait situation occurs when a task continuously checks for notifications without waiting for the actual notification. This leads to high CPU usage, as the task is constantly polling the Redis connection.

Q: How does the proposed fix address the issue?

A: The proposed fix involves using a separate timer for waits, rather than using the existing writer timer. This simplifies the logic of the writer task and prevents the busy wait situation.

Q: What are the implications of the writer degradation to busy polling after reconnection?

A: The implications of the writer degradation to busy polling after reconnection include high CPU usage, performance degradation, and even crashes.

Q: How can I prevent the writer degradation to busy polling after reconnection?

A: To prevent the writer degradation to busy polling after reconnection, you can use a separate timer for waits, as proposed in the fix.

Q: What are the benefits of using a separate timer for waits?

A: The benefits of using a separate timer for waits include simplified logic, prevention of busy wait situations, and improved performance.

Q: How can I implement the proposed fix in my Redis application?

A: To implement the proposed fix, you can create a new timer for waits and use it to wait for notifications, rather than using the existing writer timer.

Q: What are the related issues to the writer degradation to busy polling after reconnection?

A: The related issues to the writer degradation to busy polling after reconnection include the implementation of tests for #250.

Q: What is the current status of the issue?

A: The issue is currently being addressed, and a PR will be issued shortly to fix the issue.

Conclusion

In conclusion, the writer degradation to busy polling after reconnection is a critical issue in Redis. The proposed fix involves using a separate timer for waits, which simplifies the logic of the writer task and prevents the busy wait situation. We hope that this Q&A article has provided valuable information and insights into the issue.

Related Articles

  • Writer Degradation to Busy Polling After Reconnection: A Critical Issue in Redis
  • The Root Cause of the Writer Degradation to Busy Polling After Reconnection

References