Update `/api/stats` Endpoint To Return Total File Size Of All `WorkflowExecution`-outputted `DataObject`s

by ADMIN 106 views

Overview

The /api/stats endpoint in the NMDc-server repository currently returns the total file size of all DataObjects, including those that are not outputs of WorkflowExecutions. This article outlines the process of updating the endpoint to return the total file size of only those DataObjects that are outputs of WorkflowExecutions.

Current Implementation

The current implementation of the /api/stats endpoint can be found in the aggregations.py file of the NMDc-server repository. Specifically, the data_size is computed using the following code snippet:

data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),

This code snippet uses the func.sum function to calculate the sum of the file_size_bytes values of all DataObjects. The func.coalesce function is used to handle cases where the file_size_bytes value is None.

Proposed Solution

To update the /api/stats endpoint to return the total file size of only those DataObjects that are outputs of WorkflowExecutions, we have two options:

Option (a): Replace the Current Value

We can replace the current data_size value with the sum of the file_size_bytes values of only those DataObjects that are outputs of WorkflowExecutions. This can be achieved by modifying the aggregations.py file to use a filter to exclude DataObjects that are not outputs of WorkflowExecutions.

Option (b): Introduce an Additional Field

Alternatively, we can introduce an additional field in the object returned by the /api/stats endpoint to contain the sum of the file_size_bytes values of only those DataObjects that are outputs of WorkflowExecutions. This can be achieved by modifying the aggregations.py file to include a new field in the object returned by the endpoint.

Implementation Details

To implement either option, we need to modify the aggregations.py file to use a filter to exclude DataObjects that are not outputs of WorkflowExecutions. We can achieve this by using the func.filter function to filter out DataObjects that do not have a WorkflowExecution associated with them.

Here is an example of how we can modify the aggregations.py file to implement option (a):

data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),
workflow_execution_data_objects = q(func.filter(models.DataObject.workflow_execution_id != None, models.DataObject))
workflow_execution_data_size = q(func.sum(func.coalesce(workflow_execution_data_objects.file_size_bytes, 0))).scalar(),

In this example, we first filter out DataObjects that do not have a WorkflowExecution associated with them using the func.filter function. We then calculate the sum of the file_size_bytes values of the remaining DataObjects using the func.sum function.

Benefits

Updating the /api/stats endpoint to return the total file size of only thoseDataObjects that are outputs of WorkflowExecutions has several benefits. Firstly, it provides more accurate information about the total file size of DataObjects that are outputs of WorkflowExecutions. Secondly, it allows users to filter out DataObjects that are not outputs of WorkflowExecution`s, which can be useful in certain scenarios.

Conclusion

Frequently Asked Questions

Q: Why is it necessary to update the /api/stats endpoint?

A: The current implementation of the /api/stats endpoint returns the total file size of all DataObjects, including those that are not outputs of WorkflowExecutions. This can lead to inaccurate information about the total file size of DataObjects that are outputs of WorkflowExecutions.

Q: What is the current implementation of the /api/stats endpoint?

A: The current implementation of the /api/stats endpoint can be found in the aggregations.py file of the NMDc-server repository. Specifically, the data_size is computed using the following code snippet:

data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),

Q: What are the proposed solutions to update the /api/stats endpoint?

A: There are two proposed solutions to update the /api/stats endpoint:

  • Option (a): Replace the current value with the sum of the file_size_bytes values of only those DataObjects that are outputs of WorkflowExecutions.
  • Option (b): Introduce an additional field in the object returned by the /api/stats endpoint to contain the sum of the file_size_bytes values of only those DataObjects that are outputs of WorkflowExecutions.

Q: How can we implement option (a)?

A: To implement option (a), we need to modify the aggregations.py file to use a filter to exclude DataObjects that are not outputs of WorkflowExecutions. We can achieve this by using the func.filter function to filter out DataObjects that do not have a WorkflowExecution associated with them.

Q: How can we implement option (b)?

A: To implement option (b), we need to modify the aggregations.py file to include a new field in the object returned by the /api/stats endpoint. We can achieve this by using the func.sum function to calculate the sum of the file_size_bytes values of only those DataObjects that are outputs of WorkflowExecutions.

Q: What are the benefits of updating the /api/stats endpoint?

A: Updating the /api/stats endpoint to return the total file size of only those DataObjects that are outputs of WorkflowExecutions has several benefits, including:

  • Providing more accurate information about the total file size of DataObjects that are outputs of WorkflowExecutions.
  • Allowing users to filter out DataObjects that are not outputs of WorkflowExecutions, which can be useful in certain scenarios.

Q: How can we test the updated /api/stats endpoint?

A: To test the updated /api/stats endpoint, we can use a tool such as curl to send a request to the endpoint and verify that the response contains the correct information.

Q: What are the next steps after updating the /api/stats endpoint?

A: After updating the /api/stats endpoint, we need to verify that the changes have been successfully deployed that the endpoint is functioning as expected. We also need to update any documentation or user guides that may be affected by the changes.

Conclusion

In conclusion, updating the /api/stats endpoint to return the total file size of only those DataObjects that are outputs of WorkflowExecutions is a simple yet effective way to provide more accurate information about the total file size of DataObjects that are outputs of WorkflowExecutions. By modifying the aggregations.py file to use a filter to exclude DataObjects that are not outputs of WorkflowExecutions, we can achieve this goal.