Update `/api/stats` Endpoint To Return Total File Size Of All `WorkflowExecution`-outputted `DataObject`s
Overview
The /api/stats
endpoint in the NMDc-server repository currently returns the total file size of all DataObject
s, including those that are not outputs of WorkflowExecution
s. This article outlines the process of updating the endpoint to return the total file size of only those DataObject
s that are outputs of WorkflowExecution
s.
Current Implementation
The current implementation of the /api/stats
endpoint can be found in the aggregations.py
file of the NMDc-server repository. Specifically, the data_size
is computed using the following code snippet:
data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),
This code snippet uses the func.sum
function to calculate the sum of the file_size_bytes
values of all DataObject
s. The func.coalesce
function is used to handle cases where the file_size_bytes
value is None
.
Proposed Solution
To update the /api/stats
endpoint to return the total file size of only those DataObject
s that are outputs of WorkflowExecution
s, we have two options:
Option (a): Replace the Current Value
We can replace the current data_size
value with the sum of the file_size_bytes
values of only those DataObject
s that are outputs of WorkflowExecution
s. This can be achieved by modifying the aggregations.py
file to use a filter to exclude DataObject
s that are not outputs of WorkflowExecution
s.
Option (b): Introduce an Additional Field
Alternatively, we can introduce an additional field in the object returned by the /api/stats
endpoint to contain the sum of the file_size_bytes
values of only those DataObject
s that are outputs of WorkflowExecution
s. This can be achieved by modifying the aggregations.py
file to include a new field in the object returned by the endpoint.
Implementation Details
To implement either option, we need to modify the aggregations.py
file to use a filter to exclude DataObject
s that are not outputs of WorkflowExecution
s. We can achieve this by using the func.filter
function to filter out DataObject
s that do not have a WorkflowExecution
associated with them.
Here is an example of how we can modify the aggregations.py
file to implement option (a):
data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),
workflow_execution_data_objects = q(func.filter(models.DataObject.workflow_execution_id != None, models.DataObject))
workflow_execution_data_size = q(func.sum(func.coalesce(workflow_execution_data_objects.file_size_bytes, 0))).scalar(),
In this example, we first filter out DataObject
s that do not have a WorkflowExecution
associated with them using the func.filter
function. We then calculate the sum of the file_size_bytes
values of the remaining DataObject
s using the func.sum
function.
Benefits
Updating the /api/stats
endpoint to return the total file size of only thoseDataObjects that are outputs of
WorkflowExecutions has several benefits. Firstly, it provides more accurate information about the total file size of
DataObjects that are outputs of
WorkflowExecutions. Secondly, it allows users to filter out
DataObjects that are not outputs of
WorkflowExecution`s, which can be useful in certain scenarios.
Conclusion
Frequently Asked Questions
Q: Why is it necessary to update the /api/stats
endpoint?
A: The current implementation of the /api/stats
endpoint returns the total file size of all DataObject
s, including those that are not outputs of WorkflowExecution
s. This can lead to inaccurate information about the total file size of DataObject
s that are outputs of WorkflowExecution
s.
Q: What is the current implementation of the /api/stats
endpoint?
A: The current implementation of the /api/stats
endpoint can be found in the aggregations.py
file of the NMDc-server repository. Specifically, the data_size
is computed using the following code snippet:
data_size = q(func.sum(func.coalesce(models.DataObject.file_size_bytes, 0))).scalar(),
Q: What are the proposed solutions to update the /api/stats
endpoint?
A: There are two proposed solutions to update the /api/stats
endpoint:
- Option (a): Replace the current value with the sum of the
file_size_bytes
values of only thoseDataObject
s that are outputs ofWorkflowExecution
s. - Option (b): Introduce an additional field in the object returned by the
/api/stats
endpoint to contain the sum of thefile_size_bytes
values of only thoseDataObject
s that are outputs ofWorkflowExecution
s.
Q: How can we implement option (a)?
A: To implement option (a), we need to modify the aggregations.py
file to use a filter to exclude DataObject
s that are not outputs of WorkflowExecution
s. We can achieve this by using the func.filter
function to filter out DataObject
s that do not have a WorkflowExecution
associated with them.
Q: How can we implement option (b)?
A: To implement option (b), we need to modify the aggregations.py
file to include a new field in the object returned by the /api/stats
endpoint. We can achieve this by using the func.sum
function to calculate the sum of the file_size_bytes
values of only those DataObject
s that are outputs of WorkflowExecution
s.
Q: What are the benefits of updating the /api/stats
endpoint?
A: Updating the /api/stats
endpoint to return the total file size of only those DataObject
s that are outputs of WorkflowExecution
s has several benefits, including:
- Providing more accurate information about the total file size of
DataObject
s that are outputs ofWorkflowExecution
s. - Allowing users to filter out
DataObject
s that are not outputs ofWorkflowExecution
s, which can be useful in certain scenarios.
Q: How can we test the updated /api/stats
endpoint?
A: To test the updated /api/stats
endpoint, we can use a tool such as curl
to send a request to the endpoint and verify that the response contains the correct information.
Q: What are the next steps after updating the /api/stats
endpoint?
A: After updating the /api/stats
endpoint, we need to verify that the changes have been successfully deployed that the endpoint is functioning as expected. We also need to update any documentation or user guides that may be affected by the changes.
Conclusion
In conclusion, updating the /api/stats
endpoint to return the total file size of only those DataObject
s that are outputs of WorkflowExecution
s is a simple yet effective way to provide more accurate information about the total file size of DataObject
s that are outputs of WorkflowExecution
s. By modifying the aggregations.py
file to use a filter to exclude DataObject
s that are not outputs of WorkflowExecution
s, we can achieve this goal.