Remove Explicit Data Nodes From The PWD And Represent By Edges

May 2, 2025 by ADMIN 63 views

**Remove Explicit Data Nodes from the Process Workflow Definition (PWD) and Represent by Edges**

Introduction

In the current state of the Process Workflow Definition (PWD), it contains explicit data nodes with concrete values. However, this approach has several drawbacks. Firstly, it is inconsistent, as it treats data nodes differently from other nodes in the workflow definition. Secondly, it represents a concrete workflow instance rather than the general workflow logic. In this article, we propose to remove the explicit data nodes from the PWD and represent them by edges.

The Current State of the PWD

The current PWD contains the following JSON structure:

{
  "nodes": [
    {"id": 0, "function": "workflow.get_prod_and_div"},
    {"id": 1, "function": "workflow.get_sum"},
    {"id": 2, "value": 1},
    {"id": 3, "value": 2}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": 2, "sourcePort": null},
    {"target": 0, "targetPort": "y", "source": 3, "sourcePort": null},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"}
  ]
}

As we can see, the PWD contains explicit data nodes with concrete values (e.g., id: 2 and id: 3). However, the result output is not part of the PWD graph representation as a data node, and neither are the intermediate values (e.g., prod and div). Instead, they are only represented by the edges.

The Proposed Change

We propose to remove the explicit data nodes from the "nodes" section of the PWD and instead add the relevant information to the "edges" section. This would give the following PWD:

{
  "nodes": [
    {"id": 0, "function": "arithmetic_workflow.get_prod_and_div"},
    {"id": 1, "function": "arithmetic_workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result"}
  ]
}

In this modified PWD, the "global" inputs and outputs of the workflow are represented by "dangling" edges of the workflow graph. While this is not ideal, we think it is fine for now to keep the modifications minimal.

Alternative Approaches

One could instead add additional keys to the edges such as workflow_input_name and workflow_output to explicitly expose those. For example:

{
  "nodes": [
    {"id": 0, "function": "arithmetic_workflow.get_prod_and_div"},
    {"id": 1, "function": "arithmetic_workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null, "workflowInputName": "a"},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null, "workflowInputName": "b"},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result", "workflowOutputName": "c"}
  ]
}

Or add a global_ports section to the PWD. This is something to be thought about in the future.

Conclusion

Q: What is the current state of the Process Workflow Definition (PWD)?

A: The current PWD contains explicit data nodes with concrete values. This means that the PWD contains nodes with specific values, such as id: 2 and id: 3, in addition to nodes with functions, such as id: 0 and id: 1.

Q: Why is this approach problematic?

A: This approach is problematic for several reasons. Firstly, it is inconsistent, as it treats data nodes differently from other nodes in the workflow definition. Secondly, it represents a concrete workflow instance rather than the general workflow logic. This makes it difficult to modify the input values after loading a workflow into a framework from the PWD.

Q: What is the proposed change?

A: We propose to remove the explicit data nodes from the "nodes" section of the PWD and instead add the relevant information to the "edges" section. This would give the following PWD:

{
  "nodes": [
    {"id": 0, "function": "arithmetic_workflow.get_prod_and_div"},
    {"id": 1, "function": "arithmetic_workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result"}
  ]
}

Q: How does this approach represent global inputs and outputs?

A: In this modified PWD, the "global" inputs and outputs of the workflow are represented by "dangling" edges of the workflow graph. This means that the edges do not connect to any specific node, but rather represent the inputs and outputs of the workflow.

Q: Are there alternative approaches to this proposal?

A: Yes, there are alternative approaches to this proposal. One could instead add additional keys to the edges such as workflow_input_name and workflow_output to explicitly expose those. For example:

{
  "nodes": [
    {"id": 0, "function": "arithmetic_workflow.get_prod_and_div"},
    {"id": 1, "function": "arithmetic_workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null, "workflowInputName": "a"},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null, "workflowInputName": "b"},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1,targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result", "workflowOutputName": "c"}
  ]
}

Or add a global_ports section to the PWD. This is something to be thought about in the future.

Q: What are the benefits of this proposal?

A: The benefits of this proposal include consistency and the ability to represent a general workflow logic rather than a concrete workflow instance. This makes it easier to modify the input values after loading a workflow into a framework from the PWD.

Q: What are the next steps?

A: The next steps are to implement this proposal and test it thoroughly. This will involve modifying the PWD to remove explicit data nodes and adding the relevant information to the "edges" section. We will also need to test the modified PWD to ensure that it works as expected.