Notary backpressure and load handling

In high traffic networks, a notary can receive a large number of notarisation requests very quickly. Once the number of requests is more than the notary has capacity to handle, the notarisation response time will increase.

To avoid a notarisation request being lost in the event of a temporary notary worker outage or failure, Corda nodes will retry requests if no response is received. This means that if a node does not receive a response within a predefined time period, the node will resend the notarisation request. If the notary takes longer to complete each request than the retry timer, the notarisation queue will grow, potentially overloading the notary. The backpressure mechanism has been designed to offer responsive retry timers based on the current load of the notary, and prevent successive waves of retry attempts hindering notary performance.

The backpressure mechanism ensures that the retry timer is variable, and based on the timeout configuration settings timeout and backoffBase, the number of previous retry attempts for the notarisation request (capped at a maximum defined by the maxRestartCount configuration setting), and an additional jitter factor. The jitter factor introduces a degree of randomness to the calculation, helping to protect the notary against sudden increases in notarisation requests causing a subsequent increase in retry attempts.

Flow Engine Behaviour

When a node makes a notarisation request it receives an estimated completion time. This value represents an upper bound value for time that it should take to process the request, based on the current throughput of the cluster. Once the notarisation request has been processed the node will receive the response, however if no response has been received before the estimated processing time is up then the the node will retry the request.

This mechanism means that requests should always be processed in the event of a notary worker failure, whilst notaries under heavy load do not receive premature, redundant retry requests.

Artemis Messaging Layer Behaviour

A less common scenario that can occur is the notary workers get sent a very large amount of requests which causes the Artemis message broker to become overloaded. For example, if the notary worker receives messages at a greater rate than the maximum rate its Artemis message broker can consume messages from the queue then message processing delays can start to occur.

If running a HA notary cluster then scaling up the worker cluster size will help negate this problem. See Scaling A Notary Cluster for more information. The backpressure within the flow engine should also help to prevent this scenario.

Was this page helpful?

Thanks for your feedback!

Chat with us

Chat with us on our #docs channel on slack. You can also join a lot of other slack channels there and have access to 1-on-1 communication with members of the R3 team and the online community.

Propose documentation improvements directly

Help us to improve the docs by contributing directly. It's simple - just fork this repository and raise a PR of your own - R3's Technical Writers will review it and apply the relevant suggestions.

We're sorry this page wasn't helpful. Let us know how we can make it better!

Chat with us

Chat with us on our #docs channel on slack. You can also join a lot of other slack channels there and have access to 1-on-1 communication with members of the R3 team and the online community.

Create an issue

Create a new GitHub issue in this repository - submit technical feedback, draw attention to a potential documentation bug, or share ideas for improvement and general feedback.

Propose documentation improvements directly

Help us to improve the docs by contributing directly. It's simple - just fork this repository and raise a PR of your own - R3's Technical Writers will review it and apply the relevant suggestions.