Notary health check
This is a simple CorDapp to check if notaries on a Corda network are up and responsive.
Installation
To install the app, copy the notaryhealthcheck-cordapp
and the notaryhealthcheck-contract
JARs to the cordapps
directory
of a node that will run the checks. The notaryhealthcheck-contract
JAR also needs to be installed on all validating
notaries that are to be checked.
Usage
To set up periodic health checks for a notary service identity or a specific HA notary cluster member, there are two options:
- Via CorDapp configuration file: specified targets will be automatically monitored on node startup.
- Via the CLI client: commands provided for starting and stopping monitoring specific targets. Note that the health check app operates in-memory only, and checks will not resume after Corda node restart unless specified in the configuration file.
Configuration file
A standard CorDapp configuration file is used. It must be placed in the cordapps/config
directory have have the same filename as the CorDapp, apart from the extension.
E.g. for notaryhealthcheck-cordapp-4.0.jar
the configuration file should be named notaryhealthcheck-cordapp-4.0.conf
.
# A list of target X500 names to monitor on node startup.
targets = ["O=HA Notary Service, L=Zurich, C=CH"]
# Time between consecutive checks for a single target.
checkIntervalSeconds = 120
# Time to wait for a health check notarisation flow to finish before recording as failure.
checkTimeoutSeconds = 300
If a target is a notary service identity, all individual notary cluster members will be monitored as well.
Command-line client
To simplify controlling the checks, a command line client is provided that will call the respective flows via RPC.
The notaryhealthcheck-client
is built into a fat JAR and can be executed via Java:
java -jar notaryhealthcheck-client-<version>.jar -c <command> [<options]
The command must be one of:
startAll: Starts monitoring all notary nodes and services listed in the network parameters (including all cluster members for clustered notaries)
start: Starts monitoring a specified notary node. Requires use of the
--target
optionstopAll: Stops all running checks
stop: Stops monitoring a specified notary node. Requires use of the
--target
option
Options are:
-u, –user: RPC user to use for the node. This can also be stored in a config file. It must be defined either in config or on the command line.
-P, –password: RPC password. This can also be stored in a config file. It must be defined either in config or on the command line.
-h, –host: hostname or IP address of the node.
-p, –port: RPC port of the node
-w, –wait-period: Time in seconds to wait between to checks. This can also be stored in a config file. Defaults to 120, i.e. 2 minutes
-o, –wait-for-outstanding-flows: This is the time we wait before rechecking a notary for which we have a check flow in-flight, i.e. it is not responding timely or at all. This can also be stored in a config file. Defaults to 300, i.e. 5 minutes
-t, –target: A string representation of the X500 name of the notary node that we want to monitor. Can only be used with
start
orstop
-n, –notary: A string representation of the X500 name of the notary service the target is part of for clustered notaries. This can only be used with
start
orstop
. It will default to the value of--target
, so does not need to be specified for single node notaries.
Monitoring
Logfile
Every state change of the check (starting a check, successful check,
failed check, check still in-flight when the next is scheduled) will generate a log line with all the relevant stats.
By redirecting the logs for the name net.corda.notaryhealthcheck.cordapp
to a separate file,
these can be monitored separately from the logging of the rest of the node.
...
<Appenders>
...
<RollingFile name="HealthCheckFile-Appender"
fileName="${log-path}/healthcheck-${log-name}.log"
filePattern="${archive}/healthcheck-${log-name}.%date{yyyy-MM-dd}-%i.log.gz">
<PatternLayout pattern="[%-5level] %date{ISO8601}{UTC}Z [%t] %c{2}.%method - %msg %X%n"/>
...
</RollingFile>
...
</Appenders>
<Loggers>
...
<Logger name="net.corda.notaryhealthcheck.cordapp" additivity="false" level="info">
<AppenderRef ref="HealthCheckFile-Appender" />
</Logger>
...
</Loggers>
The log events leave lines like below. There are examples for single node notaries (or notary services), and for cluster members of a clustered notary, which will list a node name in addition to the notary (service) name.
The first two lines show successful checks, listing the notary (and possibly node) that was checked along with the duration of how long the notarisation took.
The second set of two lines shows a failed check, showing the notary that was checked, how long the failed notarisation took, and the error message of the failure.
[INFO ] 2019-12-09T16:19:29,394Z [pool-12-thread-13] cordapp.HealthCheckService. - Notary service identity: [O=Raft, L=Zurich, C=CH]: Check successful in 00:00:00.154 {}
[INFO ] 2019-12-09T16:17:29,356Z [pool-12-thread-2] cordapp.HealthCheckService. - Notary service identity: [O=Raft, L=Zurich, C=CH], Node identity: [O=Notary Service 0, L=Zurich, C=CH]: Check successful in 00:00:00.116 {}
[INFO ] 2019-12-09T14:36:42,774Z [pool-13-thread-2] cordapp.HealthCheckService. - Notary service identity: [O=Raft, L=Zurich, C=CH]: Check failed in 00:00:01.185, Failure: java.util.concurrent.ExecutionException: net.corda.core.flows.NotaryException: Unable to notarise transaction <Unknown> : ...
[INFO ] 2019-12-09T15:05:03,692Z [pool-12-thread-3] cordapp.HealthCheckService. - Notary service identity: [O=Raft, L=Zurich, C=CH], Node identity: [O=Notary Service 0, L=Zurich, C=CH]: Check failed in 00:00:25.471, Failure: java.util.concurrent.ExecutionException: net.corda.core.flows.NotaryException: Unable to notarise transaction <Unknown> : ...
JMX/Jolokia
The flow also populates a set of JMX metrics in the namespace net.corda.notaryhealthcheck
that can be used to
monitor notary health via a dashboard or hook up an alerter. As an example, this is the hawtio
view on a failing notary check. Note the metrics for success, fail, inflight, and maxinflightTime for the
notary service and the cluster members on the left hand side.
Was this page helpful?
Thanks for your feedback!
Chat with us
Chat with us on our #docs channel on slack. You can also join a lot of other slack channels there and have access to 1-on-1 communication with members of the R3 team and the online community.
Propose documentation improvements directly
Help us to improve the docs by contributing directly. It's simple - just fork this repository and raise a PR of your own - R3's Technical Writers will review it and apply the relevant suggestions.
We're sorry this page wasn't helpful. Let us know how we can make it better!
Chat with us
Chat with us on our #docs channel on slack. You can also join a lot of other slack channels there and have access to 1-on-1 communication with members of the R3 team and the online community.
Create an issue
Create a new GitHub issue in this repository - submit technical feedback, draw attention to a potential documentation bug, or share ideas for improvement and general feedback.
Propose documentation improvements directly
Help us to improve the docs by contributing directly. It's simple - just fork this repository and raise a PR of your own - R3's Technical Writers will review it and apply the relevant suggestions.