* Don't overload `--metrics-host` and `--metrics-port` for push mode, instead add '--metrics-push-enabled' , `--metrics-push-host` and `--metrics-push-port`, rename `--metrics-prometheus-job` to `--metrics-push-prometheus-job` and remove `--metrics-mode`
* Show an error if both metrics and metrics-push are enabled.
* Allow shutdown if we cannot communicate with the Push Gateway.
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>
Sometimes metrics are hard to poll (docker containers with varying ip
addresses). Because of that the push gateway exists. This extends the
metrics system to support push or pull mode for metrics (but not both
at the same time).
Three new flags
`--metrics-mode=`<`push`|`pull`> - Whether we are in pull mode (the default) where
prometheus is expected to poll or push mode where pantheon pushes to
a push gateway.
`--metrics-push-interval=`<_integer_> the frequency, in seconds, between pushes to
the push gateway. Only relevant in push mode
`--metrics-prometheus-job=`<_string_> The name of the job to report in the push gateway
Also, `--metrics-host=` and `--metrics-port=` gain new meaning in push mode. Instead of the
server they are opening up it is the host and the port of the push gateway it should push to.
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>
A service like the JSON-RPC service is opened up, only serving /metrics
requests in a file format for prometheus.
New CLI flags are --metrics-enabled and --metrics-listen, just like the
--rpc and --ws variants of the same.
--host-whitelist is respected the same as the JSON-RPC endpoint.
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>
* Time all tasks
This is fairly high touch consisting of 3 things:
* Moving to Prometheus's Summary for timers
* Timing at .2, .5, .8, .9, .99, and 1.0 (1.0 actually gets max I believe)
* Timing all abstract EthTasks
* The bulk of the changes: plumbing the timing context everywhere we need it
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>
* Plumb in three more metrics
* add blockchain_height gauge
* add blockchain_difficulty_total gauge
* add blockchain_announcedBlock_ingest histogram
This involved some deep pluming such that the metrics system needs to be
created in the PantheonCommand, along with trickle down effects into other
consensus engines. This is likely where it should live anyway.
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>
Metrics being captured initially:
Total number of peers ever connected to
Total number of peers disconnected, by disconnect reason and whether the disconnect was initiated locally or remotely.
Current number of peers
Timing for processing JSON-RPC requests, broken down by method name.
Generic JVM and process metrics (memory used, heap size, thread count, time spent in GC, file descriptors opened, CPU time etc).
Signed-off-by: Adrian Sutton <adrian.sutton@consensys.net>