Edit understand metrics page (#1163)

* Edit understand metrics page

Signed-off-by: Alexandra Tran <alexandra.tran@consensys.net>

* more edits

Signed-off-by: Alexandra Tran <alexandra.tran@consensys.net>

* edit footnote

Signed-off-by: Alexandra Tran <alexandra.tran@consensys.net>

* Add reference to snap sync section

Signed-off-by: Alexandra Tran <alexandra.tran@consensys.net>

Signed-off-by: Alexandra Tran <alexandra.tran@consensys.net>
pull/1168/head
Alexandra Tran 2 years ago committed by GitHub
parent 11e927a449
commit 1a1fa570ab
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      docs/public-networks/how-to/connect/sync-node.md
  2. 109
      docs/public-networks/how-to/monitor/understand-metrics.md

@ -103,6 +103,8 @@ You can't switch from fast sync to snap sync.
If your node is blocked in the middle of a fast sync, you can start over using snap sync instead by stopping the node,
deleting the data directory, and starting over using `--sync-mode=X_SNAP`.
See [how to read the Besu metrics charts](../monitor/understand-metrics.md) when using snap sync.
### Checkpoint synchronization
!!! important

@ -4,60 +4,62 @@ tags:
- public networks
---
If you've run Besu on mainnet several times with `X_SNAP` option enabled,
you may have noticed graphical patterns that stand out in different metrics charts.
# Understand metrics
These patterns are related to the Besu sync process.
When running Besu on Ethereum Mainnet using [snap sync](../connect/sync-node.md#snap-synchronization),
you might notice graphical patterns that stand out in different metrics charts.
These patterns are related to the [CPU usage](#cpu-usage) and [block time](#block-time) of the Besu
sync process.
Read this page to better understand and know how to interpret these graphical patterns.
## CPU usage
## CPU Utilization
The following screenshot from [monitoring Besu with Prometheus and Grafana] shows patterns related to CPU utilization.
The following screenshot from [monitoring Besu with Prometheus and Grafana] shows patterns related
to CPU usage.
![CPU Grafana Besu dashboard patterns screenshot](../../../images/besu-cpu-pattern-during-sync.png)
The CPU pattern is a "staircase" pattern, where each step represents one of the Besu running stages.
### 1 -- Blocks import and world state download
### 1. Blocks import and world state download
Step 1 highlights blocks import and world state download, two tasks that are executed in parallel in Besu.
Step 1 highlights blocks import and world state download, two tasks executed in parallel in Besu.
Besu manages these two tasks with two different pipelines.
This step is CPU-bound[^1].
The two pipelines stages run on multiple threads.
This step is CPU-bound.[^1]
The two pipeline stages run on multiple threads.
As displayed on the following screenshot -- for a VM with 8 CPU -- the CPU load average is about 7.5
As displayed in the following screenshot (for a VM with 8 CPUs) the CPU load average is about 7.5
and sometimes exceeds 10 (a 100% load for the 8 CPUs is 8).
It means that there's more work to be done than what the CPUs can handle.
This means there's more work to be done than what the CPUs can handle.
![System load metrics screenshot](../../../images/system-load.png)
### 2 -- World state healing
### 2. World state healing
The healing, step 2, starts just after the world state download in step 1 is over.
The peak in system CPU is related to the high rate of IO (input and output) required during this step.
IO utilization is around 61% during healing when it's only 39% during the remaining sync.
Step 2, world state healing, starts just after the world state download in step 1 is complete.
The peak in system CPU is related to the high rate of input and output (IO) required during this step.
IO usage is around 61% during healing, and it's only 39% during the remaining sync.
![IO utilization metrics screenshot](../../../images/io-utilization.png)
### 3 -- Blocks import
### 3. Blocks import
After step 1 and 2 where world state is downloaded and healed, block import continues.
After steps 1 and 2, world state is downloaded and healed, and block import continues.
The visible drop in CPU shows that Besu finished the world state nodes download.
Block import step is long as Besu can't parallelize block import,
it has to validate each parent block before importing a child.
The block import step is long because Besu can't parallelize block import -- it must validate each
parent block before importing a child.
!!! note
!!!note
The Besu team is discussing other algorithm and implementations to make this block import faster.
Stay tuned!
The Besu team is curently working on other algorithm and implementations to make this block
import faster.
### 4 -- Blocks full import
### 4. Blocks full import
In step 4, all the transactions of each block are executed by Besu.
This is where Besu updates the world state after the healing step.
In step 4, Besu executes all transactions of each block.
This is when Besu updates the world state after the healing step.
The quantity of imported blocks in this step depends on the speed of the sync.
This number indicates the cumulated blocks quantity behind head since the last healing step.
@ -66,53 +68,54 @@ This step consumes less CPU than the previous steps because the sequential part
-- executing transactions on the EVM -- must be single-threaded,
reducing the concurrent work at the CPU level.
### 5 -- Blocks production and propagation
### 5. Blocks production and propagation
Once Besu is completely synced, it propagates blocks and executes the transactions inside each block.
Block production and propagation step shows an important reduction in CPU consumption.
This reduction is due to the idle time while waiting for the new block and because executing
transactions on the EVM is sequential.
Step 5, block production and propagation, shows a reduction in CPU consumption due to the idle time
while waiting for the new block and the sequential nature of executing transactions on the EVM.
## Block time
The following screenshot shows patterns related to block times as available in the [Besu Grafana full dashboard](https://grafana.com/grafana/dashboards/16455-besu-full/).
![Block time Grafana Besu dashboard patterns screenshot](../../../images/block-time.png)
Block time measures the duration of getting new blocks in Besu.
Block time is closely related to [CPU usage](#cpu-usage).
The block times screenshot also shows a "staircase" pattern.
The following screenshot shows patterns related to block time as available in the
[Besu Grafana full dashboard](https://grafana.com/grafana/dashboards/16455-besu-full/).
Block time metric measures the duration for getting new blocks in Besu.
![Block time Grafana Besu dashboard patterns screenshot](../../../images/block-time.png)
Block time is closely related to the steps described in the previous [CPU utilization](#cpu-utilization).
The block time pattern is also a "staircase" pattern.
### 1 -- Block import time
### 1. Block import time
Block import time, as visible in step 1, is the duration for importing a block.
Step 1, block import time, is the duration of importing a block.
Import includes:
- the data retrieval over the network
- the headers, body and, receipt validation
- persisting the block in the database.
- Data retrieval over the network.
- Headers, body, and receipt validation.
- Persisting the block in the database.
Duration for a block import is between a few milliseconds and up to tens of milliseconds.
Block import takes between a few and tens of milliseconds.
### 2 -- Block full import time
### 2. Block full import time
The next step is the block full import time graph shows the duration for importing the block
(duration of the first stage) and for the execution of all the transactions in this block.
Step 2, block full import time, is the duration of importing a block (step 1) and executing all
its transactions.
Besu spends between 1 and 2 seconds per block for this step, depending on the number and complexity
Block full import takes between 1 and 2 seconds per block, depending on the number and complexity
of the transactions.
### 3 -- Block network time
### 3. Block network time
Step 3, block network time, is the duration of propagating a block over the network and
executing all its transactions.
The last step shows block network time that includes the propagating of the block over the network and
the execution of all its transactions.
Block network takes between 13 and 16 seconds.
It usually takes between 13 and 16 seconds.
<!--links-->
[monitoring Besu with Prometheus and Grafana]: ../../../private-networks/tutorials/quickstart.md#monitor-nodes-with-prometheus-and-grafana
[^1]: A CPU-bound task means that only the CPU speed drives the time required to execute the task.
Find more about [CPU-bounding on Wikipedia](https://en.wikipedia.org/wiki/CPU-bound).
[^1]: A CPU-bound task means that the time required to execute the task is determined only by the
CPU speed.

Loading…
Cancel
Save