Don't overwrite pushgateway metrics, shift mainnet3 cron schedule (#3491)
### Description
Copying from my Discord thread
https://discord.com/channels/935678348330434570/1222226637026885652
> Instead of pushing new metrics to the pushgateway and allowing any
existing metrics that exist there to stay there, we replace any existing
metrics
> We use `.push` instead of `.pushAdd` here
258bf85e43/typescript/infra/src/utils/metrics.ts (L29)
> And because we run two key funders on the mainnet cluster, the
mainnet3 key funder metrics compete with the mainnet2 key funder metrics
> We start them on the same cron job frequency I guess, but we have more
keys on mainnet3 so it typically takes longer than the mainnet2 one, so
the mainnet3 metrics tend to overwrite the mainnet2 ones
> Wonder if we should start them at diff times for less nonce contention
risk anyways
> For posterity, I found this by port-forwarding to the push gateway
```
kubectl port-forward prometheus-pushgateway-587c8dc779-294wv 9091 -n monitoring
```
> Where I found only mainnet3 metrics at localhost:9091/metrics
```
# HELP hyperlane_wallet_balance Current balance of eth and other tokens in the `tokens` map for the wallet addresses in the `wallets` set
# TYPE hyperlane_wallet_balance gauge
hyperlane_wallet_balance{chain="arbitrum",hyperlane_context="hyperlane",hyperlane_deployment="mainnet3",instance="",job="key-funder",token_name="Native",token_symbol="Native",wallet_address="0xa7ECcdb9Be08178f896c26b7BbD8C3D4E844d9Ba",wallet_name="key-funder"} 9.10653822887378
```
> even though the push gateway had been running for days
So this PR:
* Shifts the cron schedule for key funder by 30 mins to avoid nonce
clobberring with the mainnet2 key funder
* moves to by default **not** overwriting existing metrics
I'll do a similar PR for v2 to stop overwriting existing metrics
### Drive-by changes
<!--
Are there any minor or drive-by changes also included?
-->
### Related issues
Related: https://github.com/hyperlane-xyz/issues/issues/1141
### Backward compatibility
<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?
Yes/No
-->
### Testing
<!--
What kind of testing have these changes undergone?
None/Manual/Unit Tests
-->
pull/3489/head
parent
6e39cf01af
commit
3b520640d7
Loading…
Reference in new issue