Don't overwrite pushgateway metrics, shift mainnet3 cron schedule (#3491)

### Description

Copying from my Discord thread
https://discord.com/channels/935678348330434570/1222226637026885652

> Instead of pushing new metrics to the pushgateway and allowing any
existing metrics that exist there to stay there, we replace any existing
metrics
> We use `.push` instead of `.pushAdd` here
258bf85e43/typescript/infra/src/utils/metrics.ts (L29)
> And because we run two key funders on the mainnet cluster, the
mainnet3 key funder metrics compete with the mainnet2 key funder metrics
> We start them on the same cron job frequency I guess, but we have more
keys on mainnet3 so it typically takes longer than the mainnet2 one, so
the mainnet3 metrics tend to overwrite the mainnet2 ones
> Wonder if we should start them at diff times for less nonce contention
risk anyways
> For posterity, I found this by port-forwarding to the push gateway
```
kubectl port-forward prometheus-pushgateway-587c8dc779-294wv 9091 -n monitoring
```

> Where I found only mainnet3 metrics at localhost:9091/metrics
```
# HELP hyperlane_wallet_balance Current balance of eth and other tokens in the `tokens` map for the wallet addresses in the `wallets` set
# TYPE hyperlane_wallet_balance gauge
hyperlane_wallet_balance{chain="arbitrum",hyperlane_context="hyperlane",hyperlane_deployment="mainnet3",instance="",job="key-funder",token_name="Native",token_symbol="Native",wallet_address="0xa7ECcdb9Be08178f896c26b7BbD8C3D4E844d9Ba",wallet_name="key-funder"} 9.10653822887378
```
> even though the push gateway had been running for days

So this PR:
* Shifts the cron schedule for key funder by 30 mins to avoid nonce
clobberring with the mainnet2 key funder
* moves to by default **not** overwriting existing metrics

I'll do a similar PR for v2 to stop overwriting existing metrics

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

Related: https://github.com/hyperlane-xyz/issues/issues/1141

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
pull/3489/head
Trevor Porter 8 months ago committed by GitHub
parent 6e39cf01af
commit 3b520640d7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 8
      typescript/infra/config/environments/mainnet3/funding.ts
  2. 8
      typescript/infra/src/utils/metrics.ts

@ -9,12 +9,12 @@ import { environment } from './chains';
export const keyFunderConfig: KeyFunderConfig = {
docker: {
repo: 'gcr.io/abacus-labs-dev/hyperlane-monorepo',
tag: 'c037206-20240220-152500',
tag: '7781bce-20240326-173938',
},
// We're currently using the same deployer key as mainnet.
// We're currently using the same deployer/key funder key as mainnet2.
// To minimize nonce clobbering we offset the key funder cron
// schedule by 30 minutes.
cronSchedule: '15 * * * *', // Every hour at the 15-minute mark
// to run 30 mins after the mainnet2 cron.
cronSchedule: '45 * * * *', // Every hour at the 45-minute mark
namespace: environment,
prometheusPushGateway:
'http://prometheus-pushgateway.monitoring.svc.cluster.local:9091',

@ -19,17 +19,17 @@ function getPushGateway(register: Registry): Pushgateway | null {
export async function submitMetrics(
register: Registry,
jobName: string,
options?: { appendMode?: boolean },
options?: { overwriteAllMetrics?: boolean },
) {
const gateway = getPushGateway(register);
if (!gateway) return;
let resp;
try {
if (options?.appendMode) {
resp = (await gateway.pushAdd({ jobName })).resp;
} else {
if (options?.overwriteAllMetrics) {
resp = (await gateway.push({ jobName })).resp;
} else {
resp = (await gateway.pushAdd({ jobName })).resp;
}
} catch (e) {
error('Error when pushing metrics', { error: format(e) });

Loading…
Cancel
Save