feat: add taint & toleration to mainnet3 relayer node pool (#4780)

### Description

See
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

- In GKE, for the larger relayer node pool, added a taint: `NoSchedule`,
label `component=relayer`
- In our relayer workloads, they now have a toleration that matches that
taint
- As part of rollout, first did a NoExecute (evicting everything on the
big nodes), then moved to NoSchedule (so that some other pods like
daemonsets would still get onto these bigger nodes). I made sure the
hyperlane and neutron context relayers have these tolerations

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes https://github.com/hyperlane-xyz/issues/issues/1309

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
pull/4789/head
Trevor Porter 3 weeks ago committed by GitHub
parent 83a1567a0a
commit fc3818d12a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 13
      typescript/infra/src/agents/index.ts
  2. 8
      typescript/infra/src/config/agent/agent.ts

@ -164,6 +164,19 @@ export class RelayerHelmManager extends OmniscientAgentHelmManager {
signer: signers[name],
}));
if (!values.tolerations) {
values.tolerations = [];
}
// Relayer pods should only be scheduled on nodes with the component label set to relayer.
// NoSchedule was chosen so that some daemonsets (like the prometheus node exporter) would not be evicted.
values.tolerations.push({
key: 'component',
operator: 'Equal',
value: 'relayer',
effect: 'NoSchedule',
});
return values;
}
}

@ -36,6 +36,7 @@ export interface HelmRootAgentValues {
image: HelmImageValues;
hyperlane: HelmHyperlaneValues;
nameOverride?: string;
tolerations?: KubernetesToleration[];
}
// See rust/main/helm/values.yaml for the full list of options and their defaults.
@ -132,6 +133,13 @@ export interface KubernetesComputeResources {
memory: string;
}
export interface KubernetesToleration {
key: string;
operator: string;
value: string;
effect: string;
}
export class RootAgentConfigHelper implements AgentContextConfig {
readonly rawConfig: RootAgentConfig;

Loading…
Cancel
Save