chore: fix queue metric juggedness (#4689)
### Description See https://github.com/hyperlane-xyz/hyperlane-monorepo/issues/4068 for the problem description. In this fix, whenever an operation is moved from one queue to another, its metric count is decremented from the old queue and incremented for the new one. My initial implementation approach was to update these metrics inside `queue.push(op)`, but the metrics for the operation's previous queue aren't accessible there. #4068 suggests updating them in `op.set_status`, which can't be done for the same reason, even if `op` has a pointer to the current queue's metric internally. So the fix I went for does store a pointer to the current queue metric internally in `op`, but also adds a new `op.set_status_and_update_metrics(status, new_queue_metric)` method, which **must** be used if the queue metrics are to be correctly calculated. This works well except for when ops are removed from the confirm queue, because in the absence of a call to `set_status_and_update_metrics`, no metric decrementing is done. I considered using the `Drop` trait to decrement, but it'd have to be implemented individually for each `PendingOperation` type, which isn't very maintainable. I ended up decrementing the metric in `confirm_operation`, which is called for both batches and single submissions and, of course, all implementations of `PendingOperation`. Here's a screenshot of my local grafana server showing no jaggedness in the e2e run, with prometheus configured to scrape every 2s: ![Screenshot 2024-10-15 at 17 26 56](https://github.com/user-attachments/assets/26004e0e-2ccf-4cec-aa23-ee2d032df25a) ### Drive-by changes Adds the `prepare_queue` arg of `submit_single_operation` to the `instrument(skip(...))` list so it no longer pollutes logs. ### Related issues - Fixes https://github.com/hyperlane-xyz/hyperlane-monorepo/issues/4068 ### Backward compatibility Yes ### Testing Manually, by checking the queue length metric of an e2e run in grafanapull/4686/head
parent
9382658db3
commit
efd438f9b2
Loading…
Reference in new issue