Describe indexer structure and list existing fetchers (#1763)

* Describe indexer structure and list existing fetchers Resolves #1628. A brief description of indexer structure is added to `README.md` of `indexer` application. Installation instructions are removed, as they don't make sense for sub-app of umbrella application. * Add CHANGELOG.md entry
6 years ago · ef1b0cd273
parent 6a96128d94
commit ef1b0cd273
2 changed files with 90 additions and 16 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -20,6 +20,7 @@
 - [#1718](https://github.com/poanetwork/blockscout/pull/1718) - Flatten indexer module hierarchy and supervisor tree
 - [#1753](https://github.com/poanetwork/blockscout/pull/1753) - Add a check mark to decompiled contract tab
 - [#1744](https://github.com/poanetwork/blockscout/pull/1744) - remove `0x0..0` from tests
+ - [#1763](https://github.com/poanetwork/blockscout/pull/1763) - Describe indexer structure and list existing fetchers


 ## 1.3.9-beta
--- a/apps/indexer/README.md
+++ b/apps/indexer/README.md
@ -2,22 +2,95 @@

 **TODO: Add description**

-## Installation
-
-If [available in Hex](https://hex.pm/docs/publish), the package can be installed
-by adding `indexer` to your list of dependencies in `mix.exs`:
-
-```elixir
-def deps do
-  [
-    {:indexer, "~> 0.1.0"}
-  ]
-end
-```
-
-Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
-and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
-be found at [https://hexdocs.pm/indexer](https://hexdocs.pm/indexer).
+## Structure
+
+The indexer is split into multiple fetchers. Each fetcher has its own supervising tree with a separate `TaskSupervisor` for better detecting of memory, message or blocking problems.
+
+Most fetchers have their `Supervisor` module generated automatically using `use Indexer.Fetcher` macro.
+
+There are different fetchers described below, but the final step of almost all of them is importing data into database.
+A map of lists of different entities is constructed and fed to `Explorer.Chain.import` method.
+This method assigns different runners from `Explorer.Chain.Import.Runner` namespace, matching key in map to `option_key` attribute of a runner.
+The runners are then performing according to the order specified in stages in `Explorer.Chain.Import.Stage`.
+
+### Transformers
+
+Some data has to be extracted from already fetched data, and there're several transformers in `lib/indexer/transform` to do just that. They normally accept a part of the `Chain.import`-able map and return another part of it.
+
+- `addresses`: extracts all encountered addresses from different entities
+- `address_coin_balances`: detects coin balance-changing entities (transactions, minted blocks, etc) to create coin balance entities for further fetching
+- `token_transfers`: parses logs to extract token transfers
+- `mint_transfers`: parses logs to extract token mint transfers
+- `address_token_balances`: creates token balance entities for futher fetching, based on detected token transfers
+- `blocks`: extracts block signer hash from additional data for Clique chains
+
+
+### Root fetchers
+
+- `pending_transaction`: fetches pending transactions (i.e. not yet collated into a block) every second (`pending_transaction_interval`)
+- `block/realtime`: listens for new blocks from websocket and polls node for new blocks, imports new ones one by one
+- `block/catchup`: gets unfetched ranges of blocks, imports them in batches
+
+Both block fetchers retrieve/extract the blocks themselves and the following additional data:
+- `block_second_degree_relations`
+- `transactions`
+- `logs`
+- `token_transfers`
+- `addresses`
+
+The following stubs for further async fetching are inserted as well:
+- `block_rewards`
+- `address_coin_balances`
+- `address_token_balances`
+- `tokens`
+
+Realtime fetcher also immediately fetches from the node:
+- current balances for `addresses`
+- `address_coin_balances`
+
+The following async fetchers are launched for importing missing data:
+- `replaced_transaction`
+- `block_reward`
+- `uncle_block`
+- `internal_transaction`
+- `coin_balance` (only in catchup fetcher)
+- `token_balance`
+- `token`
+- `contract_code`
+
+### Async fetchers
+
+These are responsible for fetching additional block data not retrieved in root fetchers.
+Most of them are based off `BufferedTask`, and the basic algorithm goes like this:
+1. Make an initial streaming request to database to fetch identifiers of all existing unfetched items.
+2. Accept new identifiers for fetching via `async_fetch()` method.
+3. Split identifier in batches and run tasks on `TaskSupervisor` according to `max_batch_size` and `max_concurrency` settings.
+4. Make requests using `EthereumJSONRPC`.
+5. Optionally post-process results using transformers.
+6. Optionally pass new identifiers to other async fetchers using `async_fetch`.
+7. Run `Chain.import` with fetched data.
+
+- `replaced_transaction`: not a fetcher per se, but rather an async worker, which discards previously pending transactions after they are replaced with new pending transactions with the same nonce, or are collated in a block.
+- `block_reward`: missing `block_rewards` for consensus blocks
+- `uncle_block`: blocks for `block_second_degree_relations` with null `uncle_fetched_at`
+- `internal_transaction`: for either `blocks` (Parity) or `transactions` with null `internal_transactions_indexed_at`
+- `coin_balance`: for `address_coin_balances` with null `value_fetched_at`
+- `token_balance`: for `address_token_balances` with null `value_fetched_at`. Also upserts `address_current_token_balances`
+- `token`: for `tokens` with `cataloged == false`
+- `contract_code`: for `transactions` with non-null `created_contract_address_hash` and null `created_contract_code_indexed_at`
+
+Additionally:
+- `token_updater` is run every 2 days to update token metadata
+- `coin_balance_on_demand` is triggered from web UI to ensure address balance is as up-to-date as possible
+
+### Temporary workers
+
+These workers are created for fetching information, which previously wasn't fetched in existing fetchers, or was fetched incorrectly.
+After all deployed instances get all needed data, these fetchers should be deprecated and removed.
+
+- `uncataloged_token_transfers`: extracts token transfers from logs, which previously weren't parsed due to unknown format
+- `addresses_without_codes`: forces complete refetch of blocks, which have created contract addresses without contract code
+- `failed_created_addresses`: forces refetch of contract code for failed transactions, which previously got incorrectly overwritten

 ## Memory Usage