diff --git a/.gitignore b/.gitignore index 0c9eb11c..fad4b73b 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,5 @@ dist *.rst *.lock *.svg +laser* +lol.py diff --git a/README.md b/README.md index 561f2b7d..e52952fb 100644 --- a/README.md +++ b/README.md @@ -20,47 +20,24 @@ $ cd mythril $ python setup.py install ``` +Note that Mythril requires Python 3.5 to work. + You also need a [go-ethereum](https://github.com/ethereum/go-ethereum) node that is synced with the network (note that Mythril uses non-standard RPC APIs only supported by go-ethereum, so other clients likely won't work). Start the node as follows: ```bash -$ geth --rpc --rpcapi eth,admin,debug --syncmode fast +$ geth --rpc --rpcapi eth,debug --syncmode fast ``` -### Database initialization - -Mythril builds its own contract database to enable fast search operations. This is to enable operations like those described in the [legendary "Mitch Brenner" blog post](https://medium.com/@rtaylor30/how-i-snatched-your-153-037-eth-after-a-bad-tinder-date-d1d84422a50b) in seconds instead of days. Unfortunately, the initial sync process is slow. You don't need to sync the whole blockchain right away though: If you abort the syncing process with `ctrl+c`, it will be auto-resumed the next time you run the `--init-db` command. - -```bash -$ myth --init-db -Starting synchronization from latest block: 4323706 -Processing block 4323000, 3 individual contracts in database -(...) -``` - -Note that syncing doesn't take quite as long as it first seems, because the blocks get smaller towards the beginning of the chain. - -The default behavior is to only sync contracts with a non-zero balance. You can disable this behavior with the `--sync-all` flag, but be aware that this will result in a huge (as in: dozens of GB) database. - ## Command line usage The Mythril command line tool (aptly named `myth`) allows you to conveniently access some of Mythril's functionality. -### Searching the database - -The search feature allows you to find contract instances that contain specific function calls and opcode sequences. It supports simple boolean expressions, such as: - -```bash -$ myth --search "func#changeMultisig(address)#" -$ myth --search "code#PUSH1 0x50,POP#" -$ myth --search "func#changeMultisig(address)# and code#PUSH1 0x50#" -``` - ### Disassembler Use the `-d` flag to disassemble code. The disassembler accepts a bytecode string or a contract address as its input. ```bash -$ myth -d -c "$ ./myth -d -c "5060" +$ myth -d -c "0x6060" 0 PUSH1 0x60 ``` @@ -77,67 +54,61 @@ $ myth -d -a "0x2a0c0dbecc7e4d658f48e01e3fa353f44050c208" 1137 ISZERO ``` -Adding the `-g FILENAME` option will output a call graph: +### Control flow graph + +Mythril integrates the LASER symbolic virtual machine. Right now, this is mainly used for CFG generation. The `-g FILENAME` option generates an [interactive jsViz graph](http://htmlpreview.github.io/?https://github.com/b-mueller/mythril/blob/master/static/mythril.html): ```bash -$ myth -d -a "0xFa52274DD61E1643d2205169732f29114BC240b3" -g ./graph.svg +$ myth -g ./graph.html -a "0xFa52274DD61E1643d2205169732f29114BC240b3" ``` -![callgraph](https://raw.githubusercontent.com/b-mueller/mythril/master/static/callgraph.png "Call graph") +![callgraph](https://raw.githubusercontent.com/b-mueller/mythril/master/static/callgraph6.png "Call graph") -Note that currently, Mythril only processes `JUMP` and `JUMPI` instructions with immediately preceding `PUSH`, but doesn't understand dynamic jumps and function calls. +The "bounce" effect, while awesome (and thus enabled by default), sometimes messes up the graph layout. If that happens, disable the effect with the `--disable-physics` flag. -### Tracing Code +### Contract search -You can run a code trace in the PyEthereum virtual machine. Optionally, input data can be passed via the `--data` flag. +Mythril builds its own contract database to enable fast search operations. This is to enable operations like those described in the [legendary "Mitch Brenner" blog post](https://medium.com/@rtaylor30/how-i-snatched-your-153-037-eth-after-a-bad-tinder-date-d1d84422a50b) in ~~seconds~~ minutes instead of days. Unfortunately, the initial sync process is slow. You don't need to sync the whole blockchain right away though: If you abort the syncing process with `ctrl+c`, it will be auto-resumed the next time you run the `--init-db` command. ```bash -$ myth -t -c "0x60506050" -0 PUSH1 0x50; STACK: [] -2 PUSH1 0x50; STACK: [0x50] - -$ myth -t -a "0x3665f2bf19ee5e207645f3e635bf0f4961d661c0" -0 PUSH1 0x60; STACK: [] -2 PUSH1 0x40; STACK: [0x60] -4 MSTORE; STACK: [0x60, 0x40] -5 CALLDATASIZE; STACK: [] +$ myth --init-db +Starting synchronization from latest block: 4323706 +Processing block 4323000, 3 individual contracts in database (...) ``` -#### Finding cross-references +Mythril retrieves contract data over RPC by default. You can switch to IPC using the `--ipc` flag. -It is often useful to find other contracts referenced by a particular contract. Let's assume you want to search for contracts that fulfill conditions similar to the [Parity Multisig Wallet Bug](http://hackingdistributed.com/2017/07/22/deep-dive-parity-bug/). First, you want to find a list of contracts that use the `DELEGATECALL` opcode: +The default behavior is to only sync contracts with a non-zero balance. You can disable this behavior with the `--sync-all` flag, but be aware that this will result in a huge (as in: dozens of GB) database. + +#### Searching from the command line + +The search feature allows you to find contract instances that contain specific function calls and opcode sequences. It supports simple boolean expressions, such as: ```bash -$ myth --search "code#DELEGATECALL#" -Matched contract with code hash 07459966443977122e639cbf7804c446 -Address: 0x76799f77587738bfeef09452df215b63d2cfb08a, balance: 1000000000000000 -Address: 0x3582d2a3b67d63ed10f1ecaef0dca71b9283b543, balance: 92000000000000000000 -Address: 0x4b9bc00c35f7cee95c65c3c9836040c37dec9772, balance: 89000000000000000000 -Address: 0x156d5687a201affb3f1e632dcfb9fde4b0128211, balance: 29500000000000000000 -(...) +$ myth --search "func#changeMultisig(address)#" +$ myth --search "code#PUSH1 0x50,POP#" +$ myth --search "func#changeMultisig(address)# and code#PUSH1 0x50#" ``` -Note that "code hash" in the above output refers to the contract's index in the database. The following lines ("Address: ...") list instances of same contract deployed on the blockchain. +#### Finding cross-references -You can then use the `--xrefs` flag to find the addresses of referenced contracts: +It is often useful to find other contracts referenced by a particular contract. E.g.: ```bash +$ myth --search "code#DELEGATECALL#" +Matched contract with code hash 07459966443977122e639cbf7804c446 +Address: 0x76799f77587738bfeef09452df215b63d2cfb08a, balance: 1000000000000000 $ myth --xrefs 07459966443977122e639cbf7804c446 5b9e8728e316bbeb692d22daaab74f6cbf2c4691 ``` -## Custom scripts - -By combining Mythril and [PyEthereum](https://github.com/ethereum/pyethereum) modules, you can automate more complex static and dynamic analysis tasks. Here is an [example](https://github.com/b-mueller/mythril/blob/master/examples/find-fallback-dcl.py). - ## Issues -The RPC database sync solution is not very efficient. I explored some other options, including: +The database sync is currently not very efficient. - Using PyEthereum: I encountered issues syncing PyEthereum with Homestead. Also, PyEthApp only supports Python 2.7, which causes issues with other important packages. - Accessing the Go-Ethereum LevelDB: This would be a great option. However, PyEthereum database code seems unable to deal with Go-Ethereum's LevelDB. It would take quite a bit of effort to figure this out. -- IPC might allow for faster sync then RPC - haven't tried it yet. I'm writing this in my spare time, so contributors would be highly welcome! diff --git a/myth b/myth index 1ed9e460..b0ea5a62 100755 --- a/myth +++ b/myth @@ -11,12 +11,15 @@ from mythril.ether.contractstorage import get_persistent_storage from mythril.rpc.client import EthJsonRpc from mythril.ipc.client import EthIpc from ethereum import utils +from laser.ethereum import laserfree +import logging import binascii import sys import argparse import os import re + def searchCallback(code_hash, code, addresses, balances): print("Matched contract with code hash " + code_hash ) @@ -31,32 +34,43 @@ def exitWithError(message): parser = argparse.ArgumentParser(description='Bug hunting on the Ethereum blockchain') -parser.add_argument('-d', '--disassemble', action='store_true', help='disassemble, specify input with -c or -a') -parser.add_argument('-t', '--trace', action='store_true', help='trace, use with -c or -a and --data (optional)') -parser.add_argument('-c', '--code', help='hex-encoded bytecode string ("6060604052...")', metavar='BYTECODE') -parser.add_argument('-a', '--address', help='contract address') -parser.add_argument('-o', '--outfile') -parser.add_argument('-g', '--graph', help='when disassembling, also generate a callgraph', metavar='OUTPUT_FILE') -parser.add_argument('--ipc', help='use ipc interface', action='store_true') -parser.add_argument('--data', help='message call input data for tracing') -parser.add_argument('--search', help='search the contract database') -parser.add_argument('--xrefs', help='get xrefs from contract in database', metavar='CONTRACT_HASH') -parser.add_argument('--hash', help='calculate function signature hash', metavar='SIGNATURE') -parser.add_argument('--init-db', action='store_true', help='Initialize the contract database') -parser.add_argument('--sync-all', action='store_true', help='Also sync contracts with zero balance') -parser.add_argument('--rpchost', default='127.0.0.1', help='RPC host') -parser.add_argument('--rpcport', type=int, default=8545, help='RPC port') + +commands = parser.add_argument_group('commands') +commands.add_argument('-d', '--disassemble', action='store_true', help='disassemble, specify input with -c or -a') +commands.add_argument('-t', '--trace', action='store_true', help='trace, use with -c or -a and --data (optional)') +commands.add_argument('-g', '--graph', help='generate a call graph', metavar='OUTPUT_FILE') +commands.add_argument('-l', '--fire-lasers', action='store_true', help='detect vulnerabilities, use with -c or -a') +commands.add_argument('-s', '--search', help='search the contract database') +commands.add_argument('--xrefs', help='get xrefs from contract in database', metavar='CONTRACT_HASH') +commands.add_argument('--hash', help='calculate function signature hash', metavar='SIGNATURE') +commands.add_argument('--init-db', action='store_true', help='initialize the contract database') + +inputs = parser.add_argument_group('input arguments') +inputs.add_argument('-c', '--code', help='hex-encoded bytecode string ("6060604052...")', metavar='BYTECODE') +inputs.add_argument('-a', '--address', help='contract address') +inputs.add_argument('--data', help='message call input data for tracing') + +options = parser.add_argument_group('options') +options.add_argument('--sync-all', action='store_true', help='Also sync contracts with zero balance') +options.add_argument('--rpchost', default='127.0.0.1', help='RPC host') +options.add_argument('--rpcport', type=int, default=8545, help='RPC port') +options.add_argument('--ipc', help='use IPC interface instead of RPC', action='store_true') +options.add_argument('--disable-physics', action='store_true', help='disable graph physics simulation') +options.add_argument('-v', type=int, help='log level (0-2)', metavar='LOG_LEVEL') + try: db_dir = os.environ['DB_DIR'] except KeyError: db_dir = None -contract_storage = get_persistent_storage(db_dir) - args = parser.parse_args() -if (args.disassemble): +if (args.v): + if (0 <= args.v < 3): + logging.basicConfig(level=[logging.NOTSET, logging.INFO, logging.DEBUG][args.v]) + +if (args.disassemble or args.graph or args.fire_lasers): if (args.code): encoded_bytecode = args.code @@ -68,7 +82,7 @@ if (args.disassemble): encoded_bytecode = eth.eth_getCode(args.address) except Exception as e: exitWithError("Exception loading bytecode via IPC: " + str(e)) - + else: try: eth = EthJsonRpc(args.rpchost, args.rpcport) @@ -78,24 +92,37 @@ if (args.disassemble): exitWithError("Exception loading bytecode via RPC: " + str(e)) else: - exitWithError("Disassembler: Provide the input bytecode via -c BYTECODE or --id ID") + exitWithError("No input bytecode. Please provide the code via -c BYTECODE or -a address") try: disassembly = Disassembly(encoded_bytecode) - # instruction_list = asm.disassemble(util.safe_decode(encoded_bytecode)) except binascii.Error: exitWithError("Disassembler: Invalid code string.") - easm_text = disassembly.get_easm() + if (args.disassemble): - if (args.outfile): - util.string_to_file(args.outfile, easm_text) - else: + easm_text = disassembly.get_easm() sys.stdout.write(easm_text) - if (args.graph): - generate_callgraph(disassembly, args.graph) + elif (args.graph): + + if (args.disable_physics): + physics = False + else: + physics = True + + html = generate_callgraph(disassembly, physics) + try: + with open(args.graph, "w") as f: + f.write(html) + except Exception as e: + print("Error saving graph: " + str(e)) + + + elif (args.fire_lasers): + + laserfree.fire(disassembly) elif (args.trace): @@ -126,34 +153,36 @@ elif (args.trace): else: print(str(i['pc']) + " " + i['op'] + ";\tSTACK: " + i['stack']) -elif (args.search): +else: + contract_storage = get_persistent_storage(db_dir) + if (args.search): - try: - contract_storage.search(args.search, searchCallback) - except SyntaxError: - exitWithError("Syntax error in search expression.") + try: + contract_storage.search(args.search, searchCallback) + except SyntaxError: + exitWithError("Syntax error in search expression.") -elif (args.xrefs): + elif (args.xrefs): - try: - contract_hash = util.safe_decode(args.xrefs) - except binascii.Error: - exitWithError("Invalid contract hash.") + try: + contract_hash = util.safe_decode(args.xrefs) + except binascii.Error: + exitWithError("Invalid contract hash.") - try: - contract = contract_storage.get_contract_by_hash(contract_hash) - print("\n".join(contract.get_xrefs())) - except KeyError: - exitWithError("Contract not found in the database.") - -elif (args.init_db): - if args.ipc: - contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) - else: - contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) + try: + contract = contract_storage.get_contract_by_hash(contract_hash) + print("\n".join(contract.get_xrefs())) + except KeyError: + exitWithError("Contract not found in the database.") -elif (args.hash): - print("0x" + utils.sha3(args.hash)[:4].hex()) + elif (args.init_db): + if args.ipc: + contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) + else: + contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) -else: - parser.print_help() + elif (args.hash): + print("0x" + utils.sha3(args.hash)[:4].hex()) + + else: + parser.print_help() diff --git a/mythril/disassembler/callgraph.py b/mythril/disassembler/callgraph.py index ceea5d12..0d851bb3 100644 --- a/mythril/disassembler/callgraph.py +++ b/mythril/disassembler/callgraph.py @@ -1,64 +1,140 @@ -import graphviz as gv - - -styles = { - 'graph': { - 'overlap': 'false', - 'fontsize': '16', - 'fontcolor': 'white', - 'bgcolor': '#333333', - 'concentrate':'true', - }, - 'nodes': { - 'fontname': 'Helvetica', - 'shape': 'box', - 'fontcolor': 'white', - 'color': 'white', - 'style': 'filled', - 'fillcolor': '#006699', - }, - 'edges': { - 'style': 'dashed', - 'dir': 'forward', - 'color': 'white', - 'arrowhead': 'normal', - 'fontname': 'Courier', - 'fontsize': '12', - 'fontcolor': 'white', - } -} - -def apply_styles(graph, styles): - graph.graph_attr.update( - ('graph' in styles and styles['graph']) or {} - ) - graph.node_attr.update( - ('nodes' in styles and styles['nodes']) or {} - ) - graph.edge_attr.update( - ('edges' in styles and styles['edges']) or {} - ) - return graph - - -def generate_callgraph(disassembly, file): - - graph = gv.Graph(format='svg') - - index = 0 - - for block in disassembly.blocks: - easm = block.get_easm().replace("\n", "\l") - - graph.node(str(index), easm) - index += 1 - - for xref in disassembly.xrefs: - - graph.edge(str(xref[0]), str(xref[1])) - - - graph = apply_styles(graph, styles) - - graph.render(file) +from laser.ethereum import svm +from z3 import Z3Exception, simplify +import re + +graph_html = ''' + + + + + + + +

Mythril / Ethereum LASER Symbolic VM

+


+ + + +''' + + +def serialize(_svm): + + nodes = [] + edges = [] + + for n in _svm.nodes: + + code = _svm.nodes[n].as_dict()['code'] + + code = re.sub("([0-9a-f]{8})[0-9a-f]+", lambda m: m.group(1) + "(...)", code) + + nodes.append("{id: " + str(_svm.nodes[n].as_dict()['id']) + ", size: 150, 'label': '" + code + "'}") + + for edge in _svm.edges: + + if (edge.condition is None): + label = "" + else: + + try: + label = str(simplify(edge.condition)).replace("\n", "") + except Z3Exception: + label = str(edge.condition).replace("\n", "") + + label = re.sub("[^_]([[\d]{2}\d+)", lambda m: hex(int(m.group(1))), label) + code = re.sub("([0-9a-f]{8})[0-9a-f]+", lambda m: m.group(1) + "(...)", code) + + edges.append("{from: " + str(edge.as_dict()['from']) + ', to: ' + str(edge.as_dict()['to']) + ", 'arrows': 'to', 'label': '" + label + "', 'smooth': {'type': 'cubicBezier'}}") + + return "var nodes = [\n" + ",\n".join(nodes) + "\n];\nvar edges = [\n" + ",\n".join(edges) + "\n];" + + + +def generate_callgraph(disassembly, physics): + + _svm = svm.SVM(disassembly) + + _svm.sym_exec() + + html = graph_html.replace("[JS]", serialize(_svm)) + html = html.replace("[ENABLE_PHYSICS]", str(physics).lower()) + + return html diff --git a/mythril/disassembler/disassembly.py b/mythril/disassembler/disassembly.py index da3533f0..e378cac5 100644 --- a/mythril/disassembler/disassembly.py +++ b/mythril/disassembler/disassembly.py @@ -48,7 +48,7 @@ class Disassembly: try: func_name = signatures[func_hash] except KeyError: - func_name = "UNK_" + func_hash + func_name = "_function_" + func_hash try: offset = self.instruction_list[i+2]['argument'] diff --git a/mythril/ether/contractstorage.py b/mythril/ether/contractstorage.py index a30fd501..e4d6e6c2 100644 --- a/mythril/ether/contractstorage.py +++ b/mythril/ether/contractstorage.py @@ -74,31 +74,33 @@ class ContractStorage(persistent.Persistent): receipt = eth.eth_getTransactionReceipt(tx['hash']) - contract_address = receipt['contractAddress'] + if receipt is not None: - contract_code = eth.eth_getCode(contract_address) - contract_balance = eth.eth_getBalance(contract_address) + contract_address = receipt['contractAddress'] - if not contract_balance or sync_all: - # skip contracts with zero balance (disable with --sync-all) - continue + contract_code = eth.eth_getCode(contract_address) + contract_balance = eth.eth_getBalance(contract_address) - code = ETHContract(contract_code, tx['input']) + if not contract_balance or sync_all: + # skip contracts with zero balance (disable with --sync-all) + continue - m = hashlib.md5() - m.update(contract_code.encode('UTF-8')) - contract_hash = m.digest() + code = ETHContract(contract_code, tx['input']) - try: - self.contracts[contract_hash] - except KeyError: - self.contracts[contract_hash] = code - m = InstanceList() - self.instance_lists[contract_hash] = m + m = hashlib.md5() + m.update(contract_code.encode('UTF-8')) + contract_hash = m.digest() - self.instance_lists[contract_hash].add(contract_address, contract_balance) + try: + self.contracts[contract_hash] + except KeyError: + self.contracts[contract_hash] = code + m = InstanceList() + self.instance_lists[contract_hash] = m - transaction.commit() + self.instance_lists[contract_hash].add(contract_address, contract_balance) + + transaction.commit() self.last_block = blockNum blockNum -= 1 diff --git a/mythril/ether/util.py b/mythril/ether/util.py index fccf6a51..4cdc6cca 100644 --- a/mythril/ether/util.py +++ b/mythril/ether/util.py @@ -4,7 +4,6 @@ from ethereum.abi import encode_abi, encode_int from ethereum.utils import zpad from ethereum.abi import method_id - def safe_decode(hex_encoded_string): # print(type(hex_encoded_string)) @@ -35,6 +34,10 @@ def bytecode_from_blockchain(creation_tx_hash, ipc, rpc_host='127.0.0.1', rpc_po raise RuntimeError("Transaction trace didn't return any bytecode") +def fire_lasers(disassembly): + return laserfree.analysis(disassembly) + + def encode_calldata(func_name, arg_types, args): mid = method_id(func_name, arg_types) diff --git a/requirements.txt b/requirements.txt index b7123222..19d9f446 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,5 @@ ethereum>=2.0.4 ZODB>=5.3.0 -graphviz>=0.8 +z3-solver>=4.5 web3 +laser-ethereum>=0.1.4 diff --git a/setup.py b/setup.py index ecc70085..4df56d5e 100755 --- a/setup.py +++ b/setup.py @@ -219,7 +219,7 @@ security community. setup( name='mythril', - version='0.3.15', + version='0.5.6', description='A reversing and bug hunting framework for the Ethereum blockchain', long_description=long_description, @@ -253,8 +253,10 @@ setup( install_requires=[ 'ethereum>=2.0.4', + 'web3', 'ZODB>=5.3.0', - 'graphviz>=0.8' + 'z3-solver>=4.5', + 'laser-ethereum>=0.1.6' ], python_requires='>=3.5', diff --git a/static/callgraph.png b/static/callgraph.png deleted file mode 100644 index 66312856..00000000 Binary files a/static/callgraph.png and /dev/null differ diff --git a/static/callgraph6.png b/static/callgraph6.png new file mode 100644 index 00000000..6830ed40 Binary files /dev/null and b/static/callgraph6.png differ diff --git a/static/mythril.html b/static/mythril.html new file mode 100644 index 00000000..a9f15e60 --- /dev/null +++ b/static/mythril.html @@ -0,0 +1,121 @@ + + + + + + + + +

Mythril / Ethereum LASER Symbolic VM

+


+ + +