diff --git a/.gitignore b/.gitignore index 0c9eb11c..fad4b73b 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,5 @@ dist *.rst *.lock *.svg +laser* +lol.py diff --git a/README.md b/README.md index 561f2b7d..e52952fb 100644 --- a/README.md +++ b/README.md @@ -20,47 +20,24 @@ $ cd mythril $ python setup.py install ``` +Note that Mythril requires Python 3.5 to work. + You also need a [go-ethereum](https://github.com/ethereum/go-ethereum) node that is synced with the network (note that Mythril uses non-standard RPC APIs only supported by go-ethereum, so other clients likely won't work). Start the node as follows: ```bash -$ geth --rpc --rpcapi eth,admin,debug --syncmode fast +$ geth --rpc --rpcapi eth,debug --syncmode fast ``` -### Database initialization - -Mythril builds its own contract database to enable fast search operations. This is to enable operations like those described in the [legendary "Mitch Brenner" blog post](https://medium.com/@rtaylor30/how-i-snatched-your-153-037-eth-after-a-bad-tinder-date-d1d84422a50b) in seconds instead of days. Unfortunately, the initial sync process is slow. You don't need to sync the whole blockchain right away though: If you abort the syncing process with `ctrl+c`, it will be auto-resumed the next time you run the `--init-db` command. - -```bash -$ myth --init-db -Starting synchronization from latest block: 4323706 -Processing block 4323000, 3 individual contracts in database -(...) -``` - -Note that syncing doesn't take quite as long as it first seems, because the blocks get smaller towards the beginning of the chain. - -The default behavior is to only sync contracts with a non-zero balance. You can disable this behavior with the `--sync-all` flag, but be aware that this will result in a huge (as in: dozens of GB) database. - ## Command line usage The Mythril command line tool (aptly named `myth`) allows you to conveniently access some of Mythril's functionality. -### Searching the database - -The search feature allows you to find contract instances that contain specific function calls and opcode sequences. It supports simple boolean expressions, such as: - -```bash -$ myth --search "func#changeMultisig(address)#" -$ myth --search "code#PUSH1 0x50,POP#" -$ myth --search "func#changeMultisig(address)# and code#PUSH1 0x50#" -``` - ### Disassembler Use the `-d` flag to disassemble code. The disassembler accepts a bytecode string or a contract address as its input. ```bash -$ myth -d -c "$ ./myth -d -c "5060" +$ myth -d -c "0x6060" 0 PUSH1 0x60 ``` @@ -77,67 +54,61 @@ $ myth -d -a "0x2a0c0dbecc7e4d658f48e01e3fa353f44050c208" 1137 ISZERO ``` -Adding the `-g FILENAME` option will output a call graph: +### Control flow graph + +Mythril integrates the LASER symbolic virtual machine. Right now, this is mainly used for CFG generation. The `-g FILENAME` option generates an [interactive jsViz graph](http://htmlpreview.github.io/?https://github.com/b-mueller/mythril/blob/master/static/mythril.html): ```bash -$ myth -d -a "0xFa52274DD61E1643d2205169732f29114BC240b3" -g ./graph.svg +$ myth -g ./graph.html -a "0xFa52274DD61E1643d2205169732f29114BC240b3" ``` -![callgraph](https://raw.githubusercontent.com/b-mueller/mythril/master/static/callgraph.png "Call graph") +![callgraph](https://raw.githubusercontent.com/b-mueller/mythril/master/static/callgraph6.png "Call graph") -Note that currently, Mythril only processes `JUMP` and `JUMPI` instructions with immediately preceding `PUSH`, but doesn't understand dynamic jumps and function calls. +The "bounce" effect, while awesome (and thus enabled by default), sometimes messes up the graph layout. If that happens, disable the effect with the `--disable-physics` flag. -### Tracing Code +### Contract search -You can run a code trace in the PyEthereum virtual machine. Optionally, input data can be passed via the `--data` flag. +Mythril builds its own contract database to enable fast search operations. This is to enable operations like those described in the [legendary "Mitch Brenner" blog post](https://medium.com/@rtaylor30/how-i-snatched-your-153-037-eth-after-a-bad-tinder-date-d1d84422a50b) in ~~seconds~~ minutes instead of days. Unfortunately, the initial sync process is slow. You don't need to sync the whole blockchain right away though: If you abort the syncing process with `ctrl+c`, it will be auto-resumed the next time you run the `--init-db` command. ```bash -$ myth -t -c "0x60506050" -0 PUSH1 0x50; STACK: [] -2 PUSH1 0x50; STACK: [0x50] - -$ myth -t -a "0x3665f2bf19ee5e207645f3e635bf0f4961d661c0" -0 PUSH1 0x60; STACK: [] -2 PUSH1 0x40; STACK: [0x60] -4 MSTORE; STACK: [0x60, 0x40] -5 CALLDATASIZE; STACK: [] +$ myth --init-db +Starting synchronization from latest block: 4323706 +Processing block 4323000, 3 individual contracts in database (...) ``` -#### Finding cross-references +Mythril retrieves contract data over RPC by default. You can switch to IPC using the `--ipc` flag. -It is often useful to find other contracts referenced by a particular contract. Let's assume you want to search for contracts that fulfill conditions similar to the [Parity Multisig Wallet Bug](http://hackingdistributed.com/2017/07/22/deep-dive-parity-bug/). First, you want to find a list of contracts that use the `DELEGATECALL` opcode: +The default behavior is to only sync contracts with a non-zero balance. You can disable this behavior with the `--sync-all` flag, but be aware that this will result in a huge (as in: dozens of GB) database. + +#### Searching from the command line + +The search feature allows you to find contract instances that contain specific function calls and opcode sequences. It supports simple boolean expressions, such as: ```bash -$ myth --search "code#DELEGATECALL#" -Matched contract with code hash 07459966443977122e639cbf7804c446 -Address: 0x76799f77587738bfeef09452df215b63d2cfb08a, balance: 1000000000000000 -Address: 0x3582d2a3b67d63ed10f1ecaef0dca71b9283b543, balance: 92000000000000000000 -Address: 0x4b9bc00c35f7cee95c65c3c9836040c37dec9772, balance: 89000000000000000000 -Address: 0x156d5687a201affb3f1e632dcfb9fde4b0128211, balance: 29500000000000000000 -(...) +$ myth --search "func#changeMultisig(address)#" +$ myth --search "code#PUSH1 0x50,POP#" +$ myth --search "func#changeMultisig(address)# and code#PUSH1 0x50#" ``` -Note that "code hash" in the above output refers to the contract's index in the database. The following lines ("Address: ...") list instances of same contract deployed on the blockchain. +#### Finding cross-references -You can then use the `--xrefs` flag to find the addresses of referenced contracts: +It is often useful to find other contracts referenced by a particular contract. E.g.: ```bash +$ myth --search "code#DELEGATECALL#" +Matched contract with code hash 07459966443977122e639cbf7804c446 +Address: 0x76799f77587738bfeef09452df215b63d2cfb08a, balance: 1000000000000000 $ myth --xrefs 07459966443977122e639cbf7804c446 5b9e8728e316bbeb692d22daaab74f6cbf2c4691 ``` -## Custom scripts - -By combining Mythril and [PyEthereum](https://github.com/ethereum/pyethereum) modules, you can automate more complex static and dynamic analysis tasks. Here is an [example](https://github.com/b-mueller/mythril/blob/master/examples/find-fallback-dcl.py). - ## Issues -The RPC database sync solution is not very efficient. I explored some other options, including: +The database sync is currently not very efficient. - Using PyEthereum: I encountered issues syncing PyEthereum with Homestead. Also, PyEthApp only supports Python 2.7, which causes issues with other important packages. - Accessing the Go-Ethereum LevelDB: This would be a great option. However, PyEthereum database code seems unable to deal with Go-Ethereum's LevelDB. It would take quite a bit of effort to figure this out. -- IPC might allow for faster sync then RPC - haven't tried it yet. I'm writing this in my spare time, so contributors would be highly welcome! diff --git a/myth b/myth index 1ed9e460..b0ea5a62 100755 --- a/myth +++ b/myth @@ -11,12 +11,15 @@ from mythril.ether.contractstorage import get_persistent_storage from mythril.rpc.client import EthJsonRpc from mythril.ipc.client import EthIpc from ethereum import utils +from laser.ethereum import laserfree +import logging import binascii import sys import argparse import os import re + def searchCallback(code_hash, code, addresses, balances): print("Matched contract with code hash " + code_hash ) @@ -31,32 +34,43 @@ def exitWithError(message): parser = argparse.ArgumentParser(description='Bug hunting on the Ethereum blockchain') -parser.add_argument('-d', '--disassemble', action='store_true', help='disassemble, specify input with -c or -a') -parser.add_argument('-t', '--trace', action='store_true', help='trace, use with -c or -a and --data (optional)') -parser.add_argument('-c', '--code', help='hex-encoded bytecode string ("6060604052...")', metavar='BYTECODE') -parser.add_argument('-a', '--address', help='contract address') -parser.add_argument('-o', '--outfile') -parser.add_argument('-g', '--graph', help='when disassembling, also generate a callgraph', metavar='OUTPUT_FILE') -parser.add_argument('--ipc', help='use ipc interface', action='store_true') -parser.add_argument('--data', help='message call input data for tracing') -parser.add_argument('--search', help='search the contract database') -parser.add_argument('--xrefs', help='get xrefs from contract in database', metavar='CONTRACT_HASH') -parser.add_argument('--hash', help='calculate function signature hash', metavar='SIGNATURE') -parser.add_argument('--init-db', action='store_true', help='Initialize the contract database') -parser.add_argument('--sync-all', action='store_true', help='Also sync contracts with zero balance') -parser.add_argument('--rpchost', default='127.0.0.1', help='RPC host') -parser.add_argument('--rpcport', type=int, default=8545, help='RPC port') + +commands = parser.add_argument_group('commands') +commands.add_argument('-d', '--disassemble', action='store_true', help='disassemble, specify input with -c or -a') +commands.add_argument('-t', '--trace', action='store_true', help='trace, use with -c or -a and --data (optional)') +commands.add_argument('-g', '--graph', help='generate a call graph', metavar='OUTPUT_FILE') +commands.add_argument('-l', '--fire-lasers', action='store_true', help='detect vulnerabilities, use with -c or -a') +commands.add_argument('-s', '--search', help='search the contract database') +commands.add_argument('--xrefs', help='get xrefs from contract in database', metavar='CONTRACT_HASH') +commands.add_argument('--hash', help='calculate function signature hash', metavar='SIGNATURE') +commands.add_argument('--init-db', action='store_true', help='initialize the contract database') + +inputs = parser.add_argument_group('input arguments') +inputs.add_argument('-c', '--code', help='hex-encoded bytecode string ("6060604052...")', metavar='BYTECODE') +inputs.add_argument('-a', '--address', help='contract address') +inputs.add_argument('--data', help='message call input data for tracing') + +options = parser.add_argument_group('options') +options.add_argument('--sync-all', action='store_true', help='Also sync contracts with zero balance') +options.add_argument('--rpchost', default='127.0.0.1', help='RPC host') +options.add_argument('--rpcport', type=int, default=8545, help='RPC port') +options.add_argument('--ipc', help='use IPC interface instead of RPC', action='store_true') +options.add_argument('--disable-physics', action='store_true', help='disable graph physics simulation') +options.add_argument('-v', type=int, help='log level (0-2)', metavar='LOG_LEVEL') + try: db_dir = os.environ['DB_DIR'] except KeyError: db_dir = None -contract_storage = get_persistent_storage(db_dir) - args = parser.parse_args() -if (args.disassemble): +if (args.v): + if (0 <= args.v < 3): + logging.basicConfig(level=[logging.NOTSET, logging.INFO, logging.DEBUG][args.v]) + +if (args.disassemble or args.graph or args.fire_lasers): if (args.code): encoded_bytecode = args.code @@ -68,7 +82,7 @@ if (args.disassemble): encoded_bytecode = eth.eth_getCode(args.address) except Exception as e: exitWithError("Exception loading bytecode via IPC: " + str(e)) - + else: try: eth = EthJsonRpc(args.rpchost, args.rpcport) @@ -78,24 +92,37 @@ if (args.disassemble): exitWithError("Exception loading bytecode via RPC: " + str(e)) else: - exitWithError("Disassembler: Provide the input bytecode via -c BYTECODE or --id ID") + exitWithError("No input bytecode. Please provide the code via -c BYTECODE or -a address") try: disassembly = Disassembly(encoded_bytecode) - # instruction_list = asm.disassemble(util.safe_decode(encoded_bytecode)) except binascii.Error: exitWithError("Disassembler: Invalid code string.") - easm_text = disassembly.get_easm() + if (args.disassemble): - if (args.outfile): - util.string_to_file(args.outfile, easm_text) - else: + easm_text = disassembly.get_easm() sys.stdout.write(easm_text) - if (args.graph): - generate_callgraph(disassembly, args.graph) + elif (args.graph): + + if (args.disable_physics): + physics = False + else: + physics = True + + html = generate_callgraph(disassembly, physics) + try: + with open(args.graph, "w") as f: + f.write(html) + except Exception as e: + print("Error saving graph: " + str(e)) + + + elif (args.fire_lasers): + + laserfree.fire(disassembly) elif (args.trace): @@ -126,34 +153,36 @@ elif (args.trace): else: print(str(i['pc']) + " " + i['op'] + ";\tSTACK: " + i['stack']) -elif (args.search): +else: + contract_storage = get_persistent_storage(db_dir) + if (args.search): - try: - contract_storage.search(args.search, searchCallback) - except SyntaxError: - exitWithError("Syntax error in search expression.") + try: + contract_storage.search(args.search, searchCallback) + except SyntaxError: + exitWithError("Syntax error in search expression.") -elif (args.xrefs): + elif (args.xrefs): - try: - contract_hash = util.safe_decode(args.xrefs) - except binascii.Error: - exitWithError("Invalid contract hash.") + try: + contract_hash = util.safe_decode(args.xrefs) + except binascii.Error: + exitWithError("Invalid contract hash.") - try: - contract = contract_storage.get_contract_by_hash(contract_hash) - print("\n".join(contract.get_xrefs())) - except KeyError: - exitWithError("Contract not found in the database.") - -elif (args.init_db): - if args.ipc: - contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) - else: - contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) + try: + contract = contract_storage.get_contract_by_hash(contract_hash) + print("\n".join(contract.get_xrefs())) + except KeyError: + exitWithError("Contract not found in the database.") -elif (args.hash): - print("0x" + utils.sha3(args.hash)[:4].hex()) + elif (args.init_db): + if args.ipc: + contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) + else: + contract_storage.initialize(args.rpchost, args.rpcport, args.sync_all, args.ipc) -else: - parser.print_help() + elif (args.hash): + print("0x" + utils.sha3(args.hash)[:4].hex()) + + else: + parser.print_help() diff --git a/mythril/disassembler/callgraph.py b/mythril/disassembler/callgraph.py index ceea5d12..0d851bb3 100644 --- a/mythril/disassembler/callgraph.py +++ b/mythril/disassembler/callgraph.py @@ -1,64 +1,140 @@ -import graphviz as gv - - -styles = { - 'graph': { - 'overlap': 'false', - 'fontsize': '16', - 'fontcolor': 'white', - 'bgcolor': '#333333', - 'concentrate':'true', - }, - 'nodes': { - 'fontname': 'Helvetica', - 'shape': 'box', - 'fontcolor': 'white', - 'color': 'white', - 'style': 'filled', - 'fillcolor': '#006699', - }, - 'edges': { - 'style': 'dashed', - 'dir': 'forward', - 'color': 'white', - 'arrowhead': 'normal', - 'fontname': 'Courier', - 'fontsize': '12', - 'fontcolor': 'white', - } -} - -def apply_styles(graph, styles): - graph.graph_attr.update( - ('graph' in styles and styles['graph']) or {} - ) - graph.node_attr.update( - ('nodes' in styles and styles['nodes']) or {} - ) - graph.edge_attr.update( - ('edges' in styles and styles['edges']) or {} - ) - return graph - - -def generate_callgraph(disassembly, file): - - graph = gv.Graph(format='svg') - - index = 0 - - for block in disassembly.blocks: - easm = block.get_easm().replace("\n", "\l") - - graph.node(str(index), easm) - index += 1 - - for xref in disassembly.xrefs: - - graph.edge(str(xref[0]), str(xref[1])) - - - graph = apply_styles(graph, styles) - - graph.render(file) +from laser.ethereum import svm +from z3 import Z3Exception, simplify +import re + +graph_html = ''' +
+ + + + + + +Mythril / Ethereum LASER Symbolic VM
+Mythril / Ethereum LASER Symbolic VM
+