bytekit-cli
Stefan Esser's Bytekit is an extension to the PHP interpreter that provides a userspace representation of the bytecode that is generated during the PHP interpreter's compilation phase. The extension not only exports the raw bytecode data but also provides control flow information in the form of code flow graphs and basic blocks. Among other things, Bytekit is developed with the goal of providing a foundation to develop all kinds of static and dynamic code analysis tools.
bytekit-cli is such an analysis tool that I started to develop immediately after Stefan released Bytekit 0.1.0 at the International PHP Conference - Spring Edition in Berlin earlier this year. This posting provides an overview of the functionality implemented so far.
Bytecode Analyser
The bytecode analyser applies rules to the bytecode that is generated for a set of files and reports violations of these rules. A rule is an implementation of the Bytekit_Scanner_Rule class and is selected on the CLI with the --rule <rule>:<options> switch. At the moment only two rules, Bytekit_Scanner_Rule_DisallowedOpcodes and Bytekit_Scanner_Rule_DirectOutput, are implemented.
Searching for (disallowed) opcodes
The DisallowedOpcodes rule allows to scan the bytecode that is generated for a set of files for (disallowed) opcodes. In the following example we are looking for occurences of the EVAL opcode:
sb@ubuntu ~ % bytekit --rule DisallowedOpcodes:EVAL /usr/local/src/phpunit/trunk bytekit-cli 1.0.0 by Sebastian Bergmann. - Disallowed opcode "EVAL" in /usr/local/src/phpunit/trunk/PHPUnit/Extensions/PhptTestCase.php:223 - Disallowed opcode "EVAL" in /usr/local/src/phpunit/trunk/PHPUnit/TextUI/Command.php:177 - Disallowed opcode "EVAL" in /usr/local/src/phpunit/trunk/PHPUnit/Framework/TestCase.php:1158 - Disallowed opcode "EVAL" in /usr/local/src/phpunit/trunk/PHPUnit/Framework/TestCase.php:1059
Searching for direct output of variables
The DirectOutput rule allows to scan the bytecode that is generated for a set of files for ECHO and PRINT opcodes that output variables directly as opposed to outputting the return value of a function or method call:
sb@ubuntu ~ % bytekit --rule DirectOutput /usr/local/src/phpunit/trunk bytekit-cli 1.0.0 by Sebastian Bergmann. - Direct output of variable $message in /usr/local/src/phpunit/trunk/PHPUnit/Extensions/Database/UI/Mediums/Text.php:130 - Direct output of variable $buffer in /usr/local/src/phpunit/trunk/PHPUnit/TextUI/TestRunner.php:468 - Direct output of variable $buffer in /usr/local/src/phpunit/trunk/PHPUnit/Util/Printer.php:173
The bytekit command can also report these violation using the Project Mess Detector (PMD) XML format.
Bytecode Disassembler
When invoked without --rule switches and applied only to a single file, the bytekit command disassembles the bytecode that is generated for a single PHP file and then dumps a textual representation of it.
<?php
for ($i = 0; $i < 1; $i++) {
print '*';
}
?>
Below is the textual representation of the disassembled bytecode that is generated for the PHP code above:
sb@ubuntu examples % bytekit loop.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /usr/local/src/bytekit-cli/examples/loop.php Function: main Number of oplines: 13 Compiled variables: !0 = $i line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 ASSIGN !0, 0 2 IS_SMALLER ~1 !0, 1 3 EXT_STMT 4 JMPZNZ ~1, ->12, ->8 5 POST_INC ~2 !0 6 FREE ~2 7 JMP ->2 3 8 EXT_STMT 9 PRINT ~3 '*' 10 FREE ~3 4 11 JMP ->5 5 12 RETURN 1
Dead Code Elimination
The PHP compiler does not perform any bytecode optimizations by default and the generated bytecode can contain basic blocks that are unreachable, so-called dead code. Have a look at the following example:
<?php
return;
print '*';
?>
sb@ubuntu examples % bytekit dead_code.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /usr/local/src/bytekit-cli/examples/dead_code.php Function: main Number of oplines: 6 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 RETURN null 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 RETURN 1
Of the two basic blocks in the example above, only the first basic block can be executed. The second basic block is unreachable because of the unconditional RETURN at the end of the first basic block.
The bytekit command's disassembler can be instructed to hide these unreachable basic blocks by passing the --eliminate-dead-code switch:
sb@ubuntu examples % bytekit --eliminate-dead-code dead_code.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /usr/local/src/bytekit-cli/examples/dead_code.php Function: main Number of oplines: 2 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 RETURN null
Control Flow Graphs
The bytekit command can leverage the control flow information that is provided by Bytekit to generate a control flow graph. This functionality is invoked through the --graph switch that takes a directory as its argument. Graphs will be written to this directory for each function found in the given PHP source file. The output format for the graph files can be specified with the --format switch. To generate images (in SVG or PNG format, for instance), the GraphViz tool needs to be installed.
sb@ubuntu examples % bytekit --graph /tmp --format png loop.php bytekit-cli 1.0.0 by Sebastian Bergmann. Wrote "/tmp/main.png".
Click on the thumbnail below to see the control flow graph for the loop example from above.
0 Comments to "bytekit-cli"
0 Trackbacks to "bytekit-cli"
Add Comment