bytekit-cli

Sebastian Bergmann » 30 June 2009 » in Articles » 0 Comments

Stefan Esser's Bytekit is an extension to the PHP interpreter that provides a userspace representation of the bytecode that is generated during the PHP interpreter's compilation phase. The extension not only exports the raw bytecode data but also provides control flow information in the form of code flow graphs and basic blocks. Among other things, Bytekit is developed with the goal of providing a foundation to develop all kinds of static and dynamic code analysis tools.

bytekit-cli is such an analysis tool that I started to develop immediately after Stefan released Bytekit 0.1.0 at the International PHP Conference - Spring Edition in Berlin earlier this year. This posting provides an overview of the functionality implemented so far.

Bytecode Analyser

The bytecode analyser applies rules to the bytecode that is generated for a set of files and reports violations of these rules. A rule is an implementation of the Bytekit_Scanner_Rule class and is selected on the CLI with the --rule <rule>:<options> switch. At the moment only two rules, Bytekit_Scanner_Rule_DisallowedOpcodes and Bytekit_Scanner_Rule_DirectOutput, are implemented.

Searching for (disallowed) opcodes

The DisallowedOpcodes rule allows to scan the bytecode that is generated for a set of files for (disallowed) opcodes. In the following example we are looking for occurences of the EVAL opcode:

sb@ubuntu ~ % bytekit --rule DisallowedOpcodes:EVAL /usr/local/src/phpunit/trunk
bytekit-cli 1.0.0 by Sebastian Bergmann.

  - Disallowed opcode "EVAL"
    in /usr/local/src/phpunit/trunk/PHPUnit/Extensions/PhptTestCase.php:223

  - Disallowed opcode "EVAL"
    in /usr/local/src/phpunit/trunk/PHPUnit/TextUI/Command.php:177

  - Disallowed opcode "EVAL"
    in /usr/local/src/phpunit/trunk/PHPUnit/Framework/TestCase.php:1158

  - Disallowed opcode "EVAL"
    in /usr/local/src/phpunit/trunk/PHPUnit/Framework/TestCase.php:1059

Searching for direct output of variables

The DirectOutput rule allows to scan the bytecode that is generated for a set of files for ECHO and PRINT opcodes that output variables directly as opposed to outputting the return value of a function or method call:

sb@ubuntu ~ % bytekit --rule DirectOutput /usr/local/src/phpunit/trunk
bytekit-cli 1.0.0 by Sebastian Bergmann.

  - Direct output of variable $message
    in /usr/local/src/phpunit/trunk/PHPUnit/Extensions/Database/UI/Mediums/Text.php:130

  - Direct output of variable $buffer
    in /usr/local/src/phpunit/trunk/PHPUnit/TextUI/TestRunner.php:468

  - Direct output of variable $buffer
    in /usr/local/src/phpunit/trunk/PHPUnit/Util/Printer.php:173

The bytekit command can also report these violation using the Project Mess Detector (PMD) XML format.

Bytecode Disassembler

When invoked without --rule switches and applied only to a single file, the bytekit command disassembles the bytecode that is generated for a single PHP file and then dumps a textual representation of it.

<?php
for ($i = 0; $i < 1; $i++) {
    print '*';
}
?>

Below is the textual representation of the disassembled bytecode that is generated for the PHP code above:

sb@ubuntu examples % bytekit loop.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename:           /usr/local/src/bytekit-cli/examples/loop.php
Function:           main
Number of oplines:  13
Compiled variables: !0 = $i

  line  #     opcode                           result  operands
  -----------------------------------------------------------------------------
  2     0     EXT_STMT                                 
        1     ASSIGN                                   !0, 0

        2     IS_SMALLER                       ~1      !0, 1
        3     EXT_STMT                                 
        4     JMPZNZ                                   ~1, ->12, ->8

        5     POST_INC                         ~2      !0
        6     FREE                                     ~2
        7     JMP                                      ->2

  3     8     EXT_STMT                                 
        9     PRINT                            ~3      '*'
        10    FREE                                     ~3
  4     11    JMP                                      ->5

  5     12    RETURN                                   1

Dead Code Elimination

The PHP compiler does not perform any bytecode optimizations by default and the generated bytecode can contain basic blocks that are unreachable, so-called dead code. Have a look at the following example:

<?php
return;
print '*';
?>
sb@ubuntu examples % bytekit dead_code.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename:           /usr/local/src/bytekit-cli/examples/dead_code.php
Function:           main
Number of oplines:  6

  line  #     opcode                           result  operands
  -----------------------------------------------------------------------------
  2     0     EXT_STMT                                 
        1     RETURN                                   null

  3     2     EXT_STMT                                 
        3     PRINT                            ~0      '*'
        4     FREE                                     ~0
  4     5     RETURN                                   1

Of the two basic blocks in the example above, only the first basic block can be executed. The second basic block is unreachable because of the unconditional RETURN at the end of the first basic block.

The bytekit command's disassembler can be instructed to hide these unreachable basic blocks by passing the --eliminate-dead-code switch:

sb@ubuntu examples % bytekit --eliminate-dead-code dead_code.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename:           /usr/local/src/bytekit-cli/examples/dead_code.php
Function:           main
Number of oplines:  2

  line  #     opcode                           result  operands
  -----------------------------------------------------------------------------
  2     0     EXT_STMT                                 
        1     RETURN                                   null

Control Flow Graphs

The bytekit command can leverage the control flow information that is provided by Bytekit to generate a control flow graph. This functionality is invoked through the --graph switch that takes a directory as its argument. Graphs will be written to this directory for each function found in the given PHP source file. The output format for the graph files can be specified with the --format switch. To generate images (in SVG or PNG format, for instance), the GraphViz tool needs to be installed.

sb@ubuntu examples % bytekit --graph /tmp --format png loop.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Wrote "/tmp/main.png".

Click on the thumbnail below to see the control flow graph for the loop example from above.

Defined tags for this entry: , ,

Trackback specific URI for this entry

0 Comments to "bytekit-cli"

Display comments as (Linear | Threaded)
  1. No comments

0 Trackbacks to "bytekit-cli"

  1. No Trackbacks

Add Comment


To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Submitted comments will be subject to moderation before being displayed.