Thinking of Merging

I have been dealing with some challenges in the last couple of days. For example my practical assignment for DBA, to be implemented in MIL, and styling the GUI for my thesis, which is coming along just fine I guess. Both of these tasks are easy for people who invented/work with it every day, but I sometimes find it hard to wrap my mind around the problem or keeping track of every detail. Good luck for me there exists a topic where I can work on without repeatedly searching on Google, analyzing PHP!

Speaking of analyzing PHP, Martin has written a blog about the generation of rules for operator-precedence in PHP. He mentions some interesting ideas for work on grammar engineering, so if you are looking for a (thesis)-project you should definitely check it out. Otherwise it is just an interesting post to read.

As for my work in operators in PHP, I have finished the rest of the operators in PHP-Sat. Furthermore, the implementation of the constant-propagation regarding operators is revised within PHP-Front. This is done because I am thinking about merging the constant-propagation and the safety-type analysis into one big analysis. I know, it sounds like premature optimization (a.k.a. the root of all evil), but I can explain why it is necessary.

Consider the following piece of code:
  $foo = array(1,2);
  $foo[] = $_GET['bar'];
  echo $foo[2];
When we consider the constant-propagation we first assign the values 1 and 2 to the first two indexes of the array. The value of $_GET['foo'] is then assigned to the third index of the array which is the parameter to echo in the last statement. We know that the value is assigned to the third index because PHP-Front keeps track of the internal index-count of arrays.
Now lets look at the safety-type analysis. We first assign the safety-type IntegerType to the first two indexes of the array. The safety-type of $_GET['foo'] is then assigned to the third index of the array which is the parameter to echo in the last statement. We know that the safety-type is assigned to the third index because PHP-Sat keeps track of the internal index-count of arrays.

You might have noticed that both paragraphs are almost identical, except for the kind of value that is assigned. Thinking about this case, cases for function calls and cases for objects it turns out that performing the safety-analysis involves a lot of bookkeeping. This bookkeeping, for example the internal index-count, is not specific for the propagated values, it encodes the internal semantics of PHP. Therefore, in order to provide a good safety-type analysis, we have to embed this bookkeeping.

In order to avoid code duplication, which might be worse then premature optimization, I believe we must merge the two analyzes together. By separating the assignment of values from the bookkeeping we can visit a node, perform the bookkeeping and then perform as much analyzes on the node as we want. The only thing needed is a list of strategies that encodes a single analysis on the node.

The idea might sound a bit vague, but while I was working on the operators I already saw some duplication creeping in. Some of the strategies for both analysis only differ on the strategies to fetch or add some value, a perfect change to generalize a bit I would say.

No comments: