Where you at?

Now that the propagation of safety-types seems to go smoothly it was time to dive into another subject: accessing the location of terms. In this case, the location of a term is defined as the location of the text in the original file that was parsed to that term. Since the strategy to annotate the AST with position-info is available in the standard libraries nowadays, it should be easy to access these locations and finally solve PSAT-91 right? Lets find out!

The first thing I did was to add a separate module to handle the low-level stuff of getting the location annotations. This module contains several getter-strategies that can retrieve, for example, the number of the start-line. The location info is captured in six different numbers: start-line, end-line, start-column, end-column, offset and length. A getter-strategy is available for all of them. Furthermore, the name of the file in which the term is defined can be retrieved.

Although these getter-strategies are useful, they are not meant to be called directly. I figured that the most common use of these functions would be reporting the values in some kind of (formatted) message. In order to capture this kind of behavior the strategy format-location-string(|message) is defined. This strategy takes a message with holes in the form of [STARTLINE] as parameter and fills these holes with values from the current term. A rather useful strategy if I say so myself.

To practice with this new piece of functionality I have added an extra option to the tool input-vector of the php-tools-project. This option allows the user to choose between the normal list, or the same list with line-numbers printed for each access. More information about this option and how to add an option yourself can be found here.

After this was done I moved to php-sat to make the output more concise. It was actually pretty easy to implement. The algorithm is nothing more then get-terms-with-annotations, make-nice-output. I actually spend more time on creating a test-setup for calling php-sat through a shell-script then on generating the more concise format. The only problem was that the adding of position-info everywhere interfered with the dynamic-rules. A few well-places rm-annotations where needed to fix this. Please let me know if you like the new output, or whether something should be added.

The next applications of the location info is the tracking of where untainted data enters an application. When a function is called with a parameter $foo which is tainted, it would be nice to show when it was tainted. I think this is not too difficult to add, but bugs always seem to lurk in 'I-think-it-is-easy-to-add'-features.

A last remark about locations is a small problem without an actual solution. Eventually php-sat must support function-calls. The algorithm to analyze function-calls is not complicated, but how can bug-patterns within a function be reported? A message before each call to this function? Within the file in which the function is defined? And what about cases in which one call is flagged and the other one isn't? And can we also handle object-creation in the same way? I haven't figured out how to handle this, so if you have any ideas please let me know.

No comments: