November 2006 B
High Productivity Computing Systems and the Path Towards Usable Petascale Computing
Andrew Funk, MIT Lincoln Laboratory
John R. Gilbert, UC Santa Barbara
David Mizell, Cray Inc.
Viral Shah, UC Santa Barbara

6. Tools

The amount of data that needs to be analyzed to produce models of programmer workflows is quite large. We are developing automated tools for visualization, modelling, and simulation of TMMs to facilitate the kind of analysis described in earlier sections.

6.1 A tool for automatic TMM generation from collected data

There are two main types of data that are being collected in the experiments. Physical activities such as code edits, compiles, and executions are automatically captured by the instrumented development environment. During development, in some experiments, the students are also asked to record the time they spend performing logical activities such as thinking, serial coding, parallel coding, and testing. It is these logical activities that we use to create TMMs of the workflows. Alternatively, physical activities can be mapped to logical activities using a set of heuristics.

Figure 6

Figure 6. Mapping logical activities to a TMM.

Whether the logical activities come from student logs11 or heuristic mapping,9 12 the end result is a list of activities and associated effort (measured in hours), as shown in Figure 6. We have created a Python program that parses this list of activities for each student and counts the transitions and dwell times for each activity. In the example shown, the student starts in the planning stage and then transitions to serial coding. This is represented in the transition matrix as T12 = 1. Consecutive entries for the same activity are combined. Thus in the dwell time matrix, the amount of time spent in the planning state before transitioning to the serial coding state is represented as D12 = 1 + 3 = 4. These transitions and dwell times can be aggregated across students and similar assignments to create a larger sample for analysis.

We calculate the probability for each state transition from the transition matrix as:


Similarly, the average dwell time for each transition is calculated as:


Once the transition probabilities and dwell times have been computed, the next step is to generate a graph description that can be used to visualize the TMM. Our initial choice for visualization was the Graphviz tool, which uses the DOT language for graph description. Figure 7 shows the student workflow from Figure 6 visualized as a TMM using Graphviz. Using Graphviz we have created a graphical browser for rapid visualization of multiple data sets (see Figure 8).

Figure 7

Figure 7. TMM visualization with Graphviz.

Figure 8

Figure 8. TMM visualization GUI.

Pages: 1 2 3 4 5 6 7 8

Reference this article
"Modelling Programmer Workflows with Timed Markov Models ," CTWatch Quarterly, Volume 2, Number 4B, November 2006 B. http://www.ctwatch.org/quarterly/articles/2006/11/modelling-programmer-workflows-with-timed-markov-models/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.