Ask Reuben

Profiler

Where is my program slow?

What is the profiler?

How do I interpret the profiler results?

The Program Profiler is a very handy tool for the developer that allows them to determine why and where their program is running slowly.  It can help you identify bottlenecks in your code and suggests where there is room for potential improvement, and where you should invest your time looking to make it run faster.

To execute a program with the profiler is very simple.  In Genero Studio you can right-click on an application node and select “Execute with Profiler”, or from the Debug Menu select “Execute with Profiler”, or select the appropriate icon from the toolbar, it is hard to discern but it is a green triangle over the top of a stopwatch.  From the command line it is a case of using fglrun with the -p option e.g.

fglrun -p program-name

Output is written to stdout and in Genero Studio this is captured in the Output panel.  The output consists of two sections.

Flat Profile

The first section is called the Flat Profile.  It tells you what functions have been called by the program, how many times they have been called, what percentage of time has been spent in each function, and how that time inside a function is split between the function and its children.

To illustrate this, compile and run the following code using the profiler …

MAIN
    CALL loop10000()
    CALL loop20000()
    CALL loop30000()
END MAIN

FUNCTION loop10000()
DEFINE i INTEGER
    FOR i = 1 TO 10000
    END FOR
END FUNCTION

FUNCTION loop20000()
DEFINE i INTEGER
    FOR i = 1 TO 20000
    END FOR
END FUNCTION

FUNCTION loop30000()
DEFINE i INTEGER
    FOR i = 1 TO 30000
    END FOR
END FUNCTION

… you will get output similar to …

Flat profile (order by self) 
   count  %total  %child   %self  name
       1    49.7     0.1    49.6  askreuben66_part1.loop30000
       1    32.5     0.0    32.5  askreuben66_part1.loop20000
       1    17.5     0.6    16.9  askreuben66_part1.loop10000
       3     0.7     0.0     0.7  <builtin>.rts_forInit
       1    99.9    99.6     0.3  askreuben66_part1.main

… what you should hopefully observe with this example is that in the code we would expect there to be a 1:2:3 ratio of time spent in the loop10000(), loop20000(), loop30000() functions and this is roughly borne out by the results where the time spent in each of these functions as indicated by the percentages in the %total column  16.9%, 32.5%, 49.6% is roughly 16.7%, 33.3%, 50.0% plus or minus a couple of percentage points.

Call Graph

The second section of output is called the Call Graph.  This shows more detail for each individual function, in particular what is calling it and how often and for how long, and in turn what it is calling how often and for how long.

If we take our previous program and modify it so it has a few more levels of functions …

MAIN
    CALL loop10000()
    CALL loop20000()
    CALL loop30000()
END MAIN

FUNCTION loop10000()
    CALL loop(10000)
END FUNCTION

FUNCTION loop20000()
    CALL loop(20000)
END FUNCTION

FUNCTION loop30000()
    CALL loop(30000)
END FUNCTION

FUNCTION loop(X INTEGER)
DEFINE i INTEGER
    FOR i = 1 TO X
    END FOR 
END FUNCTION

… run the profiler again and look at the second section of output labelled “Call graph” …

Call graph (order by self)

index	  %total   %self  %child  calls/of	 name
	   15.82   15.50    0.33      1/3        <-- askreuben66_part2.loop10000
	   33.59   33.56    0.03      1/3        <-- askreuben66_part2.loop20000
	   50.29   50.29    0.00      1/3        <-- askreuben66_part2.loop30000
[1]	   99.70   99.34    0.36	3        *** askreuben66_part2.loop
	    0.36    0.36    0.00      3/3        --> <builtin>.rts_forInit
-------------------
	    0.36    0.36    0.00      3/3        <-- askreuben66_part2.loop
[2]	    0.36    0.36    0.00	3        *** <builtin>.rts_forInit
-------------------
	  100.00    0.16   99.84      1/1        <-- <top>
[3]	  100.00    0.16   99.84	1        *** askreuben66_part2.main
	   50.32    0.03   50.29      1/1        --> askreuben66_part2.loop30000
	   33.59    0.00   33.59      1/1        --> askreuben66_part2.loop20000
	   15.93    0.10   15.82      1/1        --> askreuben66_part2.loop10000
-------------------
	   15.93    0.10   15.82      1/1        <-- askreuben66_part2.main
[4]	   15.93    0.10   15.82	1        *** askreuben66_part2.loop10000
	   15.82   15.50    0.33      1/3        --> askreuben66_part2.loop
-------------------
	   50.32    0.03   50.29      1/1        <-- askreuben66_part2.main
[5]	   50.32    0.03   50.29	1        *** askreuben66_part2.loop30000
	   50.29   50.29    0.00      1/3        --> askreuben66_part2.loop
-------------------
	   33.59    0.00   33.59      1/1        <-- askreuben66_part2.main
[6]	   33.59    0.00   33.59	1        *** askreuben66_part2.loop20000
	   33.59   33.56    0.03      1/3        --> askreuben66_part2.loop
-------------------

… the way to interpret this is that each section relates to an individual function.

The line with *** is the name of the function.

The lines above the *** are labelled <— and are the functions that are calling this function.

The lines below the *** are labelled —> and the functions called from this function.

So if we look at the entry for the function loop() …

index	  %total   %self  %child  calls/of	 name
	   15.82   15.50    0.33      1/3        <-- askreuben66_part2.loop10000
	   33.59   33.56    0.03      1/3        <-- askreuben66_part2.loop20000
	   50.29   50.29    0.00      1/3        <-- askreuben66_part2.loop30000
[1]	   99.70   99.34    0.36	3        *** askreuben66_part2.loop
	    0.36    0.36    0.00      3/3        --> <builtin>.rts_forInit

… we see that it was called 3 times, once from each of loop10000(), loop20000(), loop30000(), and that it in turn called the internal function rts_for_init() to initialise a for loop 3 times.  The percentages relate to the calls to/from the listed functions.

What this can tell is if the function is being called with a bad set of arguments that lead to it taking a long time.  In this example, we can see that when loop() is called from loop30000() it is taking approximately 3 times as long to process as it did when called from loop10000() which given the parameters is not surprising.

Analysis

Some things to observe in the results …

  • the results for MAIN block are shown by the entry “main” (deep down FUNCTION main() = MAIN)
  • names beginning < builtin > are internal functions and won’t be a FUNCTION in the code
  • time spent waiting for user input is ignored
  • the percentages are a percentage of total time.  If you improve the results for an individual function, its total percentage will decrease and you would expect totals for other functions to increase
  • the results are sorted by value in the %self column, hence any potential improvement will likely be found in the results at the top of each section and you can invest your time there
  • the %self column should add upto 100% in the Flat Profile
  • there will be a little bit of variation and it is unlikely you will get the same results twice

As the profiler measures time spent in functions and not lines of code, you may get better analysis if you break long functions into several smaller functions so that you get a result for each of these smaller functions.  For instance wrapping CONNECT inside a function is a good idea if you want to analyse that time as a database connection often takes some time.

Note that the time spent waiting for user input is ignored, i.e. time spent waiting for the user to respond in INPUT, MENU and other dialog statements is ignored.  However you should note that time spent waiting for the operating system, database, front-end initialisation, web services is not ignored.  So you should be aware of this when analysing results.

To mitigate against database caching, one thing I might do is to run the program under test twice in quick succession, or I might add a function at the beginning of the program that selects the data required into dummy variables.  The idea being to force the data into the database cache and not have the results clouded by the data being read from disk rather than being read from the database cache.

The GUI connection also takes some time.  This can be demonstrated by running the profiler with  $FGLDIR/demo/demo and exiting the program straight away.  The function that takes most of the time is rts_optionsSet which coincides with the OPTIONS INPUT WRAP which is the first bit of UI in that program.  This first bit of UI is where the GUI connection and initialisation is occurring.  If you add a ui.Interface.refresh() immediately before that and run the profiler again you will see that the ui.Interface.refresh() is now consuming the time as that is now the first bit of UI.

What this last point means is that you may find it beneficial to create test programs to test individual functions or processes.  These programs can be run with no GUI interface (FGLGUI=0) and the results won’t be clouded with time spent connecting to the GUI.

Summary

You can read more about the profiler in the documentation.  It is a useful tool to analyse where time is being spent by a Genero program and make your programs more efficient.

As a final tidbit, run the following in the profiler and guess beforehand how many times is loop_start() and loop_end() called? …

MAIN
    DEFINE i INTEGER
    FOR i = loop_start() TO loop_end()
    END FOR
END MAIN

FUNCTION loop_start()
    RETURN 1
END FUNCTION

FUNCTION loop_end()
    RETURN 10
END FUNCTION

… check your answer in the profiler output, loop_start() will be called once, loop_end() will be called 11 times, the number of iterations + 1.  What this will tell you is that if the end expression of a loop never changes, why evaluate it on each iteration of a for loop?  It maybe beneficial to evaluate the end expression once before the loop.

Run the following in the profiler and guess beforehand how many times b() will be called …

--OPTIONS SHORT CIRCUIT

MAIN
    IF a() AND b() THEN
        DISPLAY "TRUE"
    ELSE
        DISPLAY "FALSE"
    END IF
END MAIN

FUNCTION a()
    RETURN FALSE
END FUNCTION

FUNCTION b()
    RETURN TRUE
END FUNCTION

… answer is 1, now repeat after uncommenting the OPTIONS SHORT CIRCUIT. and you should observe that b() is no longer called.  That is, it won’t be evaluated unnecessarily.

You can use the profiler like this to justify programming techniques so open your eyes to using it with small test programs and experiments, as well as analysing complete programs.