Ask Reuben
Profiler
Where is my program slow?
What is the profiler?
How do I interpret the profiler results?
The Program Profiler is a very handy tool for the developer that allows them to determine why and where their program is running slowly. It can help you identify bottlenecks in your code and suggests where there is room for potential improvement, and where you should invest your time looking to make it run faster. To execute a program with the profiler is very simple. In Genero Studio you can right-click on an application node and select “Execute with Profiler”, or from the Debug Menu select “Execute with Profiler”, or select the appropriate icon from the toolbar, it is hard to discern but it is a green triangle over the top of a stopwatch. From the command line it is a case of using fglrun with the -p option e.g. Output is written to stdout and in Genero Studio this is captured in the Output panel. The output consists of two sections. The first section is called the Flat Profile. It tells you what functions have been called by the program, how many times they have been called, what percentage of time has been spent in each function, and how that time inside a function is split between the function and its children. To illustrate this, compile and run the following code using the profiler … … you will get output similar to … … what you should hopefully observe with this example is that in the code we would expect there to be a 1:2:3 ratio of time spent in the loop10000(), loop20000(), loop30000() functions and this is roughly borne out by the results where the time spent in each of these functions as indicated by the percentages in the %total column 16.9%, 32.5%, 49.6% is roughly 16.7%, 33.3%, 50.0% plus or minus a couple of percentage points. The second section of output is called the Call Graph. This shows more detail for each individual function, in particular what is calling it and how often and for how long, and in turn what it is calling how often and for how long. If we take our previous program and modify it so it has a few more levels of functions … … run the profiler again and look at the second section of output labelled “Call graph” … … the way to interpret this is that each section relates to an individual function. The line with *** is the name of the function. The lines above the *** are labelled <— and are the functions that are calling this function. The lines below the *** are labelled —> and the functions called from this function. So if we look at the entry for the function loop() … … we see that it was called 3 times, once from each of loop10000(), loop20000(), loop30000(), and that it in turn called the internal function rts_for_init() to initialise a for loop 3 times. The percentages relate to the calls to/from the listed functions. What this can tell is if the function is being called with a bad set of arguments that lead to it taking a long time. In this example, we can see that when loop() is called from loop30000() it is taking approximately 3 times as long to process as it did when called from loop10000() which given the parameters is not surprising. Some things to observe in the results … As the profiler measures time spent in functions and not lines of code, you may get better analysis if you break long functions into several smaller functions so that you get a result for each of these smaller functions. For instance wrapping CONNECT inside a function is a good idea if you want to analyse that time as a database connection often takes some time. Note that the time spent waiting for user input is ignored, i.e. time spent waiting for the user to respond in INPUT, MENU and other dialog statements is ignored. However you should note that time spent waiting for the operating system, database, front-end initialisation, web services is not ignored. So you should be aware of this when analysing results. To mitigate against database caching, one thing I might do is to run the program under test twice in quick succession, or I might add a function at the beginning of the program that selects the data required into dummy variables. The idea being to force the data into the database cache and not have the results clouded by the data being read from disk rather than being read from the database cache. The GUI connection also takes some time. This can be demonstrated by running the profiler with $FGLDIR/demo/demo and exiting the program straight away. The function that takes most of the time is rts_optionsSet which coincides with the OPTIONS INPUT WRAP which is the first bit of UI in that program. This first bit of UI is where the GUI connection and initialisation is occurring. If you add a ui.Interface.refresh() immediately before that and run the profiler again you will see that the ui.Interface.refresh() is now consuming the time as that is now the first bit of UI. What this last point means is that you may find it beneficial to create test programs to test individual functions or processes. These programs can be run with no GUI interface (FGLGUI=0) and the results won’t be clouded with time spent connecting to the GUI. You can read more about the profiler in the documentation. It is a useful tool to analyse where time is being spent by a Genero program and make your programs more efficient. As a final tidbit, run the following in the profiler and guess beforehand how many times is loop_start() and loop_end() called? … … check your answer in the profiler output, loop_start() will be called once, loop_end() will be called 11 times, the number of iterations + 1. What this will tell you is that if the end expression of a loop never changes, why evaluate it on each iteration of a for loop? It maybe beneficial to evaluate the end expression once before the loop. Run the following in the profiler and guess beforehand how many times b() will be called … … answer is 1, now repeat after uncommenting the OPTIONS SHORT CIRCUIT. and you should observe that b() is no longer called. That is, it won’t be evaluated unnecessarily. You can use the profiler like this to justify programming techniques so open your eyes to using it with small test programs and experiments, as well as analysing complete programs.fglrun -p program-name
Flat Profile
MAIN
CALL loop10000()
CALL loop20000()
CALL loop30000()
END MAIN
FUNCTION loop10000()
DEFINE i INTEGER
FOR i = 1 TO 10000
END FOR
END FUNCTION
FUNCTION loop20000()
DEFINE i INTEGER
FOR i = 1 TO 20000
END FOR
END FUNCTION
FUNCTION loop30000()
DEFINE i INTEGER
FOR i = 1 TO 30000
END FOR
END FUNCTION
Flat profile (order by self)
count %total %child %self name
1 49.7 0.1 49.6 askreuben66_part1.loop30000
1 32.5 0.0 32.5 askreuben66_part1.loop20000
1 17.5 0.6 16.9 askreuben66_part1.loop10000
3 0.7 0.0 0.7 <builtin>.rts_forInit
1 99.9 99.6 0.3 askreuben66_part1.main
Call Graph
MAIN
CALL loop10000()
CALL loop20000()
CALL loop30000()
END MAIN
FUNCTION loop10000()
CALL loop(10000)
END FUNCTION
FUNCTION loop20000()
CALL loop(20000)
END FUNCTION
FUNCTION loop30000()
CALL loop(30000)
END FUNCTION
FUNCTION loop(X INTEGER)
DEFINE i INTEGER
FOR i = 1 TO X
END FOR
END FUNCTION
Call graph (order by self)
index %total %self %child calls/of name
15.82 15.50 0.33 1/3 <-- askreuben66_part2.loop10000
33.59 33.56 0.03 1/3 <-- askreuben66_part2.loop20000
50.29 50.29 0.00 1/3 <-- askreuben66_part2.loop30000
[1] 99.70 99.34 0.36 3 *** askreuben66_part2.loop
0.36 0.36 0.00 3/3 --> <builtin>.rts_forInit
-------------------
0.36 0.36 0.00 3/3 <-- askreuben66_part2.loop
[2] 0.36 0.36 0.00 3 *** <builtin>.rts_forInit
-------------------
100.00 0.16 99.84 1/1 <-- <top>
[3] 100.00 0.16 99.84 1 *** askreuben66_part2.main
50.32 0.03 50.29 1/1 --> askreuben66_part2.loop30000
33.59 0.00 33.59 1/1 --> askreuben66_part2.loop20000
15.93 0.10 15.82 1/1 --> askreuben66_part2.loop10000
-------------------
15.93 0.10 15.82 1/1 <-- askreuben66_part2.main
[4] 15.93 0.10 15.82 1 *** askreuben66_part2.loop10000
15.82 15.50 0.33 1/3 --> askreuben66_part2.loop
-------------------
50.32 0.03 50.29 1/1 <-- askreuben66_part2.main
[5] 50.32 0.03 50.29 1 *** askreuben66_part2.loop30000
50.29 50.29 0.00 1/3 --> askreuben66_part2.loop
-------------------
33.59 0.00 33.59 1/1 <-- askreuben66_part2.main
[6] 33.59 0.00 33.59 1 *** askreuben66_part2.loop20000
33.59 33.56 0.03 1/3 --> askreuben66_part2.loop
-------------------
index %total %self %child calls/of name
15.82 15.50 0.33 1/3 <-- askreuben66_part2.loop10000
33.59 33.56 0.03 1/3 <-- askreuben66_part2.loop20000
50.29 50.29 0.00 1/3 <-- askreuben66_part2.loop30000
[1] 99.70 99.34 0.36 3 *** askreuben66_part2.loop
0.36 0.36 0.00 3/3 --> <builtin>.rts_forInit
Analysis
Summary
MAIN
DEFINE i INTEGER
FOR i = loop_start() TO loop_end()
END FOR
END MAIN
FUNCTION loop_start()
RETURN 1
END FUNCTION
FUNCTION loop_end()
RETURN 10
END FUNCTION
--OPTIONS SHORT CIRCUIT
MAIN
IF a() AND b() THEN
DISPLAY "TRUE"
ELSE
DISPLAY "FALSE"
END IF
END MAIN
FUNCTION a()
RETURN FALSE
END FUNCTION
FUNCTION b()
RETURN TRUE
END FUNCTION