When Things Go Wrong (part 1 of 3) – Who, What, When
My program has stopped, how can I tell what has gone wrong?
How can I handle errors gracefully whilst at the same time capturing as much information as possible?
This next series of Ask-Reuben’s will look at what happens when something goes wrong. How you can make sure you can capture as much as information as possible, whilst at the same time making your application look as professional as possible whilst internally something inside is not working as expected.
In following Ask-Reuben’s I will cover the ERRORLOG and EXCEPTIONS, but today I will start with a word you won’t find in our documentation or a dictionary, and that word is WhoWhatWhen. It relates to capturing “who has run what program when”.
Hopefully you have the concept of a library function(s) that is called every time a program starts and finishes so that you have a place to write some code that is executed every time a program starts and finishes. That is your code has the following concept …
MAIN ... CALL initialise() -- library function called as soon as practical after MAIN ... # the body of your program ... CALL tidy_exit() -- library function called as close as practical to END MAIN or EXIT PROGRAM ... END MAIN
… normally that library function will contain an initial database connection as well as more mundane code such as your run-time OPTIONS and other initialisations. If you don’t have such a library function then I would recommend you do.
The technique is that inside this initial library function, you INSERT a row into a database table that records who is starting the program, what program is starting, and when they have started the program. I have named this database table whowhatwhen, you can use any name that matches your convention and add any other additional columns to record other useful information e.g. run-time arguments. If using GAS I would add a column to record the session id found from FGL_GETENV(“FGL_VMPROXY_SESSION_ID”). This database table should contain a SERIAL column. When your program finishes, then you UPDATE that row using a WHERE clause on the serial column with the program finish time.
The values for what and when should be obvious, sometimes a little bit of thought goes into the who value. In the days of Text User Interface or GDC with Direct Connection, this would typically be the result of FGL_GETENV(“LOGNAME”), but when running via the Genero Application Server you would be looking at the application user which is probably stored in a 4gl variable.
You should make the observation that if the program finish time is NULL, then the program is running, and that if the program finish time is NOT NULL then the program has terminated successfully. But what if the program crashes and the library routine to update the finish time is not executed? If program finish time is NULL this can have two meanings, the program is running, or the program has crashed.
Hence there is a little bit more information and processing you need to add. Add a column to the database table that records the state of a program. This might be an INTEGER that has the following meanings. 0-program running, 1-program exited successfully, and one or more additional values indicating different types of an exit when things have gone wrong.
There are two bits of processing that we can then add. If we can execute some code as the program crashes then we can update this new state field to reflect the fact that program has crashed in a manner we know about. I will cover how this can be called in two weeks time when I cover exception handling. We should also have a process that is run periodically that will verify that for programs that the table says are running (i.e state=0), that they are indeed still running, and if it can’t find any trace of them, update the table to change this state column to reflect the fact that the program is not running and has stopped unexpectedly. (Some of you maybe aware that the user based licensing system does something similar to free up licenses from programs that have crashed). In this example I record the process id that is being used by a running program. The validation test is to then see if this process id is being used by an fglrun process and if it isn’t we know that the fglrun process has stopped.
The database and library functions will look something like …
|who||CHAR(20)||Name of the person that is running a program.|
|what||CHAR(20)||Name of the program that is being run|
|when||DATETIME YEAR TO FRACTION(2)||What time did the program start|
|end||DATETIME YEAR TO FRACTION(2)||What time did the program finish|
|state||SMALLINT||A flag to indicate if program is running (0), has exited normally (1), has exited gracefully via exception handling (2), did not exit gracefully (3).|
|pid||INTEGER||The Process ID of the running program (If multiple servers, you will need to add a column for server name)|
DEFINE m_whowhatwhen_id INTEGER -- this function should be called every time a program starts FUNCTION initialise() ... # after a connection to a database has been made, and we know the name of the user # write a row to the whowhatwhen table to indicate that a program has been started LET l_who = ... # get the logged in user name LET l_what = base.Application.getProgramName() LET l_pid = FGL_GETPID() INSERT INTO whowhatwhen(id, who, what, when, state, pid, ...) VALUES(0, l_who, l_what, CURRENT, 0, l_pid ...) LET m_whowhatwhen_id = SQLCA.SQLERDD ... END FUNCTION -- this function should be called every time a program exits normally FUNCTION tidy_exit() ... UPDATE whowhatwhen SET state = 1, end = CURRENT WHERE id = m_whowhatwhen_id ... END FUNCTION -- this function should be called every time a program exits due to an exception FUNCTION untidy_exit() ... UPDATE whowhatwhen SET state = 2, end = CURRENT WHERE id = m_whowhatwhen_id ... END FUNCTION -- this function should be run periodically FUNCTION clear_crashes() DECLARE CURSOR running_programs_curs FOR SELECT id, pid FROM whowhatwhen WHERE state = 0 FOREACH running_programs_curs INTO l_id, l_pid LET ok = ... # test to see if that particular process id (l_pid) is in use IF NOT ok THEN -- Something has happened to the program and it is no longer running -- change the state to reflect that UPDATE whowhatwhen SET state = 3, end = CURRENT WHERE id = l_id END IF END FOREACH END FUNCTION
This database table gives you an audit of program usage. You can create enquiry screens or reports that makes use of the data in the whowhatwhen table to tell you …
What programs is a user currently running?
SELECT what FROM whowhatwhen WHERE who = ? AND state = 0
What programs has a user run?
SELECT what FROM whowhatwhen WHERE who = ?
What is the process id of programs that are currently running?
SELECT pid FROM whowhatwhen WHERE state = 0
Who was running a program at a certain time?
SELECT who, what FROM whowhatwhen WHERE ((when <= ? AND end >=?) OR (when <=? AND end IS NULL)
Who runs a certain program?
SELECT who FROM whowhatwhen WHERE what = ?
What is the most popular program?
SELECT what, COUNT(*) FROM whowhatwhen GROUP BY 1 ORDER BY 2 DESC
What programs crashed today?
SELECT * FROM whowhatwhen WHERE state > 1 AND when >= ?
Hopefully you can see that by capturing this data, you have information that allows you to better maintain and administer your system. For projects I have been involved in, this is good information to have in determining
- what programs are important - what programs do we prioritise or test more.
- what programs are not used - can we stop maintaining that code
- what time the system is in use - when can we schedule tasks
- who is running particular programs - who can we ask about a program
- who ran a program at a certain time - did they notice anything unusual
For us in this topic of what happens when something has gone wrong, we have ...
- unreported instances of something going wrong - has this happened before and not been reported
- what else was running at the time something went wrong
- for a program that crashed who started it
- when a program crashed when was it started
Some of this information can be derived through the GAS logs, but that does not include the concept of "who" ran the application which only you can supply. By putting into a database table, it can also be extracted quickly via SQL statements.
Next week I will cover the ERRORLOG file which is a file where things are and can be written to when things go wrong, and then the week after I will look at something called exception handling which can be used to present a nice user interface when things go wrong as well as writing to the errorlog and updating this whowhatwhen table.