Ask Reuben
ORDER EXTERNAL BY
Why do I need ORDER EXTERNAL BY ?
What does the EXTERNAL mean in ORDER EXTERNAL BY ?
As developers get older, the next generation can face situations where the code they are looking at, they ask themselves why is that code there? They have never faced the pain of getting the code wrong or omitting a keyword or line where their predecessors learnt the hard way.
One such expression is ORDER EXTERNAL BY that you will come across in REPORT code. Whilst the meaning of ORDER BY is self-explanatory, the keyword EXTERNAL is often what is important and can lead to a performance difference, both in terms of the speed with which the report is generated and stability of the system.
First of all, when processing reports if you have BEFORE GROUP and AFTER GROUP then it is important that the data being fed to the report is ordered. This is so that the BEFORE GROUP and AFTER GROUP are triggered when you expect them to be. If the data is not ordered, then it will appear that the BEFORE GROUP and AFTER GROUP are triggered randomly or on every row, when what is happening is that they are being triggered each time the grouping value changes in the unorderd data. So if the unordered data had values of 1,3,2,3,2,2,1,3,3,2, then the BEFORE GROUP and AFTER GROUP will be triggered nearly every row. The only time they would not be triggered was when the value was the same as the previous row, the 2,2 and 3,3 in that sample. By ordering the data, then 1,1,2,2,2,2,3,3,3,3 will be received and the BEFORE GROUP and AFTER GORUP will now only be triggered 3 times, once each for 1,2, and 3.
If you add an ORDER BY without the EXTERNAL keyword then this tells the REPORT routine that it needs to take responsibility for sorting the data. Whilst buried as in implementation detail, the mechanism is to create a temp table in the current database, insert into that table a row for every row of report data, and then selecting from that temp table with an order by clause. This takes time and database memory.
By using the EXTERNAL keyword so that it is ORDER EXTERNAL BY, then this tells the REPORT routine that the data for the report is already sorted and that it does not need to do its own sort. Internally it does not need to create a temp table, populate it, and then select from it.
Assuming that the program that feeds the data for the report has its own SQL statement with an ORDER BY, the performance gain by using ORDER EXTERNAL BY is that the data is only sorted once. Without the EXTERNAL keyword the data is sorted twice, the second time unnecessarily.
You may find reference to a term one or two pass report. A report without use of EXTERNAL keyword is a two pass report as it sorts the data twice. A one-pass report as it has ORDER EXTERNAL BY only sorts the data once.
In summary …
- if a REPORT has BEFORE GROUP, AFTER GROUP then its data should be sorted
- the 4gl program should take responsibility for the sorting of the data before the REPORT clause, typically by having an ORDER BY clause in its SQL statement(s)
- the 4gl program can signify that the data is already sorted by using ORDER EXTERNAL BY in the REPORT routine.
If I don’t see ORDER EXTERNAL BY in a REPORT routine then it arouses suspicion.