Ask Reuben

Large Scale Code Changes

How can I use a new Genero feature in all my Genero applications?

How do I transform a lot of my code at once?

How can I apply a change to all my sources?

With a number of Genero initiatives, a common message from our community is that they would like to use some of our new syntax and functionality (some of which I covered in last weeks article) but they can’t as it would take too much effort. That is they have many hundreds of Genero application, many thousands of Genero files, and many hundreds of thousands or millions of lines of code and for a developer to manually make these changes is a lot of effort and perceived risk.

What you can often do is identify and automate these code changes, but in order to do that your development environment should have a number of properties in place. If these are in place, then you will find that you can undertake some large scale code changes with confidence.

Before I look at these properties, I will first clarify what I mean by large scale code changes. It can refer to any code transformation, where the number of files touched, OR the number of lines of code touched in a file can be large, AND there is an element of repetition about the code transformation. For example,

changing your code standard from using DATABASE to SCHEMA at the top of a file involves only changing one line of code in each file, but if you are consistent then you would apply that to every .4gl, .per, .4fd in your code base. Everyone of these files would need to be checked out from source control, have the change applied, and then checked back into source control. Each Genero application would then need to be recompiled and have its QA tests run. So although the change itself is quite small, you would be looking for ways to script that change rather than hand coding it.
a task such as removing TTY attributes or removing unnecessary DISPLAY statements after moving to UNBUFFERED, not only are you touching every .4gl but you are touching potentially a large number of lines of code in the .4gl

These tasks differ from a normal day to day developers task which will typically be related to a single or a low number of Genero applications. With these large scale code changes, there is normally a level of automation where instead of hand coding the change, the change can be scripted, and you can rely on your automated build and QA tests to verify correctness rather than building and running programs individually.

The properties of a development environment that I believe can facilitate large scale code changes include …

Revertable

Any code change made should be able to be undone. If you make a large scale code change, you should be able to undo the change and put the code back how it was.

This is necessary if you decide that the change is a mistake, or more likely you decide to refine the change. So you put the code back how it was and try again.

Diffable

Any code change made, you should be able to view in isolation. In principal this means you should be able to do a diff between the old and the new version of the code and identify …

what lines of code have been added
what lines of code have been modified
what lines of code have been deleted

Ideally you should be able to choose to ignore white space differences. That is if a line of code has been indented you should be able to choose to ignore that difference.

If any code changes are automated, then you can do a diff on the code change to manually verify that the change is as you expect.

Blameable

Any code change made, you should be able to identify the developer responsible and what task it is they were working on.

If you can do this, then you can identify if a change you made as part of a large scale code change is responsible for a change in your code.

VERSION CONTROL

The above three properties imply the use of a Version Control system.

Testable

Every Genero application in your system should have a series of tests that can be executed on that application. You should not start work on that application unless those tests can be executed and passed. This will allow you to determine if any of your changes have broken the program.

You do not want to be in a position where after some large scale code changes, you are trying to find out why a program is not working, and it turns out the program did not work before you made the changes.

These tests should be of good quality too. My favourite story of bad testing involved a report program that passed a test when its only page of output was something like … “No data meets the selected criteria”. The program did not crash so the tester viewed the test as a pass. A good test for a report would generate many pages of data to test page throws, headers, footers, column widths etc

With Genero Ghost Client you can generate the tests. You can then decide how often and when to run these tests.

All

It should be possible to work on your entire code base in one step. This typically means that you can write simple scripts that will …

check out your entire source code.
compile every program.
run tests against every program.

if that is in place, you can then have scripts make changes to your entire code base in one act and then compile and run tests to make sure you have not broken anything,

One question I like to ask is what is your development server doing between when the last worker leaves the office and when the first worker arrives in the morning? Rather than sitting idle, it can be recompiling your application and running tests against your code base and application.

If the All property is in place then it is also possible to count and measure progress. If there is some manual work involved, then you will be able to count how many places this manual work is required and as the project progresses, you can measure this progress by recounting and revising estimates.

DAILY SMOKE TEST

The above two properties imply the existence of daily smoke tests. That is on a daily basis you have automated scripts that

check your code base against your code standards
recompile your application and make sure it is free of compilation errors
run QA tests on programs in your suite

These will identify very quickly any code that is broken or does not meet your standards, and it can be fixed that day whilst it is fresh in the developers mind.

Scriptable

Your code should be kept to a standard so that scripts can be applied to automate changes to the code base.

It is a lot easier to write scripts if the code has been kept to a consistent standard basis. For example consider if keywords/identifiers etc are either all UPPER CASE, all Proper Case, all camel Case, or all lower case. It is a lot easier to identify and replace certain syntax if you can ignore the multitudes of case variations and just search for the one variant of some syntax.

As well as case, other considerations might include …

Where there are alternative coding technique to achieve the same result, be consistent within your code base over what technique is used. So for example, having standards of when to use {}, –, # to identify code comments.

Hungarian notation or some form thereof can aid readability and can also help identification and subsequent transformation. One code transformation was made easier because all date form fields had dmy in their name. Thus to identify what fields needed to be DATEEDIT’s was easy.

Line consolidation. Tools such as sed, grep, awk are simpler to use if dealing with a single line at a time. So for example, with .per files, consider having a code standard that says each entry in ATTRIBUTES or INSTRUCTIONS section is on one line. That is …

EDIT f01 = formonly.fieldname, SCROLL, UPSHIFT, COMMENT="Enter field value";

… as opposed to …

EDIT f01 = formonly.fieldname, SCROLL, UPSHIFT, 
    COMMENT="Enter field value";

If this in place you can identify if particular widgets have particular attributes defined very easy using grep.

Genero 3.20 introduced the source code beautifier. and that is a great tool you can use to make sure that your code is formatted consistently.

Genero Studio 3.20 also introduced gslint. This is a much underused tool that can also be used to ensure your code meets your internal standards. To understand how it works and how to adapt it for your standards, find the function extractASTFromFile() which shows how an abstract syntax tree can be extracted from your 4gl source.

Smart

The phrase Work Smarter, not Harder was a phrase allegedly coined by Allan H. Mogensen , an industrial engineer in the 1930’s. This was before source code was a thing but it still applies. If you can get yourself in position where you can make code changes once rather than repeating the same code change in many files then you can make changes across your code base a lot quicker.

Putting repeated code into FUNCTION is an obvious one. You should always be looking out for repeated code that can be placed into a function rather than coding something twice. Having library functions that are called at the start and end of every Genero application is a good example of coding something once in a library function rather than repeating code. Code such as ui.Interface.loadStyles, ui.Interface.loadActionDefaults should only appear once in a library function rather than being explicitly coded into every single Genero application.

There are code constructs in Genero that allow things to be defined once. Presentation Styles mean that the appearance of a certain class of fields can be defined once in the .4st. If you need to change the appearance, amend the one entry in the .4st rather than in multiple files. Similar concept with action defaults, define action attributes once in a .4ad. ui.Form.setDefaulInitializer allows form properties to be modified when form is loaded. ui.ComboBox class allows combobox item lists to be defined once in a function rather than repeated in multiple fields etc.

Use generic code to reduce the number of lines of code. The goal is that the arguments to a generic code function contain what is unique, and that in the generic code function this is where you code the equivalent of the repeated framework or boilerplate code once. For example in my fgl_zoom example, the arguments contain what is unique about the zoom window such as the sql statement, the headings of the various columns, whilst the generic code contains the equivalent of one OPEN WINDOW, one DECLARE CURSOR, one FOREACH, one DISPLAY ARRAY, one CONSTRUCT. Other examples of potential generic code I like to point out

populate_combobox(field_name STRING, sql STRING) — populates a ComboBox list of items from the passed in SQL.
exists(table_name STRING, column_name STRING, value_name) STRING RETURNS BOOLEAN — indicates if a value exists in a given table and column

Sometimes you can’t code what you want generically and historically this was the domain of code generators such as BAM. That is pass a database table and/or a form and generate code used to maintain that table or enter data in that form. Code generators allow you to concentrate on writing the unique business rules for an application without having to worry about typing out the template or boilerplate code. For large systems you can often identify patterns of programs and where there is a large number of programs in a pattern these are candidates to be generated so they are consistent. For example in an ERP system, Debtor Maintenance, Creditor Maintenance, Product Maintenance, Branch Maintenance, GL Account Maintenance etc, the unique aspects to these programs are the table name, the columns, and the business rules identifying if you can see a field, change a field, what the default is, is the value valid etc.If these are generated to some extent then you can make changes to all these programs in one hit by changing the generator and regenerating the code.

The pre-processor can be thought of as a mini code generator. Use it to generate repeated code within a program that only varies by something like the name of a screen field or database column, and have this repeated code slimmed down to a one line pre-processor macro. To make large scale code changes, it is then a case of making a change in the macro definition and recompiling the sources that use this definition. To have access to pre-processor macros in your code base, this typically means having an &include line at the top of every file that loads a file that contains all your macros. Potential uses include

adding global actions
the repeated code pattern that is common in AFTER FIELD blocks where the only difference is the name of the form field

Often getting in these positions where you are coding smarter is a large scale code change in itself but once you are there the continual improvement of adding new features and taking advantage of a new syntax is a simpler activity.

STANDARDS

The above two properties imply the existence of standards. Your code base has standards that mean

AUTOMATING CHANGES

I have spent a lot of time discussing a development environment, and not discussing the howto.

If you can make changes in library functions including generic and initializer functions, preprocessor macros, code generators then that can vastly reduce the effort required. Work smarter, do it once rather than many the same change in many different files.

In a Linux environment sed and awk are powerful tools you can use to change lines of code. grep will also typically be used to identify patterns. However you could also consider that using base.Channel and other Genero classes you can write a Genero program to read a source file and write a transformed source file.

grep -c pattern file | grep -v ":0", is a handy usage of grep you can use to identify files. This can be used in a for loop to perform a task on a select group of files like so …for filename in `grep -c pattern file | grep -v ":0" | cut -d ":" -f1

wc -l can be used to tell you how many lines of code you are dealing with.

Alternate views of a file, such as producing a 42f or 4fd which are both XML from a .per, might help understand a form file better and give guidance as to how to automate a code change. Similarly using abstract syntax tree from gslint might help as well.

When making automated changes, I find they fall into categories :

can be automated and I trust the automation
can be automated but needs a human to verify it so leave a marker in the code for a human to remove after they have verified the code.
can’t be automated but where the change needs to be made can be automated and a marker can be inserted in the code identifying where a change needs to be made. This marker is removed as part of the process of making the change
can’t be automated and it can’t be accurately determined where the change is required so a marker might be put at the top of a file or function indicating manual effort required.

These markers can be used to track progress to date and work remaining.

SUMMARY

Your code is an asset. Just like a car or house it needs to be maintained, given a little love, and tweaked to take advantage of what is new. Having the right environment for your code base and being smart with your code can ensure that it is an asset that increases in value and not a liability.