Ask Reuben

ulimits

Why do my Genero applications running through the Genero Application Server stop after a while when there are a large number of users?

What are these numbers at the top of the dispatcher and proxy log?

I write this article on ulimits as a Genero application developer, not a systems administrator.  I hope it will give you some context should you need to have a conversation around getting a ulimits value changed.  Ulimit is also something that applies to Unix/Linux/OSX family of operating system, Windows you will have to google “Windows equivalent of …”.  The two example programs are also supposed to break the system limits, they are NOT intended for running on a live production system.

When running via Genero Application Server (GAS), the top of the dispatcher log and the proxy log on Linux and OSX based operating systems files will contain an entry similar to …

#Version: 1.0
#Date: 2020/11/27 11:58:43.507461
#Fields: time relative-time location process-id thread-id contexts event-type event-params
#Ulimits:
#   core file size     : 0 (unlimited)
#   data seg size      : unlimited (unlimited)
#   file size          : unlimited (unlimited)
#   max memory size    : unlimited (unlimited)
#   open files         : 1024 (unlimited)
#   stack size         : 8388608 (67104768)
#   cpu time           : unlimited (unlimited)
#   max user processes : 1024 (2128)
#   virtual memory     : unlimited (unlimited)

These numbers that are first shown are a capture of the Operating System ulimit values, and rumour has it they were added to the GAS logs by support personnel who were frustrated at continually running into support issues where these values on the operating system were too low.  Without writing them into the log file, we had no idea  of what the actual values used by the dispatcher and proxy process were.  By writing them in the log file when the process starts, we knew what limits were in place for that particular process and we can pinpoint an issue a lot quicker.

You can see the values by running …

ulimit -a

… from the command line …

> ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1418
virtual memory          (kbytes, -v) unlimited

Ulimits exist to protect the resources of a server being consumed by one process and its children.  They stop a process consuming too many resources of a system, in the case we are looking at by opening too many files, or by starting too many child processes.  A poorly written program could consume all the resources of a server if it was allowed to create an infinite number of child processes or open an infinite number of files.

For exact meaning of ulimit values, I would defer to the documentation you have for your operating system, pay careful attention to …

  • distinction between soft and hard limits
  • the inheritance of values by another process
  • where you can set them for the current process and its children
  • where you can set them to apply to other processes and by user,
  • if the totals apply for the current process, the current process and its children, and to the user.

We will expect most system administrators to have come across ulimit and have some understanding of what they are.  I will concentrate on the impact on Genero Applications.

The two values we look at most are the “open files“, and the “max user processes ” value as these are the ones where the default value (1024) may be too low for a Genero Application Server with a large number of users.

I will simplify a little but with a Genero Application Server count the descendants of the httpdispatch or fastcgidispatch process,   for each browser tab there is both an uaproxy and fglrun process.  If the number of user processes ulimit value is 1024 then at the latest on the 512th program you are going to get an Operating System error when the O/S tries to start the 1025th process.  If you have a shell that launches fglrun e.g. my_fglrun , then for each browser tab there is an uaproxy + my_fglrun + fglrun process that is a descendant of the fastcgidispatch process.  In that case at the latest on the 342nd user you will encounter an operating system error.

The architecture overfiew diagram from https://4js.com/online_documentation/fjs-gas-manual-html/#gas-topics/c_gas_architecture_ovw.html (and a similar diagram at https://4js.com/online_documentation/fjs-gas-manual-html/#gas-topics/c_gas_architecture_007.html) shows the Dispatcher + Proxy + DVM (fglrun) processes …



Typically the resolution is to increase the number for the user that owns the fastcgidispatch process.  So if it was 1024, double it to 2048.  If the error occurs but at a later time then increase it again.

If you want to see the error on your development system, you could experiment with a little program like the following …


# Program to exceed ulimit -u setting
# DO NOT RUN on a production or live server, intended for use on private development server 
# Review the value for SAFETYNUMBER, and the SLEEP value 
IMPORT os
IMPORT util

CONSTANT SAFETYNUMBER = 2000

MAIN
DEFINE u base.Channel

DEFINE i INTEGER

DEFINE ulimitu INTEGER
DEFINE step INTEGER

DEFINE l_status SMALLINT

    LET u = base.Channel.create()
    CALL u.openPipe("ulimit -u","r")
    LET ulimitu = u.readLine()
    CALL u.close()

    DISPLAY "ulimit -u = ", ulimitu, "  processes"
    LET step = ulimitu / 5

    -- Just make a check that ulimitn is a sensible number
    -- If it exceeds the SAFETYNUMBER, then stop
    -- it maybe that you need to increats the safety number
    IF ulimitu >0 AND ulimitu <= SAFETYNUMBER THEN
        -- ok
    ELSE
        DISPLAY "Something has gone wrong, check safety number or result of ulimit -u"
        EXIT PROGRAM 1
    END IF
    
    DISPLAY "ulimit -u = ", ulimitu, " processes"

    FOR i = 1 TO (ulimitu+1)
        IF i MOD step = 0 THEN
            DISPLAY i
        END IF
        RUN "sleep 10 & " RETURNING l_status     --< may need to change sleep value
        IF l_status != 0 THEN
            DISPLAY i, l_status, l_status MOD 256, l_status / 256
         EXIT PROGRAM 1
        END IF
        
    END FOR
END MAIN

... this program finds the ulimit -u value, and then attempts to start that number + 1 SLEEP processes without waiting.  You should find that before it can start that many SLEEP processes at the same time, some operating system errors will occur.  On my Mac I get "sh: fork: Resource temporarily unavailable".  This is the type of error the dispatcher or proxy process will get when it attempts to start too many child processes.

Similarly on the open files side of things, the proxy processes and fglrun processes will open a number of files.  Tools such as lsof allow you to see how many files are open by a process, wether they are the .42m files or the various log files that are being written.

Again a little test program can be written so that you can see what happens if your Genero application tries to open too many files at the same time ...

# Program to exceed ulimit -n setting
# DO NOT RUN on a production or live server, intended for use on private development server 
# Review the value for SAFETYNUMBER 
IMPORT os
IMPORT util

CONSTANT SAFETYNUMBER = 100000

MAIN
DEFINE u base.Channel
DEFINE ch DYNAMIC ARRAY OF base.Channel
DEFINE filename STRING
DEFINE i INTEGER

DEFINE ulimitn INTEGER
DEFINE step INTEGER

DEFINE l_status INTEGER

    LET u = base.Channel.create()
    CALL u.openPipe("ulimit -n","r")
    LET ulimitn = u.readLine()
    CALL u.close()

    DISPLAY "ulimitn -n = ", ulimitn, " open files"
    LET step = ulimitn / 5

    -- Just make a check that ulimitn is a sensible number
    -- If it exceeds the SAFETYNUMBER, then stop
    -- it maybe that you need to increats the safety number
    IF ulimitn >0 AND ulimitn <= SAFETYNUMBER THEN
        -- ok
    ELSE
        DISPLAY "Something has gone wrong, check safety number or result of ulimit -n"
        EXIT PROGRAM 1
    END IF
    
    DISPLAY "ulimin -n = ", ulimitn, " open files"

    FOR i = 1 TO (ulimitn+1)
        IF i MOD step = 0 THEN
            DISPLAY i
        END IF
        TRY
            LET filename = os.Path.makeTempName()
            LET ch[i] = base.Channel.create()
            CALL ch[i].openFile(filename,"w")
        CATCH
            LET l_status = status
            DISPLAY "Status = ", l_status
            DISPLAY "sqlca = ",util.Json.stringify(sqlca)
            
            DISPLAY "Error occured with i = ", i
            EXIT PROGRAM 1
        END TRY
    END FOR
END MAIN

... this program finds the ulimit -n value, and then attempts to open that number + 1 files.  You should find that an error will occur before it can open that many files.  This is the type of error the fglrun process will get when it attempts to open too many files.

In both programs, note the comment in the code, don't run this program on your live production server, this is intended for experimental use on your private development server where you are not going to inconvenience others.

In both programs note the SAFETYNUMBER, this is to protect in case the ulimit value is already set to a high or unlimited  value.

The first question was "Why do my Genero applications running through the Genero Application Server stop after a while when there are a large number of users?".  This will often manifest itself in that on the first day you go live with Genero Application Server and you have not had that number of users on your test server.  For the first users the system will be fine but then after more and more users are on, something will stop as the ulimit -u limit is reached.  The dispatcher will be restarted and same thing will happen, for the first users it will be good but eventually an error will occur.  It maybe that this is around the 300-500 user mark when the ulimit -u value of 1024 is being reached.  You'll ask yourself why did this not occur on the identical test machine and chances are you never tested with that many users at once.  You may ask why this did not occur when using direct connection instead of via Genero Application Server and it will be because there isn't the fastcgidispatch process launching everything and there isn't the proxy process increasing the proxy count.

I have deliberately avoided advising you how to change these limits, it can be via the ulimit command itself for the current process, or via editing values in a file, either for all users or for the user used by the dispatcher process.  This would normally be something you'd change in conjunction with your system administrator.

As I often say, Genero (as in Genero Application Server or Genero Studio)  can't break the rules of operating systems, isn't doing magic, it is executing commands on the command line and is subject to the rules of the operating systems.  ulimits refers to the boundary lines imposed by the operating system and you may need to move them to cater for your Genero application.