HELP MEMORY                                      Robert Duncan, Oct 1989

Memory organisation in PML.


    CONTENTS - (Use <ENTER> g to access required sections)

 -- Introduction
 -- The Heap Area
 -- The Callstack Area
 -- The Garbage Collector
 -- The Memory Module
 -- Summary of Memory Variables and Functions


-- Introduction -------------------------------------------------------

The memory used  by PML is  divided into two  distinct sections: a  HEAP
area and a CALLSTACK area.  The heap is used  for the allocation of  all
data structures  created by  ML programs  and by  the ML  compiler;  the
callstack is used  to maintain control  information and local  variables
for active functions and exception handlers.

The two memory areas  are discussed in more  detail below. It should  be
noted however, that memory organisation is ultimately determined by  the
underlying POPLOG system, and as such  is somewhat more complex than  as
described here. In  order to  satisfy the requirements  of other  POPLOG
languages, each of  the two  memory areas  so far  mentioned is  further
subdivided: the heap into a "true" heap area plus a USERSTACK area,  and
the callstack into a "true" callstack plus a special PROLOG area.  These
secondary divisions are  not relevant  to ML programmers,  and are  only
mentioned to forestall possible  confusion arising from descriptions  of
memory organisation given  in other POPLOG  documentation. For the  full
story, see REF * SYSTEM.


-- The Heap Area ------------------------------------------------------

The heap stores all  the data structures created  by ML programs.  Every
data construction operation claims  a certain amount  of space from  the
heap. As a program  runs, repeated data  construction gradually uses  up
the available space; once it is completely exhausted a garbage collector
is invoked automatically to sweep the heap, reclaiming space which is no
longer  required  and  compacting   the  remaining  useful  data.   Data
construction can  then  proceed again.  A  large program  may  call  the
garbage collector many times during its execution, but this is  normally
invisible to the user.

It can happen, however, that the amount of unused space reclaimed by the
garbage  collector   is  insufficient   to   allow  any   further   data
construction, at which  point the program  will halt with  an error.  To
demonstrate this, try  marking and loading  the following program  which
builds a unit list of unbounded length:

    let fun rom l = rom (()::l)
    in  rom []
    end;

Running this should provoke the error:

    MISHAP - rom: MEMORY LIMIT (popmemlim) EXCEEDED (using heap space)
    DOING    :  conspair rom ....

The MISHAP keyword  indicates that  this is  an error  generated by  the
POPLOG system rather than by PML:  such errors cannot be trapped  within
ML. The message  ``MEMORY LIMIT  EXCEEDED'' describes the  cause of  the
error: the program  requires more  memory than is  available. The  DOING
line shows which functions were active  at the time the error  occurred:
here -rom- is obviously active, as  is the function -conspair- which  is
trying to claim space for the list construction operation "::".

The size  of  the heap  is  not usually  fixed:  each time  the  garbage
collector is called  it may increase  or decrease the  size of the  heap
depending on how long the garbage  collection took. The size is  bounded
however by the two variables -Memory.lolim- and -Memory.hilim-, measured
in machine words. Both of these are user-assignable: increasing the high
limit allows the heap to grow  larger, thus increasing the time  between
garbage collections.  Since  the time  taken  by garbage  collection  is
proportional to the  amount of  storage in  use rather  than the  amount
available (see below), this can decrease the overall garbage  collection
time. However, the  larger the heap,  the greater the  use of  operating
system resources (in  paging, for  example) which can  often offset  any
gains made. Changing the low limit places  a lower bound on the size  of
the heap; in particular, setting the  low limit equal to the high  limit
fixes the heap at a constant size.


-- The Callstack Area -------------------------------------------------

The PML callstack  area behaves  much the same  as the  stack area  of a
conventional  language:  every  function   call  or  exception   handler
allocates space on the stack for control information and local variables
(a ``stack frame''). This space remains  in use for the duration of  the
call, but can be reclaimed automatically  once the call is complete.  In
general then, the  size of  the used portion  of the  callstack area  is
proportional to  the  depth  of  recursion in  a  program.  However,  an
important optimisation  in the  compiler allows  the stack  frame to  be
reclaimed immediately prior to the last call in the function body.  This
means that programs which are ``tail recursive'' can run with no overall
growth of stack.

The size  of  the  callstack  area is  bounded  above  by  the  variable
-Memory.stacklim-, measured in machine words. This places a limit on the
depth of recursion allowed in a program. Any program which violates this
limit provokes an error. The following program is an example:

    let fun rle l = () :: rle l
    in  rle []
    end;

This is not tail recursive, because the last call in the body is to  the
construction operator "::",  not to  -rle- itself.  Running the  program
produces an error something like:

    MISHAP - rle: RECURSION LIMIT (pop_callstack_lim) EXCEEDED
    DOING    :  rle(*27282) ....

Again, this is an error which cannot be handled within ML. Note how  the
DOING line shows the number of calls to -rle- which occurred before  the
recursion limit was reached; the exact number will vary between systems.

In practice,  it's rarely  necessary to  change the  callstack limit  as
stack overflow is  most often caused  by a programming  error. There  is
also an external limit  on the callstack size  imposed by the  operating
system, and  this  cannot  be  exceeded irrespective  of  how  high  the
-stacklim- variable has been set.


-- The Garbage Collector ----------------------------------------------

As mentioned  above,  the  garbage collector  is  invoked  automatically
whenever the heap area is exhausted  and more space is required. In  the
normal case it  uses a  fast``stop-and-copy'' algorithm: a  new area  of
memory of the required size is grabbed from the operating system and all
accessible data is  copied from  the existing  heap space  into the  new
space. The  new  space then  becomes  the new  heap.  In the  course  of
copying, the copied data is compacted at  the lower end of the new  heap
space. The garbage  collector needs only  trace and copy  data which  is
currently in use; thus the time  taken is proportional to the amount  of
useful data and is independent of the amount of garbage.

Occasionally, if the  heap area  is particularly  large or  if the  host
operating  system  is  otherwise  overloaded,  the  extra  memory  space
required for  the  copying is  unavailable.  In this  case  the  garbage
collector will automatically switch to a slower non-copying algorithm.

To see what the garbage collector is doing, you can assign -true- to the
variable -Memory.gctrace-. After this, the garbage collector will  print
statistics each time it is called. The output is as follows:

    GC-<code>(X) <t> MEM: u <u> + f <f> + s <s> = <m>

where

    <code>  is the reason for the garbage collection, typically "auto"
    X       is the algorithm used, "C" for copying, "N" for non-copying
    <t>     is the time taken in seconds
    <u>+<s> is the amount of space in use (heap + userstack)
    <f>     is the amount free
    <m>     is the current heap size

You can  force  a garbage  collection  at  any time  with  the  function
-Memory.gc-. In this case, the <code>  word printed in the trace  output
is "user".

The variable -Memory.usage-  gives the number  of words of  heap in  use
after the last garbage  collection. There is no  way of finding out  the
amount of space currently in use.

Two other functions are provided which  can help minimise the amount  of
time taken by garbage  collections. The function -Memory.lock-  places a
``barrier'' at the current endpoint  of the heap: the garbage  collector
assumes that anything below  the barrier is valuable  and must be  kept.
This means that it need not be copied, thus reducing garbage  collection
time. This function  is typically used  to lock in  code and data  which
will be used throughout the run of a program. You should always call the
garbage collector immediately  before locking the  heap to avoid  having
garbage locked in  too. The  function -Memory.unlock-  removes any  heap
barrier, opening up the whole heap to the garbage collector.

Note that only one barrier can be  active at any time: a second call  to
-Memory.lock- without an intervening unlock will simply move the barrier
further up the heap. There is no stack of barriers maintained.


-- The Memory Module --------------------------------------------------

signature Memory
structure Memory : Memory
    The memory  control  variables  and functions  discussed  above  are
    defined in  the  structure  -Memory-,  described  by  the  following
    signature:

    signature Memory = sig

        (* The Heap Area *)

        val hilim       : int ref
        val lolim       : int ref

        (* The Callstack Area *)

        val stacklim    : int ref

        (* The Garbage Collector *)

        val gc          : unit -> unit
        val gctrace     : bool ref
        val usage       : int
        val lock        : unit -> unit
        val unlock      : unit -> unit

    end;    (* signature Memory *)


-- Summary of Memory Variables and Functions --------------------------

val hilim : int ref
    An upper limit (in machine words) on the size to which the PML  heap
    area is allowed to  expand. The actual size  of the heap  fluctuates
    according to the  proportion of time  spent garbage collecting,  but
    will not rise above this value.


val lolim : int ref
    A lower limit (in machine words) on  the size to which the PML  heap
    area is allowed to contract. The actual size of the heap  fluctuates
    according to the  proportion of time  spent garbage collecting,  but
    will not fall below this value.


val stacklim : int ref
    An upper limit (in machine words)  on the size of the PML  callstack
    area. This limits the depth of recursion possible in an evaluation.


val gc () : unit
    Forces a garbage collection.


val gctrace : bool ref
    Flag controlling the display of garbage collection statistics.
    Default value: false.


val usage : int
    The number of  words of  heap space in  use after  the last  garbage
    collection.


val lock () : unit
    Places a ``barrier'' at  the current heap  endpoint: all data  below
    this point  is  assumed  to be  non-garbage  in  subsequent  garbage
    collections.


val unlock () : unit
    Removes the barrier established by  a previous -lock- call,  opening
    up the whole heap to the garbage collector.


--- C.all/pml/help/memory
--- Copyright University of Sussex 1991. All rights reserved. ----------
