TCP/SP

Section: Misc. Reference Manual Pages (1.1)
Index
Return to Main Contents
 

TCP/SP 1.1

 

Tightly-Coupled Processors Support Package

 

Running under the operating system OS-9

 

Prepared to be ported to other operating systems

 

Distributed by ELTEC elektronik, Mainz

Copyright © 1995-1997 Carsten Emde, CE Computer Experts AG, CH-5706 Boniswil. Reproduced under license. This manual has been written by Carsten Emde <ce@ceag.ch>.

 

Copyright

This manual and the digitally encoded software included with TCP/SP 1.1 is copyright © 1995-1997 by Carsten Emde, CH-8800 Thalwil and CE Computer Experts AG, CH-5706 Boniswil. All rights reserved. The software is intended to be used on a single computer system. Any reproduction of the software on tape, disk or any other medium except for backup purposes is prohibited. Reproduction of the documentation, in part or whole, by any means, electrical, mechanical, magnetic, optical, chemical, manual or otherwise is also prohibited. Distribution of the software and/or documentation, in part or as a whole, to any other party or any other system may constitute copyright infringements and misappropriation of trade secrets and confidential processes that are the property of Carsten Emde and/or other parties.

Eurocom is a registered trademark of Eltec Elektronik GmbH.
LynxOS is a trademark of Lynx Real-time Systens, Inc.
MC68000 etc. are trademarks of Motorola, Inc.
OS-9/68000 is a trademark of Microware Inc.

The information contained in this manual is believed to be accurate as of the date of publication; Carsten Emde, however, will not be liable for any damages, including direct or consequential, from use of the software or reliance on the accuracy of this documentation. The information contained herein is subject to change without notice.

 

1. Introduction

The following article is taken from the journal OS-9 International, issue I/1994, with kind permission of the editor.

 

Parallel processing under OS-9

Whenever the result of a computer's calculation is required in less time and algorithms and compiler already ensure maximally optimized binary code, more computer power is needed. This can be realized by installing a more powerful CPU board, if available. If not available, however, parallel processing must be employed. The way this is done, depends - among other things - on the number of tasks involved in the particular computing procedure.

 

Single-task projects

A typical time-sensitive single-task project is, for example, a fore-casting program that, obviously, must have finished the calculations early enough before the relevant time period starts: A weather forecast for the next weekend would, of course, make no sense if only available on Sunday evening. Such problems require either more single CPU power or the availability of several CPUs and an adequate tool that separates the given algorithm into parallel tasks. For the OS-9 operating system, however, single-task computing power is limited to the maximum speed that can be achieved using Motorola's CISC processors family since compilers that transform a single task source into concurrently running procedures are normally not available for OS-9. Single-task projects that need more computing power than about 40 MIPS can, therefore, not be realized under OS-9.

 

Multi-tasking projects

On the other hand, many computer projects, especially under OS-9, already consist of several concurrently running procedures; such situation can much easier be transformed into parallel processing than the above described situation of single-task computing. The current article, therefore, focuses on parallel processing of a multi-tasking environment under OS-9. Another reason for this article is that a new principle of multi-processing has recently been made available: VMEbus boards with tightly-coupled processors. The term "tightly-coupled" means that the processors are connected to the same bus and have, therefore, common access to the entire memory. One processor is the master processor that, by default, receives interrupts from the peripheral device controllers, the other processors have special control registers for reset and interrupt, but this is the only difference between them. Currently existing boards have two processors installed, but future boards may even have more. Thus, multi-processing does no longer require several CPU boards being connected to each other via VMEbus but may be done on one single board. As a consequence, a drastic increase in the performance of such systems can be achieved since all memory accesses are local and no longer limited by the VMEbus. In general, there are three different ways to transform an OS-9 multi-tasking project into multi-processing: 1. using independent OS-9 systems that are connected via network, 2. installing a special kernel extension that distributes the active tasks to all processors instead of only one, and 3. using a driver interface to run programs without OS-9 in the additional CPUs. In addition, Microware is working on an OS-9000 multi-processor system ("Hydra project") but has not yet announced any definite plans for release [1].

 

1.1. Independent OS-9 systems

Several independent OS-9 systems can run simultaneously one per processor, provided that the memory sections assigned to the processors do not overlap. This is normally achieved by individually defining a particular processor's memory in the 'init' configuration module. In a double-processor board with 32 MByte RAM one would, for example, define start address and memory size as 0 and 0x1000000 for the first processor, and as 0x1000000 and 0x1000000 for the second processor, respectively. After booting the first OS-9 system, a special download program may be started that boots the second CPU. If the second CPU does not need any specific mass storage, its OS9Boot file may contain a '/dd' device descriptor for a RAM floppy whose disk image is also provided as part of the OS9Boot file. The 'init' configuration module must then, of course, specify '/dd' as the primary disk device. Another, very elegant, feature is to incorporate the NFM file manager, drivers and descriptors into the OS9Boot file so that the newly booted OS-9 system may access all peripheral devices via OS9Net through the master processor. Since both OS-9 systems may access each other and the NFM file manager allows to access not only RBF devices but all other I/O devices including pipes, the network link may as well be used for synchronization purposes.  

Advantages and disadvantages

The advantage of having two independent OS-9 systems running on one CPU board is that there is no need for specific software. Already existing software that was, for example, written for a master and a slave CPU board connected to each other via backplane net may easily be adapted to run on a double-processor board. But not all software projects may easily be transformed to communicate via network and not all customers are familiar with the configuration that is needed for such master/slave applications. It may also be difficult to decide in advance what part of the project should run on what processor. In addition, it must be mentioned that two OS-9 runtime licenses must be bought since two independent kernels are needed, irrespective of whether they run on one or on two CPU boards.

 

1.2. The Doubler

Recently, the Syac company (Sy.A.C. S.R.L., Trieste, Italy) has released a set of OS-9 system modules that implement multi-processing in a single OS-9 system. Essentially, the existing OS-9 scheduling algorithm is modified so that the active tasks are distributed not only to one but to two processors. This software is called "Doubler" since it supports two processors. After installing the "Doubler" module, the single-task behavior of the system is not affected, i.e. a benchmark program such as the Dhrystone program has virtually the same performance with and without the "Doubler". If, however, the benchmark program is simultaneously started from two terminals or from two MGR windows, the performance can nearly be doubled. The increase in performance, however, depends from the particular application and may be less pronounced. The following values have, for example, been measured (68040, 33 MHz):

TestWithoutWithPercent
"Doubler""Doubler"Performance

One Dhrystone50.35250.120100
Two Dhrystones24.97549.309197
Three Dhrystones16.56332.808196
C compilationca. 160

 

Advantages and disadvantages

The advantage of the "Doubler" is, as in the first method, that any existing software can be used without restriction. Whenever at least two processes are active, the increase in performance takes place. This increase, however, is not always as big as in the above example since only user programs are executed concurrently. If a program heavily relies on kernel calls, the increase in performance may be less pronounced. This must, therefore, be considered when deciding between the two first methods of parallel-processing. The frequency of I/O calls, however, is not an important argument, since I/O is exclusively handled by the master processor in both methods.

 

1.3. Driver interface (Tightly-Coupled Processors Support Package)

Finally, a software package has been developed that manages the additional CPU through a driver interface. This driver interface, although based on the SCF file manager, does not primarily provide read and write functionality. The main part of the driver is implemented in form of SetStat calls. There are, for example, specific calls to start and stop the additional processor, to install exception handlers and to provide communication channels using interrupts and signals. In addition, a 'fork' function is available that allows to run OS-9 program modules on the additional processor. This function fully emulates the OS-9 F$Fork kernel call, i.e. memory is allocated for static and global data and those pointer variables that refer to the program code or to static data are made position-independent. Bindings for the C language are available to facilitate the calling interface. The following example presents the code that is necessary to let the additional processor execute a loop for one second and to stop:

#include <modes.h>
extern int errno;
void prog(void);
void main(void)
{
  char *cpu2name = "/cpu2";
  int   cpu2;
  if ((cpu2=open(cpu2name, S_IREAD|S_IWRITE)) == -1)   
    exit(_errmsg(errno,
      "can't open '%s' CPU device due to ",       
      cpu2name));
  
  _ss_tcpsp_runlow(cpu2, (char *) 0, prog);
  /* no stack memory needed */
  sleep(1);
  _ss_tcpsp_stop(cpu2);
}
#asm
prog:  cinva bc ; invalidate both caches
loop  bra.s loop
#endasm

 

Advantages and disadvantages

The advantage of this method is that small and effective programs can run on an additional processor, that there is no overhead from the operating system and that no run-time license is required. In consequence, the Tightly-Coupled Processors Support Package is ideally suitable for image processing (filters, reduction in bit depth, template matching, etc.) and number-crunching. The fact that no specific debugging facilities are available cannot be considered an important disadvantage, since both processors are connected to the same memory so that the development of the slave software including debugging can easily be done on the master processor. It must, however, be realized that - except inquiring system globals and low-level string output - kernel functions such as memory allocation, I/O etc. are not available. Standard software can, therefore, not easily be adapted to run on an additional processor using this method.

 

1.4. Conclusion

In comparison to some other operating systems, OS-9 is less well equipped with already existing and generally available support software for multi-processing. There are, however, at least three distinct ways to transform a multi-tasking environment into multi-processing. All of them have specific advantages so that in many cases where more OS-9 computing power is needed, an adequate method is available.

 

2. Upgrade changes from TCP/SP version 1.0 to 1.1

 

2.1. Dual-ported RAM communication channel

Originally, the TCP/SP software was intended for tightly-coupled processors and this innovative computer type also gave the name to the software (Tightly-Coupled Processors Support Package). A number of projects, however, still use a conventional master/slave architecture where the two processors are connected via a multi-processor bus such as the VMEbus. In order to allow for using the same software on both architectures, TCP/SP 1.1 can be used in conjunction with both tightly-coupled processors on the same CPU board and master/slave processors connected via VMEbus. This made it necessary to provide a transparent communication channel so that the slave CPU can send messages to the master. This feature, however, is not only available in dual-port RAM mode but can also - even additionally - selected when using tightly-coupled processors.

 

2.2. Support of more members of the 68k processor family

In contrast to TCP/SP 1.0 that only supported the MC68040 processor, the upgraded version contains code for the following processors:

VersionProcessors

TCP/SP 1.0MC68040
TCP/SP 1.1MC68000/020 MC68030 MC68040 MC68060 MC68302

 

2.3. Floating-point support package and floating-point library support package for 68040 and 68060

There are now two main methods to emulate floating-point instructions that are unavailable on the 68040 and 68060 processor. The first is based on a floating-point exception handler integrated in additional versions of the TCP/SP driver (sctcpsp040 and sctcpsp060). This exception handler was derived from Motorola's fpsp floating-point support package. The second method uses Motorola's so-called floating-point library support package fplsp that is part of TCP/SP 1.1 distribution (fplsp.l). The latter is, however, not restricted to double processor software. It can be used in conjunction with any other projects. Library functions execute about 50% faster than their respective counterparts in the exception handler.

 

2.4. Enhancements for both tightly-coupled processors and processors connected via dual-ported RAM

- Data and instruction cache and copyback mode can be defined.
- Offset for non-cache access can be defined.
- A program's 16-bit exit code is available in the status variable.
- Devcon descriptor section controls behavior of the additional CPU. Among others, message and debug mode can be selected; the latter displays the current register image at every entry into and exit from the TCP/SP kernel.
- MGR utility mcpu2 enhanced so that it displays the 16-bit exit code of the most recently terminated program.
- Many more utility and test programs such as:
- tcpspmode utility to display and modify a TCP/SP descriptor.
- shell2 utility to start a program on the additional CPU and to consecutively read out the communication channel.

 

2.5. Enhancements for processors in dual-ported RAM mode

-      Begin and end address of data and code can be defined separately
in dual-ported RAM and in local RAM of the slave CPU.
- Driver and descriptors may be copied automatically into dual-ported RAM, if not already there.
- The entire data space needed by the additional CPU is copied into dual-ported RAM, initialized global data and pointers (relative to data and and code segment) are prepared as required.
- In a second copy procedure, code and data can be written to local RAM (e.g. to increase execution speed). The required transformation of initialized global data and pointers is also done appropriately.
- Ready-to-use descriptors for SL-30, IC-40 and IP-302 are part of the TCP/SP distribution.
- Time and date in the system globals of the slave CPU are continuously updated using a system-state alarm.

 

3. Principle and distribution of the TCP/SP software

The TCP/SP software is intended as a tool kit to facilitate the development of multi-processor software for CPU boards with tightly-coupled processors and for master/slave systems. It mainly contains three different parts. The first part consists of OS-9 drivers and descriptors; while the descriptors are prepared for a particular hardware, the drivers run on most OS-9 systems, since they incorporate code for all supported processors and a selection is made only at run-time. However, the optional floating-point exception handlers that may be required for the MC68040 and MC68060 processor, are not the same and only available in the sctcpsp040 and sctcpsp060 driver, respectively.

Driver nameProcessor

TCPSP/CMDS/BOOTOBJS/sctcpsp68xxx (integer arithmetic)
TCPSP/CMDS/BOOTOBJS/sctcpsp040incl. 68040 floating-point
TCPSP/CMDS/BOOTOBJS/sctcpsp060incl. 68060 floating-point

Descriptor nameCPU-board

TCPSP/CMDS/BOOTOBJS/cpu2E-17
TCPSP/CMDS/BOOTOBJS/cpu2.no_irqE-17, no IRQ
TCPSP/CMDS/BOOTOBJS/cpu3SL-30
TCPSP/CMDS/BOOTOBJS/cpu302IP-302
TCPSP/CMDS/BOOTOBJS/cpu4IC-40
TCPSP/CMDS/BOOTOBJS/cpu6E-6

Secondly, the tool kit contains header files

File nameFunction

TCPSP/DEFS/math_fplsp.hSurrogate for math.h
TCPSP/DEFS/osk_codes.hOS-9 function codes
TCPSP/DEFS/tcpsp.dAssembly language header file
TCPSP/DEFS/tcpsp.hC language header file

and libraries

File nameFunction

TCPSP/LIB/fplsp040.lFPU library for MC68040
TCPSP/LIB/tcpsp.lTCP/SP driver calls
TCPSP/LIB/utcpsp.lUtility functions

for software development. These files are primarily intended for development in C or in assembly language but other languages that use compatible linker files such as Fortran may also be used.

Last but not least, utility programs (tcpspmode, shell2) and a large variety of example programs is included in the distribution. A first set of programs and subroutines is located in the TCPSP/SRC directory:

TCPSP/SRC/args2.c TCPSP/SRC/bounce2.c TCPSP/SRC/bounce3.c TCPSP/SRC/bounce4.c TCPSP/SRC/cfg.a TCPSP/SRC/clear2.c TCPSP/SRC/copy.a TCPSP/SRC/cpu2.d TCPSP/SRC/crash.a TCPSP/SRC/dpr.a TCPSP/SRC/edge.a TCPSP/SRC/exit2.c TCPSP/SRC/fexcpt.a TCPSP/SRC/fifo2.c TCPSP/SRC/flash.a TCPSP/SRC/fline.a TCPSP/SRC/float2.c TCPSP/SRC/foolssm.c TCPSP/SRC/forkcpu2.c TCPSP/SRC/fplsp.a TCPSP/SRC/ftol.c TCPSP/SRC/getppid.c TCPSP/SRC/getsys2.c TCPSP/SRC/invert.a TCPSP/SRC/ipp.a TCPSP/SRC/irq.a TCPSP/SRC/irqcpu1.c TCPSP/SRC/irqcpu2.c TCPSP/SRC/mgrey.a TCPSP/SRC/muncher2.c TCPSP/SRC/muncher4.c TCPSP/SRC/pi2.c TCPSP/SRC/poll.a TCPSP/SRC/print.c TCPSP/SRC/print2.c TCPSP/SRC/rmon.a TCPSP/SRC/runcpu2.c TCPSP/SRC/runcpu2.h TCPSP/SRC/shell2.c TCPSP/SRC/shift.a TCPSP/SRC/sigcpu2.c TCPSP/SRC/status2.c TCPSP/SRC/tas.a TCPSP/SRC/tcpsp_intro.c TCPSP/SRC/tcpspmode.c TCPSP/SRC/time2.c TCPSP/SRC/vector.a TCPSP/SRC/video4.c TCPSP/SRC/whereami.c TCPSP/SRC/whichcpu.c

The above source files are prepared to be compiled under OS-9 version 2.4 and version 3.0. Two makefiles are available for this purpose,

TCPSP/SRC/make.kr TCPSP/SRC/make.ucc

that are called from a common makefile. All programs are also distributed in binary form; they are located in the TCPSP/CMDS directory. Under OS-9 2.4, all programs are compiled and linked when

make kr
is entered; the command
make ucc
does the same under OS-9 3.0. Source codes for programs that require the MGR window manager are located in a different directory:

TCPSP/MGR/APPL/MIMG/data.c TCPSP/MGR/APPL/MIMG/foolssm.c TCPSP/MGR/APPL/MIMG/import.c TCPSP/MGR/APPL/MIMG/makefile TCPSP/MGR/APPL/MIMG/mcpu2.c TCPSP/MGR/APPL/MIMG/mcpu2.h TCPSP/MGR/APPL/MIMG/mcpu2_rsc.a TCPSP/MGR/APPL/MIMG/mcpu2_rsc.h TCPSP/MGR/APPL/MIMG/mcpu2_rsc.rsc TCPSP/MGR/APPL/MIMG/mimg.c TCPSP/MGR/APPL/MIMG/mimg.h TCPSP/MGR/APPL/MIMG/mimg_rsc.a TCPSP/MGR/APPL/MIMG/mimg_rsc.h TCPSP/MGR/APPL/MIMG/mimg_rsc.rsc

All MGR programs are also distributed in binary form; they are located in the TCPSP/MGR/CMDS directory.

Descriptors and libraries as well as utility and example programs are explained in detail in the following chapters.

 

4. Installation

 

4.1. File structure

It is strongly recommended to use the file structure from the distribution disk also on the development system, e.g.
$ chd /d0
$ dsave /h0 -einrs
$ chd /h0/TCPSP
$ lha -x tcpsp
since this greatly facilitates the use of the makefiles that are part of the TCP/SP distribution.

 

4.2. Driver and descriptor

Before the TCP/SP library functions can be used, driver and descriptor must be loaded into memory and initialized, e.g.:
$ chd /h0/tcpsp/cmds/bootobjs
$ load sctcpsp cpu2 -d
$ iniz cpu2

 

4.3. First test

The standard read function examines the sctcpsp driver's status variable; the command
$ dump /cpu2
can, therefore, be used to monitor the activity of the additional CPU. It is currently running, if a value of 1 is displayed; if the driver returns 0, the additional CPU is not running. Please refer to the next chapter for more details.

 

5. Device configuration

 

5.1. Data structure

Since it was intended to make available only one single driver for any type of hardware, it was necessary to provide a data base where the hardware-specific settings are stored. OS-9 SCF descriptors already contain a pointer to such a data area called DevCon which was used for this purpose. The following definitions have been made:  

$00 DC$ResWidth, Byte

Width of reset register in bytes, 0 if not available.  

$01 DC$ConWidth, Byte

Width of CPU control register in bytes, 0 if not available.  

$02 DC$IRQWidth, Byte

Width of interrupt control register in bytes, 0 if not available.  

$03 DC$IAckWidth, Byte

Width of interrupt acknowledge register in bytes, 0 if not available.  

$04 DC$ResReg, Longword

Address of reset register.  

$08 DC$ResDo, Longword

This value, if written to the reset register DC$ResReg, resets the additional CPU.  

$0C DC$ConReg, Longword

Address of CPU control register.  

$10 DC$ConStart, Longword

This value, if written to the CPU control register DC$ConReg, starts the additional CPU.  

$14 DC$ConStop, Longword

This value, if written to the CPU control register DC$ConReg, stops the additional CPU.  

$18 DC$IRQReg, Longword

Address of interrupt control register.  

$1C DC$IRQShift, Longword

The defined interrupt level is shifted by this value prior to being written to the interrupt control register.  

$20 DC$IRQReg, Longword

Address of interrupt acknowledge register.  

$24 DC$IRQDo, Longword

This value, if written to the interrupt acknowledge register, acknowledges a mailbox interrupt.  

$28 DC$Cache, Longword

This variable defines the cache mode of the additional CPU. The following bits are implemented:

BitNameFunction if set

0DC$C_DataEnable data cache
1DC$C_InstEnable instruction cache

 

$2C DC$Outmode, Longword

This variable defines the output mode of programs running on the additional CPU. The following bits are implemented:

BitNameFunction if set

0DC$O_RMONEnable RMON output
1DC$O_FIFOEnable FIFO output
8DC$O_MsgEnable default messages
9DC$O_RBlockEnable read blocking
10DC$O_WBlockEnable write blocking
16DC$O_DbgMsgEnable debug messages
17DC$O_RegDmpEnable register dump

 

$30 DC$Memory, Longword

This variable defines the memory mode of programs running on the additional CPU. The following bits are implemented:

BitNameFunction if set

0DC$M_NoCopyInhibit to copy modules into dual-ported RAM
1DC$M_AlarmUpdate time and date in slave's globals via alarms
2DC$M_CacheDual-ported RAM memory is cacheable

 

$34 DC$PC2, Longword

The start address of the program intended to be run on the additional CPU is written also to this register, if not zero.  

$38 DC$DPRDatBeg, Longword

The lowest address of the memory region being used as data area in dual-ported RAM.  

$3C DC$DPRDatEnd, Longword

One plus the highest address of the memory region being used as data area in dual-ported RAM. If any one the two variables DC$DPRDatBeg or DC$DPRDatEnd is set to 0, the TCP/SP software considers the two CPUs to be tightly-coupled CPUs.  

$40 DC$DPRCodBeg, Longword

The lowest address of the memory region being used as code area in dual-ported RAM.  

$44 DC$DPRCodEnd, Longword

One plus the highest address of the memory region being used as code area in dual-ported RAM.  

$48 DC$LocDatBeg, Longword

The lowest address of the memory region being used as data area in local memory of the additional CPU.  

$4C DC$LocDatEnd, Longword

One plus the highest address of the memory region being used as data area in local memory of the additional CPU. If both variables DC$LocDatBeg and DC$LocDatEnd are set to a non-zero value, the additional CPU copies the data area from dual-ported RAM to local memory. All related data pointers are transformed as required. The main reason for using this feature is that local memory may have a shorter access time than dual-ported RAM.  

$50 DC$LocCodBeg, Longword

The lowest address of the memory region being used as code area in local memory of the additional CPU.  

$54 DC$LocCodEnd, Longword

One plus the highest address of the memory region being used as code area in local memory of the additional CPU. If both variables DC$LocCodBeg and DC$LocCodEnd are set to a non-zero value, the additional CPU copies the code area from dual-ported RAM to local memory and then continues execution in this code. All related pc-relative data pointers are transformed as required. The main reason for using this feature is that local memory may have a shorter access time than dual-ported RAM.  

$58 DC$Mirror1, Longword

This variable specifies the offset that can be used by a program running on the primary CPU to access memory without cache.  

$5C DC$Mirror2, Longword

This variable specifies the offset that can be used by a program running on the additional CPU to access memory without cache.  

$60 DC$CPUType2, Longword

This variable specifies the CPU type of the additional CPU.  

$64 DC$DTT0, Longword

This variable specifies the data transparent translation register 0 to be set on the additional CPU.  

$68 DC$ITT0, Longword

This variable specifies the instruction transparent translation register 0 to be set on the additional CPU.  

$6C DC$DTT1, Longword

This variable specifies the data transparent translation register 1 to be set on the additional CPU.  

$70 DC$ITT1, Longword

This variable specifies the instruction transparent translation register 1 to be set on the additional CPU.

 

5.2. Examples

The first example shows the settings required to address the secondary CPU on the Eurocom-17 CPU board (cpu2 descriptor of the software distribution).
DC_ResWidth  set 0 ; Width of reset register (not available)
DC_ConWidth  set 1 ; Width of control register
DC_IRQWidth  set 1 ; Width of IRQ register
DC_IAckWidth set 1 ; Width of IACK width
DC_ResReg    set 0 ; Reset register not available
DC_ResDo     set 0
DC_ConReg    set CPU2CON ; CPU control register
DC_ConStart  set $20 ; Start CPU
DC_ConStop   set $00 ; Stop CPU
DC_IRQReg    set CPU2CON ; IRQ register
DC_IRQShift  set 0 ; Shift IRQ level into IRQ register
DC_IACKReg   set CPU2CON ; IACK register
DC_IACKDo    set $05 ; Acknowledge mailbox IRQ
DC_Cache     set DC_CACHE_DATA|DC_CACHE_INST
DC_Outmode   set DC_OUTPUT_FIFO|DC_OUTPUT_MSG|DC_OUTPUT_RBLOCK
DC_Memory    set 0
DC_PC2       set 0 ; Additional address to write slave start address
DC_DPRDatBeg set 0 ; Begin address of dual-ported RAM data
DC_DPRDatEnd set 0 ; End address of dual-ported RAM data
DC_DPRCodBeg set 0 ; Begin address of dual-ported RAM code
DC_DPRCodEnd set 0 ; End address of dual-ported RAM code
DC_LocDatBeg set 0 ; Begin address of local RAM data
DC_LocDatEnd set 0 ; End address of local RAM data
DC_LocCodBeg set 0 ; Begin address of local RAM code
DC_LocCodEnd set 0 ; End address of local RAM code
DC_Mirror1   set $04000000 ; offset to uncached mirror memory CPU #1
DC_Mirror2   set $04000000 ; offset to uncached mirror memory CPU #2
DC_CPUType2  set 68040 ; Type of additional CPU
DC_DTT0      set 0 ; DTT0
DC_ITT0      set 0 ; ITT0
DC_DTT1      set CopybackUsr ; DTT1
DC_ITT1      set WrtThrghUsr ; ITT1
 endc

The second example shows the settings required to address the IC-40 slave CPU from a Eurocom-17 CPU board (cpu4 descriptor of the software distribution).

CPU2CON      equ $96000203
DC_ResWidth  set 1 ; Width of reset register
DC_ConWidth  set 1 ; Width of control register
DC_IRQWidth  set 0 ; Width of IRQ register
DC_IAckWidth set 0 ; Width of IACK width
DC_ResReg    set CPU2CON ; Reset register
DC_ResDo     set $ff
DC_ConReg    set CPU2CON+4 ; CPU control register
DC_ConStart  set $08 ; Start CPU
DC_ConStop   set $00 ; Stop CPU
DC_IRQReg    set 0 ; IRQ register
DC_IRQShift  set 0 ; Shift IRQ level into IRQ register
DC_IACKReg   set 0 ; IACK register
DC_IACKDo    set 0 ; Acknowledge mailbox IRQ
DC_Cache     set DC_CACHE_DATA|DC_CACHE_INST
DC_Outmode   set DC_OUTPUT_FIFO|DC_OUTPUT_MSG|DC_OUTPUT_RBLOCK
DC_Memory    set DC_MEMORY_ALARM|DC_MEMORY_CACHE
DC_PC2       set 0 ; Additional address to write slave start address
DC_DPRDatBeg set $94000000 ; Begin address of dual-ported RAM data
DC_DPRDatEnd set $94100000 ; End address of dual-ported RAM data
DC_DPRCodBeg set $94100000 ; Begin address of dual-ported RAM code
DC_DPRCodEnd set $94180000 ; End address of dual-ported RAM code
DC_LocDatBeg set $00180000 ; Begin address of local RAM data
DC_LocDatEnd set $001C0000 ; End address of local RAM data
DC_LocCodBeg set $001C0000 ; Begin address of local RAM code
DC_LocCodEnd set $00200000 ; End address of local RAM code
DC_Mirror1   set $04000000 ; offset to uncached mirror memory CPU #1
DC_Mirror2   set $00000000 ; offset to uncached mirror memory CPU #2
DC_CPUType2  set 68040 ; Type of additional CPU
DC_DTT0      set 0 ; DTT0
DC_ITT0      set 0 ; ITT0
DC_DTT1      set CopybackUsr ; DTT1
DC_ITT1      set WrtThrghUsr ; ITT1

The addresses specified in the variables DC_LocDatBeg, DC_LocDatEnd, /fIDC_LocCodBeg and DC_LocCodEnd do not point to local memory but to another region in dual-ported RAM. They have been set merely for testing purposes.

 

6. Libraries

 

6.1. tcpsp.l

Although the driver that allows to access an additional CPU using the TCP/SP software is a formal driver for the sequential character file manager SCF, it is normally not addressed using read and write functions but its main functionality is implemented using so-called SetStat functions. It is, however, possible to read from a TCP/SP device; is this case, the LSB of the current setting of the driver's status is returned. The status variable may have the following values:

ValueStatus

0CPU reset
1CPU running
any otherVector# of most recent exception

Writing to the TCP/SP device is not possible; if such attempt is made, error 208 (unavailable service requested) is returned.
In order to facilitate the use of the SetStat functions, a C language library is provided that contains the bindings to the respective driver calls. It is called tcpsp.l and described in the following chapters.

 

6.1.1. Get status of the additional CPU, _ss_tcpsp_status()

 

Syntax

unsigned long _gs_tcpsp_status(path)
int path;
 

Function

The _gs_tcpsp_status function returns the current setting of the sctcpsp driver's status variable. The upper two bytes of this 4-byte variable contain the most recent service request to the operating system made by the program being running on the additional CPU. If a request has not yet been made, the value is set to 0xFFFF (available as preprocessor variable UNDEF_SRVC in the tcpsh.h header file).

If the most recent service request is not F$Exit (0x0006, available as preprocessor variable OS9_EXIT in the tcpsh.h header file), the lowest byte specifies the most recent exception processed by the additional CPU; a value of 0 or 1, denotes that the additional CPU has been stopped or is running, respectively. This variable can also be examined using the standard read function.

If, however, the most recent service request is F$Exit (0x0006), the lower two bytes contains the exit code of the program that terminated most recently.

The following conditions can be used to decode the status information:

#include <tcpsp.h>
1. Testing whether the additional CPU has terminated:
fullstatus = _gs_tcpsp_status(cpu2);
if (((fullstatus >> 16) == OS9_EXIT) ||
  ((fullstatus & 0xff) != 1))
  terminated = 1;
2. Generating the standard OS-9 exit code:
fullstatus = _gs_tcpsp_status(cpu2);
if ((fullstatus >> 16) == OS9_EXIT)
  exit(fullstatus & 0xffff);
else
  exit(100 + (fullstatus & 0xff));

 

6.1.2. Run the additional CPU (low level), _ss_tcpsp_runlow()

 

Syntax

int _ss_tcpsp_runlow(path, data, pc)
int path;
char *data, *pc;
 

Function

The _ss_tcpsp_runlow function represents the simplest way to start an additional CPU under the control of the TCP/SP library. The argument path must contain a valid path number of an sctcpsp device, data is a pointer to stack memory that will be available to the additional CPU in its a7 address register and pc is the start address of a program section to be executed from the additional CPU. The additional CPU will start execution directly at the specified program counter; neither cache settings nor interrupt mask are handled from the sctcpsp driver. The program will run in supervisor mode on the additional CPU.  

CAVEATS:

1. Reset does not affect neither data nor instruction cache. Therefore, if instruction cache is enabled and not invalidated after reset and the power supply is not switched off, the CPU may continue to execute old code.

2. The code must not contain any instruction that is susceptible to exception processing. Since the additional CPUÕs vector base register still belongs to the primary CPU, any attempt to exception processing from the additional CPU will produce utter chaos. Use the _ss_tcpsp_runvbr function, if exception processing cannot be excluded or conditionally exclude critical functions (e.g. only execute the divide function if the divisor does not equal zero).

3. Data passed in the data argument and code passed in the pc argument must be allocated in such a way that the respective memory regions are not made available to other procedures when the calling program exits and the additional CPU is still working. Under OS-9, this is best done by loading the program into the module directory prior to starting it; the data and, alternatively, the code may also be stored in a data module. Shared memory may be used under LynxOS. In both operating systems, it is also possible to use an uninitialized memory section (not made available to the operating system at boot time) for such purpose.  

EXAMPLE:

#include <modes.h>
extern int errno;
void prog();
main()
{
  char *cpu2name = "/cpu2";
  int   cpu2;
  if ((cpu2=open(cpu2name, S_IREAD|S_IWRITE)) == -1)   
    exit(_errmsg(errno,
      "can't open '%s' CPU device due to ",       
      cpu2name));
  
  _ss_tcpsp_runlow(cpu2, (char *) 0, prog);
  /* no stack memory needed */
  sleep(1);
  _ss_tcpsp_stop(cpu2);
}
#asm
prog:  cinva bc ; invalidate both caches
loop  bra.s loop
#endasm

 

6.1.3. Run the additional CPU (install vbr), _ss_tcpsp_runvbr()

 

Syntax

int _ss_tcpsp_runvbr(path, data, pc)
int path;
TCPSPSTACK *data;
char *pc;
 

Function

The _ss_tcpsp_runvbr function is, in principle, similar to the above _ss_tcpsp_runlow function, but it additionally installs default exception handlers and passes the vector base register to the additional CPU. In addition, before branching to the supplied code segment, data and instruction cashes are invalidated and the vector base register of the additional CPU is set accordingly. The program will run in supervisor mode on the additional CPU. The argument path must contain a valid path number of an sctcpsp device, data is a pointer to a data structure that will be available to the additional CPU in its a7 address register (pointing to the Status variable, see below) and pc is the start address of a program section to be executed from the additional CPU. The data structure TCPSPSTACK is defined in the header file tcpsp.d and contains the following variables:

 org 0
Status   do.l 1 Pointer to the driver's status variable
Regs     do.l 8
AddrRegs do.l 8
StatReg  do.w 1
ProgCnt  do.l 1
SprStack do.l 1
UsrStack do.l 1
IRQStack do.l 1
MstStack do.l 1
VctBase  do.l 1
CachC    do.l 1
CachA    do.l 1
SrcFCod  do.l 1
DstFcod  do.l 1
XXX      do.w 1
UserData
The subsequent section is used by the _ss_tcpsp_runfrk function to realize a communication channel between the additional and the calling CPU (see below), it is available for user data in the context of the _ss_tcpsp_runvbr function.

UserData
First    do.l 1 ; Start of FIFO data
Last     do.l 1 ; End of FIFO data
Writep   do.l 1 ; FIFO write pointer
Readp    do.l 1 ; FIFO read pointer
Fifobuf  do.b 512 ; FIFO (communcation channel)
Devcon   do.l 1 ; pointer to devcon area

DataBeg  do.l 1 ; pointer to data
DataSiz  do.l 1 ; size of data
CodeBeg  do.l 1 ; pointer to program
CodeSiz  do.l 1 ; size of program
The same data structure is available as a C language header file tcpsp.h:

#include <types.h>
#include <MACHINE/reg.h>
typedef struct tcpspstack {
  unsigned char   system[256];/*  256 */
  unsigned char  *status;     /*    4 */
  REGISTERS       reg;        /*  108 */
  unsigned char   user[656];  /*  656 */
                              /* ---- */
                              /* 1024 */
} TCPSPSTACK, *TCPSPSTACKP;
Exception vector #0 points to the system globals of the primary's CPU operating system, exception vector #1 points to the initial start address of the program code for the additional CPU; all other exception vectors (2 to 255) point to a default exception handler that stops the additional CPU and writes the exception number to the lowest byte of the sctcpsp driver's Status variable. This variable can be queried from the primary CPU using the standard read command of the sctcpsp driver. User-supplied exception vectors can be installed using the _ss_sctcpsp_excpt function after the program has been started.

In addition, other generally useful definitions are included in the tcpsp.d header file:

* Default settings for the transparent translation
* registers
WrtThrghSup  equ $00ffA000
CopybackSup  equ $00ffA020
WrtThrghUsr  equ $00ff8000
CopybackUsr  equ $00ff8020

* Default setting for the translation control
* register
TC_XEnbl     equ 0x8000

* Default settings for the cache control register
 ifdef MC68040
CacheEnable  equ $80008000
CacheDisable equ $00000000
 endc
 ifdef MC68030
CacheEnable  equ $00000101
CacheDisable equ $00000000
 endc

* Floating-point control register (fpcr)
FPCR_BSUN  equ $8000
FPCR_SNAN  equ $4000
FPCR_OPERR equ $2000
FPCR_OVFL  equ $1000
FPCR_UNFL  equ $0800
FPCR_DZ    equ $0400
FPCR_INEX2 equ $0200
FPCR_INEX1 equ $0100
 

CAVEATS:

1. The additional CPU's stack pointer a7 does not point to the start of the data structure (as seen from the calling function) but to the user area.

2. Data passed in the data argument and code passed in the pc argument must be allocated in such a way that the respective memory regions are not made available to other procedures when the calling program exits and the additional CPU is still working. Under OS-9, this is best done by loading the program into the module directory prior to starting it; the data and, alternatively, the code may also be stored in a data module. Shared memory may be used under LynxOS. In both operating systems, it is also possible to use an uninitialized memory section (not made available to the operating system at boot time) for such purpose.  

SEE ALSO:

_ss_tcpsp_runlow(), _ss_tcpsp_excpt()  

EXAMPLE:

runcpu2.c

 

6.1.4. Run the additional CPU (fork module), _ss_tcpsp_runfrk()

 

Syntax

int _ss_tcpsp_runfrk(path, argv, environ)
int path;
char *argv[];
char *environ[];
 

Function

The _ss_tcpsp_runfrk function directly executes a program module on the additional CPU, i.e. it fully emulates the operating system's fork functionality for the additional CPU. The argument path must contain a valid path number of an sctcpsp device, argv is a pointer to an array of string pointers whose first element contains the name and the following elements optionally contain command line arguments for the program to be executed. The argument environ is a pointer to an array of string pointers that contain the environment variables to be made available to the forked program. The program will run in user mode on the additional CPU.

The _ss_tcpsp_runfrk call is highly dependent from the operating system; the following details only apply to the OS-9 operating system: In a first step, the _ss_tcpsp_runfrk function is very similar to the _ss_tcpsp_vbr function: default exception handlers are installed, the vector base register of the additional CPU is set and data and instruction cashes are enabled. Additionally, the _ss_tcpsp_runfrk function installs the OS-9 kernel interface (exception vector for trap #0), allocates static and stack memory and makes those pointer variables position-independent that refer to the code (program counter-relative addressing) or to data within the code (pointers to static data). The kernel trap handler contains a small selection of emulated OS-9 calls; they are primarily intended to produce code that runs on both the primary and the additional CPU without requiring recompilation. In addition, a function is made available that allows to determine the CPU number the program is running on. If a dual ported-RAM slave CPU is used, the CPU type in the system globals is set accordingly; in addition, time and date in the system globals may be updated continuously, if defined in the descriptor's DevCon area.

The following OS-9 functions are available on the additional CPU:

OS-9 functionC language callImplementation

F$CCtln.a.Emulated
F$GSPUMpn.a.Silently ignored
F$IDgetuid()Silently ignored
F$Permitn.a.Silently ignored
F$Protectn.a.Silently ignored
F$SetSys_getsys()Emulated and expanded*
F$SUsern.a.Silently ignored
F$Time_sysdate()Emulated
I$Closeclose()Partly ignored**
I$Writewrite()Mostly emulated***
I$WritLnwriteln()Mostly emulated***

*If the D_IPID variable (interprocessor identification register) is examined, the return value will be incremented by the number of the CPU the program is running on. All other globals are treated identically on both the primary and the additional CPU.

** If a path number of 0, 1 or 2 is passed, the command is silently ignored, the default exception handler is called otherwise.

*** On the Eurocom-17, the Rmon printf function may be used for string output. Therefore, the ' ' character is treated as end of string symbol if it occurs before the specified number of characters is reached, and the % sign is only considered if preceded by a backslash or specified twice. This, however, does not apply, if the FIFO channel is used for communication. In both cases, the default exception handler is called, if a path number other than 1 or 2 is passed.

If any other non-emulated OS-9 function call is specified, the default exception handler is called, i.e. the CPU is stopped and the exception number (#32) is written to the sctcpsp driver's status variable where it can be examined using its standard read function.  

CAVEATS:

1. In contrast to the _ss_tcpsp_runvbr( function, there are no data segments available to both the calling program and the program that runs on the additional CPU. Initial settings are best done using command line arguments or environment variables. A communication from the additional to the primary CPU can be achieved by interrupting the primary CPU after installing the signal to be sent using the _ss_tcpsp_signal function.

2. Standard user trap handlers cannot be installed; in consequence, neither the -i nor the -x option must be specified when compiling programs that are intended to run on the additional CPU using the _ss_tcpsp_runfrk function.

 

SEE ALSO:

_ss_tcpsp_vbr(), _ss_tcpsp_signal()  

EXAMPLE:

One of the TCP/SP example programs that run on both the primary and the additional CPU displays several strings on screen (print2.c); it essentially contains the following program elements:

static char msg[256];
main(argc, argv)
int   argc;
char *argv[];
{
  strcpy(msg, "A message from another world.);
  writeln(1, msg, strlen(msg));
  sprintf(msg, "s = %s, d = %d, x = %x, f = %f,
    "String", 123456789, 0x12345678, sqrt(10.0)); 
  writeln(1, msg, strlen(msg));
  
  if (argc > 1) {
    sprintf(msg, "Command line argument #1: %s.,
      argv[1]);
    writeln(1, msg, strlen(msg));
  }
}
In order to run the above program on an additional CPU, it must first be compiled to a binary program module print2 and made accessible to the operating system (e.g. under OS-9 be loaded into the module directory). When the following program sequence is then run on the primary CPU, the print2 program will execute on the additional CPU:

extern char **environ;
static char *argv[3];
main()
{
int cpu2;
cpu2 = open("cpu2", S_IREAD|S_IWRITE);
argv[0] = "print2";
argv[1] = "myarg";
argv[2] = (char *) 0;
_ss_tcpsp_runfrk(cpu2, argv, environ);
}

 

6.1.5. Interrupt the additional CPU, _ss_tcpsp_irq()

 

Syntax

int _ss_tcpsp_irq(path, level)
int path;
char level;
 

Function

The _ss_tcpsp_irq function interrupts the additional CPU. The argument path must contain a valid path number of an sctcpsp device, level can be any number between 1 and 7 and defines the interrupt level. This function requires that the additional CPU has been started using the _ss_tcpsp_runvbr or the _ss_tcpsp_runfrk function in order to work properly. The following exceptions are attributed to a particular interrupt level:

Interrupt levelNumber of exception vector

125
226
327
428
529
630
731

 

SEE ALSO:

_ss_tcpsp_runvbr()  

EXAMPLE:

irqcpu2.c

 

6.1.6. Install user-supplied exception handler, _ss_tcpsp_excpt()

 

Syntax

int _ss_tcpsp_excpt(path, vector, excpthndl)
int path;
unsigned char vector;
void (*excpthndl)();
 

Function

The _ss_tcpsp_excp function is used to replace the default exception handler by a specific code segment supplied by the programmer. The argument path must contain a valid path number of an sctcpsp device, vector can be any number between 2 and 255 and defines the number of the exception whose handler routine is going to be replaced and excpthandl is the start address of the new exception handler. This function requires that the additional CPU has been started using the _ss_tcpsp_runvbr or the _ss_tcpsp_runfrk function in order to work properly.  

CAVEATS:

The memory location of the newly supplied code segment must be allocated in such a way that it may not be overwritten by the system when the calling procedure is no longer active.  

SEE ALSO:

_ss_tcpsp_runvbr()

 

6.1.7. Install signal to be sent if interrupt received, _ss_tcpsp_signal()

 

Syntax

int _ss_tcpsp_signal(path, signal)
int path;
unsigned short signal;
 

Function

The _ss_tcpsp_signal allows to install a signal number that is sent from the sctcpsp driver to the calling process, if an interrupt has been received. The argument path must contain a valid path number of an sctcpsp device and signal can be any signal number between 0 and 65535. Every time a signal was sent, the signal-sending mechanism is disabled.  

SEE ALSO:

irqcpu1, sigcpu2  

EXAMPLE:

#define OURSIGNAL 3262
static int sigrecv, cpu2;
signalhandler(sig)
int sig;
{
  switch (sig) {
    case SIGQUIT:
    case SIGINT:
      exit(sig);
        default:
      sigrecv = sig;
      break;
  }
}

main()
{
  unsigned short signal = OURSIGNAL;
  cpu2 = open("cpu2", S_IREAD|S_IWRITE);
  intercept(signalhandler);
  _ss_tcpsp_signal(cpu2, signal);
  while(1) {
    sleep(0);
    if (sigrecv == signal) {
      printf("+++ our signal received);
      exit(0);
    }
    else
      printf("--- signal #%d received, sigrecv);
  }
}

 

6.1.8. Stop execution of the additional CPU, _ss_tcpsp_stop()

 

Syntax

int _ss_tcpsp_stop(path)
    int path;
 

Function

The /fI_ss_tcpsp_stop/fP function is used to immediately stop the additional CPU. The argument path must contain a valid path number of an sctcpsp device. The additional CPU can only be restarted by one of the functions _ss_tcpsp_runlow, _ss_tcpsp_runvbr or _ss_tcpsp_runfrk.

 

6.2. utcpsp.l

 

6.2.1. Permit access to memory, foolssm()

 

Syntax

void foolssm(start, len)
    char *start, int len;
 

Function

The foolssm function permits to access len bytes of physically existing memory at address start that has not been granted by the operating system.

 

6.2.2. Transform double to long, ftol()

 

Syntax

long ftol(dbl)
    double dbl;
 

Function

The ftol function transforms the double-precision floating-point number passed in dbl to the closest representation as a 32-bit integer.

 

6.2.3. Get process ID of parent process, getppid()

 

Syntax

int getppid()
 

Function

The getppid function returns the process ID of the parent process.

 

6.2.4. Print string to standard output path, print1()

 

Syntax

int print1(str)
    char *str;
 

Function

The print1 function prints the string str to standard output path.

 

6.2.5. Print string to standard error path, print2()

 

Syntax

int print2(str)
    char *str;
 

Function

The print2 function prints the string str to standard error path.

 

6.2.6. Get start address of Rmon's screen memory, rmon_start()

 

Syntax

char *rmon_start()
char *rmon_start2()
 

Function

The rmon_start( and the rmon_start2 function return the start address of Rmon's screen memory. The only difference between the rmon_start and the rmon_start2 function is that rmon_start2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.2.7. Get width of Rmon's screen display window, rmon_x()

 

Syntax

int rmon_x()
int rmon_x2()
 

Function

The rmon_x and the rmon_x2 function return the width of Rmon's screen display window. The only difference between the rmon_x and the rmon_x2 function is that rmon_x2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.2.8. Get height of Rmon's screen display window, rmon_y()

 

Syntax

int rmon_y()
int rmon_y2()
 

Function

The rmon_y and the rmon_y2 function return the height of Rmon's screen display window. The only difference between the rmon_y and the rmon_y2 function is that rmon_y2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.2.9. Get depth of Rmon's screen memory, rmon_depth()

 

Syntax

int rmon_depth()
int rmon_depth2()
 

Function

The rmon_depth and the rmon_depth2 function return the physical bit depth (1, 2, 4 or 8) of Rmon's screen memory. The only difference between the rmon_depth and the rmon_depth2 function is that rmon_depth2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.2.10. Get width of Rmon's screen memory, rmon_pitch()

 

Syntax

int rmon_pitch()
int rmon_pitch2()
 

Function

The rmon_pitch and the rmon_pitch2 function return the width of Rmon's screen memory. The only difference between the rmon_pitch() and the rmon_pitch2 function is that rmon_pitch2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.2.11. Get height of Rmon's screen memory, rmon_height()

 

Syntax

int rmon_height()
int rmon_height2()
 

Function

The rmon_height and the rmon_height2 function return the height of Rmon's screen display window. The only difference between the rmon_height and the rmon_height2 function is that rmon_height2 does not execute a foolssm function to get permission to access the video descriptor memory.

 

6.3. fplsp040.l and fplsp060.l

There are two different ways to cope with the problem that, in contrast to the MC68881/68882 co-processors, the MC68040 and 68060 processors do not provide the full set of required floating-point instructions.

One way is to use the sctcpsp040 or sctcpsp060 driver that contain floating-point exception handlers: Whenever an unimplemented floating-point instruction is encountered, the exception handler decodes the instruction, executes it in software and puts the result to the requested register as it had been executed in hardware. The control is then returned to the program.

The other way is to use the normal sctcpsp driver that does not contain a floating point exception handler but to link the program against one of two special floating-point libraries. These libraries essentially represent adaptations of Motorola's floating-point library support packages for the MC68040 and MC68060 processors, respectively. Use of these libraries avoids the overhead required to decode the original instruction in the exception handler. Therefore, an overall increase of about 50% in execution speed can be achieved as compared to using the exception handler. This library is not restricted to double-processor software. It can be used equally well on any other MC68040- or MC68060-based OS-9 system.

In principle, any call to a non-implemented floating-point instruction must be replaced by a call to the library. As a general rule, all functions are monadic and exist in three different forms - single, double, or extended precision of the input value that is expected on the user stack. The result is always returned in the fp0 floating-point data register. The functions are named in the same way as the floating-point instruction mnemonics except that the dot between instruction and precision must be omitted. For example, if the double precision sine function,

        fsin.d fp3      ; calculate sine of fp3

has to be emulated using one of the fplsp040.l or fplsp060.l libraries, the following sequence is required:

        move.l a6,-(a7) ; save a6 on stack
        fmove.l fp0,-(a7)       ; save fp0 on stack
        fmove.d fp3,-(a7)       ; push operand
        bsr fsind               ; emulate fsin.d (a7),fp0
        fmove.d fp0,fp3 ; get result
        adda.l #8,a7            ; rewind stack
        fmove.l (a7)+,fp0       ; restore fp0
        move.l (a7)+,a6 ; restore a6

If the contents of the a6 register must be preserved (which is normally the case, because it points to the base of global data), it must be saved prior to a call to the floating-point library, since the library uses a6 as stack frame and not a5 as most OS-9 compilers.

In addition to the floating-point instruction, the fplsp0x0.l libraries contain complete sets of OS-9 math calls and C library calls. The following lists give an overview about available functions:

 

6.3.1. Floating-point emulation

DoubleSingleExtended

fabsdfabssfabsx
facosdfacossfacosx
fadddfaddsfaddx
fasindfasinsfasinx
fatandfatansfatanx
fatanhdfatanhsfatanhx
fcoshdfcoshsfcoshx
fcosdfcossfcosx
fdivdfdivsfdivx
fetoxdfetoxsfetoxx
fetoxm1dfetoxm1sfetoxm1x
fgetexpdfgetexpsfgetexpx
fgetmandfgetmansfgetmanx
fintdfintsfintx
fintrzdfintrzsfintrzx
flog10dflog10sflog10x
flog2dflog2sflog2x
flogndflognsflognx
flognp1dflognp1sflognp1x
fmoddfmodsfmodx
fmuldfmulsfmulx
fnegdfnegsfnegx
fremdfremsfremx
fscaledfscalesfscalex
fsindfsinsfsinx
fsinhdfsinhsfsinhx
fsqrtdfsqrtsfsqrtx
fsubdfsubsfsubx
ftandftansftanx
ftanhdftanhsftanhx
ftentoxdftentoxsftentoxx
ftwotoxdftwotoxsftwotoxx

 

6.3.2. Available OS-9 math calls

_T$Acs _T$Asn  _T$Atn  _T$AtoD
_T$AtoF _T$AtoL _T$AtoN _T$AtoU
_T$Cos  _T$DAdd _T$DCmp _T$DDec
_T$DDiv _T$DInc _T$DInt _T$DMul
_T$DNeg _T$DNrm _T$DSub _T$DTrn
_T$DtoA _T$DtoF _T$DtoL _T$DtoU
_T$Exp  _T$FAdd _T$FCmp _T$FDec
_T$FDiv _T$FInc _T$FInt _T$FMul
_T$FNeg _T$FSub _T$FTrn _T$FtoA
_T$FtoD _T$FtoL _T$FtoU _T$Log
_T$Log10        _T$LtoD _T$LtoF _T$Power
_T$Sin  _T$Sqrt _T$Tan  _T$UtoD
_T$UtoF

 

6.3.3. Available C library functions

acos    asin    atan    ceil
cos     fabs    exp     floor
log     log10   modf    pow
sin     tan

 

7. Utility programs

All programs described in this and in the following section use the same mechanism to determine the TCP/SP device, i.e. they examine the environment variable TCPSP. If for example, the tcpspmode program is used to display the settings of the descriptor /cpu4, the follwong line must be entered before starting the program:

setenv TCPSP /cpu4

 

7.1. tcpspmode

The tcpspmode program is very similar to the standard OS-9 utility xmode, i.e. it displays the current settings of a descriptor on screen. In addition, some of the settings may be redefined as can be seen from the usage information that is obtained when tcpspmode is started with the -? option.  

Syntax

Syntax: tcpspmode [<cmds>]
Function: display and modify TCP/SP descriptor
Options:
    (none)
Commands:
    (no)rmon      Rmon output
    (no)fifo      FIFO output
    (no)msg       Start/stop messages
    (no)rb        Read block
    (no)wb        Write block
    (no)dbg       Debug logger
    (no)dump      Register dump
    (no)dc        Data cache
    (no)ic        Instruction cache
The following example output has been obtained with the cpu4 descriptor for the IC-40 slave CPU:
TCP/SP descriptor 'cpu4':
Reset   register at 0x96000203, width 1, reset 0xFF
Control register at 0x96000207, width 1, start 0x8, stop 0x0
IRQ     register not available
IACK    register not available
Data Cache ON, Instruction cache ON
Rmon off, FIFO ON, Log ON, Rblock ON, Wblock off, Debug off, Dump off
Modules may be copied into dual-ported RAM
Slave time and date globals are updated
                        Start address      End address + 1
Dual-ported RAM data     0x94000000           0x94100000
Dual-ported RAM code     0x94100000           0x94180000
Local       RAM data     0x00180000           0x001C0000
Local       RAM code     0x001C0000           0x00200000
Offset to access DRAM without cache 0x04000000 (master), 0x00000000 (slave)
Slave CPU has MC68040 processor
TT registers    DTT0        ITT0        DTT1        ITT1
             0x00000000  0x00000000  0x00FF8020  0x00FF8000

 

7.2. shell2

The shell2 program is, essentially, a combination of the forkcpu2, fifo2 and exit2 programs described below: A program module passed as the first argument is loaded into memory and started to run on the additional CPU. The FIFO communication port of this CPU is then monitored continuously, and any arriving message is displayed on screen. Should the program stop - either because a regular F$Exit call is encountered or because of a program error, shell2 displays, if any, the appropriate error message and returns control to the caller. If the program on the additional CPU produces a lot of messages, it may be necessary to enable write blocking in the TCP/SP descriptor.  

Syntax

Syntax: shell2 <name> <args>
Function: run program <name> with arguments <args>
          on slave CPU and show output
Options:
    (none)

 

8. Example programs

 

8.1. Programs for the primary CPU

 

8.1.1. runcpu2.c

The runcpu2 program runs on the master CPU and exemplifies several ways to start the additional CPU from the master CPU and to share stack memory between them. Its source code is included in the TCP/SP software and is located in the SRC directory. The program contains a variety of small assembler program sections to be run on the additional CPU:

Name Example forRun mode

crash.aException/interrupt handling_ss_tcpsp_vbr()
flash.aGraphic demo, interrupt_ss_tcpsp_vbr()
copy.aGraphic demo, high speed_ss_tcpsp_low()
invert.aGraphic demo, high speed_ss_tcpsp_low()
shift.aGraphic demo, global data_ss_tcpsp_low()
mgrey.aGraphic demo, global data_ss_tcpsp_low()
vector.aGraphic demo, global data_ss_tcpsp_low()
edge.aGraphic demo, use of the FPU_ss_tcpsp_low()
ipp.aGraphic demo, high speed_ss_tcpsp_low()
cfg.aGraphic demo, color to gray_ss_tcpsp_low()
poll.aData polling_ss_tcpsp_low()
dpr.aas above, dual-ported RAM_ss_tcpsp_vbr()
irq.aInterrupt management_ss_tcpsp_low()
tas.aSynchronization_ss_tcpsp_low()
fplsp.aFPU library emulation_ss_tcpsp_low()
fexpt.aFPU exception error_ss_tcpsp_vbr()
fline.aFPU exception handler_ss_tcpsp_vbr()

The code to be executed on the additional CPU is always linked to the runcpu2 program code. In nearly all cases (except the last example in the crash function), the additional CPU will still run when the runcpu2 program stops execution; adequate measures must, therefore, be taken that the instruction memory is not returned to the operating system. Under OS-9, this is best done by loading the runcpu2 program into the module directory prior to starting it or by copying the code segment into a data module. Shared memory may be used under LynxOS. In both operating systems, it is also possible to use an uninitialized memory section (not made available to the operating system at boot time) for such purpose. The runcpu2 program exemplifies the use of a data module (OS-9) to assign a global data section that can be accessed by both the primary and the additional CPU.

In the following, two examples from the above-named programs are given, one example for a program (copy.a) to be started with the function _ss_tcpsp_runlow() and one example for a program (crash.a) to be started with the _ss_tcpsp_runvbr function.  

copy.a

This program copies the upper left part of the Eurocom-17 screen to its upper right part. Since it is intended to be run using the _ss_tcpsp_runlow function and this function does not provide any initial settings, the program must take care of specific CPU settings such as enabling data and instruction cache.

 use "cpu2.d"
 psect copy_a,0,0,0,0,0
copy:
 cinva bc ; reset does NOT invalidate the caches
 move.l #CacheEnable,d0 
 movec d0,cacr
 bsr.l rmon_start2 ; get upper left edge of video RAM
 move.l d0,a4
 movea.l a4,a5
 bsr.l rmon_pitch2 ; get bytes per video line
 move.l d0,d2
 lsr.l #1,d0
 add.l d0,a5 ; calculate upper left edge of destination window
 lsr.l #4,d0
 subq.l #1,d0
 move.l d0,d3 ; number of 16-byte blocks (move16) for dbra
restart
 nop
 cpusha dc
 nop 
 move.l a4,a2
 move.l a5,a3
 bsr.l rmon_acqheight2
 subq.l #1,d0
 move.l d0,d1
loop1
 move.l a2,a0
 move.l a3,a1
 move.l d3,d0
loop2
 move16 (a0)+,(a1)+
 dbra d0,loop2
 add.l d2,a2
 add.l d2,a3
 dbra d1,loop1
 bra.s restart
 ends
 

crash.a

This program exemplifies exception handling. Since it is intended to be run using the _ss_tcpsp_runvbr function and this function already enables the cashes, sets the interrupt mask and installs the vector base register, the program must not take care of any specific CPU settings. By default, the program will force vector 5 exception processing (division by zero).
 use "cpu2.d"
 psect crash_a,0,0,0,0,0
*
CrashAddr equ $87654321
*
no1010
 dc.w $a100
*
no1111
 dc.w $f100
*
crash:
*
* this will (probably) force vector 2 (bus error) processing
 tst.l CrashAddr
*
* this will force vector 3 (address error) processing
* lea.l crash(pc),a0
* jsr 1(a0)*
*
* this will force vector 4 (illegal instruction) processing
* illegal
*
* this will force vector 5 (divide by zero) processing
* move.l #0,d0
* move.l d0,d1
* divu d0,d1
*
* this will force vector 10 (unimplemented A-line) processing
* lea.l no1010(pc),a0
* jmp (a0)
*
* this will force vector 11 (unimplemented F-line) processing
* lea.l no1111(pc),a0
* jmp (a0)
*
* this will force vector 52 (OPERR) processing
* move.l #$ff00,d0 ; enable floating-point exceptions
* fmove.l d0,fpcr
* fmove.d #-1.0,fp0
* fsqrt.x fp0,fp1
* fmove.d fp1,UserData(a7)*
*
* wait for interrupts
* CAVEATS: any code below this line must be memory-resident
*          unless the additional CPU is stopped!
loop
 bra.s loop
*
 ends

 

8.1.2. fifo2.c

The fifo2 program reads the current contents from the FIFO communication channel and diplays them on screen.

 

8.1.3. forkcpu2.c

The forkcpu2 program allows to specify a linked program module and, optionally, command line arguments that are passed to the sctcpsp driver via the _ss_tcpsp_runfrk function. This program is loaded into memory and its name passed to the TCP/SP library; the respective TCP/SP function simulates the kernel's fork command and lets the program be executed on the additional CPU.

The program to be executed on the additional CPU must be a standard program module in 68xx0 object code but there is no restriction to the language it is compiled from. It must, however, be noted that most kernel calls are not available (refer to 6.1.4. for details) and that the program does not make use of any user trap handlers. The following section contains a number of example programs (demo programs with string output, communication and graphic programs) that are part of the TCP/SP software; except the irqcpu2 and the sigcpu2 program, they run on both the primary and on the additional CPU without modifications.

 

8.1.4. status2.c

The status2 program displays the current status of the additional CPU; the algorithm given in section 6.1.1. is used.

 

8.2. Demo programs with string output

 

8.2.1. args2.c

The args2 program displays the command line arguments and the current environment variables on screen.

 

8.2.2. float2.c

The float2 program performs a number of repeated floating-point operations; it is mainly intended for testing and benchmark purposes.

 

8.2.3. getsys2.c

The getsys2 program is specific to OS-9; it examines all system variables and displays their current settings. With the exception of the interprocessor identification variable (D_IPID), all variables have the same settings irrespective of whether the program was started on the primary or on an additional CPU.

 

8.2.4. pi2.c

The pi2 program calculates the value of ¹ with a given precision (default 75, maximum 1000) and displays the result on screen.

 

8.2.5. print2.c

The print2 program exemplifies string output on screen. The relevant part of the code is included in the example to section 6.1.4.

 

8.2.6. time2.c

The time2 program shows the use of the F$Time function (OS-9 only) on both the primary and the additional CPU. The following program section displays Julian time and date information on screen:

#define JULIAN_TICK 3
  strcpy(msg, "Julian format:);
  writeln(1, msg, strlen(msg));
  _sysdate(JULIAN_TICK, &time, &date, &day, &tick);
  sprintf(msg, 
    "Day #%d, %d seconds since midnight,
    date, time);
  writeln(1, msg, strlen(msg));
  strcpy(msg, ");
  writeln(1, msg, strlen(msg));
  strcpy(msg, "System heartbeat:);
  writeln(1, msg, strlen(msg));
  sprintf(msg,
    "Current value = %d (%d ticks per second),
    tick & 0xffff, tick >> 16);
  writeln(1, msg, strlen(msg));

 

8.2.7. whereami.c

The whereami program displays the absolute addresses of several code and data references. It is intended to test the descriptor settings that define where code and data is localized (in dual-ported RAM or in local memory).

 

8.2.8. whichcpu.c

The whichcpu program exemplifies the determination of the CPU number the program is currently running on. It essentially contains the following program section:

  sprintf(msg, "This program runs on CPU #%d.,
    _getsys(D_IPID, 4));
  writeln(1, msg, strlen(msg));

 

8.3. Communication programs

 

8.3.1. irqcpu1.c

The irqcpu1 program interrupts the primary CPU, the example uses autovector #1. This interrupt will be translated into a signal to be sent to an application that has installed the request using the _ss_tcpsp_signal() function (OS-9 only). The signal is, however, only sent if the cpu2 descriptor contains the correct vector number. The default descriptor that is part of the TCP/SP software specifies autovector #1 (vector number 25). The following steps are needed to test the interrupt handling from the primary and from the additional CPU to the primary CPU:

        -       Start the sigcpu2 program (preferably from a separate key-
        board or from a separate MGR or X window). A valid signal
        number must be specified as command line option:

        $ sigcpu2 100

        -       Send a signal other than the one specified in sigcpu2Õs com-
        mand line argument to the sigcpu2 program (process number
        obtained using the procs utility), e.g. 

        $ procs ! grep sigcpu2 ! cut -f=1
        23
        $ kill -101 23
        The sigcpu2 program then displays

          --- signal #101 received

        on screen and remains active.

        -       Start the irqcpu1 program from a shell of the primary CPU:

        $ irqcpu1

        The sigcpu2 program then displays

          +++ our signal received

        on screen and aborts.

        -       Restart the sigcpu2 program:

        $ sigcpu2 100

        -       Start the irqcpu1 program on the additional CPU:

        $ forkcpu2 irqcpu1

        The sigcpu2 program then displays

          +++ our signal received

        on screen and aborts.

 

8.3.2. irqcpu2.c

The irqcpu2 program interrupts the additional CPU at a level that is passed as command line argument.

 

8.3.3. sigcpu2.c

The sigcpu2 displays a message on screen whenever a signal has been received. It only aborts when interrupted from keyboard or when the signal number specified as command line option has been received, it continues waiting in a loop otherwise (see above).

 

8.4. Graphic programs

 

8.4.1. clear2.c

The clear2 program sets a defined memory section to a given value. By default, the program clears the screen on the Eurocom-17 computer.

 

8.4.2. bounce2.c

The bounce2 program is a graphic demo and part of the MGR window managerÕs development software. It was modified to be used without MGR; this version only runs in the Eurocom-17 on-board video memory.

 

8.4.3. bounce3.c

The bounce3 program is the same as bounce2 with the exception that it is intended to run on an Sl-30 slave CPU and to produce the graphics in a Eurocom-17 screen.

 

8.4.4. bounce4.c

The bounce4 program is the same as bounce2 with the exception that it is intended to run on an IC-40 slave CPU.

 

8.4.5. muncher2.c

The muncher2 program is a graphic demo and part of the MGR window manager's development software. It was modified to be used without MGR; this version only runs in the Eurocom-17 on-board video memory.

 

8.4.6. muncher4.c

The muncher4 program is the same as muncher2 with the exception that it is intended to run on an IC-40 slave CPU.

 

8.4.7. video4.c

The video4 program reduces the graphic content from the upper left part of the Eurocom-17 screen by 4 and copies this reduced image four times, one upon the other and side by side, into the upper right part of the screen.

 

8.5. MGR programs

 

8.5.1. mimg

The mimg program runs on a Eurocom-17 CPU and requires the frame grabber IPIN-1900 installed on the local extension bus. In addition, the MGR window manager is needed. A selection of on-line image processing procedures are available (left, below) that have been realized using TCP/SP library functions. The source code is part of the TCP/SP software distribution and is located in the MGR/APPL/MIMG directory. It can, however, only be compiled, if header files and libraries of the IPIN-1900 tool package and of the MGR Professional Plus development packages are available. The settings in the makefile assume that these packages are installed in the /h0/IPIN1900 and /h0/MGR directories, respectively.

 

8.5.2. mcpu2

The mcpu2 program continuously displays the state of an additional CPU and decodes the status variable should exception processing have occurred. In addition, the most recent service call to the operating system is decoded and shown in string form. When the program has terminated, its exit status is also shown. The TCP/SP device being monitored is given in the title bar of the MGR window. The source code is part of the TCP/SP software distribution and is located in the MGR/APPL/MIMG directory. It can, however, only be compiled, if header files and libraries of the MGR Professional Plus development package are available. The settings in the makefile assume that the MGR directory is /h0/MGR.
 

Index

TCP/SP 1.1
Tightly-Coupled Processors Support Package
Running under the operating system OS-9
Prepared to be ported to other operating systems
Distributed by ELTEC elektronik, Mainz
Copyright
1. Introduction
Parallel processing under OS-9
Single-task projects
Multi-tasking projects
1.1. Independent OS-9 systems
Advantages and disadvantages
1.2. The Doubler
Advantages and disadvantages
1.3. Driver interface (Tightly-Coupled Processors Support Package)
Advantages and disadvantages
1.4. Conclusion
2. Upgrade changes from TCP/SP version 1.0 to 1.1
2.1. Dual-ported RAM communication channel
2.2. Support of more members of the 68k processor family
2.3. Floating-point support package and floating-point library support package for 68040 and 68060
2.4. Enhancements for both tightly-coupled processors and processors connected via dual-ported RAM
2.5. Enhancements for processors in dual-ported RAM mode
3. Principle and distribution of the TCP/SP software
4. Installation
4.1. File structure
4.2. Driver and descriptor
4.3. First test
5. Device configuration
5.1. Data structure
$00 DC$ResWidth, Byte
$01 DC$ConWidth, Byte
$02 DC$IRQWidth, Byte
$03 DC$IAckWidth, Byte
$04 DC$ResReg, Longword
$08 DC$ResDo, Longword
$0C DC$ConReg, Longword
$10 DC$ConStart, Longword
$14 DC$ConStop, Longword
$18 DC$IRQReg, Longword
$1C DC$IRQShift, Longword
$20 DC$IRQReg, Longword
$24 DC$IRQDo, Longword
$28 DC$Cache, Longword
$2C DC$Outmode, Longword
$30 DC$Memory, Longword
$34 DC$PC2, Longword
$38 DC$DPRDatBeg, Longword
$3C DC$DPRDatEnd, Longword
$40 DC$DPRCodBeg, Longword
$44 DC$DPRCodEnd, Longword
$48 DC$LocDatBeg, Longword
$4C DC$LocDatEnd, Longword
$50 DC$LocCodBeg, Longword
$54 DC$LocCodEnd, Longword
$58 DC$Mirror1, Longword
$5C DC$Mirror2, Longword
$60 DC$CPUType2, Longword
$64 DC$DTT0, Longword
$68 DC$ITT0, Longword
$6C DC$DTT1, Longword
$70 DC$ITT1, Longword
5.2. Examples
6. Libraries
6.1. tcpsp.l
6.1.1. Get status of the additional CPU, _ss_tcpsp_status()
Syntax
Function
6.1.2. Run the additional CPU (low level), _ss_tcpsp_runlow()
Syntax
Function
CAVEATS:
EXAMPLE:
6.1.3. Run the additional CPU (install vbr), _ss_tcpsp_runvbr()
Syntax
Function
CAVEATS:
SEE ALSO:
EXAMPLE:
6.1.4. Run the additional CPU (fork module), _ss_tcpsp_runfrk()
Syntax
Function
CAVEATS:
SEE ALSO:
EXAMPLE:
6.1.5. Interrupt the additional CPU, _ss_tcpsp_irq()
Syntax
Function
SEE ALSO:
EXAMPLE:
6.1.6. Install user-supplied exception handler, _ss_tcpsp_excpt()
Syntax
Function
CAVEATS:
SEE ALSO:
6.1.7. Install signal to be sent if interrupt received, _ss_tcpsp_signal()
Syntax
Function
SEE ALSO:
EXAMPLE:
6.1.8. Stop execution of the additional CPU, _ss_tcpsp_stop()
Syntax
Function
6.2. utcpsp.l
6.2.1. Permit access to memory, foolssm()
Syntax
Function
6.2.2. Transform double to long, ftol()
Syntax
Function
6.2.3. Get process ID of parent process, getppid()
Syntax
Function
6.2.4. Print string to standard output path, print1()
Syntax
Function
6.2.5. Print string to standard error path, print2()
Syntax
Function
6.2.6. Get start address of Rmon's screen memory, rmon_start()
Syntax
Function
6.2.7. Get width of Rmon's screen display window, rmon_x()
Syntax
Function
6.2.8. Get height of Rmon's screen display window, rmon_y()
Syntax
Function
6.2.9. Get depth of Rmon's screen memory, rmon_depth()
Syntax
Function
6.2.10. Get width of Rmon's screen memory, rmon_pitch()
Syntax
Function
6.2.11. Get height of Rmon's screen memory, rmon_height()
Syntax
Function
6.3. fplsp040.l and fplsp060.l
6.3.1. Floating-point emulation
6.3.2. Available OS-9 math calls
6.3.3. Available C library functions
7. Utility programs
7.1. tcpspmode
Syntax
7.2. shell2
Syntax
8. Example programs
8.1. Programs for the primary CPU
8.1.1. runcpu2.c
copy.a
crash.a
8.1.2. fifo2.c
8.1.3. forkcpu2.c
8.1.4. status2.c
8.2. Demo programs with string output
8.2.1. args2.c
8.2.2. float2.c
8.2.3. getsys2.c
8.2.4. pi2.c
8.2.5. print2.c
8.2.6. time2.c
8.2.7. whereami.c
8.2.8. whichcpu.c
8.3. Communication programs
8.3.1. irqcpu1.c
8.3.2. irqcpu2.c
8.3.3. sigcpu2.c
8.4. Graphic programs
8.4.1. clear2.c
8.4.2. bounce2.c
8.4.3. bounce3.c
8.4.4. bounce4.c
8.4.5. muncher2.c
8.4.6. muncher4.c
8.4.7. video4.c
8.5. MGR programs
8.5.1. mimg
8.5.2. mcpu2

This document was created by man2html.