Welcome to the Linux/AP+ Home Page!


[A Picture Of Us]
This project is concerned with porting Linux to the AP1000+ and to add appropriate multi-processor extensions to support parallel programs. We are basing our port on the work done on SparcLinux

The AP1000+ located in the department of computer science currently has 16 nodes, where each node is a 50MHz TI Viking with 16Mbytes of memory. An upgrade in the future will provide 32 nodes, each with 64 Mbytes of memory configuration and a disk array.


About the AP1000+

The AP1000+ is a distributed memory multi-computer. Its built by Fujitsu and is in a similar class to the CM5 and T3D systems common in parallel supercomputer labs. It is based around 50MHz SuperSparc cpus, but has custom networking and disk controller systems.

Our current system is the minimum configuration of 16 cpus each with 16MB of ram. We will be upgrading this within a month or so to 32 cpus with 64MB of ram each and 128GB of (distributed) disk. A "full" configuration would have 1024 cpus each with 64MB of ram, but would be very expensive :-)

The Tnet is a 2D torus network which runs at around 25MB/sec, and has very low latency (around 10us). The network is wormhole routed in hardware, so hops cost very little.

[Picture Of The AP1000+] The CPUs are connected using two main networks (plus a status net). The Bnet is a broadcast network and runs at around 50MB/sec.

This is also the network used to talk to the front end processor (a sparc 10 running SunOS connected via a Sbus interface) [Picture Of The AP1000+]

We also have a single FDDI board designed and built locally by Paul Mackerras. Its currently connected to cell 8, and is on a ring that includes the major high performance computers on campus.

We still have our "AP1000" system, which has 128 Sparc1+ cpus, but as its doesn't have any MMUs its pretty useless for linux.

A picture of the team working on this project standing in front of the AP1000+ can be found here and here.


Using the AP1000+ under APLinux


The name of our 16 node AP+ is hibana , and the hostname of individual nodes are hibana0, hibana1, ... , hibana15 . Each node has a 4 Gbyte disk attached with a 2 Gbyte data partition. Other file systems, including home directories, are mounted via NFS, with the associated bandwidth limitations.

If you have an account on the CAP system, then each node can be accessed as a normal remote linux host. You may find, however, that a number of standard tools and applications are not available, as it is standard policy that hibana not be used as general computing resource. The main focus is, of course,

Running Parallel Programs

Account requests on CAP systems at the department can be mailed to captain@cafe.anu.edu.au.


Publications


AP/Linux - Initial Implementation. This is a 9 page technical report briefly describing our progress up until May 1996. It is available as latex, dvi and postscript

AP/Linux - A modern OS for the AP1000+. This is a conference paper presented at PCW96 in Japan. It is available as latex, dvi and postscript


Project Milestones (1996)


8th March: Downloaded LinuxSPARC sources.

15th March: Single cell now boots and mounts root filesystem via NFS. Emacs and netscape can run on a single node via X!

17th March: Linux/AP+ recompiled itself from scratch. kgdb works.

22nd March: Interrupt-driven DMA now working and stable between cells and host. Host can send to cells at 1MB/s, bottleneck appears to be apnet driver on host.

22nd March: Linux now running on all 16 CPUs, with separate IP addresses on each cell. We are getting 3.7MB/sec using tcp between the cells. The cells share all filesystems from the front end over NFS, and have separate /var mounts to keep cell specific stuff apart.

23rd March: FDDI board starts receiving packets from the ring, but can't send yet.

25th March: added high resolution timer code. Changed bif device to avoid copying on send and receive. rcp now runs at 6.7MB/sec between cells. ftp has dropped to 50k/sec (why??)

26th March: ftp, telnet etc now work over the FDDI board. Doesn't handle FDDI arp/rarp.

27th March: ARP/RARP works over FDDI.

28th March: Can now NFS-boot using the FDDI interface.

30th March: Wrote a apblock device which stores its data on the front end. This allows us to swap and setup a real filesystem (ext2) for /tmp so configure scripts that don't like NFS are happy.

31st March: Updated to linux 1.3.77 (using David Millers changes)

2nd April: Can now use DMA to copy frames to FDDI buffer. 7.5Mbytes/sec is the highest transfer rate measured when sending from the AP1000+.

4th April: Updated to linux 1.3.83, and integrated a lot of our code with davem's main sparclinux tree.

6th April: Added extensions to clone() and the sparc context handling to support a simple parallel process launch mechanism.

7th April: First parallel program runs, launched using the prun utility, and new CLONE_OPT interface to clone().

11th April: Initial version of MSC driver written. Crashme now runs without killing the OS.

12th April: First successful message passed between cells over the Tnet.

17th April: First MPI program which uses the Tnet was run.

22nd April: APlib parallel sorting application runs under Linux/AP+. Gives NFS an enormous workout due to I/O requirements!

23rd April: MPI blackhole applications runs, with X stuff and file I/O working happily on one of the cells. Also got PCCM2 (climate model) to work using MPI and Sun's f77 compiler which uses mmap() for Fortran I/O. This gave the swapping system an excellent workout ;)

24th April: Ran two blackhole applications simulatenously, each using all 16 cells, but using different contexts and ring buffers for communication. The MPI and APlib were dynamically linked in the blackhole application requiring the text of these libraries to be loaded only once.

29th April: More than 3 parallel programs can run at once, sharing the 3 physical ringbuffer registers. Signal driven messaging is working. A simple gang scheduler has been written which greatly improves performance when lots of parallel programs are run at once.

10th May: Ran on a 64 cell machine, with 64MB of ram per cell.


More Information


Any queries regarding the project can be sent to hackers@cafe.anu.edu.au.

You could also joing the linux-mc mailing list, which discusses linux on multicomputers. To join send a email to listproc@samba.anu.edu.au with no subject and a body of "subscribe linux-mc Your Name".


Other Sites