
|
This project is concerned with porting Linux to the AP1000+ and to add
appropriate multi-processor extensions to support parallel programs.
We are basing our port on the work done on
SparcLinux
The AP1000+ located in the department of computer science currently has 16 nodes, where each node is a 50MHz TI Viking with 16Mbytes of memory. An upgrade in the future will provide 32 nodes, each with 64 Mbytes of memory configuration and a disk array.
|
We also have a single FDDI board designed and built locally by Paul Mackerras. Its currently connected to cell 8, and is on a ring that includes the major high performance computers on campus.
We still have our "AP1000" system, which has 128 Sparc1+ cpus, but as its doesn't have any MMUs its pretty useless for linux.
A picture of the team working on this project standing in front of the AP1000+ can be found here and here.

If you have an account on the CAP system, then each node can be accessed as a normal remote linux host. You may find, however, that a number of standard tools and applications are not available, as it is standard policy that hibana not be used as general computing resource. The main focus is, of course,
Account requests on CAP systems at the department can be mailed to captain@cafe.anu.edu.au.

AP/Linux - A modern OS for the AP1000+. This is a conference paper presented at PCW96 in Japan. It is available as latex, dvi and postscript

15th March: Single cell now boots and mounts root filesystem via NFS. Emacs and netscape can run on a single node via X!
17th March: Linux/AP+ recompiled itself from scratch. kgdb works.
22nd March: Interrupt-driven DMA now working and stable between cells and host. Host can send to cells at 1MB/s, bottleneck appears to be apnet driver on host.
22nd March: Linux now running on all 16 CPUs, with separate IP addresses on each cell. We are getting 3.7MB/sec using tcp between the cells. The cells share all filesystems from the front end over NFS, and have separate /var mounts to keep cell specific stuff apart.
23rd March: FDDI board starts receiving packets from the ring, but can't send yet.
25th March: added high resolution timer code. Changed bif device to avoid copying on send and receive. rcp now runs at 6.7MB/sec between cells. ftp has dropped to 50k/sec (why??)
26th March: ftp, telnet etc now work over the FDDI board. Doesn't handle FDDI arp/rarp.
27th March: ARP/RARP works over FDDI.
28th March: Can now NFS-boot using the FDDI interface.
30th March: Wrote a apblock device which stores its data on the front end. This allows us to swap and setup a real filesystem (ext2) for /tmp so configure scripts that don't like NFS are happy.
31st March: Updated to linux 1.3.77 (using David Millers changes)
2nd April: Can now use DMA to copy frames to FDDI buffer. 7.5Mbytes/sec is the highest transfer rate measured when sending from the AP1000+.
4th April: Updated to linux 1.3.83, and integrated a lot of our code with davem's main sparclinux tree.
6th April: Added extensions to clone() and the sparc context handling to support a simple parallel process launch mechanism.
7th April: First parallel program runs, launched using the prun utility, and new CLONE_OPT interface to clone().
11th April: Initial version of MSC driver written. Crashme now runs without killing the OS.
12th April: First successful message passed between cells over the Tnet.
17th April: First MPI program which uses the Tnet was run.
22nd April: APlib parallel sorting application runs under Linux/AP+. Gives NFS an enormous workout due to I/O requirements!
23rd April: MPI blackhole applications runs, with X stuff and file I/O working happily on one of the cells. Also got PCCM2 (climate model) to work using MPI and Sun's f77 compiler which uses mmap() for Fortran I/O. This gave the swapping system an excellent workout ;)
24th April: Ran two blackhole applications simulatenously, each using all 16 cells, but using different contexts and ring buffers for communication. The MPI and APlib were dynamically linked in the blackhole application requiring the text of these libraries to be loaded only once.
29th April: More than 3 parallel programs can run at once, sharing the 3 physical ringbuffer registers. Signal driven messaging is working. A simple gang scheduler has been written which greatly improves performance when lots of parallel programs are run at once.
10th May: Ran on a 64 cell machine, with 64MB of ram per cell.

You could also joing the linux-mc mailing list, which discusses linux on multicomputers. To join send a email to listproc@samba.anu.edu.au with no subject and a body of "subscribe linux-mc Your Name".
