Summer 2007 Update
Large memory jobs:
Do you have the need to run jobs with a large memory footprint? We are currently installing an 8-core system with 64 GB of memory. This machine should be available by September, 2007. If you would like to use this resource, please contact us so that we can give you access to the appropriate queue.
Looking at the current queue structure, it appears that serial jobs can only be run on nodes with two single-cores. On these nodes, the memory is 2GB/core. You can run serial jobs that require more memory. If the job requires between 2 and 4GB, just request a two processor job from the 48 hour queue. If it requires between 4 and 12GB, use the parallel quad queue and request 8 processors on a single node. If it requires more than 12GB, then use the 64 GB node described above and request 8 processors.
| Type Of request | Lsf Command |
| A two processor job on the 48 hour queue | #BSUB q 48Hpar #BSUB n 2 #BSUB r span[ptile=2] |
| An eight processor job on the 48 hour parallel quad queue | #BSUB q 48Hquadpar #BSUB n 8 #BSUB r span[ptile=8] |
| A large memory job on the 64GB node | #BSUB q HiMem #BSUB n 8 #BSUB r span[ptile=8] |
For those of you with a historic perspective our fist HPC machine Pleione (purchased in 1999) was 100 times more expensive with less performance than this new 8-core machine.
Performance on new dual quad-core nodes:
On Hrothgar there are 64 nodes with two quad-core sockets. These eight core nodes are used for running parallel jobs. Several users have noticed that jobs on these nodes do not scale well from four to eight way parallel jobs. This is unfortunately the case for most jobs as the memory bandwidth (to the cores) did not double when Intel doubled the number of cores per processor. A major determinant of the performance is the physical layout in memory of the data for a given application. This is not something that the user has control over.
Performance of dual quad cores on an MPI parallel computational chemistry code:
| Cores | Elapsed Time in Hours(Lower is better) |
| 1 | 5.6 |
| 2 | 3.1 |
| 4 | 1.8 |
| 8 | 1.6 |
The lack of scaling from 4 to 8 cores is due to the memory-to-core bottleneck that we have been discussing. It is interesting to note that if two copies of this application are run on 4 cores each of a dual quad core system then they both finish in 1.96 hours. We acknowledge the Bill Hase chemistry group and especially Dr. U. Louderaj for running these examples.