Estimating Simulation Requirements
Introduction
The following benchmark data gives the time and memory needed to run several specific FDS jobs to completion. Based on the properties of a particular model and the computing hardware used, this information should help users to make informed estimates about the run-time and memory requirements of models with similar properties.
Test Simulations
Name | Cells | Meshes | Time (min) |
burner (image, PSM, FDS) | 5,400 | 1 | 5 |
cable_tray (image, PSM, FDS) | 19,200 | 1 | 5 |
room_fire (image, PSM, FDS) | 67,392 | 1 | 5 |
switchgear (image, PSM, FDS) | 144,000 | 2 | 5 |
multifloor (image, PSM, FDS) | 228,000 | 3 | 5 |
multifloor_hi (image, PSM, FDS) | 1,026,000 | 4 | 5 |
Cells – The total number of cells in all simulation meshes.
Meshes – The number of simulation meshes.
Time – The simulation end time.
Computer Hardware
Id | Processor | CPUs | RAM | OS |
IntelD | Intel Pentium D @2.8 GHz | 2 | 2 GB | XP Pro 32-bit |
IntelC24 | Intel Core2 Quad @2.4 GHz | 4 | 8 GB | Vista Business 64-bit |
The CPUs column indicates how many jobs can be run in parallel by the processor.
Output and Memory Requirements
The following table shows the storage (output) and memory requirements of each simulation.
Memory | ||||
Input File | Output (MB) | Serial (MB) | OpenMP (MB) | MPI (MB) |
burner | 9 | 12 | 12 | 15 |
cable_tray | 255 | 32 | 34 | 35 |
room_fire | 631 | 81 | 82 | 84 |
switchgear | 797 | 179 | 172 | 196 |
multifloor | 964 | 328 | 329 | 371 |
multifloor_hi | 2540 | 1053 | x | 1160 |
Restart files were the largest, followed by boundary (BF), and 3D Smoke (S3D) output.
The unexpected reduction in memory usage when switching from serial to OpenMP for the switchgear problem occurred on both test machines. With this exception, there was a consistent relationship where Mserial <= MOpenMP <= MMPI. For MPI runs, the given value is the sum of memory usage for all running FDS processes.
Memory usage information for the OpenMP executable for the multifloor_hi simulation is not available. The tested version of this simulator crashed on startup on both machines.
Simulation Run Time
The following table shows how long each problem took to run. Since all simulations were set to run for 5 minutes of simulated time, these values can be compared directly. No special effort was taken to optimize the performance of each problem for MPI (i.e. mesh count wasn’t matched to CPU count, meshes were not balanced, etc).
IntelD | IntelC2Q | |||||
Input File | Serial (min) | OpenMP (min) | MPI (min) | Serial (min) | OpenMP (min) | MPI (min) |
burner | 17 | 12 | 16 | 10 | 6 | 10 |
cable_tray | 24 | 18 | 23 | 14 | 12 | 15 |
room_fire | 92 | 62 | 78 | 52 | 35 | 53 |
switchgear | 408 | 261 | 292 | 233 | 140 | 201 |
multifloor | 893 | 626 | 512 | 557 | 350 | 249 |
multifloor_hi | 7927 | x | 5785 | 5536 | x | 3510 |
Memory usage information for the OpenMP executable for the multifloor_hi simulation is not available. The tested version of this simulator crashed on startup on both machines.
Estimations
These benchmarks lead to the following suggestions for estimating resource requirements for serial simulations on similar computing hardware:
Cells | Sim Time (min) | Run Time (hrs) | Memory (GB) | Output (GB) |
500,000 | 5 | 33 | 0.5 | 1.5 |
1,000,000 | 5 | 100 | 1 | 2.5 |
2,000,000 | 5 | 300 | 2 | 4.5 |
As you increase simulation time, the increase in run time and output is linear. Memory requirements are not affected by the duration of the simulation.
These estimates are suggested for serial simulation runs. Memory requirements are slightly higher for OpenMP and MPI simulation runs. To convert from a serial time estimate to an OpenMP estimate, reduce the run time by 30-35% for 2 CPUs, 35-40% for 4 CPUs.
To convert from a serial time estimate to an MPI time estimate, first figure out the number of “effective” processors you will be using. If you have 4 CPUs and 16 meshes, you will benefit from all 4 processors. If you have 4 CPUs and 3 meshes, you’ll only be using 3 processors. For a very optimistic estimate, divide by that number. For a more realistic estimate, you can handicap the effectiveness based on how evenly-divided the meshes are. If one has more cells than the others (e.g. 4 CPUs, 5 meshes), this process will control the simulation time.
These estimations were created by averaging the serial results and fitting the resulting curve for each quantity in Excel. The run time estimate used a polynomial curve and the memory and output estimates were based on a linear trend line. Since there was only one data point with over 1M cells, this simulation carried great weight in the estimation and additional results from large simulations could substantially alter estimated times.
Procedures
These tests were performed on the workstation computing hardware at Thunderhead Engineering. Often, while running a particular benchmark, the computer was in use for daily work activities. The level of scientific rigor used to gather this data was relatively low and intended only as a basis for rough estimates of the resources needed to run comparable simulations.
The run time of a simulation was calculated based on the total execution time of the FDS simulation process. When the process was launched, the clock started, when FDS terminated, the clock stopped.
The memory used by a simulation was calculated by looking at the Peak Working Set column in Windows Vista Task Manager for the FDS process. On XP, this column was labeled Peak Mem Usage. Since the memory data goes is no longer shown in the task manager after the process has completed, we usually just recorded the value after memory usage appeared to have stabilized.
The output data size was accumulated size of all files in the simulation folder excluding any input FDS or PSM files. The size of output can vary greatly based on output options specified in the input file.