Estimating Simulation Requirements

Introduction

The following benchmark data gives the time and memory needed to run several specific FDS jobs to completion. Based on the properties of a particular model and the computing hardware used, this information should help users to make informed estimates about the run-time and memory requirements of models with similar properties.

Test Simulations

Name Cells Meshes Time (min)
burner (image, PSM, FDS) 5,400 1 5
cable_tray (image, PSM, FDS) 19,200 1 5
room_fire (image, PSM, FDS) 67,392 1 5
switchgear (image, PSM, FDS) 144,000 2 5
multifloor (image, PSM, FDS) 228,000 3 5
multifloor_hi (image, PSM, FDS) 1,026,000 4 5

Cells – The total number of cells in all simulation meshes.
Meshes – The number of simulation meshes.
Time – The simulation end time.

Computer Hardware

Id Processor CPUs RAM OS
IntelD Intel Pentium D @2.8 GHz 2 2 GB XP Pro 32-bit
IntelC24 Intel Core2 Quad @2.4 GHz 4 8 GB Vista Business 64-bit

The CPUs column indicates how many jobs can be run in parallel by the processor.

Output and Memory Requirements

The following table shows the storage (output) and memory requirements of each simulation.

Memory
Input File Output (MB) Serial (MB) OpenMP (MB) MPI (MB)
burner 9 12 12 15
cable_tray 255 32 34 35
room_fire 631 81 82 84
switchgear 797 179 172 196
multifloor 964 328 329 371
multifloor_hi 2540 1053 x 1160

Restart files were the largest, followed by boundary (BF), and 3D Smoke (S3D) output.

The unexpected reduction in memory usage when switching from serial to OpenMP for the switchgear problem occurred on both test machines. With this exception, there was a consistent relationship where Mserial <= MOpenMP <= MMPI. For MPI runs, the given value is the sum of memory usage for all running FDS processes.

Memory usage information for the OpenMP executable for the multifloor_hi simulation is not available. The tested version of this simulator crashed on startup on both machines.

Simulation Run Time

The following table shows how long each problem took to run. Since all simulations were set to run for 5 minutes of simulated time, these values can be compared directly. No special effort was taken to optimize the performance of each problem for MPI (i.e. mesh count wasn’t matched to CPU count, meshes were not balanced, etc).

IntelD IntelC2Q
Input File Serial (min) OpenMP (min) MPI (min) Serial (min) OpenMP (min) MPI (min)
burner 17 12 16 10 6 10
cable_tray 24 18 23 14 12 15
room_fire 92 62 78 52 35 53
switchgear 408 261 292 233 140 201
multifloor 893 626 512 557 350 249
multifloor_hi 7927 x 5785 5536 x 3510

Memory usage information for the OpenMP executable for the multifloor_hi simulation is not available. The tested version of this simulator crashed on startup on both machines.

Estimations

These benchmarks lead to the following suggestions for estimating resource requirements for serial simulations on similar computing hardware:

Cells Sim Time (min) Run Time (hrs) Memory (GB) Output (GB)
500,000 5 33 0.5 1.5
1,000,000 5 100 1 2.5
2,000,000 5 300 2 4.5

As you increase simulation time, the increase in run time and output is linear. Memory requirements are not affected by the duration of the simulation.

These estimates are suggested for serial simulation runs. Memory requirements are slightly higher for OpenMP and MPI simulation runs. To convert from a serial time estimate to an OpenMP estimate, reduce the run time by 30-35% for 2 CPUs, 35-40% for 4 CPUs.

To convert from a serial time estimate to an MPI time estimate, first figure out the number of “effective” processors you will be using. If you have 4 CPUs and 16 meshes, you will benefit from all 4 processors. If you have 4 CPUs and 3 meshes, you’ll only be using 3 processors. For a very optimistic estimate, divide by that number. For a more realistic estimate, you can handicap the effectiveness based on how evenly-divided the meshes are. If one has more cells than the others (e.g. 4 CPUs, 5 meshes), this process will control the simulation time.

These estimations were created by averaging the serial results and fitting the resulting curve for each quantity in Excel. The run time estimate used a polynomial curve and the memory and output estimates were based on a linear trend line. Since there was only one data point with over 1M cells, this simulation carried great weight in the estimation and additional results from large simulations could substantially alter estimated times.

Procedures

These tests were performed on the workstation computing hardware at Thunderhead Engineering. Often, while running a particular benchmark, the computer was in use for daily work activities. The level of scientific rigor used to gather this data was relatively low and intended only as a basis for rough estimates of the resources needed to run comparable simulations.

The run time of a simulation was calculated based on the total execution time of the FDS simulation process. When the process was launched, the clock started, when FDS terminated, the clock stopped.

The memory used by a simulation was calculated by looking at the Peak Working Set column in Windows Vista Task Manager for the FDS process. On XP, this column was labeled Peak Mem Usage. Since the memory data goes is no longer shown in the task manager after the process has completed, we usually just recorded the value after memory usage appeared to have stabilized.

The output data size was accumulated size of all files in the simulation folder excluding any input FDS or PSM files. The size of output can vary greatly based on output options specified in the input file.