The Network Simulator ns-2: Tips and Statistical Data for Running Large Simulations in NS

This page discusses the typical problems we face when trying to run large simulations in ns, the main constraints being that of memory and cpu-time. We talk of different methods to overcome these constraints and also present statistical (benchmark) figures from experiments to show some of the improvements we have been able to achieve. Of the different performance metrics, the three significant ones highlighted here are (1)Start-up time or the time taken before the actual simulation can start, (2) Run-time or how long it takes for the simulation to complete and (3)Memory requirement for the simulation. We discuss different ways these three metrics might be improved incase of very large simulations.

1. Start-up time

So what is start-up time? Start-up time can be defined as the time overhead required before the actual simulation can start in ns. For a given simulation using default static routing, the start-up time consists of the following time-slices:

time to parse thru user script, creating topology, traffic etc
time to setup (initializing and configuring ns that consists of:)
- time to reach upto the RouteLogic procedure route-compute {}
- time to parse thru array of links and store link costs
- time to compute the next-hop by routelogic
- time taken to parse thru nodelist and populate each classifier in each node with the next-hop information

Thus we find that start-up times mainly consist of time taken to compute routes and create routing tables. Next we will see some actual start-up time statistics for static routing.

Benchmark statistics for start-up time:

Here we show several benchmark expriments that were done in ns in order to get an estimate of start-up time in case of a large topology.
Recently we have made considerable modifications to the static routing code in ns. The modifications mainly consisted in moving parts of the code from tcl to c++. The motivation for this change was a high start-up time due to a comparatively inefficient way the classifiers (or routing tables) were populated at the start of a simulation.
To evaulate the performance improvements due to the routing code changes we compared start-up time of a 1000-node (2000+ links) transit-stub topology. See file large-sim.tcl under ~ns/tcl/ex in the distribution for details of the script. There is no traffic in this scenario since we are interested only in the time taken to start the simulation. The experiment described here provides insight into the start-up times taken by large-topology simulation. It also allowed us to compare the gains we achieved by making the changes. Here we will present the start-up times both before and after the static routing code changes.

Results for large-sim.tcl BEFORE the code changes :
Breaking up start-up time:

run thru user script - ~1 min
reach route-compute{} - negligible
parse thru link array - ~2 sec
compute next-hop - ~3 min
populate classifiers - ~15 min

Total start-up time (before making changes) for a 1000 node topology : ~19 min

results AFTER the changes :

run thru user script - ~1 min
reach route-compute {} - negligible
parse thru link array - ~2 sec
compute next-hop - ~3 min
populate classifiers - ~3 min

Total start-up time (after changes were made) for a 1000 node topology : ~7 min
The changes resulted in more than 60% gain in start-up time.

Note: The above experiment was done in a relatively old single-CPU, Pentium-II 448 MHz machine with a large memory of 1GB running Linux redhat-7.0.

Manual and other types of Routing

Manual routing may be used to avoid route computation and thus start-up times for very large topologies. Manual routing is a way for users to simply configure the routes by hand just as you would with the "route" command on a workstation. See unicast routing chapter in ns-manual for more on manual routing. tcl/ex/many_tcp.tcl is a good example demonstrating the use of manual routing.
Similarly other non-default routing methods like nix-vector and algo routing may also result in shorter start-up times.

2. Run-time

Run-time is the time taken for the simulation to complete. The ways to reduce run-time are as follows:

Turn off all tracing like trace-all and namtrace-all if tracing on ALL links are not required.
Several abstraction methods discussed below under "Memory Usage" also can be applied for reducing run-times as well.
Some techniques like pre-filtering, currently at an early stage of investigation, is expected to result in significant reduction in run-times for large simulations.

3. Memory Usage

One of the most common problem that people face while running large simulations is running out of memory. Here we will talk of the diferent methods and practices that are followed to reduce memory usage in ns.

(i) Packet-headers in ns

Different types of packet headers are defined for different protocols in ns. And by default, ns includes ALL these packet headers in EVERY packet in a simulation. Thus by default each packet-header in ns, by default becomes heavy-weight which will continue to increase as more protocols are added in ns. One way to significantly reduce memory requirement, especially for packet intensive large simulations, is to remove unused packet headers.
An experiment (described below) was done to find out the amount of memory that is used in a packet-heavy simulation. Then we strip off the unused packet headers and see the amount of memory that is saved in doing so.

Benchmark statistics for Memory usage due to heavy-weight packet headers

For this experiment, we choose a very simple topology consisting of 2 nodes connected by a link of high bandwidth and delay. Thus a high bandwidth-delay product ensures large number of packets to be in flight at any given time. The script used is called pkts.tcl and can be found under ~ns/tcl/ex.
So we start with heavy-weight packet headers that includes all defined packet-header types (default behaviour in ns). We set a BW of 1Gbps and delay of 100ms. An UDP agent is connected to node 1, with a CBR traffic source generating packets of size 210bytes at the rate of 1Gbps which matches with the BW of the link. So effectively, at any given time there are

10pow9 * 100 * 10pow-6
------------------------ = ~59,500 packets on the link.
     8 * 210

Memory usage was around 140MB from which we get a figure of ~2KB/pkt.

Next, we carry out the same experiment with stripped pkt-header. We use the following commands (see pkts.tcl) before creating ns:


remove-all-packet-headers       ;# removes all except common
add-packet-header IP Message     ;# hdrs reqd for cbr traffic

Thus all pkt-hdrs (except common header) are removed first. Then the specific headers required for CBR traffic is added. This time memory usage comes to ~30MB which brings down memory usage per packet to around 500bytes, a 75% reduction in memory. All packet-header related procedures (alongwith brief documentation) are defined in ~ns/tcl/lib/ns-packet.tcl. Also see section "Selectively Including Packet Headers in Your Simulation" in chapter 12 of the ns-manual.

Note:The above experiment was done in a single-CPU, Pentium-II, 448 MHz machine with physical memory of 1GB (4GB hard drive) running Linux redhat-7.0.

(ii) Abstraction techniques

Several abstraction techniques have been studied that allows simulation of very large topologies in ns by making considerable saving of memory as well as computational resources.

Algorithmic routing

Algorithmic routing is a mechanism by which a simple algorithm is used to compute the next hop for any source and destination pair within a given topology. Thus this removes the requirement for maintaining routing table at each node. Any arbitrary network topology is mapped into a k-ary tree after which the Breath First Search (BFS) algorithm is run and node addresses are (re)assigned in a particular order. This order then allows simple next hop computation without maintaining a routing table. However, even though algorithmic routing reduces memory requirement, the routes computed maynot always be the shortest or the most optimum. For more on algorithmic routing see Polly Huang and John Heidemann's paper on Minimizing Routing State for Light-Weight Network Simulation, In Proceedings of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, p. to appear. Cincinnati, Ohio, USA, IEEE. August, 2001. Algorithmic routing in ns can be found under ~ns/tcl/rtglib/algo-route-proto.tcl

Nix-vector routing

Another abstraction technique is nix-vector routing where routes are computed on demand, after a new packet is created. This too reduces time and memory consumption at route setup phase because it doesnot need to compute routing tables at all. Moreover routes are cached per source and destination pair to prevent same routes from being computed again and again. However when traffic starts, it involves an O(N) time consumption and O(N*N) memory consumption if traffic needs to be generated across all source and destination pairs. For more info about nix-vector routing see George F. Riley, Mostafa H. Ammar, and Richard Fujimoto's Stateless Routing in Network Simulations, In Proceedings of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems,San Francisco, CA, USA, IEEE. August, 2000. Nix-vector routing implementation in ns can be found under ~ns/nix.

Session-sim or session-level packet distribution techniques.

Another abstraction technique that involves centralised computation and abstract packet distribution can also be found in ns-2 and is called SessionSim. SessionSim technique, as implemented in ns, has been applied to multicast simulations to achieve one order of magnitude of performance improvement. See ns-manual chapter on scaling for details on this technique.

(iii) Different routing strategies

Routing strategies like hierarchical routing reduce the size of routing tables and thus saves memory for large topologies. Flat routing (default routing in ns) requires a routing table of O(N*N) where N is the size of the topology. With hierarchical routing the routing table size comes down to about NlogN. See tcl/test/test-suite-hier-routing.tcl and tcl/ex/hier-rtg-100.tcl for examples and ns-manual chapter on hierarchical routing for more info.
Other routing strategies described under Abstraction Techniques above are also used to save memory for large sized simulations.

(iv) Delay binding

Using bind() in C++ consumes memory for each object you create. This approach can be very expensive if you create many identical objects. Changing bind()'s to delay_bind() changes this memory requirement to per-class. See ~ns/object.cc for an example of how to do binding, either way.

(v) Avoid trace-all

$ns trace-all $f causes trace objects to be pushed on all links. If you only want to trace one link, there's no need for this overhead. Saving is about 14 KB/link.

(vi) Use arrays for sequences of variables

Each variable, say n$i in set n$i [$ns node], has a certain overhead. If a sequence of nodes are created as an array, i.e. n($i), then only one variable is created, consuming much less memory. Saving is about 40+ Byte/variable.

(vii) Avoid unnecessary variables

If an object will not be referred to later on, avoid naming the object. E.g.

set cmcast(1) [new CtrMcast $ns $n(1) $ctrmcastcomp [list 1 1]] 

would be better if replaced by 

new CtrMcast $ns $n(1) $ctrmcastcomp [list 1 1].

Saving is about 80 Byte/variable.

(viii) Run on top of FreeBSD

malloc() overhead on FreeBSD is less than on some other systems. We will eventually port that allocator to other platofrms.

4. TCP with large congestion windows

(i) TCP's maximum window size, MWS, in distributions before 8/20/02:

TCP's maximum window size MWS is the size of the array of packets seen by the TCP data receiver. MWS should be as large as the maximum TCP window used in the simulation. The default for the maximum TCP window in ns-default.tcl is twenty packets (Agent/TCP set window_ 20).

(ii) Sack1 TCP's scoreboard size, SBSIZE, in distributions before 8/20/02:

When running Sack1 TCP, one of the one-way TCPs, the scoreboard in scoreboard.cc is the data structure at the TCP sender that keeps track of which packets have been acknowledged above the cumulative acknowledgement. SBSIZE, the size of the scoreboard array, is defined in scoreboard.h to be 1024. SBSIZE must be defined in scoreboard.h to be at least as large as the maximum congestion window reached in the course of the simulation. The advice is to set SBSIZE to the maximum TCP window used in the simulation.

(i) TCP's maximum window size and scoreboard size in distributions after 8/20/02:

The memory used for a TCP sink's seen stack now grows dynamically (it does not shrink) from a default of 64 (see MWS in tcp-sink.h). The memory used for a TCP SACK source's scoreboard also now grows (it does not shrink) from a default size of 64 (see SBSIZE in tcp-sack1.cc and SBSIZE of 1024 in tcp-fack.cc). This makes the default memory usage smaller than previous versions of ns while allowing for larger congestion windows."