Example Test

From NDWiFiTestebed
Revision as of 15:26, 10 April 2011 by Tormsl (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page will describe how to run a test experiment on 10 nodes in the testbed.

Contents

Experiment

This is a tutorial experiment where we'd like to use 10 nodes in the testbed to do a simple upload test. One of the nodes will act as an access-point and the rest will connect to this and upload data to it. We will run two different tests, one where each node sends data to the access-point one-by-one, and one where all nodes transmit data simultaneously. By this we can check the upload speed of each node to the access-point. To generate traffic, we will use iperf and each test will run for 100 seconds.

All configuration (except stated otherwise), are done as the root user on ashoka.ndgroup.lab.

Setup

The experiment will be setup as follows:

  • 10 nodes with an atheros based wireless lan card (pc01-pc10)
  • all nodes will run the minstrel rate adaption algorithm and the ath_pci driver from madwifi
  • one (1) node (pc01) will be configured to act as an access-point
  • nine (9) nodes (pc[02-10]) will be configured to connect to the access-point run by pc01
  • each node will be assigned an ip in the range 192.168.100.x, where pc01 gets 192.168.100.1, pc02 gets 192.168.100.2 and so on

Tests

We are going to run two (2) tests. These test aim to measure the maximum bandwidth between each of the nodes and the access-point. What we want to measure with this test is the the different throughput of data through the wireless medium where we have only one node sending at a time, and where all nodes send simultaneously.

Test #1

Individual test of each node transmitting data to the access-point.

Test #2

Test where all nodes transmit data to the access-point simultaneously.


Teardown

The teardown step is where we clean up after ourselves so others can use the testbed without having to run into issues because of our experiment.


Implementation

Setup

To setup our nodes, we can either ssh to every node and set it up individually, or we can utilize clush (clustershell) to run commands on all nodes simultaneously. We are going to create a script which configures the nodes, and then use clush to execute this script on all nodes simultaneously. I'll paste the script below, and explain it in the following section.

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>
# @File: /testbed/scripts/example_test_01/setup.sh

HOSTNAME=`hostname -s`
HOSTNUM=`hostname -s | cut -c 3-4 | sed 's/0*//'`
RATE_CTL="minstrel"
CWD="/testbed/scripts/example_test_01/"

setup_ap() {
        # $1 = ratectl
        # $2 = hostname
        # $3 = ip
        # $4 = netmask
        # $5 = working directory
        cd ${5}
        /testbed/scripts/load_ath.sh ${1} ap
        sleep 1
        hostapd -P./hostapd_${2}.pid -B ./hostapd.conf
        sleep 1
        ifconfig wlan0 ${3} netmask ${4} up
}

setup_client() {
        # $1 = ratectl
        # $2 = hostname
        # $3 = ip
        # $4 = netmask
        # $5 = working directory
        cd ${5}
        /testbed/scripts/load_ath.sh ${1}
        sleep 1
        wpa_supplicant -B -Dwext -P./wpa_supplicant_${2}.pid -iwlan0 -c./wpa_supplicant.conf
        sleep 1
        ifconfig wlan0 ${3} netmask ${4} up

}


IP="192.168.100.${HOSTNUM}"
NETMASK="255.255.255.0"

if [ "${HOSTNAME}" = "pc01" ] ; then
        setup_ap ${RATE_CTL} ${HOSTNAME} ${IP} ${NETMASK} ${CWD}
else
        setup_client ${RATE_CTL} ${HOSTNAME} ${IP} ${NETMASK} ${CWD}
fi

I'll save this script in the testbeds script folder so it can be accessed by all nodes in the testbed, but for your experiments, you should use your own folder in /testbed/users/<username>/

  • Folder: /testbed/scripts/example_test_01/
  • Filename: setup.sh

To execute the script on all nodes (pc[01-10]) execute the following command

[root@ashoka example_test_01]# clush -b -w pc[01-10] /testbed/scripts/example_test_01/setup.sh 
---------------
pc[02-10] (9)
---------------
loading ath_pci with ratectl=minstrel
---------------
pc01
---------------
loading ath_pci with ratectl=minstrel and autocreate=ap

As you can see, the script is run on all nodes, and the output is displayed. pc01 entered the setup_ap function and the rest entered the teardown_ap function.

To check if the nodes are configured as you expect, you may run a ping test on pc[02-10] -> pc01

[root@ashoka example_test_01]# clush -b -w pc[02-10] "ping -c 1 192.168.100.1 | head -n 2 | tail -n 1"
---------------
pc02
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=0.438 ms
---------------
pc03
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=0.490 ms
---------------
pc04
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=0.711 ms
---------------
pc05
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=0.485 ms
---------------
pc06
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=0.484 ms
---------------
pc07
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=2.69 ms
---------------
pc08
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=2.00 ms
---------------
pc09
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=1.60 ms
---------------
pc10
---------------
64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=1.89 ms

As an additional check, you may do a list directory in your test folder and see that hostapd and wpa_supplicant has written its .pid files to it:

[root@ashoka example_test_01]# ls -l
total 56
-rw-r--r-- 1 root root 193 Apr  6 17:55 hostapd.conf
-rw-r--r-- 1 root root   5 Apr  6 17:56 hostapd_pc01.pid
-rwxr-xr-x 1 root root 982 Apr  6 17:56 setup.sh
-rwxr-xr-x 1 root root 511 Apr  6 17:43 teardown.sh
-rw------- 1 root root 156 Apr  6 17:55 wpa_supplicant.conf
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc02.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc03.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc04.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc05.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc06.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc07.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc08.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc09.pid
-rw-r--r-- 1 root root   5 Apr  6 17:56 wpa_supplicant_pc10.pid

What you see here is that pc01 has started hostapd and written the process id to hostapd_pc01.pid. All other nodes have started wpa_supplicant and written their process ids to wpa_supplicant_pcXX.pid.


You are now ready to run the tests!

Tests

Test #1

This test will schedule individual jobs each running an iperf session against the access-point (pc01) lasting for 100 seconds with a 20 second pause between each. Each node will run this test three times, which means that each node will test for 2min * 3 = 6 minutes.

First, you need to upen a ssh connection into pc01 and start an iperf server

# iperf -s

I made a script which we will run on the nodes pc[02-10], see below for an explanation.

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>
# @File: /testbed/scripts/example_test_01/test1.sh

HOSTNAME=`hostname -s`
HOSTNUM=`hostname -s | cut -c 3-4 | sed 's/0*//'`
CWD="/testbed/scripts/example_test_01"

if [ ! -d "${CWD}/test1" ] ; then
        mkdir "${CWD}/test1"
fi

BEFORE_START=1

let INTERVAL=${HOSTNUM}-2
let INTERVAL=${INTERVAL}*6
let INTERVAL=${INTERVAL}+${BEFORE_START}

for i in {0..2} ; do
        let START=(${i}*2)+${INTERVAL}
        at now + ${START} minutes -f ${CWD}/test1-run.sh
done

Script explanation:

  1. figgures out hostname, hostnumber (pc04 = 4) and the current working directory
  2. creates the result directory if it does not exist
  3. the BEFORE_START variable is for the user to decide the idle time before the tests start (in minutes)
  4. it then calculates the INTERVAL variable, this variable decides when the node shall start its jobs. For example, pc04 shall start its jobs after pc02 and pc03 has finished theirs. So pc04 will have to schedule its jobs after:
    • (HOSTNUM - 2 (pc[00-01] are not counted)) * 6min (3 tests for pc[02-03]) + 1min (BEFORE_START)
  5. it then loops through the number of tests for each node (0,1,2) and calculates the start offset of each job relative to the start of the offset of the node and schedules to run the script test1-run.sh at now plus the calculated start time.

The other script, the script which are scheduled to run, are as follows:

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>
# @File: /testbed/scripts/example_test_01/test1-run.sh

HOSTNAME=`hostname -s`
CWD="/testbed/scripts/example_test_01"

SERVER=192.168.100.1

TEST_DURATION=100

TIME=`date +%Y%m%d%H%M`

iperf -c ${SERVER} -t ${TEST_DURATION} > ${CWD}/test1/test1_${HOSTNAME}_${TIME}.result

This script is very simple. It figgures out the hostname, gets the ip of the access-point, the test duration for iperf and the current time (YYYYmmDDHHMM). It then launches iperf with the correct parameters and pipes the output to a result file in the test1 directory.

To schedule the tests on all nodes pc[02-10], we use clustershell and tell all nodes to run the test1.sh script.

sudo clush -b -w pc[02-10] /testbed/scripts/example_test_01/test1.sh
pc02: job 22 at Sun Apr 10 14:42:00 2011
pc02: job 23 at Sun Apr 10 14:44:00 2011
pc02: job 24 at Sun Apr 10 14:46:00 2011
pc05: job 25 at Sun Apr 10 15:00:00 2011
pc03: job 25 at Sun Apr 10 14:48:00 2011
pc04: job 25 at Sun Apr 10 14:54:00 2011
pc05: job 26 at Sun Apr 10 15:02:00 2011
pc03: job 26 at Sun Apr 10 14:50:00 2011
pc04: job 26 at Sun Apr 10 14:56:00 2011
pc08: job 25 at Sun Apr 10 15:18:00 2011
pc05: job 27 at Sun Apr 10 15:04:00 2011
pc03: job 27 at Sun Apr 10 14:52:00 2011
pc07: job 25 at Sun Apr 10 15:12:00 2011
pc08: job 26 at Sun Apr 10 15:20:00 2011
pc04: job 27 at Sun Apr 10 14:58:00 2011
pc06: job 22 at Sun Apr 10 15:06:00 2011
pc07: job 26 at Sun Apr 10 15:14:00 2011
pc08: job 27 at Sun Apr 10 15:22:00 2011
pc06: job 23 at Sun Apr 10 15:08:00 2011
pc07: job 27 at Sun Apr 10 15:16:00 2011
pc09: job 25 at Sun Apr 10 15:24:00 2011
pc09: job 26 at Sun Apr 10 15:26:00 2011
pc06: job 24 at Sun Apr 10 15:10:00 2011
pc10: job 25 at Sun Apr 10 15:30:00 2011
pc09: job 27 at Sun Apr 10 15:28:00 2011
pc10: job 26 at Sun Apr 10 15:32:00 2011
pc10: job 27 at Sun Apr 10 15:34:00 2011

As you can see here, the jobs are correctly schedules on all nodes. You may notice that the job ids are varying from 22 to 27. This is because I had started and stopped jobs on different machines so the id counter is different for some nodes.

If I now wait a few minutes and list directory in test1, I'll see that the nodes have started writing their output to the results folder:

# ls -l test1
total 28
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:43 test1_pc02_201104101442.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:45 test1_pc02_201104101444.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:47 test1_pc02_201104101446.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:49 test1_pc03_201104101448.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:52 test1_pc03_201104101450.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:53 test1_pc03_201104101452.result
-rw-r--r-- 1 nobody nobody 380 Apr 10 14:55 test1_pc04_201104101454.result
-rw-r--r-- 1 nobody nobody   0 Apr 10 14:56 test1_pc04_201104101456.result

NB: the reason for the last file being zero in size, is that pc04 has startet a test (iperf), but it has not yet been finished so the results haven't been written to the file. If you cat one of the files, you'll see the iperf run result:

# cat test1/test1_pc02_201104101442.result 
------------------------------------------------------------
Client connecting to 192.168.100.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.2 port 36749 connected with 192.168.100.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-100.1 sec   140 MBytes  11.7 Mbits/sec

Now it's just to analyze the result, which I am not going to do here. This example is for how to run the tests, not to analyse the results :D

Test #2

This is an easy one, as it is done simultaneously on all nodes. First, you should ssh into pc01 and start a iperf server

# iperf -s

After that, we use this script to run iperf clients on all other clients

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>

HOSTNAME=`hostname -s`
CWD="/testbed/scripts/example_test_01"

SERVER=192.168.100.1

if [ ! -d "${CWD}/test2" ] ; then
	mkdir "${CWD}/test2" 2>/dev/null
fi

let BEFORE_START=1

for i in {0..2} ; do
	let START=$i*2
	let START=${START}+${BEFORE_START}
	at now + ${START} minutes -f ${CWD}/test2-run.sh
done

The job-script is very similar to test #1, but the results folder is test2 instead of test1.

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>
# @File: /testbed/scripts/example_test_01/test2-run.sh

HOSTNAME=`hostname -s`
CWD="/testbed/scripts/example_test_01"

SERVER=192.168.100.1

TEST_DURATION=100

TIME=`date +%Y%m%d%H%M`

iperf -c ${SERVER} -t ${TEST_DURATION} > ${CWD}/test2/test2_${HOSTNAME}_${TIME}.result


To run this script on all hosts (pc[02-10]) we use clustershell to execute the command

clush -b -w pc[02-10] /testbed/scripts/example_test_01/test2.sh 
pc03: job 34 at Sun Apr 10 16:54:00 2011
pc03: job 35 at Sun Apr 10 16:56:00 2011
pc05: job 34 at Sun Apr 10 16:54:00 2011
pc04: job 34 at Sun Apr 10 16:54:00 2011
pc03: job 36 at Sun Apr 10 16:58:00 2011
pc02: job 31 at Sun Apr 10 16:54:00 2011
pc04: job 35 at Sun Apr 10 16:56:00 2011
pc05: job 35 at Sun Apr 10 16:56:00 2011
pc06: job 31 at Sun Apr 10 16:54:00 2011
pc02: job 32 at Sun Apr 10 16:56:00 2011
pc04: job 36 at Sun Apr 10 16:58:00 2011
pc05: job 36 at Sun Apr 10 16:58:00 2011
pc06: job 32 at Sun Apr 10 16:56:00 2011
pc02: job 33 at Sun Apr 10 16:58:00 2011
pc07: job 34 at Sun Apr 10 16:54:00 2011
pc06: job 33 at Sun Apr 10 16:58:00 2011
pc07: job 35 at Sun Apr 10 16:56:00 2011
pc08: job 34 at Sun Apr 10 16:54:00 2011
pc07: job 36 at Sun Apr 10 16:58:00 2011
pc08: job 35 at Sun Apr 10 16:56:00 2011
pc09: job 34 at Sun Apr 10 16:54:00 2011
pc10: job 34 at Sun Apr 10 16:54:00 2011
pc08: job 36 at Sun Apr 10 16:58:00 2011
pc09: job 35 at Sun Apr 10 16:56:00 2011
pc10: job 35 at Sun Apr 10 16:56:00 2011
pc09: job 36 at Sun Apr 10 16:58:00 2011
pc10: job 36 at Sun Apr 10 16:58:00 2011

You should now start to see the test2 directory filling up with result files, one for each job for each node. As this test is executed simultaneously on all nodes, it doesn't take long, and should be finished in about 6 minutes. After that, it's just the matter of analysing the results :D

Status

To see the status of your scheduled jobs, you can issue this command

[root@jabba example_test_01]$ clush -b -w pc[02-10] "atq | sort"
---------------
pc04
---------------
15	Sun Apr 10 13:32:00 2011 = root
---------------
pc05
---------------
13	Sun Apr 10 13:34:00 2011 a root
14	Sun Apr 10 13:36:00 2011 a root
15	Sun Apr 10 13:38:00 2011 a root
---------------
pc06
---------------
13	Sun Apr 10 13:40:00 2011 a root
14	Sun Apr 10 13:42:00 2011 a root
15	Sun Apr 10 13:44:00 2011 a root
---------------
pc07
---------------
13	Sun Apr 10 13:46:00 2011 a root
14	Sun Apr 10 13:48:00 2011 a root
15	Sun Apr 10 13:50:00 2011 a root
---------------
pc08
---------------
13	Sun Apr 10 13:52:00 2011 a root
14	Sun Apr 10 13:54:00 2011 a root
15	Sun Apr 10 13:56:00 2011 a root
---------------
pc09
---------------
13	Sun Apr 10 13:58:00 2011 a root
14	Sun Apr 10 14:00:00 2011 a root
15	Sun Apr 10 14:02:00 2011 a root
---------------
pc10
---------------
13	Sun Apr 10 14:04:00 2011 a root
14	Sun Apr 10 14:06:00 2011 a root
15	Sun Apr 10 14:08:00 2011 a root

What you see here is the output of the nodes which are running or is scheduled to run jobs. The '=' sign on pc04 job 15 means this job is currently running.

Stopping jobs

If you need to halt your jobs, the easiest way to do it is to use atrm with the id of the job. Now, assuming you have started a few jobs which was scheduled as job 4,5 and 6 on all nodes (pc[02-10]), you can delete the jobs with this commands

clush -b -w pc[02-10] atrm 4 5 6

If some nodes have finished the jobs you are deleting, or one of the nodes are currently running a job, it will tell you so when running the command above. Assuming your job is relatively short (one or two minutes), you are better off to just wait it out. If not, you should delete the jobs (as you probably already have), and log into the nodes running the job in question and kill it manually.

To verify that your jobs are no longer scheduled to run, use this command

sudo clush -b -w pc[02-10] atq

if this command returns now output, no jobs are currently scheduled.

Teardown

To teardown the setup, I have created a script which will stop processes on all nodes and unload the madwifi driver.

#!/bin/bash
# @Author: Tor Martin Slåen <tormsl@ifi.uio.no>
# @Filename: /testbed/scripts/example_test_01/teardown.sh

HOSTNAME=`hostname -s`
HOSTNUM=`hostname -s | cut -c 3-4 | sed 's/0*//'`
CURR_RATE=`lsmod | grep ^ath_rate_ | cut -d " " -f 1`
IFACE=wlan0

CWD="/testbed/scripts/example_test_01/"

cd ${CWD}

if [ -f "./wpa_supplicant_${HOSTNAME}.pid" ] ; then
	kill `cat ./wpa_supplicant_${HOSTNAME}.pid`
	sleep 1
fi

if [ -f "./hostapd_${HOSTNAME}.pid" ] ; then
	kill `cat ./hostapd_${HOSTNAME}.pid`
	sleep 1
fi

/testbed/scripts/unload_ath.sh ${IFACE}

This script will try to locate the .pid file for hostapd or wpa_supplicant on each node and kill (shutdown) the locally running processes. After the processes has been shut down, it will unload the madwifi driver, and you are done.

When the teardown script tells hostapd and wpa_supplicant to shut itself down, the processes will also delete the .pid file it created when it was started. This can be checked by listing the directory of the test like this

[root@ashoka example_test_01]# ls -l
total 16
-rw-r--r-- 1 root root 193 Apr  6 17:55 hostapd.conf
-rwxr-xr-x 1 root root 982 Apr  6 17:56 setup.sh
-rwxr-xr-x 1 root root 511 Apr  6 17:43 teardown.sh
-rw------- 1 root root 156 Apr  6 17:55 wpa_supplicant.conf

As you can see, all the .pid files has been removed. If you look for the madwifi atheros module, you will not find any:

[root@ashoka example_test_01]# clush -b -w pc[01-10] "lsmod | grep ath"
clush: pc[01-10]: exited with exit code 1

Problems

  • I have notised iperf running-time sometimes exceeds its defined running time. This could be the result of many things, but after some research, I found that the '-t' parameter tells iperf how many seconds it should spend sending data. If there are traffic shaping going on in the kernel or on the network, it could potentially starve the receiver, which makes iperf not end the thread until all sent data has come through.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox