Why the kernel hrtimer might not be a high-resolution timer

I commented in this post that just because DAHDI thinks it has a High Resolution timer from the kernel, this might not actually be the case.

When you compile the DAHDI dummy driver it checks your kernel headers to see if you compiled your kernel with CONFIG_HIGH_RES_TIMERS, in which case the specific functions in the dummy driver call out to the hrtimer kernel functions.

So what if you compiled your kernel with CONFIG_HIGH_RES_TIMERS but you don’t have HPET support on your hardware?

Simple. The kernel emulates the hrtimer by passing a low-resolution timer via the same interface. This sounds crazy, but it doesn’t really have much choice.

To be sure that you’re using a hardware high-res timer, run dmesg | grep hpet like this:

hpet-test:/usr/src# dmesg | grep hpet
[    0.004000] hpet clockevent registered
[    0.256427] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.256435] hpet0: 3 64-bit timers, 14318180 Hz
[    1.150057] hpet_resources: 0xfed00000 is busy

Here you can see the HPET being initialised by the kernel. If you don’t have an HPET, then you can expect some really strange results when you run the DAHDI tests. I had some of these results a few months back, and I might dig them out for a future post at some point.

For those who are interested in bits of the kernel, it’s the following statement in the hrtimer_forward() method of hrtimer.c that overrides timer intervals that are lower than the resolution of the hardware timer in use:

[linux-2.6.29/kernel/hrtimer.c]

if (interval.tv64 < timer->base->resolution.tv64)
        interval.tv64 = timer->base->resolution.tv64;

So when dahdi_dummy.c calls this method from dahdi_dummy_hr_int (the interrupt method) with an interval value defined in DAHDI_TIME_NS as 1 000 000 nanoseconds (1/T = 1kHz), it can be reset to a larger value by the kernel code if the current hardware doesn’t support it.

Finally, if you want to be sure that your hrtimer has switched into high resolution mode, you can use this:

hpet-test:/usr/src# dmesg | grep "high resolution"
[    0.318555] checking if image is initramfs...<7>Switched to high resolution mode on CPU 1
[    0.505710] Switched to high resolution mode on CPU 0

If it didn’t work on your hardware, you should see the message Could not switch to high resolution mode on CPU X.

Posted in Virtualisation, VoIP | Tagged , , | 1 Comment

Automated batch reboot of HP ProCurve switches via telnet

Today I had occasion to reboot about 20 ProCurve 3500yl switches, thanks to yet another firmware bug. The simplest way of doing this is with the “reload” command issued via telnet.

Rather than logging into each switch and issuing the reload command by hand I threw together an Expect script which automatically logs into the specified switch via telnet and initiates a reboot.

Expect is cool because you can script an interaction with interfaces that don’t usually accept this kind of behaviour.

The script is called like this:

./procurve-reboot.ex 10.30.0.15

..and then you can sit back and watch the reboot happen. Clearly if you want to reboot several switches you can batch these commands into a bash script.

The Expect script looks like this (doesn’t trap any errors at the moment!):

#!/usr/bin/expect
set device [lindex $argv 0]
set timeout 10
spawn telnet $device
expect "Press any key to continue"
send "r"
send "r"
expect "*#"
send "reloadr"
expect -exact "System will be rebooted from primary image. Do you want to continue [y/n]?" { send "y" }
expect -exact "Do you want to save current configuration [y/n/^C]?" { send "n" }
send "rr"
expect -exact "Connection closed by foreign host."
exit

So now I have a single command to reboot all the switches on our network.

Posted in Systems Administration | Tagged , | 3 Comments

DAHDI (formerly Zaptel) Dummy and VMware ESXi

First, some background:

We run a number of production VoIP PBXs, a number of which are Asterisk-based.

Historically one of the tricky things about pure-VoIP Asterisk deployments (those which talk pure SIP and IAX2) is that you still need a timing source if you want to do any conferencing, audio-playback or IAX2 trunking.

Asterisk doesn’t implement a timing source natively, but calls on DAHDI (formerly known as Zaptel) to achieve this.

For some time, Zaptel and now DAHDI ships with a “dummy” module (formerly zaptel_dummy, now dahdi_dummy), which uses the “best” available operating system timing source to generate the 1kHz timing signal required by Asterisk.

On production boxes, where timing can be critical, it has been commonplace to use hardware line cards to obtain the 1kHz timing signal, even though no analogue lines are in use. In the UK these can be picked up in PCIe format for around £50, so it’s no big deal – although it does seem quite wasteful. This is what we’ve been doing for almost 3 years. Using hardware cards isn’t without its difficulties (the timing signal reaches DAHDI via interrupts, which isn’t bulletproof), although that’s outside the scope of this post.

If you don’t use a hardware card, The kernel module dahdi_dummy has a series of compile-time checks to determine what the best hardware timing source is. If you’re lucky, and you have a recent Linux kernel (>2.6) and a motherboard built in the last 5 years or so, you get HPET, which is the High-Precision Event Timer. This has sufficient granularity available to generate 1000 interrupts every second without too much difficulty.

With the HPET timer enabled for dahdi_dummy, you can get some pretty workable results, although I probably wouldn’t use this on a production system – we’ve not done enough benchmarking to be sure how it would perform under load.

Now, back to the point of my post:

With such a big push toward virtualisation at the moment, it’s been frustrating to have been unable to virtualise Asterisk PBXs due to the aforementioned timing issue; using mainstream environments such as VMware ESXi, it’s not possible to enable a PCIe hardware timing source, and VMware doesn’t provide a virtualised HPET (this is actually very well documented in this PDF).

However, I’ve recently been doing some reading about VMI extensions (VMware overview here), which in the simplest terms should allow interaction between a guest OS kernel and the host (ESX) kernel. This paravirtualisation not only provides better performance, but it also extends further functionality to the guest OS.

To make things even simpler, the latest Debian distro (Lenny) has the “VMI Paravirt Ops” compiled into the kernel that ships with the OS. If you’re not on Lenny yet this is a pretty decent guide to compiling VMI into your Debian kernel.

What this all means is that you can install the latest Debian release straight into ESXi, tick the VMI checkbox in the virtual machine settings via the VMware Client and you should be up and running with a paravirtualised kernel.

Note: you have to use the i386 (32-bit) Debian kernel, since ESX and ESXi 3.5 use a 32-bit host kernel. Unsurprisingly VMI isn’t operable between different architectures. If you’re wondering how you’re able to virtualise a 64-bit OS on ESXi 3.5, it’s thanks to those natty Intel VT-x and AMD-V extensions.

So, some testing followed:

Having installed a fresh Debian Lenny, I quickly downloaded and compiled DAHDI in order to run some tests on the timing. Bear in mind at this stage I’m just using the out-of-the-box kernel.

These were the initial dahdi_test results:

vmi-test:~# dahdi_test
Opened pseudo dahdi interface, measuring accuracy...
99.902% 99.978% 99.954% 99.883% 99.919% 99.897% 99.964% 99.959%
99.946% 99.802% 99.811% 99.832% 99.971% 99.964% 99.998% 99.924%
99.887% 99.924% 99.946% 99.976% 99.935% 99.917% 99.920% 99.983%
99.850% 99.969% 99.975% 99.952% 99.927% 99.959% 99.967% 99.960%
--- Results after 32 passes ---
Best: 99.998 -- Worst: 99.802 -- Average: 99.929693, Difference: 99.994844

Here’s the dmesg output from dahdi_dummy loading (although be aware that the kernel will still ‘emulate’ an HPET source even when a hardware HPET isn’t available. In VMware guests, this causes some really interesting results!):

vmi-test:/# dmesg | grep dahdi
[  680.700134] dahdi: Telephony Interface Registered on major 196
[  680.702862] dahdi: Version: 2.1.0.4
[  681.768307] dahdi_dummy: Trying to load High Resolution Timer
[  681.768307] dahdi_dummy: Initialized High Resolution Timer
[  681.768307] dahdi_dummy: Starting High Resolution Timer
[  681.768307] dahdi_dummy: High Resolution Timer started, good to go
[  681.795983] dahdi: Registered tone zone 4 (United Kingdom)

Finally here’s the “proof” that VMI was operational:

vmi-test:/# dmesg | grep vmi
[    0.004000] vmi: registering clock event vmi-timer. mult=12551454 shift=22
[    0.193598] Booting paravirtualized kernel on vmi
[    1.067853] vmi: registering clock source khz=2992500

Conclusion?

It’s certianly a step forward from previous attempts that I’ve made at DAHDI / Zaptel in a virtualised environment, however the results aren’t so encouraging that I would use it on a production system. We’ll be doing some more testing in due course, and in the mean time we’re saving a couple of these VMs as backup PBXs in the event of hardware failure elsewhere. It’s just too versatile not too.

Posted in Virtualisation, VoIP | Tagged , , , , , , | 3 Comments