Experiment: Effect on Speed of Asterisk Priority Engine by Large Dialplans

Submitted by murf on 3 November 2006 - 7:18pm.
Performance vs. size


First, terminology. When I say context, I am speaking in terms of the extensions.conf file. A Context
is a container for a set of Extensions, which in turn are containers of Priorities. Extensions have
a "name", which can consist of a simple pattern. An individual Priority
usually consists of an application call, eg. VoicemailMain().

Now, within the bowels of Asterisk, each channel records the context name, the extension name, and the
priority number of the next priority to execute. The first thing Asterisk does is lookup the context,
then looks up the extension in that context, and then searches for the priority number to execute.
These three lookups get done for every line of code. Why? Millions of reasons: with goto's, etc, the
context/exten/priority is just plugged into the channel, and the engine will then fetch that.
Saving these refs by name makes it so the dialplan can be reloaded mid-stream, and for the most part
survive without a hitch.

The important thing to remember is that all 3 of these objects are stored in linear linked lists.

Now, the fun part. The lookups for the extensions and priorities get done quite often. So, I did studies of
the 3 individual lookups that are done to find an arbitrary priority to execute. I am currently using
threaded red-black binary trees instead of the simple linked-lists for contexts and priorities.
It's not LGPL code, so I can't use them in asterisk, but until I get LGPL'd code, or our own hashtabs,
I'm using them for an experiment.

The extension patterns are matched in a linear fashion, with no early cutoffs; that last extension
in the list might yield the best match. To speed this up, I take a different route. I take the first
character of each pattern, and I form a tree, where each level down is another, subsequent character set from
the patterns. Patterns that start with similar character sequences share a path down the tree.
The maximum depth of the tree coincides with the longest pattern in the set. To match, I
follow a path down thru the tree until the final leaf is reached. It is possible that one sequence
of characters can match more than one path down the tree. In this case, all possible paths
are followed, and the metrics followed along the way determine the 'best' match. This algorithm
is fairly simple, and much faster-- when you have a lot of patterns. Let's see, it's about
O(5*(averagePatternLength)). You form this tree after the context is created and populated.
It is stored in the ast_context structure.

Threaded red-black binary trees are a cousin of the AVL trees, binary trees that implement
algorithms to keep them optimally balanced at all times. They have log(n) search times,
which is pretty good compared to linked lists, which will have n/2 search times, where n
is the number of items in the structure. So, if your set has 1000 items, a linear search will
find the result, by looking at 500 items, on the average. With RB trees, you'd look thru
at less than 9 or 10 items on the average. Much nicer!

None of these three tree-based optimizations actually speed the code up. But they will keep
it from slowing down when the dial plans get big!

My studies show that the number of contexts, extensions, and priorities has a huge impact
on the speed of the pbx dialplan processing core. If any of the 3 gets big in number, then
the speed of the engine slows down in response, because of the linear searches. But how
slow? Is it worth worrying about?

To find out, I developed 24 dialplans, and timed a loop that is contained in each one, triggered
by dialing "81". In AEL, the loop looks like this:


81 => {
    iterations=1000000;
    Set(time1=${EPOCH});
    for(i=1; ${i}<${iterations}; i=${i}+1)
    {
        NoOp(Hello);
    }
    Set(time2=${EPOCH});
    Verbose(The time diff is $[${time2} - ${time1} ] seconds);
    Verbose(Which means that the priorities/sec = $[4* ${iterations} / (${time2} - ${time1}) ]);
    SayNumber($[4 * ${iterations} / (${time2} - ${time1}) ]);
}

The number that is calculated, I call "PIPS", Priority Instructions Per Second. It's not a real fancy way
of measuring performance, but it's a start.

The 24 dialplans are split into 3 groups of 8. The 3 groups are to measure Context lookups with varying number
of contexts, Extension matches with varying numbers of extensions, and priority lookups with varying numbers
of priorities.

Each of the three has a dialplan with 1, 10, 50, 100, 200, 300, 500, and 1000 items.
For priorities, this would be the number of NoOp() calls before the loop is included.
For extension patterns, this would be the number of added extensions after the extension that
contains the loop. For contexts, this is the number of contexts after the one that contains the
the timing loop. I describe this exactly, as this describes worst-case situations in asterisk.
Contexts near the end of the dialplan will run faster than contexts at the beginning. Priorities
with big numbers, take longer to reach. A context with lots of extensions, runs slower than
one with few.

Each loop cycles a million times thru. On my little experimental PC, running Ubuntu, this takes a
minimum of 43 seconds. So, the top speed of my machine is 93023 PIPS. The memory, processor
speed, number of background tasks, the phase of the moon, and the stock market prices all will
influence this number; it's mainly a relative measure of performance.

I ran these tests on the trunk version, then ran them again on my hacked version of asterisk, that
uses the new algorithms.

Take a look at the Image I've attached!

<<<<<<<  see attachment  >>>>>>>

The plot shows the results of the tests on varying context(blue), extension(green), and priority numbers (red).
The vertical axis shows the speed in PIPS; the horizontal axis reflects the number of objects.

The results: The current (trunk) version will bog down noticeably with big dialplans (see the 3 lines
plotted along the left and bottom of the graph). But with the new data structures in place, the performance is
fairly flat (the 3 lines across the top of the graph).

To further drive the slowdown point home, note that asterisk will run all instruction half as fast in a
context with over 30 or 40 extensions. With 200 extensions, every bit of code in that context will
run 1/10th as fast. And with 1000 extensions, 1/100th the speed (or less!). And if two things are big,
say the number of contexts is 1000, and the number of extensions is 1000, then everything in that
context will run roughly 1/10 times 1/5, or 50 times slower than "normal"!

If you are still awake, let me take note that from one run to the next, the times could easily vary by a second.
So, small variances in the lines above should not be significant.

One of my goals is to speed up asterisk, and keep it from degrading when things get extreme.
More reports to come.

My experiments to date are in the branch, http://svn.digium.com/svn/asterisk/team/murf/fast-ast
These fixes are just an experiment; the code in this branch is nowhere near finished or complete.
Right now, asterisk will not do well with reloading, I still have to finish fully implementing the changes.
But it was enough to conduct the experiment. Caveat Emptor!