Measuring PowerShell hashtables performance

Usually, Powershell is used as a “glue” to stitch a bunch of commands and programs together. It does not need to be a performance daemon to do that (and nobody says it is). Flexibility comes with a price. But there are cases, where your doing seemingly trivial things, but your script just takes years to finish.

There is a useful cmdlet Measure-Command that measures how long a piece of code takes to run. The usage is very simple:

$timespan = Measure-Command {
    # do whatever you want to measure here
}

That’s nice if you know or suspect which part of code is slow. But I would like to have something that’s more like instrumentation. What I want is a list of called functions with their total run times and number of calls.

That’s why I created a little wrapper around Measure-Command, called Measure-function, that’s able to easily gather measurements of multiple functions. So now, if I have a function that I want to measure:

  function Get-Something {
    # i'm doing some heavy loading here
    return $something
  }

I just wrap the body with Measure-Function like this:

  function Get-Something {
    Measure-Function "$($MyInvocation.MyCommand.Name)" {
      # i'm doing some heavy loading here
      return $something
    }
  }

Measure-Function takes care of aggregating measurements, and makes sure not to measure recurence invocation. To get the results, do:

$global:perfcounters | format-table -AutoSize -Wrap | out-string | write-host

Now, to pinpoint bottlenecks in your code, you can follow these steps:

  1. Start with the entry point of your script and add Measure-Function to it and functions that it calls.
  2. Run the code and see, which function takes the most time.
  3. Repeat step on with the slowest functions, until you find the bottleneck.

Powershell Hashtable quircks

One of the things I discovered using aforementioned method was in a place I really wasn’t expecting - enumerating through a hashtable. It should be blazingly fast even in Powershell! As it turns out, it can be awfully slow - if you’re not careful enough.

Take a look at these three simple scenarios :

# $h is a hastable of size 10000 
$size = 10000
$h = @{
}
for($i = 0; $i -lt $size; $i++) {
    $h += @{ "key$i" = "value$i"  }
}

measure-function "enumerating $($h.count) items by enumerator" {
    foreach ($e in $h.GetEnumerator()) {
        $k = $e.key
        $v = $e.value
    }
}

measure-function "enumerating $($h.count) items by keys" {
    foreach ($k in $h.keys) {
        $v = $h[$k]
    }
}
measure-function "enumerating $($h.count) items with property accessor" {
    foreach ($k in $h.keys) {
        $v = $h.$k
    }
}

$global:perfcounters | format-table -AutoSize -Wrap | out-string | write-host

Each loop is enumerating over a hashtable and accessing stored values. Should be a matter of milliseconds, right? Well, let’s see…

name                                           elapsed          count
----                                           -------          -----
enumerating 10000 items with property accessor 00:00:30.4342957     1
enumerating 10000 items by keys                00:00:00.0479557     1
enumerating 10000 items by enumerator          00:00:00.1173057     1

As it turns out, accessing hashtable keys by property accessor takes ~800 times longer!

At a first glance, I would think that the form $h.$k would be just a syntactic sugar for $h[$k]. But it really isn’t (and can’t) be that simple. $k may not only be a key inside hashtable - it may as well be a property, like Count or a method like ContainsKey. So underneath, powershell has to do some really time-consuming stuff, invoking reflection, dynamics, and what not - just to get you a value from hashtable.

The conclusion is simple: if you know you’re working with a potentially big hashtable, don’t go for shortcuts and use plain old $h[$k]. But if you’re not in a tight loop - just go with what you think is more readable.

Reference: