Delphi Programming

and software in general.

Friday, December 3, 2010

A generic cache

Update: Eric Grange suggested a change to TStringList that speeds it up significantly and place it well in front of TCache. I will update the article to reflect this in the end of the week. See the article comments for the details.

In my previous article about a generics based case statement for strings, I commited many sins towards the Church Of Pure Pascal :)

One of them was to not check for prior art. Sergey Antonov aka 0xffff did something similar back in April 2010 (Note: two links!).

So, to make penance for my lack of purity (and have a chance to sin some more), I have tried to take the good parts of the concept and create something less ugly, and more comfortable to use.

I rewrote the generic class and named it TCache. I kept the configurable key type, and I made the lookup value configurable as well. Basically, it all ends up as a thin wrapper around a dictionary, but I quite like the simple declaration you can achieve with this approach.

Example declaration
var
  Cache : TCache<String, Integer>;
  i : Integer;
begin
  if not Assigned(Cache)
  then TCache<String, Integer>
   .Define(Cache, 0)
   ['alpha',   11]
   ['bravo',   22]
   ['charlie', 33]
   ['delta',   44]
   ['echo',    55]
   ['foxtrot', 66];

  i := Cache.Lookup('charlie');

Remember that <String, Integer> can be almost any type you like, including code (for the value part, at least).

Here is the class. Note that I also keep track of the index of the order each key/value was added. This could be removed. Also note that I still do dirty deeds, such as a dangerous cast. I guess I just suck at writing clean code ;).

unit GenericsCache;

/// Written by Lars Fosdal <lars@fosdal.com>, December 5, 2010

interface
uses
  SysUtils, Generics.Collections;

type
  TCacheEntry<T> = record
    Value: T;
    Index: Integer;
  end;

  TCache<KeyT, ValT> = class(TObjectDictionary<KeyT, TCacheEntry<ValT>>)
  private
    FCache: TCache<KeyT, ValT>;
    FDefaultValue: ValT;
    function AddValue(const Id: KeyT; const Value: ValT): TCache<KeyT, ValT>;
  protected
    function ValidateId(Id: KeyT): KeyT; virtual;
  public
    class function Define(var Cache; const aDefaultValue:ValT): TCache<KeyT, ValT>;
    function Lookup(const Id: KeyT):ValT;
    function Index(const Id: KeyT):Integer;
    property DefaultValue: ValT read FDefaultValue write FDefaultValue;
    property Values[const Id: KeyT; const Value: ValT]: TCache<KeyT, ValT> read AddValue; default;
  end;

  TCaseStringCache = class(TCache<String, String>)
    function ValidateId(Id: String): String; override;
  end;

implementation

{ TCache<KeyT, ValT> }

function TCache<KeyT, ValT>.AddValue(const Id: KeyT; const Value: ValT): TCache<KeyT, ValT>;
var
  Rec : TCacheEntry<ValT>;
begin
  Result := Self;
  Rec.Value := Value;
  Rec.Index := Count;
  Add(ValidateId(Id), Rec);
end;

class function TCache<KeyT, ValT>.Define(var Cache; const aDefaultValue: ValT): TCache<KeyT, ValT>;
begin
  Result := Create;
  Result.FCache := Result;
  Result.DefaultValue := aDefaultValue;
  TCache<KeyT, ValT>(Cache) := Result;
end;

function TCache<KeyT, ValT>.Index(const Id: KeyT): Integer;
var
  Rec : TCacheEntry<ValT>;
begin
  if TryGetValue(ValidateId(Id), Rec)
   then Result := Rec.Index
    else Result := -1;
end;

function TCache<KeyT, ValT>.Lookup(const Id: KeyT): ValT;
var
  Rec : TCacheEntry<ValT>;
begin
  if FCache.TryGetValue(ValidateId(Id), Rec)
   then Result := Rec.Value
    else Result := DefaultValue;
end;

function TCache<KeyT, ValT>.ValidateId(Id: KeyT): KeyT;
begin
  Result := Id;
end;

{ TCaseStringCache }

function TCaseStringCache.ValidateId(Id: String): String;
begin
  Result := LowerCase(Id);
end;

end.

I wrote a simple benchmark, testing different ways to use this, and also comparing it to do String2Index / Case -type lookup mechanisms as well as if/then/else, and the ugly method from my previous article. Several people suggested using a string to index / case approach. I have also used that many times. The painful part of strings to index, is that if you change the order of the strings, you also have to change the indices. TCache makes the index entirely optional, since the string is the index - if you see what I mean.

See below for the test code.

The Good, The Bad, and the Ugly.

The test uses GetTickCount and 5.000.000 iterations, for each method, repeated 10 times, with 12 strings (the numbers are consistant at 6 strings as well - except that string to index will be slightly faster) and the process priority was set to High to avoid other parts of the PC affecting the numbers. I weighted the results towards the results for the if/then/else. So Perf tells you how many times slower than if/then/else each test was.

MethodPerfAvgRunComment
TStringSwitch52.9435340.7The ugly was really ugly performance-wise as well.
AnsiIndexTextFunc12.408277.3String to Index, then case/anon.method is no speed demon either.
StringIndex12.058041.7Jolyon's variant is around the same speed.
AnsiIndexText12.038032.6If you remove anon.methods, the impact is not huge.
TStringList6.944631.6Using a pre-created sorted string list is nearly twice the speed of AnsiIndexText.
TCacheProc3.702471.1Cool, TCache and anon.methods are nearly twice the speed of a string list.
TCacheFunc3.712475.8No signficant difference between a procedure and function.
TCacheFuncStack3.722480.5Passing the anon.method by stack doesn't cost much either.
TCaseStringCache3.122085.6Case insensitive string to string saves some time over using anon.methods.
TCacheString2.931957.9So does eliminating the LowerCase function. TCache with string/string is 3 times slower than if/then/else, but it is 4 times faster than AnsiIndexText, and more than twice as fast as a string list.
If/then/else1667.6You can't beat this. You also can't enjoy maintaining it.
LastKey0.39260.4No surprise that indexed array lookups are fast. Yes, I know I am not looking up the same strings, but the cost is the same.


If you need to repeatedly do lookup by strings, you can benefit from using something like TCache. Also - for the AnsiIndexText - it does a sequential search for the match, hence if the list is long, or the later entries are more commonly used - it will degrade performancewise. Without having dissected TDictionary in detail, I would believe that it's hash table will allow TCache to remain relativly constant in performance, even if you add thousands of entries.

You could also shave off some more by eliminating the ValidateId methods.

Here is the test program (which also use the unit from my previous article).

program TestGenericsSwitch;
{$apptype Console}
uses
  ExceptionLog,
  Windows,
  Classes,
  SysUtils,
  StrUtils,
  GenericsCache in 'GenericsCache.pas',
  GenericsSwitch in 'GenericsSwitch.pas';

{$define Twelve}  // remove this to run with 6 strings

const
  {$ifndef Twelve}
  Elements = 6;
  {$else}
  Elements = 12;
  {$endif}
  TestCount = 5000000;
  Keys : Array[0..Elements] of String
   = ('alpha','bravo', 'charlie', 'delta', 'echo', 'foxtrot',
  {$ifdef Twelve}
      'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima',
  {$endif}
      'what');

type
  TFunc = reference to function:String;
  TProc = reference to procedure;

function RandomKey:String;
begin
  Result := Keys[Random(Elements + 1)];
end;

function AssignTest:String;
var
  ix : Integer;
  s: String;
begin
  for ix := 0 to TestCount - 1
  do TStringSwitch.CaseOf(RandomKey)
    ['alpha', procedure begin
            s := 'Definitively any case';
          end]
    ['bravo', procedure begin
            s := 'B all you can B';
          end]
    ['charlie', procedure begin
            s := 'Checkpoint C';
          end]
    ['delta', procedure begin
            s := 'Checkpoint D';
          end]
    ['echo', procedure begin
            s := 'Checkpoint E';
          end]
    ['foxtrot', procedure begin
            s := 'Checkpoint F';
          end]
{$ifdef Twelve}
    ['golf', procedure begin
            s:= 'golf';
          end]
    ['hotel',procedure begin
            s:= 'hotel';
          end]
    ['india',procedure begin
            s:= 'india';
          end]
    ['juliet', procedure begin
            s:= 'juliet';
          end]
    ['kilo', procedure begin
            s:= 'kilo';
          end]
    ['lima', procedure begin
            s:= 'lima';
          end]
{$endif}
    .ElseCase(procedure begin
            s := 'Else what?';
          end)
    .EndCase;
  Result := s;
end;

function AssignTestIf:String;
var
  ix : Integer;
  s, t: String;
begin
  for ix := 0 to TestCount - 1 do
  begin
    t := LowerCase(RandomKey);
    if t = 'alpha' then
      s := 'Definitively any case'
    else if t = 'bravo' then
      s := 'B all you can B'
    else if t = 'charlie' then
      s := 'Checkpoint C'
    else if t = 'delta' then
      s := 'Checkpoint D'
    else if t = 'echo' then
      s := 'Checkpoint E'
    else if t = 'foxtrot' then
      s := 'Checkpoint F'
{$ifdef Twelve}
    else if t = 'golf' then
      s := 'golf'
    else if t = 'hotel' then
      s := 'hotel'
    else if t = 'india' then
      s := 'india'
    else if t = 'juliet' then
      s := 'juliet'
    else if t = 'kilo' then
      s := 'kilo'
    else if t = 'lima' then
      s := 'lima'
{$endif}
    else
      s := 'Else what?';
  end;
  Result := s;
end;

function AssignTestS2I:String;
var
  ix : Integer;
  s: String;
begin
  for ix := 0 to TestCount - 1
  do case AnsiIndexText(RandomKey, ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot'
{$ifdef Twelve}
    , 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima'
{$endif}
  ]) of
    0 : s := 'Definitively any case';
    1 : s := 'B all you can B';
    2 : s := 'Checkpoint C';
    3 : s := 'Checkpoint D';
    4 : s := 'Checkpoint E';
    5 : s := 'Checkpoint F';
{$ifdef Twelve}
    6 : s := 'golf';
    7 : s := 'hotel';
    8 : s := 'india';
    9 : s := 'juliet';
   10 : s := 'kilo';
   11 : s := 'lima';
{$endif}
   else s := 'Else what?';
  end;
  Result := s;
end;

function StringIndex(const aString: string; const aCases: array of string;
  const aCaseSensitive: Boolean): Integer;
begin
  if aCaseSensitive then
  begin
    for Result := 0 to Pred(Length(aCases)) do
      if ANSISameText(aString, aCases[Result]) then
        EXIT;
  end
  else
  begin
    for Result := 0 to Pred(Length(aCases)) do
      if ANSISameStr(aString, aCases[Result]) then
        EXIT;
  end;

  Result := -1;
end;

function AssignStringIndexFunc:String;
var
  ix : Integer;
  func : TFunc;
begin
  for ix := 0 to TestCount - 1
  do case StringIndex(RandomKey, ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot'
  {$ifdef Twelve}
    , 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima'
  {$endif}], false) of
    0 : func := function:String begin
          Result := 'Definitively any case';
        end;
    1 : func := function:String begin
          Result := 'B all you can B';
        end;
    2 : func := function:String begin
          Result := 'Checkpoint C';
        end;
    3 : func := function:String begin
          Result := 'Checkpoint D';
        end;
    4 : func := function:String begin
          Result := 'Checkpoint E';
        end;
    5 : func := function:String begin
          Result := 'Checkpoint F';
        end;
{$ifdef Twelve}
    6 : func := function:String begin
          Result:= 'golf';
        end;
    7 : func := function:String begin
          Result:= 'hotel';
        end;
    8 : func := function:String begin
          Result:= 'india';
        end;
    9 : func := function:String begin
          Result:= 'juliet';
        end;
    10: func := function:String begin
          Result:= 'kilo';
        end;
    11: func := function:String begin
          Result:= 'lima';
        end;
{$endif}
   else func := function:String begin
         Result := 'Else what?';
       end;
  end;
  Result := Func;
end;

function AssignTestS2IFunc:String;
var
  ix : Integer;
  func : TFunc;
begin
  for ix := 0 to TestCount - 1
  do case AnsiIndexText(RandomKey, ['alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot'
  {$ifdef Twelve}
    , 'golf', 'hotel', 'india', 'juliet', 'kilo', 'lima'
  {$endif}]) of
    0 : func := function:String begin
          Result := 'Definitively any case';
        end;
    1 : func := function:String begin
          Result := 'B all you can B';
        end;
    2 : func := function:String begin
          Result := 'Checkpoint C';
        end;
    3 : func := function:String begin
          Result := 'Checkpoint D';
        end;
    4 : func := function:String begin
          Result := 'Checkpoint E';
        end;
    5 : func := function:String begin
          Result := 'Checkpoint F';
        end;
{$ifdef Twelve}
    6 : func := function:String begin
          Result:= 'golf';
        end;
    7 : func := function:String begin
          Result:= 'hotel';
        end;
    8 : func := function:String begin
          Result:= 'india';
        end;
    9 : func := function:String begin
          Result:= 'juliet';
         end;
    10: func := function:String begin
          Result:= 'kilo';
        end;
    11: func := function:String begin
          Result:= 'lima';
        end;
{$endif}
   else func := function:String begin
         Result := 'Else what?';
       end;
  end;
  Result := Func;
end;

function AssignCaseStringCache:String;
var
  ix : Integer;
  s: String;
  Cache : TCaseStringCache;
begin
  TCaseStringCache.Define(Cache,
               'Else what?')
   ['alpha',   'Definitively any case']
   ['bravo',   'B all you can B']
   ['charlie', 'Checkpoint C']
   ['delta',   'Checkpoint D']
   ['echo',    'Checkpoint E']
   ['foxtrot', 'Checkpoint F']
{$ifdef Twelve}
   ['golf', 'golf']
   ['hotel', 'hotel']
   ['india', 'india']
   ['juliet', 'juliet']
   ['kilo', 'kilo']
   ['lima', 'lima']
{$endif};
  for ix := 0 to TestCount - 1
   do s := Cache.Lookup(RandomKey);
  Result := s;
end;

function AssignCacheString:String;
var
  ix : Integer;
  s: String;
  Cache : TCache< String, String>;
begin
  TCache<String, String>
   .Define(Cache, 'Else what?')
   ['alpha',   'Definitively any case']
   ['bravo',   'B all you can B']
   ['charlie', 'Checkpoint C']
   ['delta',   'Checkpoint D']
   ['echo',    'Checkpoint E']
   ['foxtrot', 'Checkpoint F']
{$ifdef Twelve}
   ['golf', 'golf']
   ['hotel', 'hotel']
   ['india', 'india']
   ['juliet', 'juliet']
   ['kilo', 'kilo']
   ['lima', 'lima']
{$endif};
  for ix := 0 to TestCount - 1
   do s := Cache.Lookup(RandomKey);
  Result := s;
end;

function AssignCacheProc:String;
var
  ix : Integer;
  s: String;
  Cache : TCache<String, TProc>;
begin
  TCache<String, TProc>.Define(Cache,
               procedure begin
                 s := 'Else what?';
               end)
   ['alpha',     procedure begin
                 s := 'Definitively any case';
               end]
   ['bravo',   procedure begin
                 s := 'B all you can B';
               end]
   ['charlie', procedure begin
                 s := 'Checkpoint C';
               end]
   ['delta',   procedure begin
                 s := 'Checkpoint D';
               end]
   ['echo',    procedure begin
                 s := 'Checkpoint E';
               end]
   ['foxtrot', procedure begin
                 s := 'Checkpoint F';
               end]
{$ifdef Twelve}
    ['golf',   procedure begin
                 s:= 'golf';
               end]
    ['hotel',  procedure begin
                 s:= 'hotel';
               end]
    ['india',  procedure begin
                 s:= 'india';
               end]
    ['juliet', procedure begin
                 s:= 'juliet';
               end]
    ['kilo',   procedure begin
                 s:= 'kilo';
               end]
    ['lima',   procedure begin
                 s:= 'lima';
               end]
{$endif};

  for ix := 0 to TestCount - 1
   do Cache.Lookup(RandomKey)();
  Result := s;
end;

function AssignCacheFuncStack:String;
var
  ix : Integer;
  s: String;
  Cache : TCache<String, TFunc>;
begin
  TCache<String, TFunc>.Define(Cache,
               function:String begin
                 Result := 'Else what?';
               end)
   ['alpha',     function:String begin
                 Result := 'Definitively any case';
               end]
   ['bravo',   function:String begin
                 Result := 'B all you can B';
               end]
   ['charlie', function:String begin
                 Result := 'Checkpoint C';
               end]
   ['delta',   function:String begin
                 Result := 'Checkpoint D';
               end]
   ['echo',    function:String begin
                 Result := 'Checkpoint E';
               end]
   ['foxtrot', function:String begin
                 Result := 'Checkpoint F';
               end]
{$ifdef Twelve}
    ['golf', function:String begin
            Result := 'golf';
          end]
    ['hotel',function:String begin
            Result := 'hotel';
                 end]
    ['india',function:String begin
            Result := 'india';
                 end]
    ['juliet', function:String begin
            Result := 'juliet';
          end]
    ['kilo', function:String begin
            Result := 'kilo';
          end]
    ['lima', function:String begin
            Result := 'lima';
          end]
{$endif};

  for ix := 0 to TestCount - 1
   do s := Cache.Lookup(RandomKey)();
  Result := s;
end;

function AssignCacheFunc:String;
var
  ix : Integer;
  s: String;
  func: TFunc;
  Cache : TCache<String, TFunc>;
begin
  TCache<String, TFunc>.Define(Cache,
               function:String begin
                 Result := 'Else what?';
               end)
   ['alpha',     function:String begin
                 Result := 'Definitively any case';
               end]
   ['bravo',   function:String begin
                 Result := 'B all you can B';
               end]
   ['charlie', function:String begin
                 Result := 'Checkpoint C';
               end]
   ['delta',   function:String begin
                 Result := 'Checkpoint D';
               end]
   ['echo',    function:String begin
                 Result := 'Checkpoint E';
               end]
   ['foxtrot', function:String begin
                 Result := 'Checkpoint F';
               end]
{$ifdef Twelve}
    ['golf', function:String begin
            Result := 'golf';
          end]
    ['hotel',function:String begin
            Result := 'hotel';
                 end]
    ['india',function:String begin
            Result := 'india';
                 end]
    ['juliet', function:String begin
            Result := 'juliet';
          end]
    ['kilo', function:String begin
            Result := 'kilo';
          end]
    ['lima', function:String begin
            Result := 'lima';
          end]
{$endif};

  for ix := 0 to TestCount - 1
   do begin
     func := Cache.Lookup(RandomKey);
     s := func;
   end;
  Result := s;
end;

function AssignStringList:String;
var
  ix, fx : Integer;
  s: String;
  func : TFunc;
  obj : TObject absolute func;
  StrList : TStringList;
begin
  StrList := TStringList.Create;

  StrList.AddObject('alpha', TObject(function:String begin
                 Result := 'Definitively any case';
               end));
  StrList.AddObject('bravo', TObject(function:String begin
                 Result := 'B all you can B';
               end));
  StrList.AddObject('charlie', TObject(function:String begin
                 Result := 'Checkpoint C';
               end));
  StrList.AddObject('delta', TObject(function:String begin
                 Result := 'Checkpoint D';
               end));
  StrList.AddObject('echo', TObject(function:String begin
                 Result := 'Checkpoint E';
               end));
  StrList.AddObject('foxtrot', TObject(function:String begin
                 Result := 'Checkpoint F';
               end));
//{$ifdef Twelve}
  StrList.AddObject('golf', TObject(function:String begin
            Result := 'golf';
          end));
  StrList.AddObject('hotel',TObject(function:String begin
            Result := 'hotel';
                 end));
  StrList.AddObject('india',TObject(function:String begin
            Result := 'india';
                 end));
  StrList.AddObject('juliet', TObject(function:String begin
            Result := 'juliet';
          end));
  StrList.AddObject('kilo', TObject(function:String begin
            Result := 'kilo';
          end));
  StrList.AddObject('lima', TObject(function:String begin
            Result := 'lima';
          end));
//{$endif}

  StrList.Sorted := True;

  for ix := 0 to TestCount - 1
   do begin
     fx := StrList.IndexOf(RandomKey);
     if fx >= 0
      then begin
        obj := StrList.Objects[fx];
        s := func;
      end
       else s := 'ElseWhat';
   end;
  Result := s;
end;

function TimeIt(proc: TFunc; name:String):Integer;
var
  start : Cardinal;
  LastLookup : String;
  oldSeed : Integer;
begin
  OldSeed := RandSeed;
  start := GetTickCount;
  LastLookup := Proc;
  Result := GetTickCount - start;
  Writeln(Format('%-18s %6d - %s', [name, Result, LastLookup]));
  RandSeed := OldSeed;
end;

function LastKey:String;
var
  ix : Integer;
  s : String;
  OldSeed : Integer;
begin
  OldSeed := RandSeed;
  for ix := 0 to TestCount - 1
   do s := RandomKey;
  RandSeed := OldSeed;
  Result := s;
end;

procedure Test;
const
  Repeats = 10;
var
  ix : Integer;
begin
  for ix := 0 to Repeats - 1
  do begin
    Randomize;
//    RandSeed := 2003112605;
    Writeln;
    Writeln(Format('strings=%d, repeats=%d, seed=%d', [Elements, TestCount, RandSeed]));
    TimeIt(AssignTest,'TStringSwitch');
    TimeIt(AssignTestS2IFunc, 'AnsiIndexText Func');
    TimeIt(AssignStringIndexFunc, 'StringIndex');
    TimeIt(AssignTestS2I, 'AnsiIndexText');
    TimeIt(AssignStringList,'TStringList');
    TimeIt(AssignCacheProc, 'TCache Proc');
    TimeIt(AssignCacheFunc, 'TCache Func');
    TimeIt(AssignCacheFuncStack, 'TCache Func Stack');
    TimeIt(AssignCaseStringCache, 'TCaseStringCache');
    TimeIt(AssignCacheString, 'TCache String');
    TimeIt(AssignTestIf, 'If/then/else');
    TimeIt(LastKey, 'LastKey');
  end;
end;

begin
  try
    Write('Press Enter to start: ');
    Readln;

    Test;

  finally
    Writeln;
    Write('Press Enter: ');
    Readln;
  end;
end.

5 comments:

  1. For TStringList, you forgot to include the fix I mentioned in my article, subclass TStringList, override CompareStrings with

    function TCIStringList.CompareStrings(const S1, S2: string): Integer;
    begin
    Result:=CompareStr(S1, S2);
    end;

    "CI" for case insensitive, the dictionary classes use a binary hash on the string, so the TStringList equivalent is a case-insensitive comparison.
    Your code is doing full-blown case-sensitive Unicode comparisons, which as I mentioned in my post are a lot more complex.
    And merely setting the TStringList to CaseSensitive=True isn't enough, you need the override fix.

    ReplyDelete
  2. The last bench results here are about

    TStringList - 3200
    TCIStringList - 800
    If/Then/Else - 600

    ReplyDelete
  3. @Eric - You are right, that makes a massive difference! I'll update the article at the end of the week. Too much to do this week.

    I wonder, is it possible to optimize the hash for the dictionary, further? Hash tables should be faster than binary searches on a list, or what?

    ReplyDelete
  4. Doubtless the hash could be improved, the TDictionnary uses HashLittle which is probably too complex for such a case. But a simpler hash would increase the risk of collisions, which could cripple the reliability of the dictionary performance in practical use.

    However in the grand scheme of things, I'm not convinced a hash can "win" in such a case, the strings are too short, and the complexity of a generic hash dictionary too high. It has to face a O(Ln n) binary search, which scales very well for both low and high entry counts.

    A hash dictionary needs both very complex comparisons and dynamic content, but here the content is static, so all the extra data structures of the dictionary, tailored for "dynamicity", are just a waste of RAM and CPU.

    If you were to add/remove strings constantly f.i., the TStringList underlying list structure would make it quite unsuitable. But then, a binary tree could probably be a better choice than a dictionary if the string are still of the same complexity.

    ReplyDelete
  5. A short note, from just reading the code : If I'm not mistaken, the FCache variable can be removed.

    And perhaps it's a good idea to let Lookup return a boolean (and return the value via an out variable), so you don't have to check for DefaultValue results (and can thus remove that too, unless you really need that sort of functionality).

    Oh, and some might call it dirty, but I rather like how you (ab)use the indexed default property to function as a repeatable setter!

    ReplyDelete