Delphi Programming

and software in general.

Thursday, July 14, 2011

Weird code snippet #2: Generic Double Linked List

They say Generics and pointers don't mix.

type
  PMyThing<T> = ^TMyThing<T> // [DCC Error] E2508 type parameters not allowed on this type
  TMyThing<T> = record
     Thing: T;
  end;

Ok, they don't. But there is a loophole!

type
  TMyThing<T> = record
    Thing: T;
    NextThing: ^TMyThing<T>;
  end;

Why this is allowed, I don't know. Just like I don't really understand why the first one is forbidden. There is probably some good explanation for it.

Still - it can be fun breaking the rules!

unit GenericDoubleLinkedList;

interface
uses
  Classes, Generics.Defaults;

type
  TLinkVisitor<T> = reference to procedure(const Item: T);

  TDoubleLinked<T> = record
    Value: T;
    PrevLink: ^TDoubleLinked<T>; // Hey, it compiles!
    NextLink: ^TDoubleLinked<T>;
    constructor Create(aValue:T);
    function Add(aValue:T): TDoubleLinked<T>;
    function HasNext:Boolean;
    function Next: TDoubleLinked<T>;
    function HasPrev:Boolean;
    function Prev: TDoubleLinked<T>;
    function First: TDoubleLinked<T>;
    function Last: TDoubleLinked<T>;
    procedure ForEach(const Proc: TLinkVisitor<T>);
  end;

  procedure Test(const Log:TStrings);

implementation

{ TDoubleLinked<T> }

constructor TDoubleLinked<T>.Create(aValue: T);
begin
  Value := aValue;
  NextLink := nil;
  PrevLink := nil;
end;

function TDoubleLinked<T>.Add(aValue: T): TDoubleLinked<T>;
var
  p: ^TDoubleLinked<T>; // But this one is not assignment compatible
begin
  p := AllocMem(SizeOf(TDoubleLinked<T>)); // Make space
  p^ := Self;      // Copy current value to allocated block
  Value := aValue; // Set self to new value
  p.NextLink := @Self;
  if Assigned(p.PrevLink) // Fix up previous nextlink
   then Pointer(p.PrevLink.NextLink) := Pointer(p);
  Pointer(PrevLink) := Pointer(p);  // Point back to old value
  Result := Self;
end;

function TDoubleLinked<T>.HasPrev: Boolean;
begin
  Result := PrevLink <> nil;
end;

function TDoubleLinked<T>.Prev: TDoubleLinked<T>;
begin
  Result := TDoubleLinked<T>(PrevLink^)
end;

function TDoubleLinked<T>.HasNext: Boolean;
begin
  Result := NextLink <> nil;
end;

function TDoubleLinked<T>.Next: TDoubleLinked<T>;
begin
  Result := TDoubleLinked<T>(NextLink^)
end;

function TDoubleLinked<T>.First: TDoubleLinked<T>;
begin
  Result := Self;
  while Result.HasPrev
   do Result := Result.Prev;
end;

function TDoubleLinked<T>.Last: TDoubleLinked<T>;
begin
  Result := Self;
  while Result.HasNext
   do Result := Result.Next;
end;

procedure TDoubleLinked<T>.ForEach(const Proc: TLinkVisitor<T>);
var
  Node: TDoubleLinked<T>;
begin
  Node := First;
  Proc(Node.Value);
  while Node.HasNext
  do begin
    Node := Node.Next;
    Proc(Node.Value);
  end;
end;

procedure Test(const Log:TStrings);
var
  List, Node : TDoubleLinked<String>;
begin
  List.Create('One');
  List.Add('Two');
  List.Add('Three');
  Node := List;  // Bad idea
  List.Add('Four');
  Node.Add('ThreeAndAHalf');

  List.ForEach(
    procedure(const Value:String)
    begin
      Log.Add('List: ' + Value)
    end);

  Node.ForEach(
    procedure(const Value:String)
    begin
      Log.Add('Node: ' + Value)
    end);
end;

end.

The problem is that "List" is not a pointer, but the tail item of the list. Hence, a Delete procedure needs to take this into consideration.

Even worse, if you add a second Node variable to point to something in the list, that reference will not be fixed up after adding 'Four', and hence it will take the tail place of the list - for both references, effectively forgetting the 'Four' item.

So, although this was somewhat entertaining, a mix of Generics and pointers probably isn't something we should make use of.

Exercise for the reader: Implement the TDoubleLinked<T>.Delete; procedure.

End question: Why are we not allowed to declare pointers to generic types?

20 comments:

  1. This is a compiler bug, as Barry mentioned here: http://stackoverflow.com/questions/793472/pointer-to-generic-type

    ReplyDelete
  2. Thanks, Stefan! Good to know!
    There is a QC issue on it as well: http://qc.embarcadero.com/wc/qcmain.aspx?d=66584

    ReplyDelete
  3. I don't understand the Delete issue, or more exactly, the reason List and Node are declared as TDoubleLinked rather than ^TDoubleLinked. Could you explain...?

    ReplyDelete
  4. My eyes, they are bleeding.

    W

    ReplyDelete
  5. Chris - the ^TDoubleLinked<T> will not be assignment compatible as the pointer type cannot be declared, hence the non-pointer ref.

    The Delete issue is that since the tail end is a non-pointer, it cannot be disposed, so you would have to shuffle data from the "neighbour" pointer reference into the List, and dispose that pointer reference instead.

    Warren - I know... would be a lot nicer if generic pointers actually were allowed.

    ReplyDelete
  6. I realise what the Delete issue is, but it only arises due to how you've decided to code the list itself, which is weird in itself IMO (e.g., in having Add change the contents of Self 'in place' and returning a pointer to what was Self, rather than keeping Self as it is and returning a pointer to the new node). If you were to write the list more straightforwardly, Delete becomes very easy.

    As I wasn't sure how to post code in a Blogger comments box, I've blogged about the topic myself instead: http://delphihaven.wordpress.com/2011/07/14/weird-in-more-ways-than-one/

    ReplyDelete
  7. Chris - as commented on your blog - only by sacrificing type safety using raw pointers - which remove the reason for using generics alltogether.

    ReplyDelete
  8. Only structured types can be genericized. A pointer is not a structured type.

    When you declare:
    type
    TGenRec = record x:T; end;

    nothing will happen. The compiler does accept it, because there is a parameter T and a usage x:T;

    But when you then declare:
    var
    rec : TGenRec;

    a new type is created called TGenRec;

    When you try:
    type
    PRec = ^TGenRec;

    There is no instance usage of T, so the parameter can never be applied in response to a type argument.

    This is why you cannot declare a pointer to a generic record.

    ReplyDelete
  9. Lars - as I say over there, if type safety is your utter concern, you shouldn't be using pointers at all.

    ReplyDelete
  10. I didn't. At least not on the outside...

    ReplyDelete
  11. Corrie - I think Blogger ate your <T> so it is a bit hard to follow your example.

    But consider
    type
      PLink<T> = ^TLink<T>;
      TLink<T> = record
        value: T;
      end;

    Are you saying that the compiler would not be able to understand how to interpret the referential integrity of a variable of the type PLink<T>?

    ReplyDelete
  12. Hi Lars,

    I was tried to correct my post, but I had network issues:( Yes, Blogger ate the angle brackets. I will try again, and explain in another way.

    Consider this:
    type
    TRec1<T> = record x:T; end; // no type exists yet
    TRec2<T> = record x:T; end; // no type exists yet

    1. No types exist yet. This is important.
    2. Also the T in TRec1 does not in any way relate to the T in TRec2.

    var
    r1 : TRec1<Integer>; // Only now is a type created

    There is nothing to link the following pointer declaration attempt to:
    type
    PRec1<T> = ^TRec1<T>; // cannot compile

    This is why the following has to be done to compile:

    type
    TRec3 = TRec1<Integer>; // Only now, when there is actually a type argument, is a type TRec1<Integer> created

    So now only can this compile:
    type
    PRec3 = ^TRec3; // Declaration can precede TRec3 as well, it will still work

    PS: The preview showed all of the angle brackets, and all of the types correctly. I hope the actual final post will still look correct!

    ReplyDelete
  13. Right - I get that this doesn't work now - but I don't see how it can't be made to work technically.

    It might be complicated for a single pass compiler, but basically it is a matter of unwrapping nested declarations. Nested declarations in general are not allowed and probably for a good reason. It would quickly become possible to construct a huge dependency tree that way, but if the compiler applied the same model for records as for classes - we would get a very powerful, flexible, and type safe system for extendable memory structures.

    For a class:
    type
      TBaseClass<T> = class
        property Value:T;
    end;
    you can't do
      TExtClass<T> = TBaseClass<T>;
    but you can do
      TExtClass<T> = class(TBaseClass<T>);

    Staying with that model, it would work to allow
    type
      RBaseRec<T> = record
        Value:T;
      end;
      PBaseRec = ^record(RBaseRec);

    The ^ is simply notation for how the instance will be referenced.

    Aggregated records would be really handy for protocol buffers.

    ReplyDelete
  14. bah... forgot to escape the brackets again in that last ^record(RBaseRec<T>);

    ReplyDelete
  15. And again...
    PBaseRec<T> = ^record(RBaseRec<T>);

    ReplyDelete
  16. No, I don't think you understand.

    It is impossible to compile this in Pascal:
    type
      PRec<T> = ^TRec<T>;
      TRec<T> = record x:T; end;

    There would have to be a rule that the T must be unique across all generic type parameters in at least unit scope for this to work.

    So there would have to be a rule that this is illegal:
    <pre>
    type
      TRec1<T> = record v:T; end; // the usual
      TRec2<X> = record v:X; end; // still ok
      TRec3<T> = record v:T; end; // Not ok under hypothetical new rule. T has been used already.
    </pre>

    ReplyDelete
  17. Hmm. I really don't follow you here.

    AFAIK, the left side T is declarative, and right side T is referential, and the scope is limited to the construct, hence you can reuse T as much as you like across declarations?

    ReplyDelete
  18. Yes, but every type declaration is a new beginning. You cannot use T in a record definition, and later on refer to the SAME T in another record definition.

    See?

    In the same way, you cannot use T in a record definition, and then later refer to the SAME T in a pointer definition.

    So, in the following code:

    type
      TRec1<T> = record v:T; end;
      TRec2<X> = record v:X; end;
      TRec3<T> = record v:T; end;

    The first T is only for TRec1. You cannot reference that T in any of the subsequent record declarations.

    Exactly like the T <> X in the above snippet, and no record can refer to the X except for TRec2.

    And the first T (of TRec1) is NOT the same T as TRec3.

    And for that very same reason, no other type can refer to the X (or the T).

    That is why, in the following snippet:

    type
      TRec1<T> = record v:T; end;
      PRec1<T> = ^TRec1<T>

    ...the two T's are not the same T.

    And they can never be the same T under current Object Pascal syntactical rules.

    The addition of requiring that all generic params (the T) be unique would make it possible to correctly associate PRec1<T> with TRec1<T>, but otherwise it's impossible.


    You see, a key concept to note is that the following declaration:
    type
      TRec1<T> = record v:T; end;

    DOES NOT cause a type to be created.

    ReplyDelete
  19. Did you not read "There would have to be a rule" and " hypothetical new rule" ?

    It is obvious that I'm suggesting a way for your generic pointers to work with a new syntax rule.

    Of course it is true that "the scope is limited to the construct, hence you can reuse T as much as you like across declarations"

    But that is one of the two things that disallow the generic record pointers your article suggests.

    The other thing is that a generic type is not a real type until the T becomes an actual type argument.

    ReplyDelete
  20. Ok. Got it. Not possible with todays rules and type generation. For it to work - the ^ would have to trigger a new rule, creating one type with two referential models when the parametric type is known after an instance declaration.

    Duplicate references would need to resolve to same type instance, I guess.

    ReplyDelete