Delphi 2009 and UTF8

By | December 16, 2008

Today I came to a point where I had a UTF8 string in Delphi 2009 and I wanted to use StringReplace to escape some characters. No big deal.

var
  S: UTF8String;
begin
  S := UTF8Encode('Hello'#10'World!');
  S := StringReplace(S, #10, '\n', [rfReplaceAll]);
end;

The compiler compiled this piece of code without an error but with two warnings about implicit string conversions. Ok, UTF8String is an “AnsiString”, so let’s add the AnsiStrings unit to the uses clause to get access to the AnsiString-StringReplace overload. But what’s that? The two warnings do not disappear. The compiler prefers the Unicode version so let’s change the call to explicitly call AnsiStrings.StringReplace. This doesn’t help either. The opposite happens, now there are four warnings and one with a potential dataloss.

By looking at the generate code in the CPU view, I saw what the compiler has done to my code. It converts my UTF8String to an UnicodeString and then to an AnsiString(CP_ACP). It calls StringReplace and the returned AnsiString(CP_ACP) is converted to an UncodeString and back to my UTF8String. This doesn’t sound good and as if the StringReplace function wasn’t a slow function by itself, this string conversion slows down the call too much.

As a result this simple call to StringReplace is now:

var
  S: UTF8String;
begin
  S := UTF8Encode('Hello'#10'World!');
  S := RawByteString(AnsiStrings.StringReplace(RawByteString(S), #10, '\n', [rfReplaceAll]));
end;

2 thoughts on “Delphi 2009 and UTF8

  1. Michael

    Hi Andy,
    I guess you already added this to QC, right 😉

  2. Jens Mühlenhoff

    Hello Michael,
    for this and for other reasons Codegear should do this:

    QC #68764 Routines in AnsiStrings.pas should be declared with “RawByteString” parameters

    The routines in AnsiString.pas use AnsiString(CP_ACP) instead of RawByteString which causes this problem. Please vote for it.

Comments are closed.