The (unofficial) Unicode Switch is now available

By | November 23, 2008

Now DLangExtensions 2.0 BETA has a Unicode Switch extension. By keeping it the way a real compiler switch would have worked, it was easy to implement. It took me less than 2 hours. There is no command line compiler extension yet. The old dcc32le.exe used a technique that wasn’t stable and could result in crashing the dcc32.exe process. And I don’t know if I will spend time changing the code. I rather work on a Delphi 2009 version of DFMCheck.

Because the IDE plugin is still in BETA (and I don’t think that it will ever leave this state) you must install the IDE expert by hand or you use GExperts to install it.

Download:

Name IDE Version File Size Downloads Added
DLangExtensions 2.0 BETA 2009 DLangExtensions2009BETA.zip 187.32 KB 1753 times 2008-11-23

The description of all features of DLangExtensions 2.0 can be read in the Readme.txt

WARNING: No official stable version is released. The development snapshot builds are work in progress and may not be stable.
Use at your own risk.

Installation (by hand)

  1. Copy DLangExtensions.dll and the CompileInterceptorW.dll to $(BDS)\bin
  2. Optional: Copy DLangExt.exe to $(BDS)\bin
  3. Open Regedit and navigate to the HKCU\Software\CodeGear\BDS\6.0\Experts key (you must create the “Experts” sub-key if it doesn’t exist)
  4. Add a new String-Value to the registry key with the name “DLangExtensions” and the value to “Your BDS Directory\bin\DLangExtensions.dll”.


Description

The Unicode Switch changes the meaning of “string”, “Char” and “PChar” to “RawByteString”, “AnsiChar” and “PAnsiChar”. It also typecasts all constant strings and constant characters to “RawByteString” and “AnsiChar”. This reduces the number of implicit typecasts by the compiler and the ANSI functions are called and not the Unicode versions that the compiler prefers.

Enabling/Disabling the ANSI mode

The ANSI mode is activated by the new ANSISTRINGS switch.

{%ANSISTRINGS ON/ACTIVE/OFF}

The switch uses the %-character instead of the $-character because otherwise Error Insight will show you errors.

 

Switch Description
ON enables AnsiStrings, undefines UNICODE, adds alias types for Code Insight
ACTIVE enables AnsiStrings, undefines UNICODE (CodeInsight and Compiler column-info can be wrong)
OFF disables AnsiStrings, defines UNICODE

Limitations

  • The {%ANSISTRINGS ON} must be inserted below the “uses” clause in the “interface” or “implementation” block. Thus it can’t be used between “begin” and “end”. This is because the switch must declare type aliases to keep the column information correct
    {%ANSISTRINGS ACTIVE} doesn’t have this limitation.
  • The switch cannot be IFDEFed. IFDEFs are ignored.
  • The switch does not “ansify” your code it just changes the meaning of “string”, “Char” and “PChar” to “RawByteString”, “AnsiChar” and “PAnsiChar”. Nothing more, nothing less.

23 thoughts on “The (unofficial) Unicode Switch is now available

  1. Esteban Pacheco

    Good Job CodeGear!

    This kind of features allow us users to slowly move into D2009. Keep up the great work!. Kudos to the entire team behind this Beta version of the “Unicode Switch”.

  2. Michael

    “Good Job CodeGear!”

    Esteban, I guess you misunderstood something, didn’t you? This is not the work of CodeGear but the work of a private person which does the job CodeGear should have done!

  3. Kyriacos Michael

    Thank you Andreas for all these great features.
    All the best.

  4. Eduardo

    Hi Andy!!!, you made an excellent work. Unicode Switch are fantastic. One more time you made the work of the CodeGear. Congratulations!!

    🙂

  5. Andreas Hausladen Post author

    This Unicode switch implementation is only there to proof that such a switch introduces lots of pitfalls and issues because you still have to interop with a Unicode RTL and VCL. Believe me that this switch doesn’t allow “users to slowly move into D2009”. Especially because “The switch does not ‘ansify’ your code”.

  6. Xepol

    Once again proving what all of us claimed at the start of the D2009 public feedback process – it was perfectly possible to add a switch in to make these changes not just easily, but safely by aliasing the types.

    However, unless CodeGear pulls its collective head out, using this 3rd party hack will just delay the inevitable migration. We all need to pressure CodeGear to adopt this approach directly in the compiler so we can depend on it.

  7. Andreas Hausladen Post author

    # it was perfectly possible to add a switch

    They (CodeGear) never said that it was impossible. But with Unicode as a breaking change you get a much cleaner code base where all the misuses of strings for Byte arrays and PChars for pointer arithmetic are wiped out. And the best of this is, if you have done one successful migration, all the other migrations are straight forward. DLangExtensions (the non-IDE-interop code) was migrated in about 30 minutes and in DLangExtensions the string handling make 95% of the whole code base, converting from ANSI to UTF8, parsing UTF8 and also using UTF16 for things that come from the IDE.
    Now with Delphi 2009 I can clearly say what function uses UTF8 and that the UTF8String really contains UTF8 data. With Delphi 5-2007 this was more or less guessing and hoping that the string contained the text in the correct encoding. And I even found some functions that thought that they were parsing UTF8 strings while the input string was ANSI encoded. This bug never showed up because most source code doesn’t use non-ASCII characters for identifiers.

  8. Lilix

    Hi Andreas, i can use Unicode Switch in all files of VCL and RTL?

  9. Andreas Hausladen Post author

    No, because the RTL and VCL are already compiled. They are Unicode and stay Unicode. The switch is only a “unit local switch” and works in your own units. Activating the switch in Unit A doesn’t activate it for Unit B.

  10. Lilix

    ok, if i put the Switch in all files of RTL and VCL e compile it’s work’s fine?

  11. Andreas Hausladen Post author

    No, because the RTL and VCL already know about UnicodeString (UTF16). They use functions to convert from one encoding to another. Using the Unicode switch in such a scenario will lead to data loss during the string conversions. In order to “ansify” the RTL and VCL you would have to spend an enormous amount of time to make them ANSI compatible, not speaking of the assembler functions and the WinAPI function calls.

  12. Anonymous

    ok Andreas, i wait codegear add a compiler option to compiler map string to AnsiString, in ten years maybe.
    🙂

  13. Andreas Hausladen Post author

    They would have to face the same problem and open up the problem for all 3rd party vendors. I hope and think that they will never implement this switch.

  14. Marius

    LOL, this is really cool, i really like the idea as once again somebody proves CG wrong. IMHO they should just have added an additional unistring type instead of changing the default types (which in the end creates more pain then its wurth)

    And btw; your switch would really work here as we have all third party sources available 😉 (shame we uninstalled delphi 2009 already)

  15. Josep

    Hi Alvaro… if i copy Dcc32.exe of Delphi 2007 Bin folder and past into bin folder of Delphi 2009 i have Delphi 2009 compilling equals Delphi 2007?

  16. JB

    Am I understanding this correctly, even with this switch I will still have tremendous pain converting my Delphi 7 apps to Delphi 2009?

  17. Andreas Hausladen Post author

    @Josep: This doesn’t work, especially because the IDE uses the dcc120.dll which is not interface compatible to the dcc100.dll from Delphi 2007. Furthermore dcc32.exe/dcc100.dll cannot use Delphi 2009 compiled units and packages.

    @JB: It’s the opposite, this switch will cause you more pain. I only implemented it to proof that such a switch isn’t the thing we need.

  18. JB

    I think CG must bring out another version where Unicode string is a seperate string type, that makes much more sense to me. That way there is no pain, just an additional feature.

  19. Andreas Hausladen Post author

    And all 3rd party vendors have to support both? Don’t you think that this would explode the price for components, not talking about the price for Delphi itself.

    Twice the time to develop and test => twice as much money.

  20. Jens Mühlenhoff

    I hope you haven’t just opened up Pandora’s Box. When I read the comments here, I think it’s amazing how people just don’t see the implications of having two sets of compiled VCL/RTL around. Of course Codegear must be very evil for not providing such a switch ;).

  21. rap

    Thanks for the unicode switch. By the way, is there any Delphi 2007 -> Delphi 2009 source conversion utility (making our D2007 source ready for D2009)?

  22. Fabricio

    Andreas,
    Seems you’re trying to say: “Man, this was fun but is a proof that an official ANSI switch really will cause more damage than helping reducing it”.

Comments are closed.