In a recent newsgroup posting I promissed that I will write a Delphi 2009 IDE plugin that automatically change the meaning of string and Char to RawByteString and AnsiChar the source code. My plan is to keep the editor and the files on disk untouched and only ansify the buffer that is handled to the IDE compiler.
The tool emulates what a real unicode compiler switch would do. Nothing more (nothing less)
What this plugin has to do
- The parser must ignore identifiers in comments and strings. (Just in case somebody thinks the plugin will destroy his string constants)
- Is enabled by using {$ANSISTRINGS ON} and disabled by using {$ANSISTRINGS OFF}
- It must replace all identifier “string” with “RawByteString”
- It must replace all identifier “Char” with “AnsiChar”
- It must replace all identifier “PChar” with “PAnsiChar”
- It must surround all constant strings with “RawByteString(str)”, otherwise the Compiler will prefer the Unicode versions of the RTL functions or has to do some runtime typecasts.
- It must surround all constant chars with “AnsiChar(ch)”, otherwise …
- It must undefine the UNICODE define This can be achieved by placing a “$UNDEF UNICODE” (*.inc files are also passed through the UUSwitch).
- It must add the “AnsiStrings” unit to the uses clause after “SysUtils”. Maybe an own implementation with RawByteString and some helper typedefs for the “ansification”. This is a lot more than the compiler switch would have done
- Ansified *.dcu files should be written to “$(DCUOutput)\ANSI”. Otherwise a mix of ANSI and Unicode *.dcu files will get you in trouble. This is not necessary with the $ANSISTRING switch
- It should be possible to specify alternative search paths for ANSI *.dcu files This is not necessary with the $ANSISTRING switch
- It must remap all WinAPI Wide functions to the appropriate Ansi functions This could be achieved by inserting a WindowsAnsi unit that does the mapping. This is a lot more than the compiler switch would have done.
- It must remap all WinAPI Wide structures to the appropriate Ansi structures This could be achieved by inserting a WindowsAnsi unit that does the mapping. This is a lot more than the compiler switch would have done
Anything else that I’ve forgotten?
Issues that will occure due to the changes in the buffer
- The column info in compiler messages (hints, warnings, errors) are wrong if the line was altered in the buffer. This could be worked arround by keeping track of all changes and then incrementing the column info appropriately.
- The debugger’s column info is also wrong (if that is ever used)
- Code Insight will have a major issue with the wrong column info. The best solution for this would be to let Code Insight work on the unicode code or using some tricks like not surrounding constant string/char and using “zzzrbs” for “RawByteString”, “zzza” for “AnsiChar” and “zzzpa” for “PAnsiChar” to keep the length of the identifier identical to “string”, “Char” and “PChar”.
What about the command line compiler
That one is a different beast. There are ways to get the Ansifyer into the dcc32.exe process (DLL-injection) but the command line compiler will be the last thing I’ll work on. Especially because this project has only one thing in mind: Show you what issues you would have if CodeGear would have implemented the Unicode switch. If it turns out that most “ansify” issues are insignificant we could file a petition to get the switch but first I have to write this plugin.
Add a comment if you have any further ideas (or concerns).
It should be possible to specify alternative search paths for ANSI *.dcu files
• It must undefine the UNICODE define
• It must remap all WinAPI Wide functions to the appropriate Ansi functions
• It must remap all WinAPI Wide structures to the appropriate Ansi structures
• It must check if somebody confused {$IFDEF VER200} and {$IFDEF UNICODE}
“Anything else that I’ve forgotten?”
What about Char to AnsiChar and PChar to PAnsiChar?
Q
# It must undefine the UNICODE define
If you use the UNICODE define you really want to migrate to UNICODE and not ansify the code. That is my point of view. Ignoring UNICODE the VER200 is a non-issue.
The other points are added.
# What about Char to AnsiChar
Already handled.
# PChar to PAnsiChar
Added.
On a second thought the undefine UNICODE could make sense, but if somebody has used UNICODE if he really meant VER200 then it’s his fault. A Unicode compiler switch wouldn’t check that either.
What would you do with PromptForFileName, …?
var
S: string;
begin
PromptForFileName(S);
end;
PromptForFileName(var S: string, …) doesn’t have an Ansi Overload, but expects a (Unicode)String.
What about StrAlloc or other Delphi functions that return a PChar?
If you interact with Unicode code you must explicitly use UnicodeString. The plugin does not really ansify your code, it just changes the definition of “string” to “RawByteString” (=old AnsiString). It does exactly what a Unicode compiler switch does. Nothing more (, nothing less). Actually by mapping the WinAPI Wide functions to WinAPI ANSI functions is already a lot more than the compiler switch would have done.
And your code is one example for the issues you run into if CodeGear would have implemented the Unicode switch.
What do you want to prove?
I think it was the right decision by CodeGear to force us
to the age of Unicode. Be honest – with a magic switch we
would stay in the Ansi world forever (at least much longer).
Sure it’s a challenge and i am sure, if someone can do
it this way – your are the men.
Andreas you have better things to provide for the community.
Please don’t waste your time on this project.
A very great idea.
However, the easiest way to Ansi/Unicode switching is using D2007+D2009.
That’s a lot of work to stay in the past. How about putting the effort towards moving forward?
There are those, like me, that throws a tantrum, lie down and kick with our legs, because we don’t want to spend time on all these Unicode stuff as it does not make us money, cause we’re small and need to develop new things quickly to survive.
Thanks Andreas, we need some help like this.
What to do with the RTL functions that only have an Unicode variant and no Ansi variant anymore.
As an alternative to your ANSI dcu subdirectories and other manipulation, you could enable your ansification upon finding a custom compiler switch, which would then be active *only* for the unit.
Leave the task of having Ansi/Unicode version to the developpers, this way the conversion can take palce incrementally. Ie. the switch wouldn’t be global, but per-unit. For units without WinAPI calls, the WinAPI ansificiation would be unnecessary (and could be handled manually).
Of course other unit would see an ansified unit as exposing RawByteString, but that’s quite manageable, and you could thus ansify only legacy libraries (allowing to port them later on, if so desired).
Am I right in assuming this would allow a D2009 .exe to run on older versions of Windows? (W95/98/ME)?
If so, and if you guys get this to work, I may just take the plunge and move up to D2009, despite the price here in Europe – I want to use the Generics!
Nice try, but an D2009 IDE plugin for using the D2007 RTL and compiler if D2007 is installed would be much worth it.
@Ritsaert Hornstra:
# What to do with the RTL functions that only have an Unicode
# variant and no Ansi variant anymore.
The switch only replaces the “string” in your units. Units that are compiled without the switch (like the VCL and RTL) are still Unicode units. And it is your job do change your code in a way that the “ansified” units work with the unicode units.
@Eric:
# Ie. the switch wouldn’t be global, but per-unit.
That’s excatly what I came up this morning when I woke up. A {$ANSISTRINGS ON} and {$ANSISTRINGS OFF} in the units could be implemented. This way you can enable the “ansification” for a single function. And the unit is always compiled in the correct Unicode/ANSI mode. One limitation would be that you cannot disable the switch by IFDEF-ing it because this would require a full featured parser that also can read *.dcu files because of the {$IF declared()} syntax. (Somthing only CodeGear can do)
@Ken Knopfli:
# Am I right in assuming this would allow a D2009 .exe to run on
# older versions of Windows? (W95/98/ME)?
The RTL and VCL will be still bound to the WinAPI Wide functions what makes Win9x/ME a no-go.
@No One Special:
Have a look at my “Compiler Plugin” tool.
How about WideString?
“The RTL and VCL will be still bound to the WinAPI Wide functions what makes Win9x/ME a no-go.”
Pity. I guess it’s back to UnicoWS.dll – which involves patching the .exe
Thanks for replying.
@Edwin Yip:
WideString is a WideString. I don’t think a unicode compiler switch would do anything with it, especially because it existed before CodeGear introduced the UnicodeString.