The Return of the Byte-Strings

By | October 23, 2013

Delphi’s NextGen compiler (Android, IOS) removed support for UTF8String, AnsiString and RawByteString. But if you look into System.pas you see that those types are still there but Embarcadero makes them inaccessible from outside of System.pas by prefixing them with an underscore that the compiler converts to the at-sign. And you can’t write “@UTF8String” as it is not a valid identifier.

By patching DCU files it is possible to make those hidden types accessible. And guess what, the compiler generates correct code for the “unsupported” strings.

The unit System.ByteStrings reintroduces:

  • ShortString
  • AnsiString
  • AnsiChar
  • PAnsiChar
  • PPAnsiChar
  • UTF8String
  • PUTF8String
  • RawByteString
  • PRawByteString

Usage:
Add the System.ByteStrings.dcu’s path to the compiler’s search path and add the unit to your uses clauses.

There is no *.PAS file because the DCU is patched with a hex editor to get access to the hidden types.

Name IDE Version File Size Downloads Added
System.ByteStrings XE5 RTM/UP1 only XE5ByteStrings.7z 2.45 KB 1578 times 2013-10-23
System.ByteStrings XE5 UP2 only XE5Up2ByteStrings.7z 2.85 KB 1504 times 2013-12-20
System.ByteStrings XE6 XE6ByteStrings.7z 2.89 KB 1337 times 2014-04-16
System.ByteStrings XE7 XE7ByteStrings.7z 2.89 KB 1669 times 2015-01-20
System.ByteStrings XE8 XE8ByteStrings.7z 3.69 KB 1725 times 2015-04-16
System.ByteStrings 10 Seattle D10ByteStrings.7z 3.67 KB 1906 times 2015-09-01
System.ByteStrings 10.1 Berlin D101ByteStrings.7z 3.72 KB 2377 times 2016-05-31

49 thoughts on “The Return of the Byte-Strings

  1. JoeD

    Wow… didnt know they disabled byte strings. What are we supposed to do if we have single byte text to be worked with? Its always two steps forward, two steps back with Emb. They move forward with android support… and then right back in the other direction with their ‘language improvements’.

  2. Joseph

    They did it because it’s 2013, Unicode reigns, and you don’t need four different string types. Python declared having TWO different string types one of the worst mistakes they ever made and broke compatibility to bring it back down to one. That Delphi went and created four AFTER the Python incident is embarrassing. It took them the same amount of time, too, to decide it was a mistake (that they could have avoided). Now they’re trying to bring the compiler in line with modern language standards. Immutable strings and zero-based strings are some other things that are coming.

    This trick may work now, but don’t expect it to in the future. We can’t live in the past anymore. If Delphi doesn’t grow up soon it’s going to die. A new compiler gives the opportunity to clean up a lot of the language’s mistakes and legacy code. We shouldn’t fight them on this.

    1. LDS

      Frankly, what Python does it’s not of our interest. Python has to rely to bindings written in C for what it can’t access natively, exactly because it is limited by design. Face it: the world doesn’t use a single string type and thereby a tool able to access low-level data must be able to handle different encodings – not a single one. Python was never designed with low-level access in mind, but Delphi is.
      Python is not *the language*, it’s just one language good at doing what it is designed for, and bad outside its “envelope”. “Modern” in your language just means “fashionable”. There’s nothing “modern” in having less options and being forced to many unneed conversions “just because” – or because too many “script kiddies” have no idea about how a CPU work and how data are encoded. I hate programming languages that try to “shield” me from low level details. Thank you, but I’m not a child and I don’t need to be “protected”. A Java-ized, C#-ized and Python-ized Delphi is really useless, not “modern”. It’s a strange idea that being “modern” means “being like everything else – without any advantage”.

      1. Joseph

        >Frankly, what Python does it’s not of our interest.

        And that’s how Delphi has ended up isolated and cut off from the mainstream – we pay no attention to what anyone else is doing. Finally they’re starting to pay attention at Embarcadero. If they had been paying attention to what Python was doing, they’d have heard its creator lament that introducing multiple string types was one of the worst mistakes the language ever made and he was going to break backwards compatibility to fix it. They’d have seen the reports that were produced that saw string-related questions shoot up on StackOverflow, the numerous implicit conversion problems caused by multiple string types, etc. Instead, Embarcadero paid no attention and added twice as many string types as Python did and the same number of years later are also publicly saying that it led to implicit conversions all over the place and was a mistake.

        > Python has to rely to bindings written in C for what it can’t access
        >natively, exactly because it is limited by design.

        The Python reference interpreter is written in C; Python has also been implemented on the .NET and JVM platforms and also self-hosted via a restricted subset of Python called RPython. The reference interpreter’s use of C and C Types was done explicitly to make it very easy to embed in other programs and to call C code from Python.

        >Face it: the world doesn’t use a single string type

        Which languages are you speaking of?

        > and thereby a tool able to access low-level data must be able to handle
        >different encodings – not a single one.

        Python handles different encodings just fine (in fact, easier/better than Delphi). Python 3.x understands that a string is collection of characters (abstract concept) and Unicode is a collection of bytes. Bytes are decoded into strings and strings can be encoded into bytes, but they’re not the same thing. As Mark Pilgrim put it,

        >In Python 3, all strings are sequences of Unicode characters. There is no
        >such thing as a Python string encoded in UTF-8, or a Python string
        >encoded as CP-1252. “Is this string UTF-8?” is an invalid question.
        >UTF-8 is a way of encoding characters as a sequence of bytes. If you want
        >to take a string and turn it into a sequence of bytes in a particular
        >character encoding, Python 3 can help you with that. If you want to take a
        >sequence of bytes and turn it into a string, Python 3 can help you with
        >that too. Bytes are not characters; bytes are bytes. Characters are an
        >abstraction. A string is a sequence of those abstractions.

        Unlike in Python, the length of a string gives the number of characters, never the number of bytes that are used internally to store it (which alone is enough to realize there’s something wrong and confusing about how Delphi is currently handling strings).

        some_bytes = a_string.encode(‘UTF-8’)

        will return a sequence of bytes that represent the a_string variable encoded in UTF-8.

        some_bytes.decode(‘UTF-8’)

        This will convert the bytes back into a string. Taking the length of some_bytes returns the number of bytes used, which makes sense.

        Bytes come in from the outside – you decode them as early as possible. Afterwards you have and work with strings. Only when you need to pass these strings to the outside world again – such as saving to a file – do you encode them back into arrays of bytes. Simple, easier, and you avoid all of the implicit conversion problems that occur otherwise. It’s Unicode conversion issues that are the heart of Delphi’s regular expression bug that results in incredibly long times to perform regex requests – the code is converting back and forth all over the place.

        >Python was never designed with low-level access in mind, but Delphi is.

        Delphi was never designed for low-level access; Delphi was designed to be a grid-based pretty-printing front-end for client/server databases. Any low-level access was simply brought over from Borland Pascal. Delphi has never been marketed as a low-level language and even David Intersimone’s article on the inception/creation of Delphi only talks about its purpose as a VB/Access killer. Delphi 1.0 added the VCL framework and BDE onto Borland Pascal 7.0. The only language changes were high-level object related and were for the purpose of the VCL.

        >Python is not *the language*,

        There is no one language, but this one (eventually) got Unicode strings right after making some major mistakes. Five years after it broke compatibility, it’s still trying to get some libraries to port over to the new changes. It would be nice if we learned from their mistake and don’t go through the same problems.

        > “Modern” in your language just means “fashionable”.

        I can’t agree with that. “Modern” means that state of the art in computer science. Once upon a time structured language concepts – functions – were controversial, and the debate was even called “The GOTO Wars”, in which people like Donald Knuth participated. Assembler people argued that functions introduced too much overhead and code that used them would be too slow. Die-hards produced papers attempting to demonstrate instances in which a certain piece of programming logic could only be expressed via a GOTO or expressed better. People like Knuth invented new structured language concepts to deal with these examples without using GOTO, etc.

        Structured programming won. Pascal itself was intended as a showcase to teach the new structured programming. Pascal itself was “modern” compared to some other languages. Procedures and functions weren’t “fashionable”; they were the state of the art as determined by computer scientists to improve code reuse, productivity, clarity, performance, etc.

        Today “modern” means things like automatic memory management, iteration, type inference, functional programming, object-oriented programming, etc. These are the latest *advances* in computer science. You’re not going to argue that everything that came after BASIC was just “fashionable”, are you?

        >There’s nothing “modern” in having less options

        There is something modern in a language placing emphasis on readability and thus not having numerous ways to do the same thing. It’s this reason that PERL use is declining, as more modern languages are able to be just as or more expressive while being much more readable and maintainable without all of PERL’s exceptions to rules (such as two different comparison operations for strings and numbers!).

        > and being forced to many unneed conversions “just because”

        1) It’s Delphi that does lots of unneeded conversions – just look at its regex library.
        2) I wish those who are arguing against the proposed improvements to Delphi would stop characterizing the opposition’s arguments as “just because” – Marco’s whitepaper didn’t consist of that phrase and it shows that people either aren’t listening to the arguments or don’t understand them.

        > – or because too many “script kiddies” have no idea about how a CPU
        >work and how data are encoded.

        This is the same kind of elitist attitude we also saw during the move away from assembler towards compiled languages. Assembly programmers criticized C programmers, saying that there was no way a compiler could ever produce code as good as hand-written assembler and that it would lead to a generation of stupid programmers who don’t understand anything about computers. Obviously that was nothing but sour grapes and never transpired. Now those who use lower-level compiled languages are using the same discredited arguments against those using higher-level languages than they are.

        Also, it’s Delphi developers who don’t seem to understand that a string and a collection of bytes aren’t the same thing and that they don’t need to worry about how it’s represented internally. Heck, the recent version of Python changed from using UTF-16 internally to UTF-8, 16 or 32 depending on circumstances, and the documentation said that from the perspective of the developer this would result in no need to do anything differently at all!

        >I hate programming languages that try to “shield” me from low level
        >details.

        They’re not “shielding” you; they’re trying to increase your productivity. They’re for people who actually have to get something done. As one article I read recently stated, “We could implement the following in a low-level language like C++ or Java, but who has time for that anymore?” 🙂 You do realize that the whole point of Delphi was RAD, Rapid Application Development? The benefit of the VCL was specifically that it handled lots of the details of Windows development so users could spend more time writing code that related to their problem and less code doing Windows housekeeping. Paul Graham writes that in an ideal language, one would never need to write any code other than what relates to your problem. The original intent of Delphi was to increase productivity. Working at a lower level than necessary is the opposite of increasing productivity.

        >Thank you, but I’m not a child and I don’t need to be “protected”.

        You use a language with static typing, the purpose of which is to protect you from type errors. So you’re using a RAD-oriented, statically typed language but want to work on a low level and not be protected? Sounds like C to me.

        >A Java-ized, C#-ized and Python-ized Delphi is really useless, not
        >“modern”.

        Given that the world is using Java, C# and Python a lot more than Delphi, I’m not sure such a combination (if it had only the strengths of each) would be useless.

        > It’s a strange idea that being “modern” means “being like everything else

        I should hope everyone else would be modern…

        >– without any advantage”.

        That’s simply your characterization, achieved by ignoring all of the advances of computer science since the 1980s. This is why when I brought attention to these types of arguments by Delphi users on Reddit they rolled their eyes and said that people with arguments like this are impossible to discuss things with. Brian Dunning on the Skeptoid podcast made a similar remark recently – he demonstrated how people with a certain theory were able to hand-wave away all evidence that showed they were wrong, and then when asked what kind of evidence they’d need to see to be convinced they were wrong, they cited the exact same things they’d already been shown. There’s no arguing the point with you if you’re going to claim there’s “no reason” for all of modern computer science and language development, even after it’s been explained to you many places before. 🙁

    2. A. Bouchez

      Under Python, it is a bug to have several types of string.

      Under Delphi, thanks to its strong typing, and automatic type conversion, it is a feature!

      1. Joseph

        >Under Delphi, thanks to its strong typing, and automatic type conversion, it
        >is a feature!

        According to Marco Cantu and Mr. Bauer, automatic type conversion leads to a great deal of bugs when type conversions happen when you don’t expect them. Implicit conversion is not the feature you think it is.

    3. Greg Heffernan

      Joseph, your comments completely demonstrate you are a web developer and have very little understanding of what’s under the hood.

      1. Joseph

        1. I wish someone could make a fact-based argument for the status quo, all the more important because the product manager and chief scientist of Delphi don’t see one.
        2. I’m not a web developer.
        3. We don’t program “under the hood”. It’s all binary under the hood; in Delphi it’s reals, integers, booleans, etc. You’re in the driver’s seat. If you’re under the hood, that’s language smell.

        1. Greg Heffernan

          Sorry Joseph, I jumped to conclusions too quickly.

          1. Joseph

            I apologize if my first post was a bit gruff; this debate has happened at least twice already on blogs and EMBT’s own forum and people are still claiming it’s being done for “no reason”, which gets me a little annoyed.

    4. A. Bouchez

      “Unicode reigns” – of course.
      But I susepect you make a confusion between the Unicode standard and the UTF-16 encoding.
      You can use UTF8String = AnsiString(CP_UTF8) which is perfectly Unicode-ready, and fits almost 90% of the web content, for instance.

      The ability to have the choice in Delphi is a very nice feature.
      If you are writing high-level scripts, one single encoding is OK.
      But if you want to serve JSON, XML or HTML, having at hand UTF-8 encoding during the whole string process is IMHO a definitive modern feature.

      Breaking existing code is never a good idea.

  3. Thomas K.

    I have to access zero based strings in an given binary struture, until now this was easy by using StrPCopy, StrPas, etc, a nice clean approach, but with the NextGen Compiler i have to manage this by myself, basicly rewriting everythink from scratch. Generating own simple datatypes and functionality, a frustrating and time consuming process.
    By removing the support fore a varieties of Strings – EMB is removing also alot of flexibility and easy of use. Sorry this is not what i am paying for.

    Thanks for the hack, it is awful to see what EMB is doing.

  4. himselfv

    @Joseph:
    > That Delphi went and created four AFTER the Python incident is embarrassing. It took them the same amount of time, too, to decide it was a mistake (that they could have avoided).

    The trick may have worked with python where you don’t care how your data is stored. But Delphi is a relatively low level language. You have to work with data in a different formats. Sure you can shove everything into string (immutable too) and convert every time, but it’s slow and clumsy. Those types were there for a reason, so that they can be used when needed. int64 is good enough to store any integer, but there are byte, word and dword all signed and unsigned, right?

    Right, it stands to reason that the compiler should have one preferred string type and all RTL/VCL should use it. Novice programmers work with it by default. But why kill the other types?

    Let’s compare it:
    1. Keep other types.
    Pros: backward compatibility, everyone happy, use whatever type suits your needs, speed when working with non-standard types, auto-conversion, no changes, no additional support needed.
    Cons: older code written explicitly for AnsiString sometimes does not work with Unicode.
    2. Kill other types.
    Pros: a load off Delphi devs mind, “phew guys, now we only have one string type. Why did we need that, again?”
    Cons: broken compability; no code written explicitly for AnsiString works at all; lots of valid code which used string types does not work either; programmers rage and switch to C# or Oxygene.

    > A new compiler gives the opportunity to clean up a lot of the language’s mistakes and legacy code.

    You know what my problem with Delphi is? They didn’t clean any of the *real* mistakes. The whole mess with the strings could have been avoided and transition made smoother *if only they finally introduced fast custom types*, 15 years after the language was devised. Just yeah, that thing C++ programmers have for ages. That thing we’ve been struggling without for so long, forced to resort to either slow and manually managed classes, even more slow and clumsy interfaced classes or use records and do everything by hand (and it’ll still be slow).

    Just unify all the objects, records, classes, everything into one, like C++ does (with some specifics to keep compatibility) and give a clear rules when it’s stack allocated, when it’s heap allocated, how exactly it is initialized (with no code running except for what the programmer writes). Then alias existing types to it, e.g. TObject = pointer to (record with all the standard TObject stuff).

    There are no downsides to this one, only upsides. Less compiler code to support, less reimplementing of features (e.g. operator overloading) to do, more power to the programmer. And you can move out all additional string types into their own libraries called “Deprecated.AnsiString” and “Deprecated.RawByteString” if you so wish, and they’ll be as fast as cheap as Delphi strings were.

    But no, “Language needs to change, let us do this completely random thing and drop support to stuff which bothered no one”.

    Sorry for the rant.

    1. Mason Wheeler

      Wait… you say “let’s adopt the C++ model, and there are no downsides, and it would simplify the compiler”?!? Do you not know that C++ is about the most difficult language to compile ever created? And yes, a lot of it is because of its insane, horribly complex template model, but a lot of what* left is the way it uses a completely wrong object model.

      You cannot have oblects as value types in an OO system, period. C++ tries, and fails miserably, bloating up the language with all sorts of messy new complexity. You do that, you end up with all the horrors of C++: copy constructors and assignment operators required everywhere, RAII and all the messes it brings, broken polymorphism, dangling references everywhere, etc.

      TP *tried* this model, but the team was smart enough too abandon it quickly before it gained widespread adoption, and give us an object model that actually works.

        1. Mason Wheeler

          I saw that post. The problem is that objects-as-value-types and polymorphism don’t mix, and without polymorphism there is no OOP. There’s a reason why every OO language ever developed (except for C++) has had objects-as-reference-types: it’s the only way to do it right.

          1. A. Bouchez

            “without polymorphism there is no OOP”…
            Are you sure you are not making a confusing between polymorphism and inheritance?

            In Delphi, the only way we can do polymorphism (as in C#) is by using interfaces.
            But inheritance is a valid OOP pattern, right? Otherwise I missed the whole OOP design since the beginning.

            You can use objects-as-value-types and inheritance with no problem. It is pretty common and usefull to inherit from common ancestor properties and methods. Which sounds to me like valid OOP.

          2. Mason Wheeler

            No, actually inheritance is the problem I had in mind. Inheritance and objects-as-value-types do not mix, and the reason why is obvious when you think about it for a moment.

            Let’s say I have a class TParent, and a class TChild that adds a new field. If I’m calling a method that expects a value type, the object has to be copied, and that copy has to be a certain number of bytes. But TChild has more bytes than TParent does! So it’s not possible to pass a TChild to a method that expects a TParent; the compiler has to come up with some way to pass *only* the TParent part of the TChild object, and if you call a virtual method on it, it has to call the TParent version, not the TChild version. This is where all the mess in C++ about copy constructors and = operator overloads comes from.

            The only way to make inheritance and polymorphism work right is to ensure that all passing that is done is passing a pointer to the object, or in other words, to ensure that the object is always a reference type.

          3. Michael

            Polimorphism is simply about being in the position to call an objects specific behavior without knowing it’s concrete type. An abstract class can act as an interface in practice.

            I would not restrict that to interfaces only.

            Inheritance and polymorphism belong together. Inheritance is the static view.

            I am somehow surprised about string types available but not made available for public use. Wonder why? There is not only one string type anymore and there never was just one. Maybe an attempt to sell something tomorrow that already exists … or not tested – that’s ok – it’s EMBs freedom to ship whatever they like. Maybe they are aiming at removing the one or other string type.

            UTF8 became a certain ‘standard’ on the Internet. Not more not less.

      1. himselfv

        > You cannot have oblects as value types in an OO system, period.
        We have records which can have fields, properties, methods, operator overloading. Are they not objects? They only things they lack are access specifiers and operator overloading.

        I believe Delphi already has a type system as complicated as C++ one, if not more. It’s just *less transparent*. We have three separate non-basic types (object, record, class) each handled differently and with their own subset of features. Record initialization/finalization and copying is as ugly as it gets. People are forced to use dummy interface fields to trigger auto-destruction, that’s saying something about the language!

        All I’m asking for is four simple things:
        1. Be transparent. Do not do anything the programmer has not explicitly asked to do, and that cannot be overridden.
        2. Be consistent. Do not have several things where there could have been one (classes, records, strings). If you started something, do not stop midway (records as objects). If some things have it, let other things have it (interface auto-deinit).
        3. Be pluggable. Let us do the same things basic classes do (strings, arrays). Let us override every bit of automatic code.
        4. Be compact and fast. Or at least give us the option to strip everything down to basics.

        And it just so happens that while C++ is a break-your-leg type language, its objects satisfy all four points. What they lack is just a little of default behavior.

        > and give us an object model that actually works.

        Could you write a string class which would be equally fast (+- 20%), would work like native (copy on write) and could be used in the same fashion?

        Because that could easily have been written with normal object model.

        1. Mason Wheeler

          >We have records which can have fields, properties, methods, operator overloading. Are they not objects? They only things they lack are access specifiers and operator overloading.

          No. Records can have access specifiers, but what they lack is the sine qua non of OO programming: polymorphism. And that only works properly when your objects are value types. Every OO language ever developed (except for C++) has gotten this right, because it’s a very simple, fundamental thing. C++ probably would have gotten it right too if it had been implemented from the ground up as a new OO language, instead of a ham-fisted attempt to tack object-orientation onto C, which was most easily accomplished by implementing objects as “C structs plus a bit of compiler magic.” But since they did go that route, C++’s polymorphism is a huge mess, and the last thing we want is to bring that over into Delphi.

          1. himselfv

            So your objection is that value-type objects must not have inheritance? You agree to other propositions: transparency, consistency, pluggability – everything except polymorphism?

            Let’s get concrete then. You’re saying that C++ polymorphism is “a huge mess” and everyone else “gets it right”, but can you list exact problems with use cases which will make everyone’s life harder? And I’ll suggest what can be done about it. Because maybe it’s not as scary as you paint it?

            There has been lots of time when I wanted to do something with Delphi records/objects and could not. There’s not been a single time like that with C++ objects.

          2. A. Bouchez

            Are you sure you are not making a confusing between polymorphism and inheritance?

            Records do not feature inheritance, which is a pity, since the good old “object” type did – and no need of virtual methods for value objects: just properties and some methods are very handy. And using inheritance allows to write less code, and enhance maintenability.

          3. Iztok Kacin

            Records lack constructors / destructors more then inheritance. Give them that and alsdo inheritance and you basically have objects with ARC and overloaded operators support 🙂

      2. Joseph

        >Do you not know that C++ is about the most difficult language to compile
        >ever created?

        How does that affect us, from the developer side of the aisle? We’re already suffering the legacy effects of single pass compilation. Difficulty to compile is EMBT’s problem, not ours.

        >You cannot have oblects as value types in an OO system, period.

        Mason, you tend to make these sweeping pronouncements that contradict reality. Many object-oriented languages have everything as an object. That’s sort of the apex of object orientation.

        >You do that, you end up with all the horrors of C++: copy constructors
        >and assignment operators required everywhere, RAII and all the messes
        >it brings, broken polymorphism, dangling references everywhere, etc.

        Reminds me of an online exchange from many years ago:

        Person A:
        >[Object Pascal]… has all the advantages of C++ but none of its
        >disadvantages.
        Person B:
        >it does, however, have all the disdvantages of pascal, which kind of
        >outweighs the advantages of not having c++’s disadvantages

        🙂 Delphi has its own list of kludges, ugly syntax, and hair-pulling “features”.

        1. Joseph

          Actually, Mason, I withdraw that statement. I misinterpreted what you wrote; sorry. I thought it was about making items (values) objects rather than about value vs. reference.

    2. Joseph

      >The trick may have worked with python where you don’t care how your data is stored.

      Why do you have to care how your data is stored? You work with reals, integers, booleans, etc., right? Are you worrying that it’s all binary internally? The whole point of a computer language is to present to you a higher level framework than machine code. If you’re constantly finding it necessary to drop down to that level, then there’s a problem with the language/framework.

      >But Delphi is a relatively low level language.

      This is something I don’t agree with. Object orientation, generics, (some) iteration – Delphi is a lot closer to C++ than to C. And given its origins as a rapid application development tool (often forgotten), it wouldn’t make sense for Delphi to be a low-level language.

      I’ve had this discussion on strings at length on Embarcadero’s forum, and no one has ever been able to come up with a credible example of why they NEED more than one type of string. Even here, they’ll resort to ad hominem attacks on me or computer science or other development tools rather than simply present examples that demonstrate why they need multiple string types. Without this, it tends to look like a reactionary position.

      >You have to work with data in a different formats.

      Same as every other language.

      > Sure you can shove everything into string (immutable too) and convert every time,

      When are you converting? If you separate the concept of string and encoding (string = sequence of characters, encoding = sequence of bytes), then you only need to convert once during input and once during output. That’s another issue – fans of multiple string types can’t demonstrate why they need to be doing conversions at any other point in the process.

      >Those types were there for a reason,

      Perhaps. But that doesn’t mean the reason is still valid, if it ever was a good reason in the first place. GOTO was there for a reason too. So was AssignFile, BlockRead, etc.

      >so that they can be used when needed.

      And yet, if no one can explain when they’d be needed, perhaps they’re not or no longer needed?

      > int64 is good enough to store any integer,

      Not any integer > 64bits!

      >but there are byte, word and dword all signed and unsigned, right?

      We don’t need those either. 🙂 Only when you had a million numbers you needed to store might you want to use those types to save some memory.

      >Right, it stands to reason that the compiler should have one preferred string type and all
      >RTL/VCL should use it. Novice programmers work with it by default. But why kill the >other types?

      Because the implicit conversions that occur between them lead to many subtle and hard-to-detect bugs, as chief scientist Bauer has explained. It also leads to greater difficulty in learning and reading Delphi programs, as Marco Cantu the project manager has explained. I still don’t think they’re arrived at the best solution, but it’s better than the existing one.

      >Let’s compare it:
      >1. Keep other types.
      >Pros: backward compatibility, everyone happy, use whatever type suits your needs,
      >speed when working with non-standard types, auto-conversion, no changes, no
      >additional support needed.

      Cons: Language becomes Java-like, loaded with dozens of constructs that no longer are needed but are present for backward compatibility forever. This makes the standard library bloated and hard to learn. New users aren’t happy, because there’s still a ridiculous number of string types. Worse, the “old guard” ignore the concept of “backward compatibility” and continue to churn out code using these string types (just like the many who still haven’t converted to Unicode), forcing everyone else to have to use these string types too. Implicit conversion continues to be a source of subtle errors.

      You also didn’t mention a major con: Embarcadero have to rewrite the compiler and have a small team. This becomes additional work they have to do and additional code they have to maintain forever. As Marco himself explained, no one expects the new Volkswagen Beetle to accept accessories from the original Beetle from the 1970s. That would be ridiculous. As he asked, why then do they expect a computer language to continue to work with 20-year-old code?

      The team is getting a rare chance to clean house in the compiler code and sweep out anything that isn’t needed. The problem is some users don’t want to let anything go. They’re not thinking about the cost of leaving in all this code and the work required to port it to a new compiler. A smaller, leaner, cleaner compiler would be a lot easier to maintain and a lot easier to optimize. Leaving cruft in defeats that purpose.

      2. Kill other types.
      >Cons: broken compability;
      If you have an old project, keep using the old version of Delphi to maintain it. Develop next-gen code with the next-gen compiler. That’s the way it works with other languages and also with frameworks. Qt has a numbering scheme X.y. Code targeting X.y is guaranteed to work with X.y+, but not with Y.whatever. Major numbers can break compatibility. Python does the same. Code targeting 2.3 will be guaranteed to work with 2.3+ but not with 3.0+. You have to be able to break compatibility every once in awhile to grow the language and to fix mistakes.

      >no code written explicitly for AnsiString works at all;

      People have had years to move away from Ansi. If anything, Delphi’s Unicode implementation arrived rather late (but earlier than PHP, which is still working on it!). The thing is, they not only didn’t refactor their existing code, they kept generating more non-Unicode code. They’ve got no one to blame but themselves for using deprecated constructs. We can’t be living in ANSI forever; that was Marco’s point about original Beetles.

      >lots of valid code which used string types does not work either; programmers rage and
      >switch to C# or Oxygene.

      Some folks had a good laugh at this one. They can’t tolerate a little change (strings) so they switch to a new language which will be even more different? 🙂 I guarantee you you’ll be doing more code rewriting to move to C# then to move to Next-gen. Also, as far as I’m aware, C# has one string type, which would be rather ironic indeed. 🙂 Oxygene for .NET also has a single string type of UTF-16, just like what people are complaining about. So how does going to C# or Oxygene make any sense?

      > A new compiler gives the opportunity to clean up a lot of the language’s mistakes and >legacy code.

      >You know what my problem with Delphi is? They didn’t clean any of the *real*
      >mistakes.

      We each probably have different lists of what we consider real mistakes.

      >Just unify all the objects, records, classes, everything into one, like C++ does (with some
      >specifics to keep compatibility) and give a clear rules when it’s stack allocated, when it’s
      >heap allocated, how exactly it is initialized (with no code running except for what the programmer writes). Then alias existing types to it, e.g. TObject = pointer to >(record with all the standard TObject stuff).

      Are you saying one structure? Or just make everything an object?

      1. himselfv

        > Why do you have to care how your data is stored?

        I’m writing an encoder/decoder. I’m compiling older app which stored unicode in an unusual fashion. I interact with a library which takes ANSI. I work with file formats which have Ansi fields. I want to save memory because I have 4kk of 4-char strings some of which are occasionally random length.

        What does it matter? Give me smart tools and I’ll find a smart use; give me dumb tools and I’m disappointed.

        Sure I can do all of the above with TBytes. But it’s clumsy.

        > Delphi is a lot closer to C++ than to C

        Yes, and C++ is low level compared to python or ruby or about anything else. It’s not Visual Basic. It lets you do terrific stuff easily, not code one of the three approved applications with slight changes in UI.

        > Because the implicit conversions that occur between them lead to many subtle and hard-to-detect bugs, as chief scientist Bauer has explained.

        Then don’t use those. Why kill them?

        > It also leads to greater difficulty in learning and reading Delphi programs, as Marco Cantu the project manager has explained.

        Then don’t write default libraries around those. Still not a reason to kill them. On one hand you have “drop compatibility with older apps”, on the other hand “older apps continue to work but you have to learn a bit more to maintain them”.

        > This makes the standard library bloated and hard to learn.

        It doesn’t if you don’t include those into standard library. Or put them into some dark corner of it.

        > New users aren’t happy, because there’s still a ridiculous number of string types.

        They are, because default manual gives you a simple “string” type and only links to others in the advanced section.

        The one argument I can understand is the “so that they don’t have to rewrite it”. But this is why I’m saying they should have just made it possible for us to write it ourselves. It would be a win-win for everyone.

        > Are you saying one structure? Or just make everything an object?

        One object/record/class type. It has access specifiers, operator overloading, constructors, destructors, inheritance, virtual methods, can implement interfaces if instantiated by pointer. Perhaps there are some limitations in place on how polymorphism works with value-type instances. It also has a predetermined, controllable memory layout (like a record).

        Юtype
        record = object;
        class = ^object;
        //standard TObject machinery
        end;
        TObject = class;
        TOldStyleObject = class(TObject)
        //works exactly as before
        end;
        TCheapObject = ^object
        //uses no memory, calls no functions other than explicitly written ones
        end;
        TString = object
        DataPointer: PStringData;
        constructor Create; inline; //increment data.refcnt
        destructor Destroy; inline; //decrement data.refcnt
        end;

      2. Michael

        I think it does not make sense to compare Python and Delphi. Two different animals.

        Strings. Java as well as .net first tried to offer string conversion but in the end both options had to decide for one string type. Anything else was simply not accepted 10 years ago. That’s built into the ‘run-time’ now.

        IT is about information processing and the results of information processing are part of the reality and not information processing itself. It’s not this important.

  5. Wodzu

    Removing support for AnsiString is insane, we have a lot of hardware that generates one byte per character strings… Using unicode for this is a waste of memory and CPU cycles. I think, our company stays in Delphi XE2 or we will move to Free Pascal in the future….

    1. Rodries

      @Wodzu

      I use routines to control modbus automaton using TBytes, so this is not an excuse, but I have to say that I have had to create my own ‘old ansistring rutines’ (copied from regular expressions).

      function CopyBytes(const S: TBytes; Index, Count: Integer): TBytes;
      var
      Len: Integer;
      begin
      Len := Length(S);
      if Len > 0 then
      begin
      if Index Len then Count := 0;
      Len := Len – Index;
      if (Count > 0) and (Len > 0) then
      begin
      if Count > Len then Count := Len;
      SetLength(Result, Count);
      Move(S[Index], Result[0], Count);
      end;
      end;
      end;

      procedure InsertBytes(const Source: TBytes; var S: TBytes; Index: Integer);
      var
      Len: Integer;
      Len1, Len2: Integer;
      begin
      Len := Length(S);
      if Index Len then Index := Len ;

      Len1 := Length(Source);
      Len2 := Len – Index;

      SetLength(S, Index + Len1 + Len2);
      if Len1 > 0 then
      begin
      if Len2 > 0 then
      Move(S[Index], S[Index + Len1], Len2);
      Move(Source[0], S[Index], Len1);
      end;
      end;

      procedure DeleteBytes(var S: TBytes; Index, Count: Integer);
      var
      Len, TailLen: Integer;
      begin
      Len := Length(S);
      if (Index >= 0) and (Index 0 then
      begin
      TailLen := Len – Index ;
      if Count > TailLen then Count := TailLen;
      if TailLen Count then Move(S[Index+Count], S[Index], TailLen – Count);
      SetLength(S, Len – Count);
      end;
      end;
      end;

      1. William Meyer

        I fully agree. Much of my use of Delphi has been in the implementation of control systems. In the past, this meant serial communication on RS-232 and RS-422, and byte-oriented protocols. For EMBT to make such a change is a stunning declaration of ignorance on their part. Not everyone in development makes a living producing office tools.

        1. Joseph

          Let’s be honest: Delphi, from 1.0 onward, was never designed or intended for embedded code or control systems. It was internally called “VB killer” and consisted of taking Borland Pascal with some object changes and adding VCL and BDE to it – high level office software features. The very name, Delphi, was in reference to the Oracle database, and Delphi’s original focus was as a RAD tool for producing front-ends for client-server databases (the edge that VCL and BDE gave it). During the product reveal/press conference, when repeatedly asked why Pascal, Anders basically said that because no one else was using it so they could do what they wanted with it.

          The vast, vast majority of those left are using Delphi for conventional desktop (and perhaps now mobile) applications (Delphi really isn’t up to par for backend server/web apps anymore). Do you really think we should be basing decisions on the fundamentals of the language on what an exotic use a few may be using it for that EMBT isn’t marketing to at all?

          1. himselfv

            I use Delphi for conventional desktop and losing other string types sucks.

            Watching companies such as Opera screw everything while saying “We don’t want to cater to a few geeks, we want larger audiences” is fascinating.

      2. Leif

        Yes, we have tons of applications using different protocols for communicating with byte-oriented code. Much of planned applications will be to bring those into the mobile world. We have a COW wrapper around TBytes, making it act as a pre-unicode AnsiString. Removing AnsiString would make many formatting parts obsolete and code will have to be rewritten.
        The projects are currently in a halt position until AnsiString is reinstated.

  6. Dirk

    Hi Andy,
    works like a charm, you’re the hero!
    I cannot understand why emb makes it so hard let existing code working on NextGen.
    All the language changes (0 based no no 1Byte strings) are soo unnecassary..

    Best regards
    Dirk

  7. Peter

    “I believe Delphi already has a type system as complicated as C++ one, if not more. It’s just *less transparent*. We have three separate non-basic types (object, record, class) each handled differently and with their own subset of features. Record initialization/finalization and copying is as ugly as it gets. People are forced to use dummy interface fields to trigger auto-destruction, that’s saying something about the language!”

    + 1E+9999 !!!

    1. Iztok Kacin

      Yes, how simple would be for them to add constructors / destructors to records. That would make them powerfull. But they did not do it for years. By the way you can hook the System.pas functions and procedures to get that, but its a hack more or less.

  8. Allen

    Thank you, can’t image how can “Delphi” survive without You…^_^

Comments are closed.