Java has been a significant player in software engineering since its inception in 1995. Through the years, it has undergone significant evolution. Among its many features, one key aspect is how Java handles text; in fact, a <span class="teal" >String</span> is a heavily used object in Java programs. On average, 50% of a typical Java heap may be consumed by <span class="pink" >String</span> objects, which is substantial.
This article explores the evolution of string handling in Java, starting from its first release up to the latest version, Java 21.
In JDK 1, Java introduced the <span class="pink" >String</span> class as an immutable sequence of characters, a choice that was made keeping reliability and security in mind. Immutable strings are thread-safe, allowing safe use across multiple threads in multi-threaded applications—their predictability and resistance to tampering secure sensitive data like network addresses and file paths. Java's string pooling, enabled by this immutability, efficiently stores only one copy of each unique string, reducing memory usage.
String concatenation in JDK 1 primarily uses the <span class="teal" >+</span> operator. For example,
However, this method had efficiency concerns, especially for multiple concatenations. Each + operator usage in concatenation resulted in the creation of a new String object. This was particularly inefficient in scenarios like loops where concatenating a list of strings would create a new String object at every iteration, causing substantial performance overhead and increased memory usage.
The above code will create seven string objects to construct one.
To solve the efficiency concern, Java introduced the StringBuffer class, which provides a mutable sequence of characters. This was a game-changer for string manipulation, especially in scenarios involving frequent modifications. For example-
JDK 5 took string manipulation a step further with the introduction of StringBuilder. It was similar to <span class="pink" >StringBuffer</span> in providing a mutable sequence of characters but differed in a crucial aspect.
The major difference between these two is that <span class="pink" >StringBuilder</span> is a bit faster and more suitable for single-threaded scenarios as it's not thread-safe. In contrast, <span class="pink" >StringBuffer</span> is thread-safe and slightly slower due to its synchronized methods, making it ideal for multi-threaded environments. Both offer similar APIs, allowing for easy interchangeability based on your thread safety requirements.
Moving on, JDK 6 and JDK 7 continued to refine string handling, focusing more on performance optimizations rather than introducing new APIs. The major leap in string manipulation came with JDK 8, which introduced lambda expressions and the Stream API, revolutionizing how developers could handle data, including strings.
With JDK 8, operations on collections of strings became more concise and expressive due to lambda expressions and streams.
In this example, we transform a list of strings into a single concatenated string. Each word is converted to uppercase, and then they are joined. This approach is more readable and eliminates the need for manual iteration and string concatenation.
Besides, JEP 192 reduces the Java heap live-data set by enhancing the G1 garbage collector so that duplicate instances of String are automatically and continuously deduplicated.
In Java 9, there has been a significant improvement in how string concatenation is handled at the bytecode level. The introduction of invokedynamic, a special bytecode instruction, has changed the game.
When concatenating strings using the <span class="pink" >+</span> operator, Java 9 and later versions use invokedynamic, which delegates the optimization responsibility to java.lang.invoke.StringConcatFactory#makeConcatWithConstants. This method is more efficient in optimizing string concatenation. Consider the following code:
The equivalent bytecode of the above code, where concatenation happens, would be:
This optimization is a substantial under-the-hood improvement, reducing the memory and performance overhead of string concatenation.
For those interested in the technical details of this improvement, JEP 280 offers an in-depth explanation. This Java Enhancement Proposal details the changes and optimizations brought by <span class="pink" >invokedynamic</span> for string concatenation.
Aside from this, JDK 9 introduced another significant change in the internal representation of strings with the introduction of compact strings with JEP 254.
In Java 8 and earlier versions, strings were represented as an array of characters (<span class="teal" >char[]</span>), with each char occupying two bytes of memory. This representation was not always memory-efficient, especially considering that many characters in Western locales could be encoded using just one byte.
Consider the string <span class="white" "Hello"</span>:
With this idea, the new compact strings encode a string with an 8-bit byte array instead of a <span class="teal"> char</span> array. Unless they explicitly need 16-bit characters, these strings are known as compact strings. Hence, the size of an average string in Java 9 is roughly half the size of the same string in Java 8.
On average, 50% of a typical Java heap may be consumed by String objects. This will vary from application to application, but on average, the heap requirement for such a program running with Java 9 is only 75% of that same program running in Java 8.
This is a huge saving.
Nonetheless, JDK 11 continued to expand the String API, introducing methods like <span class="teal"> strip()</span>, <span class="teal" >stripLeading()</span>, <span class="teal" >stripTrailing()</span>, <span class="teal" >repeat()</span>, and <span class="teal" >isBlank()</span>.
These methods made common string operations more straightforward, reducing the need for external libraries or custom utility methods.
During the releases of JDK 12 to 15, Java focused on incremental improvements and refinements in string handling. These versions introduced several new methods and enhancements to the String class, making string operations more intuitive and efficient.
JDK 12 introduced new methods to the <span class="teal">String</span> class, further simplifying common string operations.
The <span class="teal">indent()</span> method adds or removes spaces from each line in the string, while <span class="teal">transform()</span>allows applying a function to the string.
One of the most significant additions in JDK 15 was the introduction of Text Blocks, which greatly enhanced working with multi-line string literals.
Text blocks simplify the creation of multi-line strings, preserving the intended formatting without the need for escape sequences.
Text blocks can be easily concatenated with other strings or variables, maintaining readability and structure.
Creating complex SQL queries in Java becomes more manageable and readable with the use of Text Blocks. Let's consider an example where we need to construct a SQL query for retrieving data from a database. This query involves multiple joins, conditions, and potentially complex logic:
Imagine this if we had to write in the old ways with the + operator.
JEP 430 introduces String Templates as a preview feature in Java 21. This enhancement aims to simplify Java programming by allowing the combination of literal text with embedded expressions and template processors. It is extremely useful for strings that include runtime-computed values or are composed of user-provided values for systems like databases.
With this, Java developers can now enhance the language's string literals and text blocks with string templates. This new feature aims to simplify writing Java programs, improve the readability of expressions that mix text and expressions, and enhance the security of Java programs, especially those that compose strings from user-provided values.
Let’s explore it a bit in depth.
A new kind of expression called a template expression has been introduced, allowing developers to perform string interpolation and compose strings safely and efficiently. Template expressions are programmable and extend beyond composing strings – they can convert structured text into various types of objects according to domain-specific rules.
In this example, the template expression is prefixed and combined with embedded expressions, providing a safe and efficient way to compose strings.
Unlike traditional string interpolation, which can create security vulnerabilities, Java's template expressions require validation and sanitization of strings with embedded expressions. This approach automatically applies template-specific rules, resulting in safer and more efficient string composition.
For example, consider this hypothetical Java code with the embedded expression <span class="teal">${name}</span>:
If <span class="teal">name</span> had the troublesome value
then the query string would be
and the code would select all rows, potentially exposing confidential information.
To avoid such vulnerability, Java took a safer approach. For example, when composing SQL statements, any quotes in the values of embedded expressions must be escaped, and the string overall must have balanced quotes.
<span class="pink">STR</span> is a template processor defined in the Java Platform. It performs string interpolation by replacing each embedded expression in the template with the value of that expression, converted to a string.
Let's see another example:
This example demonstrates how template expressions can be used to create structured HTML content safely and efficiently.
<span class="pink">STR</span> is a <span class="teal">public</span> <span class="pink">static</span> <span class="pink">final</span> field that is automatically imported into every Java source file.
Alongside <span class="pink">STR</span>, Java introduces <span class="pink">FMT</span>, another template processor with additional capabilities. Like <span class="pink">STR</span>, <span class="pink">FMT</span> performs interpolation, but it uniquely interprets format specifiers positioned to the left of embedded expressions. These format specifiers are consistent with those defined in java.util.Formatter, providing familiar syntax for those accustomed to Java's standard formatting utilities.
The <span class="pink">FMT</span> processor is particularly useful for creating structured and formatted outputs, where alignment and numerical formatting are crucial.
Consider an example where we define a Rectangle record and create an array of these objects. Using <span class="pink">FMT</span>, we can format a table that neatly displays the properties and computed area of each rectangle.
This code snippet creates a well-structured table, demonstrating the power of <span class="pink">FMT</span> in handling complex string formatting scenarios.
Beyond the built-in template processors <span class="pink">STR</span> and <span class="pink">FMT</span>, Java allows developers to create custom template processors. This flexibility opens a realm of possibilities for string manipulation tailored to specific application needs.
A template processor is essentially an instance of the functional interface StringTemplate.Processor. It implements the process method, which takes a StringTemplate and returns an object. Static fields like <span class="pink">STR</span> simply store instances of such classes.
StringTemplate represents the template used in a template expression. It exposes the text fragments and the values of embedded expressions. These two components – fragments and values – are key to how custom template processors operate.
Developers can define their own template processors, leveraging the StringTemplate class to create specialized string composition behaviors.
In this example, the custom processor <span class="pink">INTER</span> alternates between appending fragments and values to construct the final string.
Let’s consider another scenario where we want to embed code snippets within a text in a way that clearly differentiates them from the surrounding text. This is particularly useful in technical writing, documentation, or educational materials.
To achieve this, we can define a custom template processor, <span class="pink">CODE</span>, that processes a template to format and embed code snippets using a specific syntax (like backticks in Markdown).
The <span class="pink">CODE</span> processor handles the embedding of Java class names or code snippets within a regular text, formatting them distinctly.
In this example, the <span class="pink">CODE</span> processor wraps the <span class="teal">String.class.getName()</span> expression within backticks, clearly marking it as a code snippet within the text.
Beyond simple string manipulation, Java's template processor API is robust enough to accommodate the creation of more complex data structures. A prime example of this is a template processor that returns instances of <span class="teal">JSONObject</span>.
The ability to dynamically create JSON objects in a structured and safe manner is crucial in many modern applications, especially those involving data interchange and web APIs. Java's template processors can be leveraged to achieve this with great efficiency.
Here's how we can create a custom template processor that interprets the template expression to produce a <span class="teal">JSONObject</span>:
In this implementation, the JSON extracts keys and values from the template and uses them to construct a <span class="teal">JSONObject</span>. This approach is particularly useful for building JSON objects dynamically, with data coming from various sources in the application.
💡 It's important to note that <span class="pink">StringTemplate</span> is currently a preview feature. Developers looking to experiment with string templates and custom template processors must enable these features explicitly. This is done by adding the <span class="teal">--enable-preview</span> flag while compiling and running Java applications. For instance:
<span class="teal">javac --release 21 --enable-preview Example.java</span>
<span class="teal">javac --enable-preview Example</span>
The journey of string manipulation in Java, from its inception in JDK 1 to the sophisticated advancements in JDK 21, showcases a remarkable evolution. Initially focusing on immutability for security and stability, Java gradually introduced more flexible and efficient string handling mechanisms, such as <span class="pink">StringBuffer</span>, <span class="pink">StringBuilder</span>, and enhancements in JDK 8. The introduction of compact strings and, more recently, string templates and template expressions in JDK 21, marked significant strides towards modernization. These advancements not only simplified string manipulation but also aligned Java with contemporary programming practices, demonstrating its adaptability and responsiveness to developers' needs. As Java continues to evolve, it stands as a testament to its robustness and versatility, remaining a fundamental tool in the ever-changing landscape of software development.