Java

Elements of the Java Programming Language

A program in Java isn’t described in colloquial language; rather, a set of rules and a grammar define the syntax and semantics. In this post, let’s explore these elements.

 

First, let’s talk about the set of rules that govern the Java programming language (the grammar and the syntax) and look at the Unicode encoding, tokens, and identifiers, among other things. When naming a method, for example, you can choose from a large number of characters; the character set is called the lexicon.

 

The syntax of a Java program defines the tokens and thus forms the vocabulary. However, correctly written programs aren’t necessarily correct. The term semantics therefore summarizes the meaning of a syntactically correct program. Semantics determines what the program does. The abstraction order is as follows: lexicon, syntax, and semantics. The compiler goes through these steps before it can generate the bytecode.

 

Tokens

A token is a lexical unit that provides the compiler with the building blocks of the program. Based on the grammar of a language, the compiler recognizes which sequences of characters form a token. For identifiers, for example, this recognition means “Take the next characters as long as a letter is followed only by letters or digits.” For example, a number like 1982 forms a token by the following rule: “Read digits until no digit follows.” For comments, the combinations /* and */ form a token.

 

Unfortunately, in C(++), an expression like *s/*t won’t be parsed as expected. Only a space between the division sign and the asterisk “helps” the parser to recognize the desired division.

Whitespace

The compiler must be able to distinguish tokens from each other. For this reason, we use separators, which include whitespace characters such as spaces, tabs, line feeds, and form feeds. These characters have no other meaning than being separators. Thus, you can place any number of whitespace characters between the tokens—any number of spaces is valid between tokens. And since you don’t have to be stingy with them, whitespace can clarify a section of a program tremendously. Programs are more readable when they are formatted with a lot of air.

Separators

In addition to the separators, 12 tokens are created from ASCII characters, which are defined as separators:

 

( ) { } [ ] ; , . ... @ ::

 

However, the following is anything but good to read, although the compiler does accept it:

 

Code written with separators2

Text Encoding by Unicode Characters

Java encodes texts by means of Unicode characters. Each character is assigned a unique numerical value (code point), so that the capital “A,” for example, is located at position 65. The Unicode character set includes the ISO-US-ASCII characters1 from 0 to 127 (hexadecimal 0x00 to 0x7f, i.e., 7 bits) and the extended encoding according to ISO 8859-1 (Latin-1), which adds characters from 128 to 255.

Identifiers

For variables (and thus constants), methods, classes, and interfaces, identifiers are assigned to subsequently identify the corresponding modules in the program. Data is then available under variables. Methods are the subroutines in object-oriented programming (OOP) languages, and classes are the building blocks of object-oriented (OO) programs.

 

An identifier is a sequence of characters that can be almost arbitrarily long (the length is only theoretically fixed). These characters are elements from the Unicode character set, and each character is important for identification. Thus, an identifier that is 100 characters long must also always be specified correctly with all 100 characters. Some C and FORTRAN compilers are a bit more generous in this respect and evaluate only the first few digits.

 

For example, in the following Java program, the identifiers are set in bold:

 

class Application {

   public static void main( String[] args ) {

       System.out.println( "Hello World" );

   }

}

 

String is set in bold because String is a class and not a built-in data type like int. While the String class is given preferential treatment in Java—and a plus sign can concatenate Strings together—it’s still a class type

.

Each Java identifier is a sequence of Java letters and Java digits, where the identifier must start with a Java letter. Java letters include not only our Latin letters of the range “A” to “Z” (also “a” to “z”) but also include many other characters from the Unicode alphabet, such as the underscore, currency characters—like the characters for dollar ($), euro (€), and yen (¥)—and Greek and Arabic letters. Even if a large number of wild characters are possible as identifier letters, the programming should nevertheless be done with English identifier names. At this point, we should emphasize once again that Java is strictly case sensitive.

 

The table below lists some valid identifiers.

 

Example Valid Identifiers in Java

 

Invalid identifiers, on the other hand, include the following:

 

Example Invalid Identifiers in Java

 

Note: In Java programs, identifier names are often formed from compound words of a description. For instance, in a sentence like “open file read only,” the spaces are removed, and the words following the first word start with capital letters. Thus, the sample sentence becomes “openFileReadOnly.” Linguists refer to a capital letter in the middle of words as inner majuscule. Programmers and IT-savvy people, on the other hand, like to talk about the CamelCase notation, because of two-humped camels. Mixed case notation gets difficult with capitalized abbreviations, such as HTTP or URL; in these cases, the Java libraries aren’t consistent so that class names such as HttpConnection or HTTPConnection are acceptable.

 

Literals

A literal is a constant expression. Several types of literals exist:

  • The truth values true and false
  • Integral literals for numbers such as 122
  • Floating point literals such as 567 or 9.999E-2
  • Character literals written in single quotes such as 'X' or '\n'.
  • String literals for character strings written in double quotes such as "The rotation of earth 'really' makes my day."
  • null stands for a special reference type.

For example, in the following Java program, the three literals are set in bold:

 

class Application {

   public static void main( String[] args ) {

       System.out.println( "Hello World" );

       System.out.println( 1 + 2.65 );

   }

}

 

(Reserved) Keywords

Certain words aren’t allowed as identifiers because they’re treated specially by the compiler as keywords. Keywords determine the “language” of a compiler, and you cannot add your own keywords when programming.

 

For example, keywords are set in bold in the following listing:

 

class Application {

   public static void main( String[] args ) {

       System.out.println( "Hello World" );

   }

}

Keywords

The strings in the table below are keywords and therefore cannot be used as identifier names in Java.

 

(Reserved) Keywords in Java

 

Although the words marked with † aren’t currently used by Java, variables of this name can’t be declared. We refer to these keywords as reserved keywords because they’re reserved for future use. However, we don’t foresee a time that goto will ever be used.

Restricted Identifiers and Literals

Inserting new keywords can break old program code. So, a few years ago, Oracle added a new way to extend the syntax: restricted identifiers. var, yield, and record are examples of such restricted identifiers. In addition to the keywords, the literals true, false, and null are also not available for identifiers.

 

Summary of the Lexical Analysis

When the compiler translates Java programs, it starts with a lexical examination of the source code. We’ve already learned about the central elements, but let’s summarize by looking at the following simple program:

 

class Application {

   public static void main( String[] args ) {

      String text = "Hello World";

      System.out.println( text );

      System.out.println( 1 + 2.65 );

   }

}

 

The compiler will read over all comments, and the delimiters will take the compiler from token to token. The following tokens can be identified in the program:

 

Tokens in Our Sample Program

 

Comments

Programming isn’t only about expressing the correct algorithm in a language, but also about formulating our thoughts in an understandable way. This articulation is achieved, for example, by giving meaningful names to program elements such as classes, methods, and variables. Self-explanatory class names help considerably during development. But the solution concept and the algorithm don’t necessarily become clearer even by the most beautiful variable names. For outsiders (and after months, perhaps yourselves) to quickly understand the solution concept and later extend or modify the program, you can write comments into the source code. They only help humans read the program and have no effect whatsoever on its processing.

Different Types of Comments

In Java, you can formulate comments in the following three ways:

  • Line comments: These comments start with two slashes (//) and comment out the rest of a line. The comment is valid from these characters until the end of the line, that is, until the newline character.
  • Block comments: Using /* */, you can comment out individual sections. The text in the block comment must not contain */ itself because block comments must not be nested.
  • Javadoc comments: These comments are special block comments that contain Javadoc comments and are contained within /** */. For example, a Javadoc comment describes the method or the parameters from which the application programming interface (API) documentation can be generated later.

Let’s look at an example where all three comment types appear:

 

/*

* This source code is public domain.

*/

// Magic. Do not touch.

/**

* @author Christian Ullenboom

*/

class DoYouHaveAnyCommentsToMake { // TODO: Rename later

// When I wrote this, only God and I understood what I was doing

// Now, only God knows

public static void main( String[] args /* Command line argument */ ) {

   System.out.println( "Sleeps all /*night*/ and he //works// all day" );

   }

}

 

For the compiler, a comment is a token, which is why 1/*2*/3 does not give the token 13, but the two tokens 1 and 3. But simply speaking, to the compiler, a file with comments looks the same as without, that is, like class DoYouHaveAnyCommentsToMake {public static void main( String[] args ) { System.out.println("Sleeps all /*night*/and he //works// all day" );} }.

 

The console output is Sleeps all /*night*/ and he //works// all day and shows that no comments exist inside the string literals; the symbols /*, */, and // belong to the string.

 

The bytecode contains exactly the same—all comments are discarded by the compiler; no comments are included in the bytecode.

Comments with Style

All comments and remarks should be written in English to facilitate reading for project members from other countries.

 

Javadoc comments generally document the “what” and block comments document the “how.”

 

For general comments, you should use the // characters, which offers two advantages:

  • In editors that don’t highlight comments or in a simple source code output on the command line, you can quickly see that a line beginning with // is a comment. Keeping track of a source text that is interrupted for several pages with the comment characters /* and */ is difficult. Line comments make it clear where comments start and where they end.
  • The use of line comments is better suited for commenting out blocks of code during the development and debugging phases. If you use the block comments for program documentation, you’ll be limited because you cannot nest comments of this type. Line comments can be nested more easily.

Pressing (Ctrl) + (/) comments a line or block in and out of IntelliJ. But keep in mind that the forward slash (/) must be selected via the numeric keypad on your keyboard! If several lines are selected, the key combination comments out all selected lines with line comments. In a commented line, another press of (Ctrl) + (/) takes back the comments of a line.

 

Editor’s note: This post has been adapted from a section of the Java: The Comprehensive Guide by Christian Ullenboom.

Recommendation

Java: The Comprehensive Guide
Java: The Comprehensive Guide

This is the up-to-date, practical guide to Java you’ve been looking for! Whether you’re a beginner, you’re switching to Java from another language, or you’re just looking to brush up on your Java skills, this is the only book you need. You’ll get a thorough grounding in the basics of the Java language, including classes, objects, arrays, strings, and exceptions. You'll also learn about more advanced topics: threads, algorithms, XML, JUnit testing, and much more. This book belongs on every Java programmer's shelf!

Learn More
Rheinwerk Computing
by Rheinwerk Computing

Rheinwerk Computing is an imprint of Rheinwerk Publishing and publishes books by leading experts in the fields of programming, administration, security, analytics, and more.

Comments