Scripting in Java?

Since Java 11, you can run a Java program without compiling it first. Can you script in Java like in Python?

I wanted to extract some data from an XML document. Finding a regular expression for grep or awk looked more time-consuming than using a proper XML parser, especially since I had used streaming XML parsers before. When you are interested in a small part of the whole XML document, a streaming parser might be better: it won’t force you to walk the whole DOM tree, it’ll just notify you when the element that interests you flies by.

An online search to find what the current recommended approaches are did not yield much. I found examples of binding Go structs to XML, but I did not want data binding. It seems XML processing has gone a bit out of fashion, especially transforming documents into other documents.

Read the input file

Since none of the samples looked more convenient than Java, I picked Java. I did not use an IDE, just vi and the javadoc.

In Java 11 and later, the java command can execute a file without compiling it with javac first:

java QuickProgram.java

You still need to write your code inside a class in a static method named main, but the class name does not need to match the file name. So you can have a class Parser inside a file named QuickProgram.java:

class Parser {
  public void main(String[] args) {
  }
}

Once the file is set up, I need to open the input file. I used the java.nio.file.Files together with java.nio.path.Path.of(String) API to return a new BufferedReader to read the file and pass the file name as a string. The default encoding is UTF-8 which was fine for me.

With try-with-resources, you can open a file without needing to remember to close all the streams in the right order. Declare the checked IOException in your main() method and you’re good to go.

try (var fl = Files.newBufferedReader(Path.of("input.xml"))) {
}

The type inference in Java 11 saves some keystrokes, as you can use var instead of writing the type name on the left-hand side of the assignment.

Even when running a single-file script with the java command, you’ve got to import most standard library classes with import statements at the top of the file:

import java.nio.file.Files;
import java.nio.file.Path;

Next, parse the XML file. For this task, I chose the StAX parser: it puts you in charge of the control flow, but like the SAX parser, it does not build the whole DOM in memory. To access the parser, you first need an XMLInputFactory:

var xmlInputFactory = XMLInputFactory.newInstance();

Don’t forget to import the class at the top of the file:

import javax.xml.stream.XMLInputFactory;

A this is a public post, I feel I have to put a warning about XML security for anyone who might stumble across this and try to copy the code.

XML security warning!

I was parsing a single document I had created myself, but for XML documents of uncertain origin, it is better to disable some parser features to prevent attacks. According to the documentation, Java ensures mitigations against popular XML security attacks like XXE are on, but when I verified the default configuration against the OWASP cheatsheet, I found out that not all recommended properties were set. So here we go.

xmlInputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
xmlInputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, false);

Again, this would be superfluous when you parse a document you created, but is recommended in the general case.

Parse the file with StAX

Now you can start processing the XML. Pass the buffered reader you’ve created earlier to the XMLInputFactory to retrieve an XML stream reader:

var streamReader = xmlInputFactory.createXMLStreamReader(fl);

With type inference, you don’t need to declare the XMLStreamReader type, so you do not need to import it either. Finally, handle the various XML elements:

while(streamReader.hasNext()) {
    streamReader.next();
    if (streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
        if (streamReader.getLocalName() == "b") {
              streamReader.getElementText());
        }
    }
}

That’s it! Compiling and running the code with one command is handy, but I feel type inference is the killer feature: thanks to var, you don’t need to repeat type names for every declaration. Can Java compete with Python for scripting? It depends on your skill with Java or Python, but the answer might be a surprising ‘yes’.

For more about single-file Java programs, take a look at JEP 330: Launch Single-File Source-Code Programs.