Introduction
In the first article we learned about Lambdas, functional interfaces and method references introduced in Java 8. In the previous article we saw some of the new methods added in the Collections hierarchy. In this article we look at what is probably the most important addition to Java since generics – Streams. Streams make working on collections easier and makes parallel programming on collections ridiculously easy.
What are streams?
Lets start with an example. In this example we create a list of genre and then find the count of all genre names that start with an ‘r’. The single line of code (line 2) does that. We will explain how it works, but this gives an idea of the power of Streams.
List<String> genre = new ArrayList<String>(Arrays.asList("rock", "pop", "jazz", "reggae")); long a = genre.stream().filter(s -> s.startsWith("r")).count(); System.out.println(a);
so what are streams?
Stream is a sequence of objects or primitive types. Operations can be performed on the elements sequentially or in parallel. Let us look at the lifecycle of a stream
A stream is created from a source, which can be an array, a collection, IO channel etc. Once you create a stream you can perform aggregate operations on it. We will look at all the major functions below. In our example above we performed a filter operation on the stream. The last step in the lifecyle is called a terminal operation. This step results in a result or a side effect. The stream no longer exists after the terminal operation (and hence the name..). The terminal operation can be a count operation or a collect operation. There are certain operations that are called short-circuit terminal operations. The example below explains it.
Remember:Streams have a lifecycle which consists of creation, intermediate operations and terminal operation.
Streams can be created from Collections using the default Stream<E> stream() method in the Collection interface. For arrays use the Arrays.stream(T[] array) method.
Stream Operations
In this section we look at the common stream operations. For the example we use a List called genre which stores the music genre
List<String> genre = new ArrayList<String<(Arrays.asList("rock", "pop", "jazz", "reggae"));
The examples below also explain the concept of short-ciruit, non-interference, statelesness, reduction etc. Here are the operations (click to expand/collapse):
genre.stream().allMatch(s -> !s.isEmpty())
This returns true since none of the genre string is empty. This is a Short-circuiting terminal operation. Look at the anyMatch example to understand short circuit operations.
genre.stream().anyMatch(s -> s.indexOf("r") == 0)
This returns true since the first element begins with ‘r’ (even though all of them dont begin with ‘r’).In order to understand what that is, lets write an example.
System.out.println(genre.stream().peek(s->System.out.println(s)).anyMatch(s -> s.indexOf("r") == 0)); System.out.println(genre.stream().peek(s->System.out.println(s)).count());
We have used the peek operation here, what it does is for each element performs an operation specified by the lambda expression (Consumer), in this case it just prints the genre. line 2 does what we expect, i.e. it prints “rock”,”pop”,”jazz”,”reggae” in separate lines. But line 1 prints only “rock” and “true”. what has happened is that the anyMatch found the match in the first element (rock starts with r) and it terminated the operation. The intermediate operation (peek) had its stream terminated too. To put it in words ‘a short circuit operation makes only those parts of the stream available to its predecessor operations that it needs to process’. In our example the anyMatch needs to process only one element and so only one element is available to the peek operation.
List<String> genre = new ArrayList<String>(Arrays.asList("rock", "pop", "jazz", "reggae","pop"));
System.out.println(genre.stream().distinct().count()); // prints 4
This is a stateful intermediate operation which means that while moving through the stream the operation maintains a state. In our example the operation would need to know that “pop” has appreared in the stream before.
Stream<T> filter(Predicate<? super T> predicate);Non-intefering and stateless intermediate operation
System.out.println(genre.stream().filter(s -> s.length() <= 4).count());
except reggae all other genres pass the filter and hence this prints 4. The operation is stateless in the sense that it stores no information about the element. It is also ‘non-interfering’ since it does not modify the original datasource. (the list genre does not change)
genre.stream().forEach(System.out::println);
Optional<String> combinedgenre = genre.stream().reduce((b, c) -> b.concat(",").concat(c));
combinedgenre contains comma separated list of genre strings. This is a reduction function since it reduces all the elements to a single summary result by applying the lambda expression repeatedly. If the function is associative reduction works well in parallel stream too. An associate function obeys this :
f(a,f(a,b)) = f(f(a,b),c)
int d = genre.stream().reduce(0, (b, c) -> b + c.length(), (b, c) -> b + c);
In this example we are calculating the total length of all words in the genre List. This is the generalised form of the reduce function. It takes in 3 arguments. The first argument is the identity function for the combiner. Think of it as the initial value. The accumulator accumulates the element into the resulting value (as in the previous example). The combiner can be understood when seen in context of parallel operation. It combines the result from the parallel streams.
HashSet<String> genreSet = genre.stream().collect(() -> new HashSet<String>(), (b, c) -> b.add(c), (b, c) -> b.addAll(c));
The collect function puts the elements of a stream in a mutable container such as a HashSet. In this example we create a stream from the genre ArrayList and then collect the result in a HashSet. This is a mutable reduction operation since the results are collected in a mutable container. HashSet is a mutable container. In the earlier reduce operation the result was a String which is immutable. The collect function takes in three arguments. The first function creates the collector, in our case the HashSet. The second function appends an element to the collector. The last function combines two results from parallel streams.
genre.stream().map(String::toUpperCase).forEach(System.out::println);
In the above example we convert all strings in the genre list to uppercase.
Map<String, List<String>> artists = new HashMap<String, List<String>>(); artists.put("rock", new ArrayList<String>(Arrays.asList("rockArtistA", "rockArtistB"))); artists.put("pop", new ArrayList<String>(Arrays.asList("popArtistA", "popArtistB"))); artists.put("jazz", new ArrayList<String>(Arrays.asList("jazzArtistA", "jazzArtistB"))); artists.put("reggae", new ArrayList<String>(Arrays.asList("reggaeArtistA", "reggaerockArtistB"))); genre.stream().flatMap(s -> artists.get(s).stream()).forEach(s -> System.out.print(" " + s)); // prints rockArtistA rockArtistB popArtistA popArtistB jazzArtistA jazzArtistB //reggaeArtistA reggaerockArtistB popArtistA popArtistB
Before we wrap up lets formally introduce the concept of lazy evaluation
Lazy evaluation
Streams introduce the concept of lazy evaluation. The intermediate operations that you saw are not evaluated till a terminal operation is called. So lets say that you have a chain of operations with mutliple intermediate operations. The intermediate operations are evaluated only when the terminal function is called. So if you create a stream, call some intermediate operations on it and call the terminal operation later somewhere in the code then nothing happens till that terminal operation is called.
Remember:The intermediate operations are not evaluated till a terminal operation is called.
This concludes our introduction to stream. We have covered the major functions here, there are many more functions but many of them are mostly variations of the above. Most of them can be perfomed in parallel as well for which instread of creating a stream, create a parallelStream(). The power of streams lies in making parallel programming trivial. However, note that parallel programming introduces overhead and certain time synchronization problems. Use it only if you are sure about it.