Posts Tagged ‘java collections framework’

Many programmers will never need to implement their own Collections classes. You can go pretty far using the implementations described in the preceding sections of this chapter. However, someday you might want to write your own implementation. It is fairly easy to do this with the aid of the abstract implementations provided by the Java platform. Before we discuss how to write an implementation, let’s discuss why you might want to write one.

Reasons to Write an Implementation

The following list illustrates the sort of custom Collections you might want to implement. It is not intended to be exhaustive:
  • Persistent: All of the built-in Collection implementations reside in main memory and vanish when the program exits. If you want a collection that will still be present the next time the program starts, you can implement it by building a veneer over an external database. Such a collection might be concurrently accessible by multiple programs.
  • Application-specific: This is a very broad category. One example is an unmodifiable Map containing real-time telemetry data. The keys could represent locations, and the values could be read from sensors at these locations in response to the getoperation.
  • High-performance, special-purpose: Many data structures take advantage of restricted usage to offer better performance than is possible with general-purpose implementations. For instance, consider a List containing long runs of identical element values. Such lists, which occur frequently in text processing, can be run-length encoded — runs can be represented as a single object containing the repeated element and the number of consecutive repetitions. This example is interesting because it trades off two aspects of performance: It requires less space but more time than an ArrayList.
  • High-performance, general-purpose: The Java Collections Framework’s designers tried to provide the best general-purpose implementations for each interface, but many, many data structures could have been used, and new ones are invented every day. Maybe you can come up with something faster!
  • Enhanced functionality: Suppose you need an efficient bag implementation (also known as a multiset): a Collection that offers constant-time containment checks while allowing duplicate elements. It’s reasonably straightforward to implement such a collection atop a HashMap.
  • Convenience: You may want additional implementations that offer conveniences beyond those offered by the Java platform. For instance, you may frequently need List instances representing a contiguous range of Integers.
  • Adapter: Suppose you are using a legacy API that has its own ad hoc collections’ API. You can write an adapter implementation that permits these collections to operate in the Java Collections Framework. An adapter implementation is a thin veneer that wraps objects of one type and makes them behave like objects of another type by translating operations on the latter type into operations on the former.


How to Write a Custom Implementation

Writing a custom implementation is surprisingly easy. The Java Collections Framework provides abstract implementations designed expressly to facilitate custom implementations. We’ll start with the following example of an implementation ofArrays.asList.
public static <T> List<T> asList(T[] a) {
    return new MyArrayList<T>(a);
}

private static class MyArrayList<T> extends AbstractList<T> {

    private final T[] a;

    MyArrayList(T[] array) {         a = array;     }

    public T get(int index) {
        return a[index];
    }

    public T set(int index, T element) {
        T oldValue = a[index];
        a[index] = element;
        return oldValue;
    }

    public int size() {
        return a.length;
    }
}
Believe it or not, this is very close to the implementation that is contained in java.util.Arrays. It’s that simple! You provide a constructor and the getset, and size methods, and AbstractList does all the rest. You get the ListIterator, bulk operations, search operations, hash code computation, comparison, and string representation for free.
Suppose you want to make the implementation a bit faster. The API documentation for abstract implementations describes precisely how each method is implemented, so you’ll know which methods to override to get the performance you want. The preceding implementation’s performance is fine, but it can be improved a bit. In particular, the toArray method iterates over the List, copying one element at a time. Given the internal representation, it’s a lot faster and more sensible just to clone the array.
public Object[] toArray() {     return (Object[]) a.clone(); }
With the addition of this override and a few more like it, this implementation is exactly the one found in java.util.Arrays. In the interest of full disclosure, it’s a bit tougher to use the other abstract implementations because you will have to write your own iterator, but it’s still not that difficult.
The following list summarizes the abstract implementations:
  • AbstractCollection — a Collection that is neither a Set nor a List. At a minimum, you must provide the iterator and the size methods.
  • AbstractSet — a Set; use is identical to AbstractCollection.
  • AbstractList — a List backed up by a random-access data store, such as an array. At a minimum, you must provide the positional access methods (get and, optionally, setremove, and add) and the size method. The abstract class takes care oflistIterator (and iterator).
  • AbstractSequentialList — a List backed up by a sequential-access data store, such as a linked list. At a minimum, you must provide the listIterator and size methods. The abstract class takes care of the positional access methods. (This is the opposite of AbstractList.)
  • AbstractQueue — at a minimum, you must provide the offerpeekpoll, and size methods and an iterator supporting remove.
  • AbstractMap — a Map. At a minimum you must provide the entrySet view. This is typically implemented with the AbstractSet class. If the Map is modifiable, you must also provide the put method.
The process of writing a custom implementation follows:
  1. Choose the appropriate abstract implementation class from the preceding list.
  2. Provide implementations for all the class’s abstract methods. If your custom collection is to be modifiable, you’ll have to override one or more of the concrete methods as well. The API documentation for the abstract implementation class will tell you which methods to override.
  3. Test and, if necessary, debug the implementation. You now have a working custom collection implementation.
  4. If you’re concerned about performance, read the abstract implementation class’s API documentation for all the methods whose implementations you’re inheriting. If any seem too slow, override them. If you override any methods, be sure to measure the performance of the method before and after the override. How much effort you put into tweaking performance should be a function of how much use the implementation will get and how critical to performance its use is. (Often this step is best

The LinkedList and ArrayList represent some of the widely used data structures of the Java Collections Framework. You might have had a chance to already use both classes several times so far; however, if you have wondered on what makes them fundamentally different, then continue reading.

Although both the LinkedList and ArrayList classes implement the List interface, they differ on the following 3-key issues: 1) Data Storage, 2) Data Access, 3) Data Management.

DATA STORAGE: Linked-list versus Array

• LinkedList uses the design of a linked-list data structure (from where it get its name) to store data elements (i.e. objects). In this manner, each data element — depending on the position of the list — has a link to the previous and next element. This storage approach proves to be very efficient, as it uses only as much memory as it needs; no more, no less.
• ArrayList uses an Object array for storing the data elements. By default the array starts with a capacity of 10; unless otherwise specified at the time of instantiation. For example, if we want to instantiate an ArrayList of strings with an initial capacity of 1024 then we would use:

ArrayList<String> stringArrayList = new ArrayList<String>(1024);

Thus, by default the ArrayList can be more wasteful as far as the memory allocation goes.

DATA ACCESS: Sequential Access versus Random Access

• LinkedList due to the nature of its data storage, has a sequential access approach. Thus, if we want to retrieve an element at a specific index through the public E get(int index) method, we would have to iterate either from the beginning or the end, depending if the requested index is greater than or less than half the length of the list. It can easily be seen how this sequential access can become quite timely as we deal with longer and longer lists. As the public E get(int index)method uses the Node node(int index) method for data retrieval, we can see the implications of LinkedList’s sequential access in the following code:

570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
/**
 * Returns the (non-null) Node at the specified element index.
 */
Node<E> node(int index) {
    // assert isElementIndex(index);

    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}

• ArrayList supports random access as it has an internal array for data storage. Thus, when we call the public E get(int index) method in an ArrayList instance, we get forwarded to the E elementData(int index) method which simply returns the element located at the specified position within the array, i.e.:

333
334
335
336
337
338
// Positional Access Operations

@SuppressWarnings("unchecked")
E elementData(int index) {
    return (E) elementData[index];
}

Needless to say, ArrayList’s random access approach is far more efficient, as it doesn’t have to waste computing cycles on iteration.

DATA MANAGEMENT: Linking and Unlinking versus Array Copying

• LinkedList can very efficiently undergo any changes in its collection of data elements. For example, if we want to remove a specific data element from the LinkedList instance — through the public boolean remove(Object o) method — then we go ahead and link the neighbors (i.e. previous and next data elements) of the unwanted data element, and the Garbage Collector will take care of the rest. Thus, if we have a linked-list structure with three string data elements (with previous and next links), as follow:

["A"] <-> ["B"] <-> ["C"]

Then, when we request removal of the “B” string data element, “A” and “C” get linked together, and “B” is left alone, thus becoming eligible for Garbage Collection:

["A"].next = ["B"].next => ["A"].next = ["C"]
["C"].prev = ["B"].prev => ["C"].prev = ["A"]

The Java implementation behind this pseudo-code looks like the following:

213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
/**
 * Unlinks non-null node x.
 */
E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    size--;
    modCount++;
    return element;
}

Do note that the code displayed above is for the E unlink(Node x) method on which LinkedList’s public boolean remove(Object o) method relies. In addition to the efficiency of the remove method, LinkedList is very flexible when it comes to enlarging or shrinking its collection of data elements. In a LinkedList instance whenever we need to add to the collection, a call is made to the void linkLast(E e) method, which simply appends the new data element to the end of the list, as seen below:

144
145
146
147
148
149
150
151
152
153
154
155
156
157
/**
 * Links e as last element.
 */
void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<E>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

• ArrayList due to the usage of its internal array it copies all of the elements to a new array every time there’s a removal request, or when there’s not enough room to add a new element. Thus, for every new addition or removal from our ArrayList collection a call is made to the public void ensureCapacity(int minCapacity) method, which carries out the necessary copying and shifting of the elements, as seen below:

171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
/**
 * Increases the capacity of this <tt>ArrayList</tt> instance, if
 * necessary, to ensure that it can hold at least the number of elements
 * specified by the minimum capacity argument.
 *
 * @param   minCapacity   the desired minimum capacity
 */
public void ensureCapacity(int minCapacity) {
    modCount++;
    int oldCapacity = elementData.length;
    if (minCapacity > oldCapacity) {
        int newCapacity = (oldCapacity * 3)/2 + 1;
        if (newCapacity < minCapacity)
            newCapacity = minCapacity;
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }
}

Needless to say, you can imagine how much work these operations carry out, especially when dealing with very large arrays.

In conclusion, we can see that both the LinkedList and ArrayList have their positives and negatives; however, by having a good understanding of these issues you can make a better decision on what implementation should you use, based on the data size and usage requirements.