Primitive Collections for Java 1.2
A developer's guide to using and extending the framework

Table of Contents

1.  Introduction

1.1  A personal motive

Primitive Collections for Java (PCJ) started with the need for a set data structure for character values. The set was to be used for configuring a phonetic encoder class. The encoder would be given a text in some language, and for each character in the text, it would determine whether or not the character was to be encoded. I started using the HashSet class of Java Collections Framework (JCF). Somewhere in the encoder the code was something like this:

public class PhoneticEncoder {

    java.util.Set acceptedChars;

    public Encoder(java.util.Set ac) {
        acceptedChars = ac;
    }

    String encode(CharSequence s) {
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (acceptedChars.contains(new Character(c))) {
                ...
            }
        }
    }
}

// And somewhere outside the encoder...
java.util.Set s = new java.util.HashSet();
for (char c = 'a'; c <= 'z'; c++)
    s.add(new Character(c));
for (char c = 'A'; c <= 'Z'; c++)
    s.add(new Character(c));
for (char c = '0'; c <= '9'; c++)
    s.add(new Character(c));
PhoneticEncoder enc = new PhoneticEncoder(s);

I usually don't like representing primitive values as objects, especially not when a simple comparison requires my program to create a new instance (unless I want to represent texts as sequences of Character objects). It may be an illusion taking JVM optimization into account, but it does not seem efficient, and thinking about the data in my programs as wrapped values seems counter-intuitive. How I would like to think about my program is this:

public class PhoneticEncoder {

    CharSet acceptedChars;

    public Encoder(CharSet ac) {
        acceptedChars = ac;
    }

    String encode(CharSequence s) {
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (acceptedChars.contains(c)) {
                ...
            }
        }
    }
}

// And somewhere outside the encoder...
CharRangeSet s = new CharRangeSet();
s.add(new CharRange('a', 'z'));
s.add(new CharRange('A', 'Z'));
s.add(new CharRange('0', '9'));
PhoneticEncoder enc = new PhoneticEncoder(s);

So I began investigating other possibilities. Keeping memory usage low was quite important for this particular application. First, I thought of using a BitSet. I believed that if I subtracted (int)'A' from each character value to be stored, I could do with a hundred bits or so. A little research into the Unicode standard meant that this particular approach had to be abandoned because it would mean a memory usage of about 0xFFFF/8 = 16384 bytes for representing a set of perhaps a few hundred characters (some of the Unicode characters have pretty high numbers). Even worse: most of the characters fell into long ranges of consecutive values, so there were actually no reason for setting a bit for each and every character. Instead I decided to represent a set of characters as a sorted list of ranges, storing only the lower and upper bounds of each range. This list could be searched very quickly because it would rarely have more than 15 ranges and testing a character for containment in a range would only require comparing it to the lower and upper bounds. Using binary search on 15 ranges it would in the worst case require 2×log2(16) = 2×4 = 8 character comparisons to find a character in the set. That was acceptable for my purposes. What was not acceptable was that my changing the representation of character sets meant that the API of the phonetic encoder would have to depend upon a very specialized binary searching range based character set class, which in turn had to be packaged with the encoder. Having done this trick for the (n+1)'th time and having noted for the n'th time the stupidity of doing it, I decided to make a generalized framework for primitive types. The first n-1 times I had resisted the temptation of creating yet another standard on which all other APIs should depend. This time I fell short, and PCJ is the result.

Søren Bak

1.2  The relation to the Java Collections Framework

When you examine the class hierarchy and the methods of PCJ, you will find that the interfaces and abstract classes (not the inner workings of the implementing classes) bear a close resemblance to those of the Java Collections Framework. This is intended. I have designed PCJ with the following in mind: Programmers are used to JCF and a close resemblance will increase the learning rate; and the complexity of adapters between JFC and PCJ should be kept low. There are plenty of other opportunities to introduce bugs. Besides that, I think that Joshua Bloch's JCF has a good interface design - worth basing other collections libraries on.

1.3  Completeness over sanity

You will find some rather pathological classes in PCJ. For example, there is a set class for boolean values called BooleanDirectSet. The value of such a class is questionable and I felt quite silly coding it (take a look at the source code, it seems like a lot of work to represent the four possible states: {}, {F}, {T}, and {F,T}). Another example of such a class is the hash map from float keys to short values (that one is generated). However, I think that a primitive collections API should provide a complete set of classes instead of just those that the author finds useful at the time. It should be completely symmetric over all primitive types.

1.4  Why use PCJ?

1.5  Why not use PCJ?

1.6  What's new?

Version 1.2

The focus of v1.2 was to implement sorted set operations.

Changes:

Version 1.1.1

This was a bugfix release:

Version 1.1

The focus of v1.1 was to complete a branch of the map interface hierarchy and make the API interact better with client Java code.

Changes:

2.  The Framework

The classes and interfaces of PCJ have been built symmetrically around the primitive types: boolean, char, byte, short, int, long, double, and float.

When I refer to a class, it can either be a type independent reference or a type specific reference. For type independent references, T and S are used to stand for primitive types. When I write TCollection it is therefore a general (type independent) reference to the classes BooleanCollection, CharCollection, ByteCollection, ShortCollection, IntCollection, LongCollection, FloatCollection, and DoubleCollection.

When for some reason it makes no sense to provide a class for some primitive type, the symmetry is allowed to fail. This is only the case for classes - not for the general interfaces. There are generally made notes about such cases throughout this guide.

2.1  The bak.pcj.* packages

bak.pcj

Provides general primitive collection classes. All other classes in the API depends on those classes.

bak.pcj.adapter

Provides adapter classes between PCJ and JCF. The adapters provide a convenient interface between PCJ-based and JCF-based classes.

bak.pcj.benchmark

Provides classes for benchmarking PCJ and parts of JCF. When new classes are implemented they can be compared to other implementations using standard tasks and data sets.

bak.pcj.hash

Provides classes for creating hash functions. Besides providing default hash functions the package also provides interfaces that can be implemented to change the hash functions of the hashing collection classes.

bak.pcj.list

Provides list collection interfaces and classes. Lists extend the general collection interfaces with sequential access to collection elements.

bak.pcj.map

Provides map interfaces and classes. The implementations are currently based on hashing.

bak.pcj.set

Provides set interfaces and classes. Sets extend the general collection interfaces with set behavior. There are set classes providing different storage strategies for different situations.

2.2  Collection classes

The core of PCJ is a small set of interrelated interfaces. All collection classes derive in some way or other from these interfaces. They are shown shown in the UML class diagram below:

Interface hierarchy for collection classes

The main interface is TCollection. It provides basic collection behaviour like adding and removing elements. The basic interfaces can be found in the package bak.pcj. There are two major branches of sub-interfaces. The one is the TSet branch which provides set behaviour for collections. All set classes can be found in the bak.pcj.set package. The other is the TList branch which provides list behaviour. All list classes can be found in the bak.pcj.list package.

There are a number of alternative implementations of these interfaces. Each implementation is useful in specific situations, and it requires a little knowledge about the classes to choose the right one. The classes implementing the interface hierarchy are shown in the UML class diagram below without method details.

Interface and class hierarchy for collection classes

Note that each of the main interfaces have an abstract class directly below in the hierarchy. Unless you are going to implement new collection classes, the abstract classes can safely be ignored. The implemented interfaces are important, not the super-classes. If you are going to implement new collection classes, the abstract classes provide base implementations of each of the main interfaces.

As mentioned earlier T is not {boolean, char, byte, short, int, long, float, double} for all classes. The table below shows the type independent classes and their implementations for each of the primitive types. You should examine the table and the corresponding APIs to get an idea of the class hierarchy.

Conceptual collection classes and actual implementations in PCJ
Class or interface Description Availability
boolean char byte short int long float double
TCollection General collection interface × × × × × × × ×
TSet General set interface × × × × × × × ×
TSortedSet Sorted set interface × × × × × × × ×
TOpenHashSet Hash set implementation using open addressing   × × × × × × ×
TChainedHashSet Hash set implementation using chained addressing   × × × × × × ×
TRangeSet Set class based on value ranges   × × × × ×    
TBitSet Set class based on bit arrays   × × × ×      
TDirectSet Boolean set ×              
TList General list interface × × × × × × × ×
TStack General stack interface × × × × × × × ×
TDeque General deque interface × × × × × × × ×
TArrayList Array list class × × × × × × × ×
TArrayStack Array stack class × × × × × × × ×
TArrayDeque Array deque class × × × × × × × ×

2.3  Map classes

As in JCF, map classes are not collections, but interact closely with collections. All primitive map classes implement either the TKeySMap interface, whose instances map T keys to S values, the TKeyMap interface, whose instances map T values to java.lang.Objects, or the ObjectKeyTMap interface, whose instances map java.lang.Objects to T values. The three interfaces are not directly related but share some design and logic. They are shown in the UML class diagram below:

Interface hierarchy for map classes

PCJ provides two implementation variants of both the TKeySMap interfaces, the TKeyMap interfaces, and the ObjectKeyTMap interfaces; namely TKeySOpenHashMap/TKeySChainedHashMap, TKeyOpenHashMap/TKeyChainedHashMap, and ObjectKeyTOpenHashMap/ObjectKeyTChainedHashMap as shown in the UML class diagram below:

Class hierarchy for map classes

2.4  Adapter classes

To provide a bridge between PCJ based and JCF based APIs there are adapters for all of the main interfaces: collections, iterators, sets, lists, list iterators, and maps. They can be found in the package bak.pcj.adapter. Alternatively, use the class bak.pcj.Adapter, which provides factory methods for all adapter classes. The table below lists all adapter classes.

Adapter classes
Class or interface Description Availability
boolean char byte short int long float double
TCollection­To­Collection­Adapter Adapters from PCJ collections to JCF collections × × × × × × × ×
TIterator­To­Iterator­Adapter Adapters from PCJ iterators to JCF iterators × × × × × × × ×
TListToListAdapter Adapters from PCJ lists to JCF lists × × × × × × × ×
TListIterator­To­ListIterator­Adapter Adapters from PCJ list iterators to JCF list iterators × × × × × × × ×
TSetToSetAdapter Adapters from PCJ sets to JCF sets × × × × × × × ×
TSortedSetToSortedSetAdapter Adapters from PCJ sorted sets to JCF sorted sets × × × × × × × ×
TKeyMapToMapAdapter Adapters from PCJ maps to JCF maps × × × × × × × ×
ObjectKeyTMapToMapAdapter Adapters from PCJ maps to JCF maps × × × × × × × ×
TKeySMapToMapAdapter Adapters from PCJ maps to JCF maps All 64 combinations, e.g. IntKeyCharMapToMapAdapter
Collection­To­TCollection­Adapter Adapters from JCF collections to PCJ collections × × × × × × × ×
Iterator­To­TIterator­Adapter Adapters from JCF iterators to PCJ iterators × × × × × × × ×
List­To­TList­Adapter Adapters from JCF lists to PCJ lists × × × × × × × ×
ListIterator­To­TListIterator­Adapter Adapters from JCF list iterators to PCJ list iterators × × × × × × × ×
Set­To­TSet­Adapter Adapters from JCF sets to PCJ sets × × × × × × × ×
SortedSet­To­TSortedSet­Adapter Adapters from JCF sorted sets to PCJ sorted sets × × × × × × × ×
MapToTKeyMapAdapter Adapters from JCF maps to PCJ maps × × × × × × × ×
MapToObjectKeyTMapAdapter Adapters from JCF maps to PCJ maps × × × × × × × ×
MapToTKeySMapAdapter Adapters from JCF maps to PCJ maps All 64 combinations, e.g. MapToIntKeyCharMapAdapter

2.5  Collection growth

Many of the collection and map classes are table based. Elements and mappings are stored in an underlying table (a Java array). The tables underlying table based collections have to grow as elements are added and possibly shrink as they are removed. That means allocating a new longer table and copy the existing elements into that table. It is easy to see that tables should not be expanded every time an element is added because that would result in very poor performance for additions. Instead the tables are expanded to a length well over the currently needed length. We call that length the capacity of the collection. Increasing the capacity of a collection is an expensive operation because of the copying involved (and re-hashing for hashtable-based maps), so the performance of add operations depend very much on how often the capacity is increased.

The table based collections and maps in PCJ supports two strategies for increasing the capacity:

Each strategy can be specified by selection of constructor. For example, we can create an int array list that grows with a fixed number of elements with the following code:

IntList list = new IntArrayList(100, 10);

The first argument in the constructor is the initial capacity of the list. The second is the number of elements with which to increase the capacity as elements are added. So in this case the first 100 elements are added without any capacity increase. When the 101st element is added the capacity is increased to 110, when the 111th element is added the capacity is increased to 120, and so on. Similarly, to create a list that grows with 20% size for each capacity increase, we would write:

IntList list = new IntArrayList(100, 0.2);

When the 101st element is added, the capacity is increased to 1.2×100 = 120. When the 121st element is added, the capacity is increased to 1.2×120 = 144, and so forth.

The strategy that should be employed depends on the situation. If you know how many elements are going to be in a collection, it is generally a good idea to give the collection that capacity from the beginning so as to avoid capacity increases. Very often we cannot know the needed capacity, and in those cases we should think about how collections grow. A rule of thumb is that if a collection rarely grows or grows slowly its capacity should increase by a fixed amount. In that way we avoid wasting too much memory on an exponential growth rate. On the other hand, if a collection grows fast its capacity should grow exponentially, because that is the best way to minimize the number of capacity increases. However, with a growth factor of 1.0 (doubling the capacity) we potentially allocate approximately twice as much memory as needed.

2.6  Hashing

A number of the classes of PCJ are based on hash tables. As explained below, there are two main hashing variants in use: open addressing with double hashing and chaining.

Open addressing

The open hash collection and map variants (e.g. TOpenHashSet and TKeySOpenHashMap) are based on a hash table using open addresing with double hashing. Double hashing is a method of resolving collisions in hash tables when more than one key hash to the same address in the table. The secondary hash function used is always:

h2(n) = 1 + (h1(n) mod (M - 2))

where h1 is the primary hash function and M is the length of the underlying hash table. Using this function, the M should be a prime in order for all the elements of the hash table to be searched. The class Primes is used for selecting prime numbers.

Chained hash tables

The chained hash table variants (e.g. TChainedHashSet and TKeySChainedHashMap) are based on hash tables with linked lists. Instead of searching for an empty slot in the hash table, each slot contains a linked list (a chain) of those entries whose keys hash to that slot. In that way collision resolution is completely avoided, but we have to search linked lists instead.

Load factors

Both the OpenHash and ChainedHash variants begin to perform bad when many keys have been hashed to the same slot in the underlying hash table. To prevent that, hash tables are equipped with a maximum load factor, which is some proportion of the its size. When that proportion of the hash table is used, its length is increased (see the discussion on growth above). The default load factor is 0.75 for both variants, and a custom load factor may be specified with the constructors.

The plots below shows how the load of a hash based set affects performance. For open addressing the time to look up a key increases steadily until it suddenly explodes around 0.7. For chained hashing the mean time to look up a key will increase steadily as the lists grow longer. The plots were created for hash sets, but are very similar for hash maps.

(The horizontal lines on the plots are due to the resolution of the timer on my computer, which is quite clearly somewhere around 10ms.)

Creating new hash functions

If nothing else is specified with the constructors for the hash table based collections and maps, the default hash functions of bak.pcj.hash are used. It is possible that the default functions give poor performance results for specific data sets, and in such cases a new hash function may be specified.

Assume that the hash table length M is 17 and the default hash function h(n) = n (which it is for int values). If we add 17 it goes into slot 0 since 17 mod 17 = 0. If we add 34 it also goes into slot 0 since 34 mod 17 = 0, thus activating the use of the secondary hash function in the case of open addressing with double hashing. With chaining we get longer lists. This example could benefit from an alternative hash function, e.g. h(n) = (n div 17) + n. That would make n = 17 go into slot ((17 div 17) + 17) mod 17 = 18 mod 17 = 1, and similarly 34 would go into slot ((34 div 17) + 17) mod 17 = 19 mod 17 = 2, thus eliminating the use of the secondary hash function (or chains).

The example below shows how to pass a new hash function to a hash set:

import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.hash.IntHashFunction;

...
IntSet s = new IntOpenHashSet(
    new IntHashFunction() {
        public int hash(int v) {
            return v/17 + v;
        }
    }
);

3.  The classes

In this section the interfaces and classes are described in more detail and examples are given. The design is also discussed.

3.1  The TCollection interfaces

The TCollection interfaces define the basic collection behaviour. It is adapted from the JCF Collection interface. It provides the basic methods for modifying and querying collections.

Although TCollection is the general ancestor of all collection classes, each method may have a different semantics depending on the collection type. For example, the method add(T) adds an element to the end of a TList whereas it ensures that a TSet contains an element.

If you only use PCJ and do not extend it, you will only use TCollection as a polymorphic type for collections.

See TSet for examples. The two interfaces are identical in signature.

Availability

Type Interface
boolean bak.pcj.BooleanCollection
char bak.pcj.CharCollection
byte bak.pcj.ByteCollection
short bak.pcj.ShortCollection
int bak.pcj.IntCollection
long bak.pcj.LongCollection
float bak.pcj.FloatCollection
double bak.pcj.DoubleCollection

3.2  The TList interfaces

The TList interfaces define an extension to the TCollection interfaces for sequential lists. As TCollection this is basically an adpation of the JCF List interface.

Example:

import bak.pcj.list.IntList;
import bak.pcj.list.IntArrayList;
import bak.pcj.list.IntListIterator;

public class ListExample {
    public static void main(String[] args) {
        IntList s = new IntArrayList();

        //  Adding elements to the end
        
        s.add(1);               //  s=[1]
        s.add(2);               //  s=[1,2]

        //  Inserting elements at positions

        s.add(2, 3);            //  s=[1,2,3]
        s.add(0, 0);            //  s=[0,1,2,3]
        s.add(1, 10);           //  s=[0,10,1,2,3]

        //  Removing elements at positions

        s.removeElementAt(1);   //  s=[0,1,2,3]

        //  Removing elements by value
        
        s.remove(2);            //  s=[0,1,3]

        //  Updating elements

        s.set(2, 2);            //  s=[0,1,2]
        s.set(0, -1);           //  s=[-1,1,2]

        //  Searching for elements

        int v;
        v = s.indexOf(1);       //  v=1
        v = s.indexOf(2);       //  v=2
        v = s.indexOf(10);      //  v=-1

        //  Adding collections at positions

        IntArrayList t = new IntArrayList();
        t.add(10);              //  t=[10]
        t.add(20);              //  t=[10,20]
        t.add(30);              //  t=[10,20,30]
        
        s.addAll(1, t);         //  s=[-1,10,20,30,1,2]
    }
}

Other examples:

ListExample.java
Reverse.java
Sorting.java

Availability

Type Interface
boolean bak.pcj.list.BooleanList
char bak.pcj.list.CharList
byte bak.pcj.list.ByteList
short bak.pcj.list.ShortList
int bak.pcj.list.IntList
long bak.pcj.list.LongList
float bak.pcj.list.FloatList
double bak.pcj.list.DoubleList

Design notes

The TList interfaces do not provide methods to retrieve a sub-list of a list. It is relatively complex to implement sub-list views, so this feature (which is implemented in JCF lists) is omitted for the time being.

A deviation from JCF in method names is that of removeElementAt(). JCF lists has a remove(int) and remove(Object) method for removing elements by index and by equality, respectively. Since the objects of PCJ lists can be of type int, there would be a clash between the two method signatures if the JCF naming should be used. Therefore PCJ lists use removeElementAt(int) to remove elements by index.

3.3  The TStack interfaces

The TStack interfaces extend the TList interfaces with the typical stack operations: push(), pop(), and peek(). Stacks typically only grow and shrink at the end.

Note that all TLists in principle can be used as stacks. We can easily replace pop() with removeElementAt(size()-1) and push(v) with add(v). The stack interfaces does not guarantee that stack operations are more efficient than their list/collection counterparts, but it makes the intention clear.

Example:

import bak.pcj.list.IntStack;
import bak.pcj.list.IntArrayStack;

public class StackExample {
    public static void main(String[] args) {
        IntStack s = new IntArrayStack();

        //  Pushing elements

        s.push(1);              //  s=[1]
        s.push(2);              //  s=[1,2,3]
        s.push(3);              //  s=[1]

        //  Inspecting

        int v;
        v = s.peek();           //  v=3, s=[1,2,3]

        //  Popping elements

        v = s.pop();            //  v=3, s=[1,2]
        v = s.pop();            //  v=0, s=[1]
    }
}

Other examples:

StackExample.java
Calculator.java

Availability

Type Interface
boolean bak.pcj.list.BooleanStack
char bak.pcj.list.CharStack
byte bak.pcj.list.ByteStack
short bak.pcj.list.ShortStack
int bak.pcj.list.IntStack
long bak.pcj.list.LongStack
float bak.pcj.list.FloatStack
double bak.pcj.list.DoubleStack

Design notes

It may be questioned whether a stack is a special kind of list. Interfaces are often used for specifying the minimum capabilities of an object, e.g. an argument to a method. In those cases a stack ought to be just that: a stack, not a list, because we probably only need to push, pop, and peek the object.

On the other hand, stacks are sequential information structures. It should be possible to iterate over the elements. It may also be argued that they at least should extend the TCollection interface to be interesting in a collections API. And extending the TCollection interface they might as well extend the TList interface.

In JCF stacks are merely special implementations of lists. That leaves us with a problem when an operations needs a stack to work efficiently: We can either 1) make the argument a Stack which is a special implementation of List or we can 2) make the argument a List and hope that the caller passes an object that has fast addition and removal from the end of the list.

//  Solution 1:
void doSomething(Stack s) {
    // Perform stack operations on s here using push(Object) and pop()
    ...
}
...
Stack s = new Stack();
doSomething(s);

//  Solution 2:
void doSomething(List s) {
    // Perform stack operations on s using add(Object) and remove(int)
    ...
}
...
List s = new Stack();
doSomething(s);

Solution 1 above is not acceptable, because the doSomething() method does not care about the implementation of the stack argument. It puts a restriction on the caller that is not needed. Solution 2 is not acceptable because any list implementation can be passed to doSomething() may render it inefficient.

I think that a better way is to make Stack an interface and to provide implementations with fast addition and removal at the end of lists.

//  Solution with Stack as an interface:

void doSomething(Stack s) {
    // Perform stack operations on s here using push(Object) and pop()
    ...
}
...
Stack s = new MyStackImplementation();
doSomething(s);

That is why TStack is an interface in PCJ.

3.4  The TDeque interfaces

The TDeque interfaces extend the TList interfaces with the typical typical operations of a deque: addFirst(), addLast(), getFirst(), getLast(), removeFirst(), and removeLast(). Deques typically only grow and shrink at the ends.

Note that all TLists in principle can be used as stacks because of the positional access to elements. We can easily replace getFirst() with get(0), removeLast() with removeElementAt(size()-1), and so forth. The deque interfaces does not guarantee that deque operations are more efficient than their list/collection counterparts, but it makes the intention clear.

Example:

import bak.pcj.IntIterator;
import bak.pcj.list.IntDeque;
import bak.pcj.list.IntArrayDeque;

public class DequeExample {
    public static void main(String[] args) {
        IntDeque q = new IntArrayDeque();

        //  Adding elements

        q.addFirst(1);          //  q=[1]
        q.addFirst(2);          //  q=[2,1]
        q.addFirst(3);          //  q=[3,2,1]

        q.addLast(0);           //  q=[3,2,1,0]

        //  Retrieving elements

        int v;
        v = q.getFirst();       //  v=3, q=[3,2,1,0]
        v = q.getLast();        //  v=0, q=[3,2,1,0]
        
        //  Removing elements

        v = q.removeFirst();    //  v=3, q=[2,1,0]
        v = q.removeLast();     //  v=0, q=[2,1]
    }
}

Other examples:

DequeExample.java
DequeExample1.java

Availability

Type Interface
boolean bak.pcj.list.BooleanDeque
char bak.pcj.list.CharDeque
byte bak.pcj.list.ByteDeque
short bak.pcj.list.ShortDeque
int bak.pcj.list.IntDeque
long bak.pcj.list.LongDeque
float bak.pcj.list.FloatDeque
double bak.pcj.list.DoubleDeque

Design notes

The design considerations for TStack also applies to TDeque.

3.5  The TArrayList classes

The TArrayList classes provide array based implementations of the TList interfaces. The data in the list is represented as a T[] that grows with the size of the list.

See TList for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayList
char bak.pcj.list.CharArrayList
byte bak.pcj.list.ByteArrayList
short bak.pcj.list.ShortArrayList
int bak.pcj.list.IntArrayList
long bak.pcj.list.LongArrayList
float bak.pcj.list.FloatArrayList
double bak.pcj.list.DoubleArrayList

Remarks

Currently PCJ provides no alternatives to TArrayList other than TArrayStack and TArrayDeque which are specialized lists - they should not be used as replacements for lists.

TArrayLists perform best when they grow or shrink at the end - like stacks.

Since the underlying data structure of a TArrayList is an array, removal and insertion (other than at the end of the list) is not efficient for large list sizes.

IntArrayList s = new IntArrayList();
for (int i = 0; i < 1000000; i++)
    s.add(i);
s.removeElementAt(0); This is not efficient!!!
s.add(0, 100);        This is not efficient!!!

If you have to modify a long list at its beginning, use a TDeque instead of a list.

3.6  The TArrayStack classes

TArrayStack is an array based implementation of TStack. It is implemented as a simple extension to TArrayList.

See TStack for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayStack
char bak.pcj.list.CharArrayStack
byte bak.pcj.list.ByteArrayStack
short bak.pcj.list.ShortArrayStack
int bak.pcj.list.IntArrayStack
long bak.pcj.list.LongArrayStack
float bak.pcj.list.FloatArrayStack
double bak.pcj.list.DoubleArrayStack

3.7  The TArrayDeque classes

The TArrayDeque classes provide array based implementations of TDeque. Internally the elements are stored in a T[] the same size of or longer than the current number of elements.

0 1 2 3 4 5
1 2 3 - - -
^ ^

We operate with a pointer to the first element in the deque and a pointer to the last element (or 1 beyond the last). Insertions performed performed at the end moves the last-pointer forward, and insertions performed at the beginning moves the first-pointer backward. Likewise, removals from the end moves the last-pointer backward, and removals from the beginning move the first-pointer forward. Since the deque is restricted by a capacity all pointer movement is wrapped around the ends of the array (indices are added modulo the capacity). See the examples below:

addLast(4):

0 1 2 3 4 5
1 2 3 4 - -
^ ^

removeFirst():

0 1 2 3 4 5
- 2 3 4 - -
^ ^

addLast(5):

0 1 2 3 4 5
- 2 3 4 5 -
^ ^

removeFirst():

0 1 2 3 4 5
- - 3 4 5 -
^ ^

addLast(6):

0 1 2 3 4 5
- - 3 4 5 6
^ ^

addLast(7):

0 1 2 3 4 5
7 - 3 4 5 6
^ ^

Examples: see TDeque for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayDeque
char bak.pcj.list.CharArrayDeque
byte bak.pcj.list.ByteArrayDeque
short bak.pcj.list.ShortArrayDeque
int bak.pcj.list.IntArrayDeque
long bak.pcj.list.LongArrayDeque
float bak.pcj.list.FloatArrayDeque
double bak.pcj.list.DoubleArrayDeque

3.8  The TSet interfaces

The TSet interfaces define the set behaviour for primitive collections. It is adapted from the JCF Set interface and exists for all primitive types. In PCJ TSet is only used to tag collections - it does not extend the collection interface with new methods.

Examples:

import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.IntIterator;

public class SetExample {
    public static void main(String[] args) {
        IntSet s = new IntOpenHashSet();

        //  Adding elements
        
        s.add(1);               //  s={1}
        s.add(2);               //  s={1,2}
        s.add(3);               //  s={1,2,3}
        s.add(4);               //  s={1,2,3,4}
        s.add(5);               //  s={1,2,3,4,5}
        s.add(2);               //  s={1,2,3,4,5} (not changed)
        s.add(4);               //  s={1,2,3,4,5} (not changed)

        //  Removing elements

        s.remove(2);            //  s={1,3,4,5}
        s.remove(5);            //  s={1,3,4}

        //  Clearing
        
        s.clear();              //  s={}
        boolean c;
        c = s.isEmpty();        //  c=true

        //  Testing containment of elements

        s.add(1);
        s.add(2);
        s.add(3);               //  s={1,2,3}
        c = s.contains(1);      //  c=true, s={1,2,3}
        c = s.contains(3);      //  c=true, s={1,2,3}
        c = s.contains(5);      //  c=false, s={1,2,3}

        //  Testing containment of collections

        IntSet t;
        t = new IntOpenHashSet();
        t.add(1);               //  t={1}, s={1,2,3}
        c = s.containsAll(t);   //  c=true
        t.add(0);               //  t={0,1}, s={1,2,3}
        c = s.containsAll(t);   //  c=false

        //  Adding collections

        s.addAll(t);            //  t={0,1}, s={0,1,2,3}

        //  Retaining elements of collections
        
        t = new IntOpenHashSet();
        t.add(2);
        t.add(3);
        t.add(4);
        t.add(5);               //  t={2,3,4,5}, s={0,1,2,3}
        s.retainAll(t);         //  t={2,3,4,5}, s={2,3}

        //  Removing whole collections
        
        t = new IntOpenHashSet();
        t.add(3);
        t.add(4);
        s.removeAll(t);         //  t={3,4}, s={2}

        //  Finding the size
        
        int v;
        v = s.size();           //  v=1, s={2}
        s.add(1);
        v = s.size();           //  v=2, s={1,2}
        
        //  Converting to an array
        
        int[] a = s.toArray();  //  a=int[]{1,2}, s={1,2}

        //  Iterating over elements
        for (int i = 0; i < 10; i++)
            s.add(i);           //  s={0,1,2,3,4,5,6,7,8,9}
        IntIterator i = s.iterator();
        while (i.hasNext()) {
            int e = i.next();
            if ((e % 2) == 1)
                i.remove();
        }                       //  s={0,2,4,6,8}

        //  Comparing

        t = new IntOpenHashSet(s); //  s={0,2,4,6,8}, t={0,2,4,6,8}
        c = s.equals(t);        //  c=true
        t.remove(2);            //  s={0,2,4,6,8}, t={0,4,6,8}
        c = s.equals(t);        //  c=false

    }
}

Other examples:

SetExample.java

Availability

Type Interface
boolean bak.pcj.set.BooleanSet
char bak.pcj.set.CharSet
byte bak.pcj.set.ByteSet
short bak.pcj.set.ShortSet
int bak.pcj.set.IntSet
long bak.pcj.set.LongSet
float bak.pcj.set.FloatSet
double bak.pcj.set.DoubleSet

3.9  The TSortedSet interfaces

The TSortedSet interfaces define the sorted set behaviour for primitive collections. It extends TSet with some sub-set operations, retrieval of first and last elements, and a contract to iterate in the natural order of the elements. It is adapted from the JCF SortedSet interface and exists for all primitive types.

Availability

Type Interface
boolean bak.pcj.set.BooleanSortedSet
char bak.pcj.set.CharSortedSet
byte bak.pcj.set.ByteSortedSet
short bak.pcj.set.ShortSortedSet
int bak.pcj.set.IntSortedSet
long bak.pcj.set.LongSortedSet
float bak.pcj.set.FloatSortedSet
double bak.pcj.set.DoubleSortedSet

3.10  The TOpenHashSet classes

TOpenHashSet provide a hash table based implementation of the TSet interface with open addressing.

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharOpenHashSet
byte bak.pcj.set.ByteOpenHashSet
short bak.pcj.set.ShortOpenHashSet
int bak.pcj.set.IntOpenHashSet
long bak.pcj.set.LongOpenHashSet
float bak.pcj.set.FloatOpenHashSet
double bak.pcj.set.DoubleOpenHashSet

3.11  The TChainedHashSet classes

TOpenHashSet provide a hash table based implementation of the TSet interface with chained hashing.

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharChainedHashSet
byte bak.pcj.set.ByteChainedHashSet
short bak.pcj.set.ShortChainedHashSet
int bak.pcj.set.IntChainedHashSet
long bak.pcj.set.LongChainedHashSet
float bak.pcj.set.FloatChainedHashSet
double bak.pcj.set.DoubleChainedHashSet

3.12  The TRangeSet classes

The TRangeSet classes provide range based implementations of TSortedSet.

The elements of a TRangeSets are represented by a number of ranges of consecutive values. A TRangeSet keeps track of the lower and upper bounds of the ranges. Internally a list of ranges are kept in sorted order for fast lookup (O(log(n), where n is the number of ranges).

See TSet for examples.

TRangeSet provides a number of specialized methods for adding and searching for elements. See the API documentation for details.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharRangeSet
byte bak.pcj.set.ByteRangeSet
short bak.pcj.set.ShortRangeSet
int bak.pcj.set.IntRangeSet
long bak.pcj.set.LongRangeSet
float n/a
double n/a

TRangeSet is only provided for byte, short, int, and long. They are the only types for which it makes sense to speak about ranges of consecutive values. It is also not quite clear that the LongRangeSet class should exist. With ranges of long values we can easily construct a set that has more than Integer.MAX_VALUE elements:

import bak.pcj.set.LongRangeSet;
...
LongRangeSet s = new LongRangeSet();
s.add(0L, Integer.MAX_VALUE+1L);
int size = s.size();
// What is the value of size at this point???

This problem is unresolved. I think that use of the LongRangeSet will be deprecated some time in the future.

Remarks

TRangeSet objects use very little memory when the number of ranges is small. It does not depend on the number of elements, only the number of ranges.

A typical use of the TRangeSet classes is to get rid of hard-coded range checks, as in:

if ((c >= 'a' && c <= 'z') ||
    (c >= 'A' && c <= 'Z') ||
    (c >= '0' && c <= '9')) {
    ...
}

The above code is much easier to read and maintain (and for enough ranges much more efficient) if it is rewritten to a question of set containment:

static CharRangeSet ALPHANUMERIC = new CharRangeSet(
    new CharRange('a', 'z'),
    new CharRange('A', 'Z'),
    new CharRange('0', '9'),
);
...
if (ALPHANUMERIC.contains(c)) {
    ...
}

As the benchmarks show, TRangeSet should not be used when additions or removals can split the ranges too much. Random elements gives very bad performance, and when elements cannot be packed into ranges, there is no advantage in memory use.

3.13  The TBitSet classes

The TBitSet classes provide a bit array based implementation of TSortedSet.

A TBitSet is represented by an array of bits. When a bit is set it means that the element with the same value as the position of that bit is an element of the set. For example, the least five elements of the set {1,5,7,8,9,...} are represented by the bit array shown below:

0 1 2 3 4 5 6 7 8 9 ...
0 1 0 0 0 1 0 1 1 1 ...

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharBitSet
byte bak.pcj.set.ByteBitSet
short bak.pcj.set.ShortBitSet
int bak.pcj.set.IntBitSet
long n/a
float n/a
double n/a

TBitSet is provided for byte, char, short, and int. TBitSet depends on the set type having a meaningful enumeration. This is not the case for boolean, double, and float. TBitSet is not provided for long values since the underlying arrays would have to be too large.

Remarks

TBitSets generally perform good for small set elements and dense sets (elements that are close to each other).

Iteration over TBitSets is not particularly efficient - especially not when the set elements are sparse. The reason for this is that the object must search through an array of bits for the next set bit before it can return the next value of an iteration.

The memory used by TBitSets is proportional not to its number of elements but to the value of its largest element. Don't use bit sets with large values if memory is a problem.

When elements are added in ascending order, it requires many capacity increases. If the largest element is known it should be added first. Alternatively the initial capacity can be set to a number corresponding to the largest element via a constructor.

TBitSets cannot be used with negative elements.

3.14 The TKeySMap interfaces

The TKeySMap interfaces define maps from T keys to S values.

The primitive map interfaces are probably the interfaces that deviate the most from their JCF counterparts. The Map.Entry has been omitted because it restricts the implementations. Unless one wants to represent a map by a Map.Entry instance for each entry of the map, one is bound to create these extra instances when returning the entry set (see Map.entrySet()). It is very useful to be able to access all entries of a map, but it does not have to be through a set. PCJ introduces a special iterator called a TKeySMapIterator (see below). It has the same characteristics as a normal iterator, but is capable of returning a key and a value apart from each other.

Some of the issues that arise when using primitive types in maps are discussed below.

Examples:

import bak.pcj.CharCollection;
import bak.pcj.map.IntKeyCharMap;
import bak.pcj.map.IntKeyCharOpenHashMap;
import bak.pcj.map.IntKeyCharMapIterator;
import bak.pcj.set.IntSet;

public class IntKeyCharMapExample {
    public static void main(String[] args) {
        IntKeyCharMap s = new IntKeyCharOpenHashMap();

        //  Adding mappings
        
        s.put(1,'a');           //  s={(1,'a')}
        s.put(2,'b');           //  s={(1,'a'),(2,'b')}
        s.put(3,'c');           //  s={(1,'a'),(2,'b'),(3,'c')}

        //  Mapping keys to values
        
        char v;
        v = s.get(1);           //  v='a', s={(1,'a'),(2,'b'),(3,'c')}
        v = s.get(3);           //  v='c', s={(1,'a'),(2,'b'),(3,'c')}
        v = s.get(4);           //  v='\0', s={(1,'a'),(2,'b'),(3,'c')}

        //  Testing for a key mapping
        
        boolean c;
        c = s.containsKey(1);   //  c=true, s={(1,'a'),(2,'b'),(3,'c')}
        c = s.containsKey(4);   //  c=false, s={(1,'a'),(2,'b'),(3,'c')}

        //  Testing for a value

        c = s.containsValue('a'); //  c=true, s={(1,'a'),(2,'b'),(3,'c')}
        c = s.containsValue('d'); //  c=false, s={(1,'a'),(2,'b'),(3,'c')}

        //  Adding elements from maps
        IntKeyCharMap t = new IntKeyCharOpenHashMap();
        t.put(2, 'x');
        t.put(3, 'c');
        t.put(4, 'd');          //  t={(2,'x'),(3,'c'),(4,'d')}, s={(1,'a'),(2,'b'),(3,'c')}
        s.putAll(t);            //  s={(1,'a'),(2,'x'),(3,'c'),(4,'d')}

        //  Replacing mappings

        s.put(2, 'b');          //  s={(1,'a'),(2,'b'),(3,'c'),(4,'d')}

        //  Iterating over entries
        
        IntKeyCharMapIterator i = s.entries();
        while (i.hasNext()) {
            i.next();
            int key = i.getKey();
            char value = i.getValue();
            //  Print or do something else with entry here
        }

        //  Retrieving keys (view of map)

        IntSet keys = s.keySet(); // keys={1,2,3}
        keys.remove(4);         //  s={(1,'a'),(2,'b'),(3,'c')}

        //  Retrieving values (view of map)

        CharCollection values = s.values(); // values={'a','b','c'}
        //  Iterate over or do something else with values here

        //  Finding size of map
        int size = s.size();    //  size=3, s={(1,'a'),(2,'b'),(3,'c')}

        //  Comparing maps
        t = new IntKeyCharOpenHashMap(s);
        c = s.equals(t);        //  c=true, s={(1,'a'),(2,'b'),(3,'c')}, t={(1,'a'),(2,'b'),(3,'c')}
        t.remove(1);            //  s={(1,'a'),(2,'b'),(3,'c')}, t={(2,'b'),(3,'c')}
        c = s.equals(t);        //  c=false

        //  Clearing

        s.clear();              //  s={}
        c = s.isEmpty();        //  c=true
    }
}

Other examples:

IntKeyCharMapExample.java

Availability

Key type Value type
boolean char byte short int long float double
boolean × × × × × × × ×
char × × × × × × × ×
byte × × × × × × × ×
short × × × × × × × ×
int × × × × × × × ×
long × × × × × × × ×
float × × × × × × × ×
double × × × × × × × ×

TKeySMap is provided for all combinations of T and S.

Design note: The get() problem

The get(Object) method of Map returns null, if no mapping can be found (and also if the key maps to a null value). To make the distinction clear, JCF provides the method containsKey(Object) which is also supported by PCJ. With JCF we can almost always avoid the use of containsKey because in most programs null values are very rare compared to non-null values:

Map m;
Object value;
Object key;
...
if ( (value = m.get(key)) != null ) {
    // The mapping exists, and we know its value
    ...
}

However, with primitive types we will rarely allow specific primitive values in a map, such as the value 0, to be interpreted as a non-existing mapping. The problem with using containsKey() is that we have to perform the same lookup twice. First to check if the mapping exists (using containsKey()), and then to retrieve the mapping using get():

IntKeyCharMap m;
int key;
char value;
...
if (m.containsKey(key)) {
    // The mapping exists, now get its value
    value = m.get(key);
    ...
}

An improvement is to check for containment only when a special value is returned, but it does not solve the basic problem of having to perform the same lookup twice. The following code is only an improvement if the special value returned to indicate a non-existing mapping is not too frequently occuring as a value in the map:

IntKeyCharMap m;
int key;
char value;
...
if (((value = m.get(key)) != SPECIAL || m.containsKey(key))
    // The mapping exists, and we know its value
    ...
}

When a mapping is known to exist (i.e. it is a program error if the mapping does not exist) we usually don't want to burden the virtual machine with confirming its existence. But we can easily afford the extra check, if it is very simple.

With JCF we have the safety of get() returning a null value. After retrieving a value, we will typically perform some operation on that value, e.g. invoke a method on it or pass it to another method, so we have a good chance that a NullPointerException will be thrown very quickly:

Map m;
Object value;
Object key;
...
value = m.get(key);
// Throws a NullPointerException if the mapping does not exist
return value.toString();

For primitive maps, a similar assumption cannot be made. Operations on primitive types (excluding division operators) tend to work just fine with 0 values (or any other chosen default value). So we cannot depend on the operations succeeding the map lookup to produce an error:

StringBuffer s;
IntKeyCharMap m;
int key;
char value;
...
// Returns '\0' if the mapping does not exist
value = m.get(key);
// No error is produced if value is '\0'
s.append(value);

In that case a second map lookup is needed just for the purpose of discovering bugs early.

Design note: The get() methods of PCJ maps

To solve the problems described in the section above, PCJ provides three different get() methods for the different siturations:

If you can accept that a default value indicates the non-existence of a mapping, use get(). The method will return a default value when the key does not map to any value. The default value returned is defined in the class MapDefaults.

IntKeyCharMap m;
char value;
int key;
...
if ( (value = m.get(key)) != MapDefaults.defaultChar() ) {
    // The mapping exists, and we know its value
    ...
}

Future versions of PCJ may allow a configurable default value, but this has not been judged necessary for the time being.

If you are certain about the existence of a mapping, use tget(). The method will throw a runtime exception if the mapping does not exist.

IntKeyCharMap m;
char value;
int key;
...
// Throws an exception if key does not map to a value
value = m.tget(key);
...

If you cannot accept a primitive default value and are not certain about the existence of the mapping, use containsKey() and lget() The method will return the last value corresponding to the key of the last call to containsKey(), provided that the key exists. You may object that this is indeed two lookups, but in practice implementations of maps will be able to cache the value when looking up a key.

IntKeyCharMap m;
char value;
int key;
...
if (m.containsKey(key)) {
  // Returns the value corresponding to key
  value = m.lget();
  ...
}

Design note: Map iterators

JCF maps provide a set view of all the entries in a map. This actually restricts the implementation to be based on entries having a specific form (Map.Entry). PCJ uses an alternative approach. When we obtain the entries of a map, it will almost always have the purpose of performing an iteration. Instead of providing a set view of the entries of a primitive set, PCJ provides an iterator over all entries, which does not constrain the underlying implementation.

The TKeySMapIterator interfaces define such iterators for all TKeySMap type combinations (an example is IntKeyLongMapIterator). As shown in the example below, there is a difference between using normal iterators and map iterators. Normal iterators return the next value when next() is invoked. Map iterators make the key and the value of the next entry available when next() invoked. The key and the value can thereafter be retrieved using the map iterator's getKey() and getValue() methods.

Typical JCF code for iterating over entries:

Map m;
Iterator i = m.entrySet().iterator();
while (i.hasNext()) {
    Map.Entry e = (Map.Entry)i.next();
    int key = ((Integer)e.getKey()).intValue();
    long value = ((Long)e.getValue()).longValue();
    ...
}

Corresponding PCJ code:

IntKeyLongMap m;
IntKeyLongMapIterator i = m.entries();
while (i.hasNext()) {
    i.next();
    int key = i.getKey();
    long value = i.getValue();
    ...
}

3.15  The TKeySOpenHashMap classes

TKeySOpenHashMap provide a hash table based implementation of the TKeySMap interface with open addressing.

See TKeySMap for examples.

Availability

Key type Value type
boolean char byte short int long float double
boolean n/a n/a n/a n/a n/a n/a n/a n/a
char × × × × × × × ×
byte × × × × × × × ×
short × × × × × × × ×
int × × × × × × × ×
long × × × × × × × ×
float × × × × × × × ×
double × × × × × × × ×

TKeySOpenHashMap is provided for all combinations of T and S except where T is boolean.

3.16  The TKeySChainedHashMap classes

TKeySChainedHashMap provide a hash table based implementation of the TKeySMap interface with chained hashing.

See TKeySMap for examples.

Availability

Key type Value type
boolean char byte short int long float double
boolean n/a n/a n/a n/a n/a n/a n/a n/a
char × × × × × × × ×
byte × × × × × × × ×
short × × × × × × × ×
int × × × × × × × ×
long × × × × × × × ×
float × × × × × × × ×
double × × × × × × × ×

TKeySChainedHashMap is provided for all combinations of T and S except where T is boolean.

3.17  The TKeyMap interfaces

The TKeyMap interfaces define maps from T keys to objects.

As for TKeySMap a special map iterator is provided for TKeyMaps. An example is IntKeyMapIterator.

Examples:

import bak.pcj.map.IntKeyMap;
import bak.pcj.map.IntKeyOpenHashMap;
import bak.pcj.map.IntKeyMapIterator;
import bak.pcj.set.IntSet;
import java.util.Collection;

public class IntKeyMapExample {
    public static void main(String[] args) {
        IntKeyMap s = new IntKeyOpenHashMap();

        //  Adding mappings
        
        s.put(1,"a");           //  s={(1,"a")}
        s.put(2,"b");           //  s={(1,"a"),(2,"b")}
        s.put(3,"c");           //  s={(1,"a"),(2,"b"),(3,"c")}

        //  Mapping keys to values
        
        String v;
        v = (String)s.get(1);   //  v="a", s={(1,"a"),(2,"b"),(3,"c")}
        v = (String)s.get(3);   //  v="c", s={(1,"a"),(2,"b"),(3,"c")}
        v = (String)s.get(4);   //  v='\0', s={(1,"a"),(2,"b"),(3,"c")}

        //  Testing for a key mapping
        
        boolean c;
        c = s.containsKey(1);   //  c=true, s={(1,"a"),(2,"b"),(3,"c")}
        c = s.containsKey(4);   //  c=false, s={(1,"a"),(2,"b"),(3,"c")}

        //  Testing for a value

        c = s.containsValue("a"); //  c=true, s={(1,"a"),(2,"b"),(3,"c")}
        c = s.containsValue("d"); //  c=false, s={(1,"a"),(2,"b"),(3,"c")}

        //  Adding elements from maps

        IntKeyMap t = new IntKeyOpenHashMap();
        t.put(2, "x");
        t.put(3, "c");
        t.put(4, "d");          //  t={(2,"x"),(3,"c"),(4,"d")}, s={(1,"a"),(2,"b"),(3,"c")}
        s.putAll(t);            //  s={(1,"a"),(2,"x"),(3,"c"),(4,"d")}

        //  Replacing mappings

        s.put(2, "b");          //  s={(1,"a"),(2,"b"),(3,"c"),(4,"d")}

        //  Iterating over entries
        
        IntKeyMapIterator i = s.entries();
        while (i.hasNext()) {
            i.next();
            int key = i.getKey();
            String value = (String)i.getValue();
            //  Print or do something else with entry here
        }

        //  Retrieving keys (view of map)

        IntSet keys = s.keySet(); // keys={1,2,3}
        keys.remove(4);         //  s={(1,"a"),(2,"b"),(3,"c")}

        //  Retrieving values (view of map)

        Collection values = s.values(); // values={"a","b","c"}
        //  Iterate over or do something else with values here

        //  Finding size of map
        int size = s.size();    //  size=3, s={(1,"a"),(2,"b"),(3,"c")}

        //  Comparing maps
        t = new IntKeyOpenHashMap(s);
        c = s.equals(t);        //  c=true, s={(1,"a"),(2,"b"),(3,"c")}, t={(1,"a"),(2,"b"),(3,"c")}
        t.remove(1);            //  s={(1,"a"),(2,"b"),(3,"c")}, t={(2,"b"),(3,"c")}
        c = s.equals(t);        //  c=false

        //  Clearing

        s.clear();              //  s={}
        c = s.isEmpty();        //  c=true
    }
}

Other examples:

IntKeyMapExample.java

Availability

Type Interface
boolean bak.pcj.map.BooleanKeyMap
char bak.pcj.map.CharKeyMap
byte bak.pcj.map.ByteKeyMap
short bak.pcj.map.ShortKeyMap
int bak.pcj.map.IntKeyMap
long bak.pcj.map.LongKeyMap
float bak.pcj.map.FloatKeyMap
double bak.pcj.map.DoubleKeyMap

3.18  The TKeyOpenHashMap classes

TKeyOpenHashMap provide a hash table based implementation of the TKeyMap interface with open addressing.

See TKeyMap for examples.

Availability

Type Class
boolean n/a
char bak.pcj.map.CharKeyOpenHashMap
byte bak.pcj.map.ByteKeyOpenHashMap
short bak.pcj.map.ShortKeyOpenHashMap
int bak.pcj.map.IntKeyOpenHashMap
long bak.pcj.map.LongKeyOpenHashMap
float bak.pcj.map.FloatKeyOpenHashMap
double bak.pcj.map.DoubleKeyOpenHashMap

3.19  The TKeyChainedHashMap classes

TKeyChainedHashMap provide a hash table based implementation of the TKeyMap interface with chained hashing.

See TKeyMap for examples.

Availability

Type Class
boolean n/a
char bak.pcj.map.CharKeyChainedHashMap
byte bak.pcj.map.ByteKeyChainedHashMap
short bak.pcj.map.ShortKeyChainedHashMap
int bak.pcj.map.IntKeyChainedHashMap
long bak.pcj.map.LongKeyChainedHashMap
float bak.pcj.map.FloatKeyChainedHashMap
double bak.pcj.map.DoubleKeyChainedHashMap

Remarks

3.20  The ObjectKeyTMap interfaces

The ObjectKeyTMap interfaces define maps from Object keys to T values.

As for TKeySMap a special map iterator is provided for ObjectKeyTMaps. An example is IntKeyMapIterator.

Examples:

import bak.pcj.map.ObjectKeyIntMap;
import bak.pcj.map.ObjectKeyIntOpenHashMap;
import bak.pcj.map.ObjectKeyIntMapIterator;
import bak.pcj.IntCollection;
import java.util.Set;

public class ObjectKeyIntMapExample {
    public static void main(String[] args) {
        ObjectKeyIntMap s = new ObjectKeyIntOpenHashMap();

        //  Adding mappings
        
        s.put("a",1);           //  s={("a",1)}
        s.put("b",2);           //  s={("a",1),("b",2)}
        s.put("c",3);           //  s={("a",1),("b",2),("c",3)}

        //  Mapping keys to values
        
        int v;
        v = s.get("a");         //  v=1, s={("a",1),("b",2),("c",3)}
        v = s.get("c");         //  v=3, s={("a",1),("b",2),("c",3)}
        v = s.get("d");         //  v=null, s={("a",1),("b",2),("c",3)}

        //  Testing for a key mapping
        
        boolean c;
        c = s.containsKey("a"); //  c=true, s={("a",1),("b",2),("c",3)}
        c = s.containsKey("d"); //  c=false, s={("a",1),("b",2),("c",3)}

        //  Testing for a value

        c = s.containsValue(1); //  c=true, s={("a",1),("b",2),("c",3)}
        c = s.containsValue(4); //  c=false, s={("a",1),("b",2),("c",3)}

        //  Adding elements from maps
        ObjectKeyIntMap t = new ObjectKeyIntOpenHashMap();
        t.put("b", 100);
        t.put("c", 3);
        t.put("d", 4);          //  t={("b",100),("c",3),("d",3)}, s={("a",1),("b",2),("c",3)}
        s.putAll(t);            //  s={("a",1),("b",100),("c",3),("d",4)}

        //  Replacing mappings

        s.put("b", 2);          //  s={("a",1),("b",2),("c",3),("d",4)}

        //  Iterating over entries
        
        ObjectKeyIntMapIterator i = s.entries();
        while (i.hasNext()) {
            i.next();
            String key = (String)i.getKey();
            int value = i.getValue();
            //  Print or do something else with entry here
        }

        //  Retrieving keys (view of map)

        Set keys = s.keySet();  //  keys={"a","b","c","d"}
        keys.remove("d");       //  s={("a",1),("b",2),("c",3)}

        //  Retrieving values (view of map)

        IntCollection values = s.values(); // values={1,2,3}
        //  Iterate over or do something else with values here

        //  Finding size of map
        int size = s.size();    //  size=3, s={("a",1),("b",2),("c",3)}

        //  Comparing maps
        t = new ObjectKeyIntOpenHashMap(s);
        c = s.equals(t);        //  c=true, s={("a",1),("b",2),("c",3)}, t={("a",1),("b",2),("c",3)}
        t.remove("a");          //  s={("a",1),("b",2),("c",3)}, t={("b",2),("c",3)}
        c = s.equals(t);        //  c=false

        //  Clearing

        s.clear();              //  s={}
        c = s.isEmpty();        //  c=true
    }
}

Other examples:

ObjectKeyIntMapExample.java

Availability

Type Interface
boolean bak.pcj.map.ObjectKeyBooleanMap
char bak.pcj.map.ObjectKeyCharMap
byte bak.pcj.map.ObjectKeyByteMap
short bak.pcj.map.ObjectKeyShortMap
int bak.pcj.map.ObjectKeyIntMap
long bak.pcj.map.ObjectKeyLongMap
float bak.pcj.map.ObjectKeyFloatMap
double bak.pcj.map.ObjectKeyDoubleMap

3.21  The ObjectKeyTOpenHashMap classes

ObjectKeyTOpenHashMap provide a hash table based implementation of the ObjectKeyTMap interface with open addressing.

See ObjectKeyTMap for examples.

Availability

Type Class
boolean bak.pcj.map.ObjectKeyBooleanOpenHashMap
char bak.pcj.map.ObjectKeyCharOpenHashMap
byte bak.pcj.map.ObjectKeyByteOpenHashMap
short bak.pcj.map.ObjectKeyShortOpenHashMap
int bak.pcj.map.ObjectKeyIntOpenHashMap
long bak.pcj.map.ObjectKeyLongOpenHashMap
float bak.pcj.map.ObjectKeyFloatOpenHashMap
double bak.pcj.map.ObjectKeyDoubleOpenHashMap

3.22  The ObjectKeyTChainedHashMap classes

ObjectKeyTChainedHashMap provide a hash table based implementation of the ObjectKeyTMap interface with chained hashing.

See ObjectKeyTMap for examples.

Availability

Type Class
boolean bak.pcj.map.ObjectKeyBooleanChainedHashMap
char bak.pcj.map.ObjectKeyCharChainedHashMap
byte bak.pcj.map.ObjectKeyByteChainedHashMap
short bak.pcj.map.ObjectKeyShortChainedHashMap
int bak.pcj.map.ObjectKeyIntChainedHashMap
long bak.pcj.map.ObjectKeyLongChainedHashMap
float bak.pcj.map.ObjectKeyFloatChainedHashMap
double bak.pcj.map.ObjectKeyDoubleChainedHashMap

3.23  The adapter classes

Overview

The adapter classes of PCJ make it possible to interchange PCJ and JCF collections.

PCJ collections contain only primitive type elements whereas JCF collections contain only object type elements, so type conversion is necessary to make a PCJ collection act as a JCF collection and vice versa. There are restrictions to the types that can be converted. For example, it is not possible to adapt a JCF collection of String values to a PCJ collection, and a PCJ collection of char values cannot be converted to a JCF collection of Locale values. The only conversions allowed are those of wrapping and unwrapping primitive values. The table below shows how primitive types map to object types and the other way around.

Primitive type Object type
booleanBoolean
charCharacter
byteByte
shortShort
intInteger
longLong
floatFloat
doubleDouble

For example, a PCJ collection of int values can be adapted to a JCF collection of Integer values. And a JCF collection of Character values can be adapted to a PCJ collection of char values.

As described above there are adapter classes for all main interfaces (not stacks and deques) both ways. An adapter class implements the interface of the collection to which it adapts another collection, so an adapter from an IntList to a JCF List implements the List interface of JCF. The example below shows how the IntListToListAdapter can be used directly as a List.

...
//  Create list of 10 random int values
Random rnd = new Random();
IntList ilist = new IntList();
for (int i = 0; i < 10; i++)
    ilist.add(rnd.nextInt());

//  Adapt it to list of Integer values
List list = new IntListToListAdapter(ilist);

//  Sort using JCF sorting
Collections.sort(list);
...

Adapter classes are wrappers around their counterpart - they are not copies. So changes to a collection through an adaper are reflected by the underlying collection and vice versa. If we iterated over both collections, list and ilist in the example above, printing the elements, two equal number sequences would be printed. The list object is only a view of the ilist object.

All adapter classes can be found in the package bak.pcj.adapter. An alternative is to use the bak.pcj.Adapter class which has factory methods for creating adapters. Its method naming scheme is somewhat simpler than that of adapter classes. For example, and adaption from an IntList to a List can be created with the call Adapter.asObjects(ilist).

Validation of adapted JCF collections

Since a JCF collection is not statically typed (at least not below the java.lang.Object type) it cannot be guaranteed that it only contains elements of a specific type. By default, the adapters from JCF to PCJ does not directly perform any checks of the backing JCF collection. There is no simple way of doing it since it is always possible to write a program that creates an adapter to, say, int values on an empty JCF collection. After creation of the adapter String values or other values of wrong class can be added to the JCF collection without the adapter knowing it. Only when an element of the JCF collection is unwrapped by the adapter is the element validated. Errors resulting from illegal values in the underlying JCF collection can therefore be hard to find. The example below illustrates the point:

List list = new ArrayList();
IntList ilist = new ListToIntListAdapter(list);
// Add one legal and two illegal values to list
list.add(new Integer(1));
list.add(null); 
list.add("42");
// Set x = 1
int x = ilist.get(0);
// Will fail with a NullPointerException
int y = ilist.get(1);
...
// Will fail with a ClassCastException
int z = ilist.get(2);

As can be seen, the code fails not when an illegal value is added to the underlying JCF collection (that would require an event listener mechanism for JCF collections to intercept). Neither does it fail when the first value of the illegal collection is retrieved because that value is legal. It it first when the null value at position 1 or the String value at position 2 is retrieved that the code fails with a runtime exception.

To perform runtime checks of the underlying JCF collections, a number of methods are provided. First, JCF collections can be validated when the adapter is created. All adapters from JCF to PCJ collections has a constructor that accepts a boolean value indicating whether validation should be performed. This is shown in the example below:

Set set;
...
//  Create adapter, validate set
CharSet cset = new SetToCharSetAdapter(set, true);

The validating constructors throw a runtime exception, if the JCF collection cannot be adapted at the moment of construction, but they do not ensure that no illegal values are added to the JCF collection later in the program flow. Apart from the constructors, all adapters from JCF to PCJ collections provide two methods for validation of the underlying JCF collection:

//  Indicates whether underlying collection is valid
boolean validate();

//  Throws exception if underlying collection is invalid
void evalidate() throws IllegalStateException;

Invocations of these methods can be inserted in the code whenever the adapted JCF may have been changed. The first method is designed for use with the assertion facility; the second for the lack of assertions. The example below shows typical uses of validation methods:

List list = ExternalAPI.createData();
IntList ilist = new ListToIntListAdapter(list, true);
// ilist is safe here
...
ExternalAPI.modifyData(list); // <--- untrusted, may contain errors

//  Validate after external modification
ilist.evalidate();

//  Alternatively use the assertion facility
assert ilist.validate();

// ilist is safe here
...

The validation methods are not intended for normal operation because a failed validation means that the program has errors. If you want to check whether a JCF collection can be adapted without throwing exceptions, the Adapter class has methods for that. They all follow the pattern isXAdaptable.

Making copies

Instead of adapting a JCF collection to a PCJ collection, it is sometimes a better alternative to create a PCJ copy of the JCF collection. The reason for this is that all operations unwrap values each time they are called. If a copy is created, the unwrapping is performed once and for all. Copying can be performed by creating a new PCJ collection, creating an adaption of the JCF collection to a PCJ collection, and finally adding the adapted collection's elements to the new PCJ collection:

List list;
...
// Create int list to hold copy
IntList ilist = new IntList();

// Create an adapter to int values
IntList adapt = Adapter.asInts(list);

// Add adapted list's elements to create a copy
ilist.addAll(adapt);

The above example can be trimmed by taking advantage of the general one-argument constructor of collections, which is the preferred way of copying collections:

List list;
...
IntList ilist = new IntList(Adapter.asInts(list));

4.  Benchmarks

PCJ comes with a benchmarking API (see bak.pcj.benchmark). There are two main reasons why the API is distributed along with the core library: 1) Benchmarks are machine dependent. Running a benchmark on a different machine will yield different results from those that I have provided. 2) They are very handy when developing new classes implementing one of the main interfaces of PCJ.

4.1  About the benchmarks

Each benchmark exercises a collection class with some data set. There are benchmarks for all int variants of the PCJ classes and also for their JCF counterparts with Integer objects. There are also benchmarks for JCF classes adapted to primitive collection classes using the bak.pcj.adapter classes.

Two data sets have been used for producing the benchmark results. Both are compact in the sense that they contain ranges of consecutive numbers. One of the data sets is ordered and the other is shuffled (which makes a huge difference for the TRangeSet classes). The data sets are available in the bak.pcj.benchmark package.

4.2  The benchmark results

The benchmark results are available in the generated reports:

The benchmark names refer to the classes that produced the results. They can all be found in the bak.pcj.benchmark package. Each benchmark exercise a specific class which is named right below the name of the benchmark. Regarding the adapter classes, they adapt instances of java.util.ArrayList and java.util.HashSet. Each benchmark contains a number of tasks which are executed for the data sets available. For each task and data set, a timing result is given in milliseconds.

4.3  Comparing PCJ to JCF

When benchmarking a JCF class, the time needed to wrap and unwrap objects is not counted in. I believe that the adapted JCF-class provides a more realistic image since it performs the wrapping and unwrapping of primitive values that is necessary when working with JCF. In the tables below, timings are given as the PCJ time as a fraction of the JCF time. The timings for comparison with adapted JCF classes are given in paranthesis.

It is interesting to compare the results of a JCF class with the results of an adaptor of that class to a primitive collection. The difference tells us how much time is spent wrapping and unwrapping collection elements.

Sets

The table below shows the benchmark task times for PCJ set classes as percentages of the corresponding benchmark task times for HashSet. The percentages in parenthesis show the task times of PCJ classes as percentages of the corresponding benchmark task times for a PCJ adapted HashSet.

PCJ set class timings as percentages of JCF HashSet timings and PCJ adapted HashSet timings
Task Data IntBitSet IntChainedHashSet IntOpenHashSet IntRangeSet
AddNonExistingOrdered/Compact/400000344%(761%)94%(208%)14%(31%)13%(29%)
AddNonExistingShuffled/Compact/4000001%(3%)62%(122%)18%(35%)n/an/a
AddExistingOrdered/Compact/40000024%(12%)49%(24%)49%(24%)37%(18%)
AddExistingShuffled/Compact/4000005%(3%)85%(49%)61%(35%)n/an/a
IteratorOrdered/Compact/40000037%(22%)62%(38%)37%(22%)62%(38%)
IteratorShuffled/Compact/40000027%(16%)63%(38%)27%(16%)n/an/a
RemoveExistingOrdered/Compact/40000025%(3%)62%(7%)50%(6%)126%(15%)
RemoveExistingShuffled/Compact/4000005%(1%)91%(30%)59%(19%)n/an/a
ContainsExistingOrdered/Compact/40000016%(10%)83%(50%)50%(30%)66%(40%)
ContainsExistingShuffled/Compact/4000002%(2%)94%(64%)61%(42%)n/an/a
RemoveNonExistingOrdered/Compact/40000020%(16%)60%(50%)60%(50%)100%(83%)
RemoveNonExistingShuffled/Compact/4000005%(4%)88%(76%)71%(61%)n/an/a
ContainsNonExistingOrdered/Compact/40000020%(16%)40%(33%)60%(50%)100%(83%)
ContainsNonExistingShuffled/Compact/4000000%(0%)88%(76%)66%(57%)n/an/a

TRangeSet performs extremely bad when inserting or removing elements in random order. What happens is that each range contain only one value, and in that case a hashing approach is much better. The results are excluded from the table since the time to produce them has become intolerable.

When inserting elements in ascending order (ordered dataset) in a TBitSet it will expand its capacity with only 64 elements at a time resulting in bad performance.

Maps

The table below shows the benchmark task times for PCJ map classes as percentages of the corresponding benchmark task times for HashMap. The percentages in parenthesis show the task times of PCJ classes as percentages of the corresponding benchmark task times for a PCJ adapted HashMap.

PCJ map class timings as percentages of JCF HashMap timings and PCJ adapted HashMap timings
Task Data IntKeyIntChainedHashMap IntKeyIntOpenHashMap IntKeyChainedHashMap IntKeyOpenHashMap
PutNonExistingOrdered/Compact/400000102%(60%)13%(8%)150%(124%)91%(76%)
PutNonExistingShuffled/Compact/40000084%(65%)24%(18%)149%(135%)105%(96%)
PutExistingOrdered/Compact/40000087%(5%)50%(3%)87%(41%)62%(29%)
PutExistingShuffled/Compact/40000089%(15%)102%(17%)89%(53%)111%(66%)
GetExistingOrdered/Compact/40000085%(40%)44%(20%)71%(50%)42%(30%)
GetExistingShuffled/Compact/40000077%(44%)97%(56%)88%(61%)94%(66%)
IteratorOrdered/Compact/40000088%(44%)44%(22%)77%(43%)33%(18%)
IteratorShuffled/Compact/40000092%(47%)23%(11%)84%(54%)30%(19%)
RemoveExistingOrdered/Compact/40000087%(38%)50%(22%)87%(58%)50%(33%)
RemoveExistingShuffled/Compact/40000092%(53%)97%(56%)89%(60%)105%(71%)
RemoveNonExistingOrdered/Compact/40000075%(30%)75%(30%)75%(50%)75%(50%)
RemoveNonExistingShuffled/Compact/40000088%(59%)72%(47%)94%(80%)72%(61%)
GetNonExistingOrdered/Compact/40000075%(33%)75%(33%)77%(62%)75%(60%)
GetNonExistingShuffled/Compact/40000084%(59%)68%(47%)84%(80%)68%(65%)

Lists

The table below shows the benchmark task times for PCJ list classes as percentages of the corresponding benchmark task times for ArrayList. The percentages in parenthesis show the task times of PCJ classes as percentages of the corresponding benchmark task times for a PCJ adapted ArrayList.

PCJ list class timings as percentages of JCF ArrayList timings and PCJ adapted ArrayList timings
Task Data IntArrayList IntArrayDeque
RemoveNonExistingOrdered/Compact/40000015%(15%)16%(16%)
RemoveNonExistingShuffled/Compact/40000015%(15%)16%(16%)
RemoveExistingOrdered/Compact/400000111%(106%)0%(0%)
RemoveExistingShuffled/Compact/400000111%(106%)0%(0%)
AddExistingOrdered/Compact/40000038%(5%)38%(5%)
AddExistingShuffled/Compact/40000038%(6%)45%(7%)
IteratorOrdered/Compact/40000066%(50%)133%(100%)
IteratorShuffled/Compact/400000150%(75%)200%(100%)
RemoveEndOrdered/Compact/4000000%(0%)0%(0%)
RemoveEndShuffled/Compact/4000000%(0%)0%(0%)
ContainsNonExistingOrdered/Compact/40000021%(15%)23%(16%)
ContainsNonExistingShuffled/Compact/40000021%(15%)23%(16%)
AddNonExistingOrdered/Compact/40000036%(21%)35%(20%)
AddNonExistingShuffled/Compact/40000035%(20%)43%(24%)
AddMiddleOrdered/Compact/40000090%(101%)90%(101%)
AddMiddleShuffled/Compact/40000090%(100%)90%(101%)
RemoveBeginningOrdered/Compact/400000111%(106%)0%(0%)
RemoveBeginningShuffled/Compact/400000111%(105%)0%(0%)
RemoveMiddleOrdered/Compact/400000108%(104%)108%(104%)
RemoveMiddleShuffled/Compact/400000109%(104%)109%(104%)
AddBeginningOrdered/Compact/40000097%(97%)0%(0%)
AddBeginningShuffled/Compact/40000097%(97%)0%(0%)
ContainsExistingOrdered/Compact/40000016%(5%)16%(5%)
ContainsExistingShuffled/Compact/40000020%(6%)20%(6%)

4.4  Running the benchmarks

It may be useful to run the benchmarks on your own computer or with your own benchmark classes. To do so, you can use the classes bak.pcj.benchmark.BenchmarkRunner and bak.pcj.benchmark.ResultCollector. The BenchmarkRunner can run a benchmark on a specified data set with a specified size, and write the results to a file. When you have run a number of benchmarks, the results can be collected in a report using the ResultCollector.

Running a benchmark

The syntax for running the BenchmarkRunner is:

  bak.pcj.benchmark.BenchmarkRunner <data set class> <data set size> <benchmark class> <output file>

where

data set class
is name of a data set class. The class must be in the classpath and must be a sub-class of bak.pcj.benchmark.DataSet.
data set size
is the size of the data set. Low values will make it harder to measure the time, and high values will take a larger amount of memory. The DataSet classes will automatically produce data of the specified size.
benchmark class
is the benchmark to run. It must be a sub-class of bak.pcj.benchmark.Benchmark. The standard benchmarks can be found in the bak.pcj.benchmark package which also contains some abstract bases for implementing new benchmarks.
output file
is the name of a file on which to write the results. The file is overwritten if it already exists.

To run the standard benchmark for IntBitSets with a data set of 200000 consecutive shuffled values and write the results to a file called output.txt write:

  java -cp pcj.jar;pcj-benchmark.jar bak.pcj.benchmark.BenchmarkRunner \
    bak.pcj.benchmark.DataSetShuffledCompact 200000 \
    bak.pcj.benchmark.IntBitSetBenchmark output.txt

Collecting the results in a report

When a number of benchmarks has been run, the results can be collected and formatted as HTML using the bak.pcj.benchmark.ResultCollector. The syntax for using the ResultCollector is:

  bak.pcj.benchmark.ResultCollector <output file> <title> <result1> <result2> ...

where

output file
is the name of a file on which to write the HTML formatted report.
title
is a title for the report.
result1 result2 ...
are names of files containing benchmark results (output from BenchmarkRunner).

Making the benchmarks report better results

You need to have a setup where the JVM can run relatively undisturbed by other processes and can take plenty of memory without the operating system swapping memory out to the hard-disk. Some of the first runs that I made took a lot of memory and the Windows began swapping memory out in the middle of the benchmark, invalidating the results.

Most of the benchmarks use a lot of memory. Since you cannot rely 100% on garbage collection, you should only run one or two benchmarks during the same invocation of the JVM (that is one of the reasons why it is possible to collect benchmark results from separate reports into one report). Alternatively you can cut down the size of the data sets, but that will make the time measurements uncertain since the benchmarks wont be allowed to run for as long a time as they would with larger data sets.

If you have Windows 2000/NT you can use the task manager to verify your benchmark results. Each benchmark run should look like two trapezoids on the CPU-usage graph. They should be completely flat on the top (which should lie at 100%). If they don't, they have been interrupted by a process waiting for some resource to become available, e.g. memory that is swapped out.


Report a bug or request a feature.
Further information on the development and latest release of PCJ can be found at the project homepage.

Primitive Collections for Java is released under the GNU Lesser General Public License.
Copyright © 2002, 2003 Søren Bak. All Rights Reserved.

Hosted by SourceForge.net
SourceForge.net Logo

Java is a trademark or registered trademark of Sun Microsystems, Inc. in the US and other countries.

Document version: 1.4, August 24, 2003