Primitive Collections for Java 1.2
A developer's guide to using and extending the framework

Table of Contents

1.  Introduction

1.1  A personal motive

Primitive Collections for Java (PCJ) started with the need for a set data structure for character values. The set was to be used for configuring a phonetic encoder class. The encoder would be given a text in some language, and for each character in the text, it would determine whether or not the character was to be encoded. I started using the HashSet class of Java Collections Framework (JCF). Somewhere in the encoder the code was something like this:

public class PhoneticEncoder {

    java.util.Set acceptedChars;

    public Encoder(java.util.Set ac) {
        acceptedChars = ac;
    }

    String encode(CharSequence s) {
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (acceptedChars.contains(new Character(c))) {
                ...
            }
        }
    }
}

// And somewhere outside the encoder...
java.util.Set s = new java.util.HashSet();
for (char c = 'a'; c <= 'z'; c++)
    s.add(new Character(c));
for (char c = 'A'; c <= 'Z'; c++)
    s.add(new Character(c));
for (char c = '0'; c <= '9'; c++)
    s.add(new Character(c));
PhoneticEncoder enc = new PhoneticEncoder(s);

I usually don't like representing primitive values as objects, especially not when a simple comparison requires my program to create a new instance (unless I want to represent texts as sequences of Character objects). It may be an illusion taking JVM optimization into account, but it does not seem efficient, and thinking about the data in my programs as wrapped values seems counter-intuitive. How I would like to think about my program is this:

public class PhoneticEncoder {

    CharSet acceptedChars;

    public Encoder(CharSet ac) {
        acceptedChars = ac;
    }

    String encode(CharSequence s) {
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (acceptedChars.contains(c)) {
                ...
            }
        }
    }
}

// And somewhere outside the encoder...
CharRangeSet s = new CharRangeSet();
s.add(new CharRange('a', 'z'));
s.add(new CharRange('A', 'Z'));
s.add(new CharRange('0', '9'));
PhoneticEncoder enc = new PhoneticEncoder(s);

So I began investigating other possibilities. Keeping memory usage low was quite important for this particular application. First, I thought of using a BitSet. I believed that if I subtracted (int)'A' from each character value to be stored, I could do with a hundred bits or so. A little research into the Unicode standard meant that this particular approach had to be abandoned because it would mean a memory usage of about 0xFFFF/8 = 16384 bytes for representing a set of perhaps a few hundred characters (some of the Unicode characters have pretty high numbers). Even worse: most of the characters fell into long ranges of consecutive values, so there were actually no reason for setting a bit for each and every character. Instead I decided to represent a set of characters as a sorted list of ranges, storing only the lower and upper bounds of each range. This list could be searched very quickly because it would rarely have more than 15 ranges and testing a character for containment in a range would only require comparing it to the lower and upper bounds. Using binary search on 15 ranges it would in the worst case require 2×log2(16) = 2×4 = 8 character comparisons to find a character in the set. That was acceptable for my purposes. What was not acceptable was that my changing the representation of character sets meant that the API of the phonetic encoder would have to depend upon a very specialized binary searching range based character set class, which in turn had to be packaged with the encoder. Having done this trick for the (n+1)'th time and having noted for the n'th time the stupidity of doing it, I decided to make a generalized framework for primitive types. The first n-1 times I had resisted the temptation of creating yet another standard on which all other APIs should depend. This time I fell short, and PCJ is the result.

Søren Bak

1.2  The relation to the Java Collections Framework

When you examine the class hierarchy and the methods of PCJ, you will find that the interfaces and abstract classes (not the inner workings of the implementing classes) bear a close resemblance to those of the Java Collections Framework. This is intended. I have designed PCJ with the following in mind: Programmers are used to JCF and a close resemblance will increase the learning rate; and the complexity of adapters between JFC and PCJ should be kept low. There are plenty of other opportunities to introduce bugs. Besides that, I think that Joshua Bloch's JCF has a good interface design - worth basing other collections libraries on.

1.3  Completeness over sanity

You will find some rather pathological classes in PCJ. For example, there is a set class for boolean values called BooleanDirectSet. The value of such a class is questionable and I felt quite silly coding it (take a look at the source code, it seems like a lot of work to represent the four possible states: {}, {F}, {T}, and {F,T}). Another example of such a class is the hash map from float keys to short values (that one is generated). However, I think that a primitive collections API should provide a complete set of classes instead of just those that the author finds useful at the time. It should be completely symmetric over all primitive types.

1.4  Why use PCJ?

1.5  Why not use PCJ?

1.6  What's new?

Version 1.2

The focus of v1.2 was to implement sorted set operations.

Changes:

Version 1.1.1

This was a bugfix release:

Version 1.1

The focus of v1.1 was to complete a branch of the map interface hierarchy and make the API interact better with client Java code.

Changes:

2.  The Framework

The classes and interfaces of PCJ have been built symmetrically around the primitive types: boolean, char, byte, short, int, long, double, and float.

When I refer to a class, it can either be a type independent reference or a type specific reference. For type independent references, T and S are used to stand for primitive types. When I write TCollection it is therefore a general (type independent) reference to the classes BooleanCollection, CharCollection, ByteCollection, ShortCollection, IntCollection, LongCollection, FloatCollection, and DoubleCollection.

When for some reason it makes no sense to provide a class for some primitive type, the symmetry is allowed to fail. This is only the case for classes - not for the general interfaces. There are generally made notes about such cases throughout this guide.

2.1  The bak.pcj.* packages

bak.pcj

Provides general primitive collection classes. All other classes in the API depends on those classes.

bak.pcj.adapter

Provides adapter classes between PCJ and JCF. The adapters provide a convenient interface between PCJ-based and JCF-based classes.

bak.pcj.benchmark

Provides classes for benchmarking PCJ and parts of JCF. When new classes are implemented they can be compared to other implementations using standard tasks and data sets.

bak.pcj.hash

Provides classes for creating hash functions. Besides providing default hash functions the package also provides interfaces that can be implemented to change the hash functions of the hashing collection classes.

bak.pcj.list

Provides list collection interfaces and classes. Lists extend the general collection interfaces with sequential access to collection elements.

bak.pcj.map

Provides map interfaces and classes. The implementations are currently based on hashing.

bak.pcj.set

Provides set interfaces and classes. Sets extend the general collection interfaces with set behavior. There are set classes providing different storage strategies for different situations.

2.2  Collection classes

The core of PCJ is a small set of interrelated interfaces. All collection classes derive in some way or other from these interfaces. They are shown shown in the UML class diagram below:

Interface hierarchy for collection classes

The main interface is TCollection. It provides basic collection behaviour like adding and removing elements. The basic interfaces can be found in the package bak.pcj. There are two major branches of sub-interfaces. The one is the TSet branch which provides set behaviour for collections. All set classes can be found in the bak.pcj.set package. The other is the TList branch which provides list behaviour. All list classes can be found in the bak.pcj.list package.

There are a number of alternative implementations of these interfaces. Each implementation is useful in specific situations, and it requires a little knowledge about the classes to choose the right one. The classes implementing the interface hierarchy are shown in the UML class diagram below without method details.

Interface and class hierarchy for collection classes

Note that each of the main interfaces have an abstract class directly below in the hierarchy. Unless you are going to implement new collection classes, the abstract classes can safely be ignored. The implemented interfaces are important, not the super-classes. If you are going to implement new collection classes, the abstract classes provide base implementations of each of the main interfaces.

As mentioned earlier T is not {boolean, char, byte, short, int, long, float, double} for all classes. The table below shows the type independent classes and their implementations for each of the primitive types. You should examine the table and the corresponding APIs to get an idea of the class hierarchy.

Conceptual collection classes and actual implementations in PCJ
Class or interface Description Availability
boolean char byte short int long float double
TCollection General collection interface × × × × × × × ×
TSet General set interface × × × × × × × ×
TSortedSet Sorted set interface × × × × × × × ×
TOpenHashSet Hash set implementation using open addressing   × × × × × × ×
TChainedHashSet Hash set implementation using chained addressing   × × × × × × ×
TRangeSet Set class based on value ranges   × × × × ×    
TBitSet Set class based on bit arrays   × × × ×      
TDirectSet Boolean set ×              
TList General list interface × × × × × × × ×
TStack General stack interface × × × × × × × ×
TDeque General deque interface × × × × × × × ×
TArrayList Array list class × × × × × × × ×
TArrayStack Array stack class × × × × × × × ×
TArrayDeque Array deque class × × × × × × × ×

2.3  Map classes

As in JCF, map classes are not collections, but interact closely with collections. All primitive map classes implement either the TKeySMap interface, whose instances map T keys to S values, the TKeyMap interface, whose instances map T values to java.lang.Objects, or the ObjectKeyTMap interface, whose instances map java.lang.Objects to T values. The three interfaces are not directly related but share some design and logic. They are shown in the UML class diagram below:

Interface hierarchy for map classes

PCJ provides two implementation variants of both the TKeySMap interfaces, the TKeyMap interfaces, and the ObjectKeyTMap interfaces; namely TKeySOpenHashMap/TKeySChainedHashMap, TKeyOpenHashMap/TKeyChainedHashMap, and ObjectKeyTOpenHashMap/ObjectKeyTChainedHashMap as shown in the UML class diagram below:

Class hierarchy for map classes

2.4  Adapter classes

To provide a bridge between PCJ based and JCF based APIs there are adapters for all of the main interfaces: collections, iterators, sets, lists, list iterators, and maps. They can be found in the package bak.pcj.adapter. Alternatively, use the class bak.pcj.Adapter, which provides factory methods for all adapter classes. The table below lists all adapter classes.

Adapter classes
Class or interface Description Availability
boolean char byte short int long float double
TCollection­To­Collection­Adapter Adapters from PCJ collections to JCF collections × × × × × × × ×
TIterator­To­Iterator­Adapter Adapters from PCJ iterators to JCF iterators × × × × × × × ×
TListToListAdapter Adapters from PCJ lists to JCF lists × × × × × × × ×
TListIterator­To­ListIterator­Adapter Adapters from PCJ list iterators to JCF list iterators × × × × × × × ×
TSetToSetAdapter Adapters from PCJ sets to JCF sets × × × × × × × ×
TSortedSetToSortedSetAdapter Adapters from PCJ sorted sets to JCF sorted sets × × × × × × × ×
TKeyMapToMapAdapter Adapters from PCJ maps to JCF maps × × × × × × × ×
ObjectKeyTMapToMapAdapter Adapters from PCJ maps to JCF maps × × × × × × × ×
TKeySMapToMapAdapter Adapters from PCJ maps to JCF maps All 64 combinations, e.g. IntKeyCharMapToMapAdapter
Collection­To­TCollection­Adapter Adapters from JCF collections to PCJ collections × × × × × × × ×
Iterator­To­TIterator­Adapter Adapters from JCF iterators to PCJ iterators × × × × × × × ×
List­To­TList­Adapter Adapters from JCF lists to PCJ lists × × × × × × × ×
ListIterator­To­TListIterator­Adapter Adapters from JCF list iterators to PCJ list iterators × × × × × × × ×
Set­To­TSet­Adapter Adapters from JCF sets to PCJ sets × × × × × × × ×
SortedSet­To­TSortedSet­Adapter Adapters from JCF sorted sets to PCJ sorted sets × × × × × × × ×
MapToTKeyMapAdapter Adapters from JCF maps to PCJ maps × × × × × × × ×
MapToObjectKeyTMapAdapter Adapters from JCF maps to PCJ maps × × × × × × × ×
MapToTKeySMapAdapter Adapters from JCF maps to PCJ maps All 64 combinations, e.g. MapToIntKeyCharMapAdapter

2.5  Collection growth

Many of the collection and map classes are table based. Elements and mappings are stored in an underlying table (a Java array). The tables underlying table based collections have to grow as elements are added and possibly shrink as they are removed. That means allocating a new longer table and copy the existing elements into that table. It is easy to see that tables should not be expanded every time an element is added because that would result in very poor performance for additions. Instead the tables are expanded to a length well over the currently needed length. We call that length the capacity of the collection. Increasing the capacity of a collection is an expensive operation because of the copying involved (and re-hashing for hashtable-based maps), so the performance of add operations depend very much on how often the capacity is increased.

The table based collections and maps in PCJ supports two strategies for increasing the capacity:

Each strategy can be specified by selection of constructor. For example, we can create an int array list that grows with a fixed number of elements with the following code:

IntList list = new IntArrayList(100, 10);

The first argument in the constructor is the initial capacity of the list. The second is the number of elements with which to increase the capacity as elements are added. So in this case the first 100 elements are added without any capacity increase. When the 101st element is added the capacity is increased to 110, when the 111th element is added the capacity is increased to 120, and so on. Similarly, to create a list that grows with 20% size for each capacity increase, we would write:

IntList list = new IntArrayList(100, 0.2);

When the 101st element is added, the capacity is increased to 1.2×100 = 120. When the 121st element is added, the capacity is increased to 1.2×120 = 144, and so forth.

The strategy that should be employed depends on the situation. If you know how many elements are going to be in a collection, it is generally a good idea to give the collection that capacity from the beginning so as to avoid capacity increases. Very often we cannot know the needed capacity, and in those cases we should think about how collections grow. A rule of thumb is that if a collection rarely grows or grows slowly its capacity should increase by a fixed amount. In that way we avoid wasting too much memory on an exponential growth rate. On the other hand, if a collection grows fast its capacity should grow exponentially, because that is the best way to minimize the number of capacity increases. However, with a growth factor of 1.0 (doubling the capacity) we potentially allocate approximately twice as much memory as needed.

2.6  Hashing

A number of the classes of PCJ are based on hash tables. As explained below, there are two main hashing variants in use: open addressing with double hashing and chaining.

Open addressing

The open hash collection and map variants (e.g. TOpenHashSet and TKeySOpenHashMap) are based on a hash table using open addresing with double hashing. Double hashing is a method of resolving collisions in hash tables when more than one key hash to the same address in the table. The secondary hash function used is always:

h2(n) = 1 + (h1(n) mod (M - 2))

where h1 is the primary hash function and M is the length of the underlying hash table. Using this function, the M should be a prime in order for all the elements of the hash table to be searched. The class Primes is used for selecting prime numbers.

Chained hash tables

The chained hash table variants (e.g. TChainedHashSet and TKeySChainedHashMap) are based on hash tables with linked lists. Instead of searching for an empty slot in the hash table, each slot contains a linked list (a chain) of those entries whose keys hash to that slot. In that way collision resolution is completely avoided, but we have to search linked lists instead.

Load factors

Both the OpenHash and ChainedHash variants begin to perform bad when many keys have been hashed to the same slot in the underlying hash table. To prevent that, hash tables are equipped with a maximum load factor, which is some proportion of the its size. When that proportion of the hash table is used, its length is increased (see the discussion on growth above). The default load factor is 0.75 for both variants, and a custom load factor may be specified with the constructors.

The plots below shows how the load of a hash based set affects performance. For open addressing the time to look up a key increases steadily until it suddenly explodes around 0.7. For chained hashing the mean time to look up a key will increase steadily as the lists grow longer. The plots were created for hash sets, but are very similar for hash maps.

(The horizontal lines on the plots are due to the resolution of the timer on my computer, which is quite clearly somewhere around 10ms.)

Creating new hash functions

If nothing else is specified with the constructors for the hash table based collections and maps, the default hash functions of bak.pcj.hash are used. It is possible that the default functions give poor performance results for specific data sets, and in such cases a new hash function may be specified.

Assume that the hash table length M is 17 and the default hash function h(n) = n (which it is for int values). If we add 17 it goes into slot 0 since 17 mod 17 = 0. If we add 34 it also goes into slot 0 since 34 mod 17 = 0, thus activating the use of the secondary hash function in the case of open addressing with double hashing. With chaining we get longer lists. This example could benefit from an alternative hash function, e.g. h(n) = (n div 17) + n. That would make n = 17 go into slot ((17 div 17) + 17) mod 17 = 18 mod 17 = 1, and similarly 34 would go into slot ((34 div 17) + 17) mod 17 = 19 mod 17 = 2, thus eliminating the use of the secondary hash function (or chains).

The example below shows how to pass a new hash function to a hash set:

import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.hash.IntHashFunction;

...
IntSet s = new IntOpenHashSet(
    new IntHashFunction() {
        public int hash(int v) {
            return v/17 + v;
        }
    }
);

3.  The classes

In this section the interfaces and classes are described in more detail and examples are given. The design is also discussed.

3.1  The TCollection interfaces

The TCollection interfaces define the basic collection behaviour. It is adapted from the JCF Collection interface. It provides the basic methods for modifying and querying collections.

Although TCollection is the general ancestor of all collection classes, each method may have a different semantics depending on the collection type. For example, the method add(T) adds an element to the end of a TList whereas it ensures that a TSet contains an element.

If you only use PCJ and do not extend it, you will only use TCollection as a polymorphic type for collections.

See TSet for examples. The two interfaces are identical in signature.

Availability

Type Interface
boolean bak.pcj.BooleanCollection
char bak.pcj.CharCollection
byte bak.pcj.ByteCollection
short bak.pcj.ShortCollection
int bak.pcj.IntCollection
long bak.pcj.LongCollection
float bak.pcj.FloatCollection
double bak.pcj.DoubleCollection

3.2  The TList interfaces

The TList interfaces define an extension to the TCollection interfaces for sequential lists. As TCollection this is basically an adpation of the JCF List interface.

Example:

import bak.pcj.list.IntList;
import bak.pcj.list.IntArrayList;
import bak.pcj.list.IntListIterator;

public class ListExample {
    public static void main(String[] args) {
        IntList s = new IntArrayList();

        //  Adding elements to the end
        
        s.add(1);               //  s=[1]
        s.add(2);               //  s=[1,2]

        //  Inserting elements at positions

        s.add(2, 3);            //  s=[1,2,3]
        s.add(0, 0);            //  s=[0,1,2,3]
        s.add(1, 10);           //  s=[0,10,1,2,3]

        //  Removing elements at positions

        s.removeElementAt(1);   //  s=[0,1,2,3]

        //  Removing elements by value
        
        s.remove(2);            //  s=[0,1,3]

        //  Updating elements

        s.set(2, 2);            //  s=[0,1,2]
        s.set(0, -1);           //  s=[-1,1,2]

        //  Searching for elements

        int v;
        v = s.indexOf(1);       //  v=1
        v = s.indexOf(2);       //  v=2
        v = s.indexOf(10);      //  v=-1

        //  Adding collections at positions

        IntArrayList t = new IntArrayList();
        t.add(10);              //  t=[10]
        t.add(20);              //  t=[10,20]
        t.add(30);              //  t=[10,20,30]
        
        s.addAll(1, t);         //  s=[-1,10,20,30,1,2]
    }
}

Other examples:

ListExample.java
Reverse.java
Sorting.java

Availability

Type Interface
boolean bak.pcj.list.BooleanList
char bak.pcj.list.CharList
byte bak.pcj.list.ByteList
short bak.pcj.list.ShortList
int bak.pcj.list.IntList
long bak.pcj.list.LongList
float bak.pcj.list.FloatList
double bak.pcj.list.DoubleList

Design notes

The TList interfaces do not provide methods to retrieve a sub-list of a list. It is relatively complex to implement sub-list views, so this feature (which is implemented in JCF lists) is omitted for the time being.

A deviation from JCF in method names is that of removeElementAt(). JCF lists has a remove(int) and remove(Object) method for removing elements by index and by equality, respectively. Since the objects of PCJ lists can be of type int, there would be a clash between the two method signatures if the JCF naming should be used. Therefore PCJ lists use removeElementAt(int) to remove elements by index.

3.3  The TStack interfaces

The TStack interfaces extend the TList interfaces with the typical stack operations: push(), pop(), and peek(). Stacks typically only grow and shrink at the end.

Note that all TLists in principle can be used as stacks. We can easily replace pop() with removeElementAt(size()-1) and push(v) with add(v). The stack interfaces does not guarantee that stack operations are more efficient than their list/collection counterparts, but it makes the intention clear.

Example:

import bak.pcj.list.IntStack;
import bak.pcj.list.IntArrayStack;

public class StackExample {
    public static void main(String[] args) {
        IntStack s = new IntArrayStack();

        //  Pushing elements

        s.push(1);              //  s=[1]
        s.push(2);              //  s=[1,2,3]
        s.push(3);              //  s=[1]

        //  Inspecting

        int v;
        v = s.peek();           //  v=3, s=[1,2,3]

        //  Popping elements

        v = s.pop();            //  v=3, s=[1,2]
        v = s.pop();            //  v=0, s=[1]
    }
}

Other examples:

StackExample.java
Calculator.java

Availability

Type Interface
boolean bak.pcj.list.BooleanStack
char bak.pcj.list.CharStack
byte bak.pcj.list.ByteStack
short bak.pcj.list.ShortStack
int bak.pcj.list.IntStack
long bak.pcj.list.LongStack
float bak.pcj.list.FloatStack
double bak.pcj.list.DoubleStack

Design notes

It may be questioned whether a stack is a special kind of list. Interfaces are often used for specifying the minimum capabilities of an object, e.g. an argument to a method. In those cases a stack ought to be just that: a stack, not a list, because we probably only need to push, pop, and peek the object.

On the other hand, stacks are sequential information structures. It should be possible to iterate over the elements. It may also be argued that they at least should extend the TCollection interface to be interesting in a collections API. And extending the TCollection interface they might as well extend the TList interface.

In JCF stacks are merely special implementations of lists. That leaves us with a problem when an operations needs a stack to work efficiently: We can either 1) make the argument a Stack which is a special implementation of List or we can 2) make the argument a List and hope that the caller passes an object that has fast addition and removal from the end of the list.

//  Solution 1:
void doSomething(Stack s) {
    // Perform stack operations on s here using push(Object) and pop()
    ...
}
...
Stack s = new Stack();
doSomething(s);

//  Solution 2:
void doSomething(List s) {
    // Perform stack operations on s using add(Object) and remove(int)
    ...
}
...
List s = new Stack();
doSomething(s);

Solution 1 above is not acceptable, because the doSomething() method does not care about the implementation of the stack argument. It puts a restriction on the caller that is not needed. Solution 2 is not acceptable because any list implementation can be passed to doSomething() may render it inefficient.

I think that a better way is to make Stack an interface and to provide implementations with fast addition and removal at the end of lists.

//  Solution with Stack as an interface:

void doSomething(Stack s) {
    // Perform stack operations on s here using push(Object) and pop()
    ...
}
...
Stack s = new MyStackImplementation();
doSomething(s);

That is why TStack is an interface in PCJ.

3.4  The TDeque interfaces

The TDeque interfaces extend the TList interfaces with the typical typical operations of a deque: addFirst(), addLast(), getFirst(), getLast(), removeFirst(), and removeLast(). Deques typically only grow and shrink at the ends.

Note that all TLists in principle can be used as stacks because of the positional access to elements. We can easily replace getFirst() with get(0), removeLast() with removeElementAt(size()-1), and so forth. The deque interfaces does not guarantee that deque operations are more efficient than their list/collection counterparts, but it makes the intention clear.

Example:

import bak.pcj.IntIterator;
import bak.pcj.list.IntDeque;
import bak.pcj.list.IntArrayDeque;

public class DequeExample {
    public static void main(String[] args) {
        IntDeque q = new IntArrayDeque();

        //  Adding elements

        q.addFirst(1);          //  q=[1]
        q.addFirst(2);          //  q=[2,1]
        q.addFirst(3);          //  q=[3,2,1]

        q.addLast(0);           //  q=[3,2,1,0]

        //  Retrieving elements

        int v;
        v = q.getFirst();       //  v=3, q=[3,2,1,0]
        v = q.getLast();        //  v=0, q=[3,2,1,0]
        
        //  Removing elements

        v = q.removeFirst();    //  v=3, q=[2,1,0]
        v = q.removeLast();     //  v=0, q=[2,1]
    }
}

Other examples:

DequeExample.java
DequeExample1.java

Availability

Type Interface
boolean bak.pcj.list.BooleanDeque
char bak.pcj.list.CharDeque
byte bak.pcj.list.ByteDeque
short bak.pcj.list.ShortDeque
int bak.pcj.list.IntDeque
long bak.pcj.list.LongDeque
float bak.pcj.list.FloatDeque
double bak.pcj.list.DoubleDeque

Design notes

The design considerations for TStack also applies to TDeque.

3.5  The TArrayList classes

The TArrayList classes provide array based implementations of the TList interfaces. The data in the list is represented as a T[] that grows with the size of the list.

See TList for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayList
char bak.pcj.list.CharArrayList
byte bak.pcj.list.ByteArrayList
short bak.pcj.list.ShortArrayList
int bak.pcj.list.IntArrayList
long bak.pcj.list.LongArrayList
float bak.pcj.list.FloatArrayList
double bak.pcj.list.DoubleArrayList

Remarks

Currently PCJ provides no alternatives to TArrayList other than TArrayStack and TArrayDeque which are specialized lists - they should not be used as replacements for lists.

TArrayLists perform best when they grow or shrink at the end - like stacks.

Since the underlying data structure of a TArrayList is an array, removal and insertion (other than at the end of the list) is not efficient for large list sizes.

IntArrayList s = new IntArrayList();
for (int i = 0; i < 1000000; i++)
    s.add(i);
s.removeElementAt(0); This is not efficient!!!
s.add(0, 100);        This is not efficient!!!

If you have to modify a long list at its beginning, use a TDeque instead of a list.

3.6  The TArrayStack classes

TArrayStack is an array based implementation of TStack. It is implemented as a simple extension to TArrayList.

See TStack for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayStack
char bak.pcj.list.CharArrayStack
byte bak.pcj.list.ByteArrayStack
short bak.pcj.list.ShortArrayStack
int bak.pcj.list.IntArrayStack
long bak.pcj.list.LongArrayStack
float bak.pcj.list.FloatArrayStack
double bak.pcj.list.DoubleArrayStack

3.7  The TArrayDeque classes

The TArrayDeque classes provide array based implementations of TDeque. Internally the elements are stored in a T[] the same size of or longer than the current number of elements.

0 1 2 3 4 5
1 2 3 - - -
^ ^

We operate with a pointer to the first element in the deque and a pointer to the last element (or 1 beyond the last). Insertions performed performed at the end moves the last-pointer forward, and insertions performed at the beginning moves the first-pointer backward. Likewise, removals from the end moves the last-pointer backward, and removals from the beginning move the first-pointer forward. Since the deque is restricted by a capacity all pointer movement is wrapped around the ends of the array (indices are added modulo the capacity). See the examples below:

addLast(4):

0 1 2 3 4 5
1 2 3 4 - -
^ ^

removeFirst():

0 1 2 3 4 5
- 2 3 4 - -
^ ^

addLast(5):

0 1 2 3 4 5
- 2 3 4 5 -
^ ^

removeFirst():

0 1 2 3 4 5
- - 3 4 5 -
^ ^

addLast(6):

0 1 2 3 4 5
- - 3 4 5 6
^ ^

addLast(7):

0 1 2 3 4 5
7 - 3 4 5 6
^ ^

Examples: see TDeque for examples.

Availability

Type Class
boolean bak.pcj.list.BooleanArrayDeque
char bak.pcj.list.CharArrayDeque
byte bak.pcj.list.ByteArrayDeque
short bak.pcj.list.ShortArrayDeque
int bak.pcj.list.IntArrayDeque
long bak.pcj.list.LongArrayDeque
float bak.pcj.list.FloatArrayDeque
double bak.pcj.list.DoubleArrayDeque

3.8  The TSet interfaces

The TSet interfaces define the set behaviour for primitive collections. It is adapted from the JCF Set interface and exists for all primitive types. In PCJ TSet is only used to tag collections - it does not extend the collection interface with new methods.

Examples:

import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.IntIterator;

public class SetExample {
    public static void main(String[] args) {
        IntSet s = new IntOpenHashSet();

        //  Adding elements
        
        s.add(1);               //  s={1}
        s.add(2);               //  s={1,2}
        s.add(3);               //  s={1,2,3}
        s.add(4);               //  s={1,2,3,4}
        s.add(5);               //  s={1,2,3,4,5}
        s.add(2);               //  s={1,2,3,4,5} (not changed)
        s.add(4);               //  s={1,2,3,4,5} (not changed)

        //  Removing elements

        s.remove(2);            //  s={1,3,4,5}
        s.remove(5);            //  s={1,3,4}

        //  Clearing
        
        s.clear();              //  s={}
        boolean c;
        c = s.isEmpty();        //  c=true

        //  Testing containment of elements

        s.add(1);
        s.add(2);
        s.add(3);               //  s={1,2,3}
        c = s.contains(1);      //  c=true, s={1,2,3}
        c = s.contains(3);      //  c=true, s={1,2,3}
        c = s.contains(5);      //  c=false, s={1,2,3}

        //  Testing containment of collections

        IntSet t;
        t = new IntOpenHashSet();
        t.add(1);               //  t={1}, s={1,2,3}
        c = s.containsAll(t);   //  c=true
        t.add(0);               //  t={0,1}, s={1,2,3}
        c = s.containsAll(t);   //  c=false

        //  Adding collections

        s.addAll(t);            //  t={0,1}, s={0,1,2,3}

        //  Retaining elements of collections
        
        t = new IntOpenHashSet();
        t.add(2);
        t.add(3);
        t.add(4);
        t.add(5);               //  t={2,3,4,5}, s={0,1,2,3}
        s.retainAll(t);         //  t={2,3,4,5}, s={2,3}

        //  Removing whole collections
        
        t = new IntOpenHashSet();
        t.add(3);
        t.add(4);
        s.removeAll(t);         //  t={3,4}, s={2}

        //  Finding the size
        
        int v;
        v = s.size();           //  v=1, s={2}
        s.add(1);
        v = s.size();           //  v=2, s={1,2}
        
        //  Converting to an array
        
        int[] a = s.toArray();  //  a=int[]{1,2}, s={1,2}

        //  Iterating over elements
        for (int i = 0; i < 10; i++)
            s.add(i);           //  s={0,1,2,3,4,5,6,7,8,9}
        IntIterator i = s.iterator();
        while (i.hasNext()) {
            int e = i.next();
            if ((e % 2) == 1)
                i.remove();
        }                       //  s={0,2,4,6,8}

        //  Comparing

        t = new IntOpenHashSet(s); //  s={0,2,4,6,8}, t={0,2,4,6,8}
        c = s.equals(t);        //  c=true
        t.remove(2);            //  s={0,2,4,6,8}, t={0,4,6,8}
        c = s.equals(t);        //  c=false

    }
}

Other examples:

SetExample.java

Availability

Type Interface
boolean bak.pcj.set.BooleanSet
char bak.pcj.set.CharSet
byte bak.pcj.set.ByteSet
short bak.pcj.set.ShortSet
int bak.pcj.set.IntSet
long bak.pcj.set.LongSet
float bak.pcj.set.FloatSet
double bak.pcj.set.DoubleSet

3.9  The TSortedSet interfaces

The TSortedSet interfaces define the sorted set behaviour for primitive collections. It extends TSet with some sub-set operations, retrieval of first and last elements, and a contract to iterate in the natural order of the elements. It is adapted from the JCF SortedSet interface and exists for all primitive types.

Availability

Type Interface
boolean bak.pcj.set.BooleanSortedSet
char bak.pcj.set.CharSortedSet
byte bak.pcj.set.ByteSortedSet
short bak.pcj.set.ShortSortedSet
int bak.pcj.set.IntSortedSet
long bak.pcj.set.LongSortedSet
float bak.pcj.set.FloatSortedSet
double bak.pcj.set.DoubleSortedSet

3.10  The TOpenHashSet classes

TOpenHashSet provide a hash table based implementation of the TSet interface with open addressing.

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharOpenHashSet
byte bak.pcj.set.ByteOpenHashSet
short bak.pcj.set.ShortOpenHashSet
int bak.pcj.set.IntOpenHashSet
long bak.pcj.set.LongOpenHashSet
float bak.pcj.set.FloatOpenHashSet
double bak.pcj.set.DoubleOpenHashSet

3.11  The TChainedHashSet classes

TOpenHashSet provide a hash table based implementation of the TSet interface with chained hashing.

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharChainedHashSet
byte bak.pcj.set.ByteChainedHashSet
short bak.pcj.set.ShortChainedHashSet
int bak.pcj.set.IntChainedHashSet
long bak.pcj.set.LongChainedHashSet
float bak.pcj.set.FloatChainedHashSet
double bak.pcj.set.DoubleChainedHashSet

3.12  The TRangeSet classes

The TRangeSet classes provide range based implementations of TSortedSet.

The elements of a TRangeSets are represented by a number of ranges of consecutive values. A TRangeSet keeps track of the lower and upper bounds of the ranges. Internally a list of ranges are kept in sorted order for fast lookup (O(log(n), where n is the number of ranges).

See TSet for examples.

TRangeSet provides a number of specialized methods for adding and searching for elements. See the API documentation for details.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharRangeSet
byte bak.pcj.set.ByteRangeSet
short bak.pcj.set.ShortRangeSet
int bak.pcj.set.IntRangeSet
long bak.pcj.set.LongRangeSet
float n/a
double n/a

TRangeSet is only provided for byte, short, int, and long. They are the only types for which it makes sense to speak about ranges of consecutive values. It is also not quite clear that the LongRangeSet class should exist. With ranges of long values we can easily construct a set that has more than Integer.MAX_VALUE elements:

import bak.pcj.set.LongRangeSet;
...
LongRangeSet s = new LongRangeSet();
s.add(0L, Integer.MAX_VALUE+1L);
int size = s.size();
// What is the value of size at this point???

This problem is unresolved. I think that use of the LongRangeSet will be deprecated some time in the future.

Remarks

TRangeSet objects use very little memory when the number of ranges is small. It does not depend on the number of elements, only the number of ranges.

A typical use of the TRangeSet classes is to get rid of hard-coded range checks, as in:

if ((c >= 'a' && c <= 'z') ||
    (c >= 'A' && c <= 'Z') ||
    (c >= '0' && c <= '9')) {
    ...
}

The above code is much easier to read and maintain (and for enough ranges much more efficient) if it is rewritten to a question of set containment:

static CharRangeSet ALPHANUMERIC = new CharRangeSet(
    new CharRange('a', 'z'),
    new CharRange('A', 'Z'),
    new CharRange('0', '9'),
);
...
if (ALPHANUMERIC.contains(c)) {
    ...
}

As the benchmarks show, TRangeSet should not be used when additions or removals can split the ranges too much. Random elements gives very bad performance, and when elements cannot be packed into ranges, there is no advantage in memory use.

3.13  The TBitSet classes

The TBitSet classes provide a bit array based implementation of TSortedSet.

A TBitSet is represented by an array of bits. When a bit is set it means that the element with the same value as the position of that bit is an element of the set. For example, the least five elements of the set {1,5,7,8,9,...} are represented by the bit array shown below:

0 1 2 3 4 5 6 7 8 9 ...
0 1 0 0 0 1 0 1 1 1 ...

See TSet for examples.

Availability

Type Class
boolean n/a
char bak.pcj.set.CharBitSet
byte bak.pcj.set.ByteBitSet
short bak.pcj.set.ShortBitSet
int bak.pcj.set.IntBitSet
long n/a
float n/a
double n/a

TBitSet is provided for byte, char, short, and int. TBitSet depends on the set type having a meaningful enumeration. This is not the case for boolean, double, and float. TBitSet is not provided for long values since the underlying arrays would have to be too large.

Remarks

TBitSets generally perform good for small set elements and dense sets (elements that are close to each other).

Iteration over TBitSets is not particularly efficient - especially not when the set elements are sparse. The reason for this is that the object must search through an array of bits for the next set bit before it can return the next value of an iteration.

The memory used by TBitSets is proportional not to its number of elements but to the value of its largest element. Don't use bit sets with large values if memory is a problem.

When elements are added in ascending order, it requires many capacity increases. If the largest element is known it should be added first. Alternatively the initial capacity can be set to a number corresponding to the largest element via a constructor.

TBitSets cannot be used with negative elements.

3.14 The TKeySMap interfaces

The TKeySMap interfaces define maps from T keys to S values.

The primitive map interfaces are probably the interfaces that deviate the most from their JCF counterparts. The Map.Entry has been omitted because it restricts the implementations. Unless one wants to represent a map by a Map.Entry instance for each entry of the map, one is bound to create these extra instances when returning the entry set (see Map.entrySet()). It is very useful to be able to access all entries of a map, but it does not have to be through a set. PCJ introduces a special iterator called a TKeySMapIterator (see below). It has the same characteristics as a normal iterator, but is capable of