TCollection interfacesTList interfacesTStack interfacesTDeque interfacesTArrayList classesTArrayStack classesTArrayDeque classesTSet interfacesTSortedSet interfacesTOpenHashSet classesTChainedHashSet classesTRangeSet classesTBitSet classesTKeySMap interfacesTKeySOpenHashMap classesTkeySChainedHashMap classesTKeyMap interfacesTKeyOpenHashMap classesTKeyChainedHashMap classesObjectKeyTMap interfacesObjectKeyTOpenHashMap classesObjectKeyTChainedHashMap classesPrimitive Collections for Java (PCJ) started with the
need for a set data structure for character values. The set
was to be used for configuring a phonetic encoder class. The
encoder would be given a text in some language, and for each
character in the text, it would determine whether or not the
character was to be encoded. I started using the
HashSet
class of Java Collections Framework (JCF). Somewhere in the
encoder the code was something like this:
public class PhoneticEncoder {
java.util.Set acceptedChars;
public Encoder(java.util.Set ac) {
acceptedChars = ac;
}
String encode(CharSequence s) {
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (acceptedChars.contains(new Character(c))) {
...
}
}
}
}
// And somewhere outside the encoder...
java.util.Set s = new java.util.HashSet();
for (char c = 'a'; c <= 'z'; c++)
s.add(new Character(c));
for (char c = 'A'; c <= 'Z'; c++)
s.add(new Character(c));
for (char c = '0'; c <= '9'; c++)
s.add(new Character(c));
PhoneticEncoder enc = new PhoneticEncoder(s);
I usually don't like representing primitive values as
objects, especially not when a simple comparison requires my
program to create a new instance (unless I want to represent
texts as sequences of Character objects). It
may be an illusion taking JVM optimization into account, but
it does not seem efficient, and thinking about the data in
my programs as wrapped values seems counter-intuitive. How I
would like to think about my program is this:
public class PhoneticEncoder {
CharSet acceptedChars;
public Encoder(CharSet ac) {
acceptedChars = ac;
}
String encode(CharSequence s) {
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (acceptedChars.contains(c)) {
...
}
}
}
}
// And somewhere outside the encoder...
CharRangeSet s = new CharRangeSet();
s.add(new CharRange('a', 'z'));
s.add(new CharRange('A', 'Z'));
s.add(new CharRange('0', '9'));
PhoneticEncoder enc = new PhoneticEncoder(s);
So I began investigating other possibilities. Keeping
memory usage low was quite important for this particular
application. First, I thought of using a
BitSet.
I believed that if I subtracted (int)'A' from
each character value to be stored, I could do with a hundred
bits or so. A little research into the Unicode standard
meant that this particular approach had to be abandoned
because it would mean a memory usage of about 0xFFFF/8 =
16384 bytes for representing a set of perhaps a few hundred
characters (some of the Unicode characters have pretty high
numbers). Even worse: most of the characters fell into long
ranges of consecutive values, so there were actually no
reason for setting a bit for each and every character.
Instead I decided to represent a set of characters as a
sorted list of ranges, storing only the lower and upper
bounds of each range. This list could be searched very
quickly because it would rarely have more than 15 ranges and
testing a character for containment in a range would only
require comparing it to the lower and upper bounds. Using
binary search on 15 ranges it would in the worst case
require 2×log2(16) = 2×4 = 8
character comparisons to find a character in the set. That
was acceptable for my purposes. What was not acceptable was
that my changing the representation of character sets meant
that the API of the phonetic encoder would have to depend
upon a very specialized binary searching range based
character set class, which in turn had to be packaged with
the encoder. Having done this trick for the (n+1)'th
time and having noted for the n'th time the stupidity
of doing it, I decided to make a generalized framework for
primitive types. The first n-1 times I had resisted
the temptation of creating yet another standard on which all
other APIs should depend. This time I fell short, and PCJ is
the result.
Søren Bak
When you examine the class hierarchy and the methods of PCJ, you will find that the interfaces and abstract classes (not the inner workings of the implementing classes) bear a close resemblance to those of the Java Collections Framework. This is intended. I have designed PCJ with the following in mind: Programmers are used to JCF and a close resemblance will increase the learning rate; and the complexity of adapters between JFC and PCJ should be kept low. There are plenty of other opportunities to introduce bugs. Besides that, I think that Joshua Bloch's JCF has a good interface design - worth basing other collections libraries on.
You will find some rather pathological classes in PCJ.
For example, there is a set class for boolean
values called BooleanDirectSet. The value
of such a class is questionable and I felt quite silly
coding it (take a look at the source code, it seems like a
lot of work to represent the four possible states: {}, {F},
{T}, and {F,T}). Another example of such a class is the hash
map from float keys to short
values (that one is generated). However, I think that a
primitive collections API should provide a complete set of
classes instead of just those that the author finds useful
at the time. It should be completely symmetric over all
primitive types.
TRangeSet classes,
which are more efficient in memory usage.
CharSet
as an argument to a method.
The focus of v1.2 was to implement sorted set operations.
Changes:
SortedSet to primitive sorted sets
SortedSet
TKeyOpenHashMap
TKeyChainedHashMap
This was a bugfix release:
TArrayDeque.getLast() raised under
certain circumstances an IndexOutOfBoundsException with
singleton deques.The focus of v1.1 was to complete a branch of the map interface hierarchy and make the API interact better with client Java code.
Changes:
The classes and interfaces of PCJ have been built symmetrically around
the primitive types: boolean, char, byte, short,
int, long, double, and float.
When I refer to a class, it can either be a type independent reference or a
type specific reference. For type independent references, T and
S are used to stand for primitive types. When I write
TCollection it
is therefore a general (type independent) reference to the classes
BooleanCollection,
CharCollection,
ByteCollection,
ShortCollection,
IntCollection,
LongCollection,
FloatCollection, and
DoubleCollection.
When for some reason it makes no sense to provide a class for some primitive type, the symmetry is allowed to fail. This is only the case for classes - not for the general interfaces. There are generally made notes about such cases throughout this guide.
bak.pcj.* packagesbak.pcjProvides general primitive collection classes. All other classes in the API depends on those classes.
bak.pcj.adapterProvides adapter classes between PCJ and JCF. The adapters provide a convenient interface between PCJ-based and JCF-based classes.
bak.pcj.benchmarkProvides classes for benchmarking PCJ and parts of JCF. When new classes are implemented they can be compared to other implementations using standard tasks and data sets.
bak.pcj.hashProvides classes for creating hash functions. Besides providing default hash functions the package also provides interfaces that can be implemented to change the hash functions of the hashing collection classes.
bak.pcj.listProvides list collection interfaces and classes. Lists extend the general collection interfaces with sequential access to collection elements.
bak.pcj.mapProvides map interfaces and classes. The implementations are currently based on hashing.
bak.pcj.setProvides set interfaces and classes. Sets extend the general collection interfaces with set behavior. There are set classes providing different storage strategies for different situations.
The core of PCJ is a small set of interrelated interfaces. All collection classes derive in some way or other from these interfaces. They are shown shown in the UML class diagram below:
The main interface is TCollection. It provides basic collection
behaviour like adding and removing elements.
The basic interfaces can be found in the package
bak.pcj.
There are two major branches of
sub-interfaces. The one is the TSet branch which provides
set behaviour for collections. All set classes can be found in the
bak.pcj.set
package.
The other is the TList branch
which provides list behaviour.
All list classes can be found in the
bak.pcj.list
package.
There are a number of alternative implementations of these interfaces. Each implementation is useful in specific situations, and it requires a little knowledge about the classes to choose the right one. The classes implementing the interface hierarchy are shown in the UML class diagram below without method details.
Note that each of the main interfaces have an abstract class directly below in the hierarchy. Unless you are going to implement new collection classes, the abstract classes can safely be ignored. The implemented interfaces are important, not the super-classes. If you are going to implement new collection classes, the abstract classes provide base implementations of each of the main interfaces.
As mentioned earlier T is not
{boolean, char, byte,
short, int, long,
float, double} for all classes. The table below shows the
type independent classes and their implementations for each
of the primitive types. You should examine the table and the
corresponding APIs to get an idea of the class
hierarchy.
| Class or interface | Description | Availability | |||||||
|---|---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
||
TCollection |
General collection interface | × | × | × | × | × | × | × | × |
TSet |
General set interface | × | × | × | × | × | × | × | × |
TSortedSet |
Sorted set interface | × | × | × | × | × | × | × | × |
TOpenHashSet |
Hash set implementation using open addressing | × | × | × | × | × | × | × | |
TChainedHashSet |
Hash set implementation using chained addressing | × | × | × | × | × | × | × | |
TRangeSet |
Set class based on value ranges | × | × | × | × | × | |||
TBitSet |
Set class based on bit arrays | × | × | × | × | ||||
TDirectSet |
Boolean set | × | |||||||
TList |
General list interface | × | × | × | × | × | × | × | × |
TStack |
General stack interface | × | × | × | × | × | × | × | × |
TDeque |
General deque interface | × | × | × | × | × | × | × | × |
TArrayList |
Array list class | × | × | × | × | × | × | × | × |
TArrayStack |
Array stack class | × | × | × | × | × | × | × | × |
TArrayDeque |
Array deque class | × | × | × | × | × | × | × | × |
As in JCF, map classes are not collections, but interact
closely with collections. All primitive map classes
implement either the
TKeySMap
interface, whose
instances map T keys to S values,
the
TKeyMap interface,
whose
instances map T values to
java.lang.Objects,
or the
ObjectKeyTMap interface,
whose
instances map java.lang.Objects to T values.
The three interfaces are not
directly related but share some design and logic. They are
shown in the UML class diagram below:
PCJ provides two implementation variants of both the
TKeySMap interfaces, the TKeyMap interfaces, and the ObjectKeyTMap interfaces; namely
TKeySOpenHashMap/TKeySChainedHashMap,
TKeyOpenHashMap/TKeyChainedHashMap,
and
ObjectKeyTOpenHashMap/ObjectKeyTChainedHashMap
as shown in the UML class diagram below:
To provide a bridge between PCJ based and JCF based APIs
there are adapters for all of the main interfaces:
collections, iterators, sets, lists, list iterators, and
maps. They can be found in the package
bak.pcj.adapter.
Alternatively, use the class
bak.pcj.Adapter,
which provides factory methods for all adapter classes. The table below
lists all adapter classes.
| Class or interface | Description | Availability | |||||||
|---|---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
||
TCollectionToCollectionAdapter |
Adapters from PCJ collections to JCF collections | × | × | × | × | × | × | × | × |
TIteratorToIteratorAdapter |
Adapters from PCJ iterators to JCF iterators | × | × | × | × | × | × | × | × |
TListToListAdapter |
Adapters from PCJ lists to JCF lists | × | × | × | × | × | × | × | × |
TListIteratorToListIteratorAdapter |
Adapters from PCJ list iterators to JCF list iterators | × | × | × | × | × | × | × | × |
TSetToSetAdapter |
Adapters from PCJ sets to JCF sets | × | × | × | × | × | × | × | × |
TSortedSetToSortedSetAdapter |
Adapters from PCJ sorted sets to JCF sorted sets | × | × | × | × | × | × | × | × |
TKeyMapToMapAdapter |
Adapters from PCJ maps to JCF maps | × | × | × | × | × | × | × | × |
ObjectKeyTMapToMapAdapter |
Adapters from PCJ maps to JCF maps | × | × | × | × | × | × | × | × |
TKeySMapToMapAdapter |
Adapters from PCJ maps to JCF maps |
All 64 combinations, e.g.
IntKeyCharMapToMapAdapter
|
|||||||
CollectionToTCollectionAdapter |
Adapters from JCF collections to PCJ collections | × | × | × | × | × | × | × | × |
IteratorToTIteratorAdapter |
Adapters from JCF iterators to PCJ iterators | × | × | × | × | × | × | × | × |
ListToTListAdapter |
Adapters from JCF lists to PCJ lists | × | × | × | × | × | × | × | × |
ListIteratorToTListIteratorAdapter |
Adapters from JCF list iterators to PCJ list iterators | × | × | × | × | × | × | × | × |
SetToTSetAdapter |
Adapters from JCF sets to PCJ sets | × | × | × | × | × | × | × | × |
SortedSetToTSortedSetAdapter |
Adapters from JCF sorted sets to PCJ sorted sets | × | × | × | × | × | × | × | × |
MapToTKeyMapAdapter |
Adapters from JCF maps to PCJ maps | × | × | × | × | × | × | × | × |
MapToObjectKeyTMapAdapter |
Adapters from JCF maps to PCJ maps | × | × | × | × | × | × | × | × |
MapToTKeySMapAdapter |
Adapters from JCF maps to PCJ maps |
All 64 combinations, e.g.
MapToIntKeyCharMapAdapter
|
|||||||
Many of the collection and map classes are table based. Elements and mappings are stored in an underlying table (a Java array). The tables underlying table based collections have to grow as elements are added and possibly shrink as they are removed. That means allocating a new longer table and copy the existing elements into that table. It is easy to see that tables should not be expanded every time an element is added because that would result in very poor performance for additions. Instead the tables are expanded to a length well over the currently needed length. We call that length the capacity of the collection. Increasing the capacity of a collection is an expensive operation because of the copying involved (and re-hashing for hashtable-based maps), so the performance of add operations depend very much on how often the capacity is increased.
The table based collections and maps in PCJ supports two strategies for increasing the capacity:
Each strategy can be specified by selection of constructor. For example, we can create an int array list that grows with a fixed number of elements with the following code:
IntList list = new IntArrayList(100, 10);
The first argument in the constructor is the initial capacity of the list. The second is the number of elements with which to increase the capacity as elements are added. So in this case the first 100 elements are added without any capacity increase. When the 101st element is added the capacity is increased to 110, when the 111th element is added the capacity is increased to 120, and so on. Similarly, to create a list that grows with 20% size for each capacity increase, we would write:
IntList list = new IntArrayList(100, 0.2);
When the 101st element is added, the capacity is increased to 1.2×100 = 120. When the 121st element is added, the capacity is increased to 1.2×120 = 144, and so forth.
The strategy that should be employed depends on the situation. If you know how many elements are going to be in a collection, it is generally a good idea to give the collection that capacity from the beginning so as to avoid capacity increases. Very often we cannot know the needed capacity, and in those cases we should think about how collections grow. A rule of thumb is that if a collection rarely grows or grows slowly its capacity should increase by a fixed amount. In that way we avoid wasting too much memory on an exponential growth rate. On the other hand, if a collection grows fast its capacity should grow exponentially, because that is the best way to minimize the number of capacity increases. However, with a growth factor of 1.0 (doubling the capacity) we potentially allocate approximately twice as much memory as needed.
A number of the classes of PCJ are based on hash tables. As explained below, there are two main hashing variants in use: open addressing with double hashing and chaining.
The open hash collection and map variants (e.g.
TOpenHashSet
and
TKeySOpenHashMap)
are based on a hash table using open addresing with double hashing.
Double hashing is a method of resolving collisions in hash tables
when more than one key hash to the same address in the table.
The secondary hash function used is always:
Primes
is used for selecting prime numbers.
The chained hash table variants (e.g.
TChainedHashSet
and
TKeySChainedHashMap)
are based on hash tables with linked lists. Instead of
searching for an empty slot in the hash table, each slot
contains a linked list (a chain) of those entries whose keys hash to
that slot. In that way collision resolution is completely
avoided, but we have to search linked lists instead.
Both the OpenHash and ChainedHash variants
begin to perform bad when many keys have been hashed to the same slot
in the underlying hash table. To prevent that, hash tables are equipped
with a maximum load factor, which is some proportion of the its size.
When that proportion of the hash table is used, its length is increased
(see the discussion on growth above).
The default load factor is 0.75 for both variants, and a custom load
factor may be specified with the constructors.
The plots below shows how the load of a hash based set affects performance. For open addressing the time to look up a key increases steadily until it suddenly explodes around 0.7. For chained hashing the mean time to look up a key will increase steadily as the lists grow longer. The plots were created for hash sets, but are very similar for hash maps.
(The horizontal lines on the plots are due to the resolution of the timer on my computer, which is quite clearly somewhere around 10ms.)
If nothing else is specified with the constructors for the hash table based
collections and maps, the default hash functions of
bak.pcj.hash
are used. It is possible that the default functions give poor performance
results for specific data sets, and in such cases a new hash function may be specified.
Assume that the hash table length M is 17 and the default hash function h(n) = n
(which it is for int values). If we add 17 it goes into slot 0 since 17 mod 17 = 0.
If we add 34 it also goes into slot 0 since 34 mod 17 = 0, thus activating the use of the secondary
hash function in the case of open addressing with double hashing. With chaining we get longer lists.
This example could benefit from an alternative hash function, e.g.
h(n) = (n div 17) + n. That would make n = 17 go into
slot ((17 div 17) + 17) mod 17 = 18 mod 17 = 1, and similarly 34 would go into slot
((34 div 17) + 17) mod 17 = 19 mod 17 = 2, thus eliminating the use of the secondary hash
function (or chains).
The example below shows how to pass a new hash function to a hash set:
import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.hash.IntHashFunction;
...
IntSet s = new IntOpenHashSet(
new IntHashFunction() {
public int hash(int v) {
return v/17 + v;
}
}
);
In this section the interfaces and classes are described in more detail and examples are given. The design is also discussed.
TCollection interfacesThe TCollection interfaces define the basic collection
behaviour. It is adapted from the JCF
Collection
interface. It provides the basic methods for modifying and querying collections.
Although TCollection is the general ancestor
of all collection classes, each method may have a different
semantics depending on the collection type. For example, the
method add(T) adds an element to the end of a
TList whereas it ensures that a
TSet contains an element.
If you only use PCJ and do not extend it, you
will only use TCollection as a polymorphic type
for collections.
See TSet for examples.
The two interfaces are identical in signature.
| Type | Interface |
|---|---|
boolean |
bak.pcj.BooleanCollection |
char |
bak.pcj.CharCollection |
byte |
bak.pcj.ByteCollection |
short |
bak.pcj.ShortCollection |
int |
bak.pcj.IntCollection |
long |
bak.pcj.LongCollection |
float |
bak.pcj.FloatCollection |
double |
bak.pcj.DoubleCollection |
TList interfaces
The TList interfaces define an extension to the
TCollection
interfaces for sequential lists. As TCollection
this is basically an adpation of the JCF
List
interface.
Example:
import bak.pcj.list.IntList;
import bak.pcj.list.IntArrayList;
import bak.pcj.list.IntListIterator;
public class ListExample {
public static void main(String[] args) {
IntList s = new IntArrayList();
// Adding elements to the end
s.add(1); // s=[1]
s.add(2); // s=[1,2]
// Inserting elements at positions
s.add(2, 3); // s=[1,2,3]
s.add(0, 0); // s=[0,1,2,3]
s.add(1, 10); // s=[0,10,1,2,3]
// Removing elements at positions
s.removeElementAt(1); // s=[0,1,2,3]
// Removing elements by value
s.remove(2); // s=[0,1,3]
// Updating elements
s.set(2, 2); // s=[0,1,2]
s.set(0, -1); // s=[-1,1,2]
// Searching for elements
int v;
v = s.indexOf(1); // v=1
v = s.indexOf(2); // v=2
v = s.indexOf(10); // v=-1
// Adding collections at positions
IntArrayList t = new IntArrayList();
t.add(10); // t=[10]
t.add(20); // t=[10,20]
t.add(30); // t=[10,20,30]
s.addAll(1, t); // s=[-1,10,20,30,1,2]
}
}
Other examples:
ListExample.java
Reverse.java
Sorting.java
| Type | Interface |
|---|---|
boolean |
bak.pcj.list.BooleanList |
char |
bak.pcj.list.CharList |
byte |
bak.pcj.list.ByteList |
short |
bak.pcj.list.ShortList |
int |
bak.pcj.list.IntList |
long |
bak.pcj.list.LongList |
float |
bak.pcj.list.FloatList |
double |
bak.pcj.list.DoubleList |
The TList interfaces do not provide methods
to retrieve a sub-list of a list. It is relatively complex
to implement sub-list views, so this feature (which is
implemented in JCF lists) is omitted for the time being.
A deviation from JCF in method names is that of removeElementAt().
JCF lists has a remove(int) and remove(Object) method
for removing elements by index and by equality, respectively. Since the objects
of PCJ lists can be of type int, there would be a clash between
the two method signatures if the JCF naming should be used. Therefore PCJ lists
use removeElementAt(int) to remove elements by index.
TStack interfaces
The TStack interfaces extend the
TList
interfaces with the
typical stack operations: push(), pop(), and
peek(). Stacks typically only grow and shrink at the end.
Note that all TLists in principle can be used as
stacks. We can easily replace pop() with
removeElementAt(size()-1) and push(v) with
add(v).
The stack interfaces does not guarantee that stack operations are more
efficient than their list/collection counterparts, but it makes the
intention clear.
Example:
import bak.pcj.list.IntStack;
import bak.pcj.list.IntArrayStack;
public class StackExample {
public static void main(String[] args) {
IntStack s = new IntArrayStack();
// Pushing elements
s.push(1); // s=[1]
s.push(2); // s=[1,2,3]
s.push(3); // s=[1]
// Inspecting
int v;
v = s.peek(); // v=3, s=[1,2,3]
// Popping elements
v = s.pop(); // v=3, s=[1,2]
v = s.pop(); // v=0, s=[1]
}
}
Other examples:
StackExample.java
Calculator.java
| Type | Interface |
|---|---|
boolean |
bak.pcj.list.BooleanStack |
char |
bak.pcj.list.CharStack |
byte |
bak.pcj.list.ByteStack |
short |
bak.pcj.list.ShortStack |
int |
bak.pcj.list.IntStack |
long |
bak.pcj.list.LongStack |
float |
bak.pcj.list.FloatStack |
double |
bak.pcj.list.DoubleStack |
It may be questioned whether a stack is a special kind of list. Interfaces are often used for specifying the minimum capabilities of an object, e.g. an argument to a method. In those cases a stack ought to be just that: a stack, not a list, because we probably only need to push, pop, and peek the object.
On the other hand, stacks are sequential information structures.
It should be possible to iterate over the elements. It may also
be argued that they at least should extend the TCollection
interface to be interesting in a collections API. And extending
the TCollection interface they might as well extend the
TList interface.
In JCF stacks are merely special implementations of
lists. That leaves us with a problem when an operations
needs a stack to work efficiently: We can either 1) make the
argument a Stack which is a special implementation
of List or we can 2) make the argument a List
and hope that the caller passes an object that has fast
addition and removal from the end of the list.
// Solution 1:
void doSomething(Stack s) {
// Perform stack operations on s here using push(Object) and pop()
...
}
...
Stack s = new Stack();
doSomething(s);
// Solution 2:
void doSomething(List s) {
// Perform stack operations on s using add(Object) and remove(int)
...
}
...
List s = new Stack();
doSomething(s);
Solution 1 above is not acceptable, because the
doSomething() method does not care about the
implementation of the stack argument. It puts a restriction
on the caller that is not needed. Solution 2 is not
acceptable because any list implementation can be passed to
doSomething() may render it inefficient.
I think that a better way is to make Stack
an interface and to provide implementations with fast
addition and removal at the end of lists.
// Solution with Stack as an interface:
void doSomething(Stack s) {
// Perform stack operations on s here using push(Object) and pop()
...
}
...
Stack s = new MyStackImplementation();
doSomething(s);
That is why TStack is an interface in PCJ.
TDeque interfaces
The TDeque interfaces extend the
TList
interfaces with the typical
typical operations of a deque:
addFirst(), addLast(),
getFirst(), getLast(),
removeFirst(), and removeLast().
Deques typically only grow and shrink at the ends.
Note that all TLists in principle can be used as
stacks because of the positional access to elements.
We can easily replace getFirst() with get(0),
removeLast() with removeElementAt(size()-1), and so forth.
The deque interfaces does not guarantee that deque operations are more
efficient than their list/collection counterparts, but it makes the
intention clear.
Example:
import bak.pcj.IntIterator;
import bak.pcj.list.IntDeque;
import bak.pcj.list.IntArrayDeque;
public class DequeExample {
public static void main(String[] args) {
IntDeque q = new IntArrayDeque();
// Adding elements
q.addFirst(1); // q=[1]
q.addFirst(2); // q=[2,1]
q.addFirst(3); // q=[3,2,1]
q.addLast(0); // q=[3,2,1,0]
// Retrieving elements
int v;
v = q.getFirst(); // v=3, q=[3,2,1,0]
v = q.getLast(); // v=0, q=[3,2,1,0]
// Removing elements
v = q.removeFirst(); // v=3, q=[2,1,0]
v = q.removeLast(); // v=0, q=[2,1]
}
}
Other examples:
DequeExample.java
DequeExample1.java
| Type | Interface |
|---|---|
boolean |
bak.pcj.list.BooleanDeque |
char |
bak.pcj.list.CharDeque |
byte |
bak.pcj.list.ByteDeque |
short |
bak.pcj.list.ShortDeque |
int |
bak.pcj.list.IntDeque |
long |
bak.pcj.list.LongDeque |
float |
bak.pcj.list.FloatDeque |
double |
bak.pcj.list.DoubleDeque |
TStack
also applies to TDeque.
TArrayList classes
The TArrayList classes provide array based implementations
of the
TList interfaces. The data in the list
is represented as a T[] that grows with the size of the list.
See TList for examples.
| Type | Class |
|---|---|
boolean |
bak.pcj.list.BooleanArrayList |
char |
bak.pcj.list.CharArrayList |
byte |
bak.pcj.list.ByteArrayList |
short |
bak.pcj.list.ShortArrayList |
int |
bak.pcj.list.IntArrayList |
long |
bak.pcj.list.LongArrayList |
float |
bak.pcj.list.FloatArrayList |
double |
bak.pcj.list.DoubleArrayList |
Currently PCJ provides no alternatives to TArrayList other
than TArrayStack and
TArrayDeque
which are specialized lists - they should not be used as replacements
for lists.
TArrayLists perform best when they grow or shrink at the end
- like stacks.
Since the underlying data structure of a TArrayList is an array,
removal and insertion (other than at the end of the list) is not efficient
for large list sizes.
IntArrayList s = new IntArrayList();
for (int i = 0; i < 1000000; i++)
s.add(i);
s.removeElementAt(0); This is not efficient!!!
s.add(0, 100); This is not efficient!!!
If you have to modify a long list at its beginning, use a
TDeque instead of a list.
TArrayStack classes
TArrayStack is an array based implementation of TStack.
It is implemented as a simple extension to
TArrayList.
See TStack for examples.
| Type | Class |
|---|---|
boolean |
bak.pcj.list.BooleanArrayStack |
char |
bak.pcj.list.CharArrayStack |
byte |
bak.pcj.list.ByteArrayStack |
short |
bak.pcj.list.ShortArrayStack |
int |
bak.pcj.list.IntArrayStack |
long |
bak.pcj.list.LongArrayStack |
float |
bak.pcj.list.FloatArrayStack |
double |
bak.pcj.list.DoubleArrayStack |
TArrayDeque classes
The TArrayDeque classes provide array based implementations
of TDeque. Internally the elements
are stored in a T[] the same size of or longer than the
current number of elements.
| 0 | 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | - | - | - |
| ^ | ^ |
We operate with a pointer to the first element in the deque and a pointer to the last element (or 1 beyond the last). Insertions performed performed at the end moves the last-pointer forward, and insertions performed at the beginning moves the first-pointer backward. Likewise, removals from the end moves the last-pointer backward, and removals from the beginning move the first-pointer forward. Since the deque is restricted by a capacity all pointer movement is wrapped around the ends of the array (indices are added modulo the capacity). See the examples below:
addLast(4):
| 0 | 1 | 2 | 3 | 4 | 5 |
| 1 | 2 | 3 | 4 | - | - |
| ^ | ^ |
removeFirst():
| 0 | 1 | 2 | 3 | 4 | 5 |
| - | 2 | 3 | 4 | - | - |
| ^ | ^ |
addLast(5):
| 0 | 1 | 2 | 3 | 4 | 5 |
| - | 2 | 3 | 4 | 5 | - |
| ^ | ^ |
removeFirst():
| 0 | 1 | 2 | 3 | 4 | 5 |
| - | - | 3 | 4 | 5 | - |
| ^ | ^ |
addLast(6):
| 0 | 1 | 2 | 3 | 4 | 5 |
| - | - | 3 | 4 | 5 | 6 |
| ^ | ^ |
addLast(7):
| 0 | 1 | 2 | 3 | 4 | 5 |
| 7 | - | 3 | 4 | 5 | 6 |
| ^ | ^ |
Examples: see TDeque for examples.
| Type | Class |
|---|---|
boolean |
bak.pcj.list.BooleanArrayDeque |
char |
bak.pcj.list.CharArrayDeque |
byte |
bak.pcj.list.ByteArrayDeque |
short |
bak.pcj.list.ShortArrayDeque |
int |
bak.pcj.list.IntArrayDeque |
long |
bak.pcj.list.LongArrayDeque |
float |
bak.pcj.list.FloatArrayDeque |
double |
bak.pcj.list.DoubleArrayDeque |
TSet interfaces
The TSet interfaces define the set
behaviour for primitive collections. It is adapted from the JCF
Set
interface and exists for all primitive types. In PCJ TSet
is only used to tag collections - it does not extend the collection
interface with new methods.
Examples:
import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.IntIterator;
public class SetExample {
public static void main(String[] args) {
IntSet s = new IntOpenHashSet();
// Adding elements
s.add(1); // s={1}
s.add(2); // s={1,2}
s.add(3); // s={1,2,3}
s.add(4); // s={1,2,3,4}
s.add(5); // s={1,2,3,4,5}
s.add(2); // s={1,2,3,4,5} (not changed)
s.add(4); // s={1,2,3,4,5} (not changed)
// Removing elements
s.remove(2); // s={1,3,4,5}
s.remove(5); // s={1,3,4}
// Clearing
s.clear(); // s={}
boolean c;
c = s.isEmpty(); // c=true
// Testing containment of elements
s.add(1);
s.add(2);
s.add(3); // s={1,2,3}
c = s.contains(1); // c=true, s={1,2,3}
c = s.contains(3); // c=true, s={1,2,3}
c = s.contains(5); // c=false, s={1,2,3}
// Testing containment of collections
IntSet t;
t = new IntOpenHashSet();
t.add(1); // t={1}, s={1,2,3}
c = s.containsAll(t); // c=true
t.add(0); // t={0,1}, s={1,2,3}
c = s.containsAll(t); // c=false
// Adding collections
s.addAll(t); // t={0,1}, s={0,1,2,3}
// Retaining elements of collections
t = new IntOpenHashSet();
t.add(2);
t.add(3);
t.add(4);
t.add(5); // t={2,3,4,5}, s={0,1,2,3}
s.retainAll(t); // t={2,3,4,5}, s={2,3}
// Removing whole collections
t = new IntOpenHashSet();
t.add(3);
t.add(4);
s.removeAll(t); // t={3,4}, s={2}
// Finding the size
int v;
v = s.size(); // v=1, s={2}
s.add(1);
v = s.size(); // v=2, s={1,2}
// Converting to an array
int[] a = s.toArray(); // a=int[]{1,2}, s={1,2}
// Iterating over elements
for (int i = 0; i < 10; i++)
s.add(i); // s={0,1,2,3,4,5,6,7,8,9}
IntIterator i = s.iterator();
while (i.hasNext()) {
int e = i.next();
if ((e % 2) == 1)
i.remove();
} // s={0,2,4,6,8}
// Comparing
t = new IntOpenHashSet(s); // s={0,2,4,6,8}, t={0,2,4,6,8}
c = s.equals(t); // c=true
t.remove(2); // s={0,2,4,6,8}, t={0,4,6,8}
c = s.equals(t); // c=false
}
}
Other examples:
| Type | Interface |
|---|---|
boolean |
bak.pcj.set.BooleanSet |
char |
bak.pcj.set.CharSet |
byte |
bak.pcj.set.ByteSet |
short |
bak.pcj.set.ShortSet |
int |
bak.pcj.set.IntSet |
long |
bak.pcj.set.LongSet |
float |
bak.pcj.set.FloatSet |
double |
bak.pcj.set.DoubleSet |
TSortedSet interfaces
The TSortedSet interfaces define the sorted set
behaviour for primitive collections. It extends
TSet with some sub-set operations, retrieval of first
and last elements, and a contract to iterate in the natural order of the
elements. It is adapted from the JCF
SortedSet
interface and exists for all primitive types.
| Type | Interface |
|---|---|
boolean |
bak.pcj.set.BooleanSortedSet |
char |
bak.pcj.set.CharSortedSet |
byte |
bak.pcj.set.ByteSortedSet |
short |
bak.pcj.set.ShortSortedSet |
int |
bak.pcj.set.IntSortedSet |
long |
bak.pcj.set.LongSortedSet |
float |
bak.pcj.set.FloatSortedSet |
double |
bak.pcj.set.DoubleSortedSet |
TOpenHashSet classes
TOpenHashSet provide a hash table based implementation of
the TSet interface with
open addressing.
See TSet for examples.
| Type | Class |
|---|---|
boolean |
n/a |
char |
bak.pcj.set.CharOpenHashSet |
byte |
bak.pcj.set.ByteOpenHashSet |
short |
bak.pcj.set.ShortOpenHashSet |
int |
bak.pcj.set.IntOpenHashSet |
long |
bak.pcj.set.LongOpenHashSet |
float |
bak.pcj.set.FloatOpenHashSet |
double |
bak.pcj.set.DoubleOpenHashSet |
TChainedHashSet classes
TOpenHashSet provide a hash table based implementation of
the TSet interface with
chained hashing.
See TSet for examples.
| Type | Class |
|---|---|
boolean |
n/a |
char |
bak.pcj.set.CharChainedHashSet |
byte |
bak.pcj.set.ByteChainedHashSet |
short |
bak.pcj.set.ShortChainedHashSet |
int |
bak.pcj.set.IntChainedHashSet |
long |
bak.pcj.set.LongChainedHashSet |
float |
bak.pcj.set.FloatChainedHashSet |
double |
bak.pcj.set.DoubleChainedHashSet |
TRangeSet classes
The TRangeSet classes provide range based implementations of
TSortedSet.
The elements of a TRangeSets are represented by a number of
ranges of consecutive values. A TRangeSet keeps track of the lower
and upper bounds of the ranges. Internally a list of ranges are kept in sorted
order for fast lookup (O(log(n), where n is the number of
ranges).
See TSet for examples.
TRangeSet provides a number of specialized methods for
adding and searching for elements. See the
API documentation
for details.
| Type | Class |
|---|---|
boolean |
n/a |
char |
bak.pcj.set.CharRangeSet |
byte |
bak.pcj.set.ByteRangeSet |
short |
bak.pcj.set.ShortRangeSet |
int |
bak.pcj.set.IntRangeSet |
long |
bak.pcj.set.LongRangeSet |
float |
n/a |
double |
n/a |
TRangeSet is only provided for byte, short,
int, and long. They are the only types for which
it makes sense to speak about ranges of consecutive values. It is also
not quite clear that the
LongRangeSet
class should exist. With ranges of long values we can easily construct a
set that has more than Integer.MAX_VALUE elements:
import bak.pcj.set.LongRangeSet;
...
LongRangeSet s = new LongRangeSet();
s.add(0L, Integer.MAX_VALUE+1L);
int size = s.size();
// What is the value of size at this point???
This problem is unresolved. I think that use of the LongRangeSet
will be deprecated some time in the future.
TRangeSet objects use very little memory when the
number of ranges is small. It does not depend on the number of
elements, only the number of ranges.
A typical use of the TRangeSet classes is to
get rid of hard-coded range checks, as in:
if ((c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9')) {
...
}
The above code is much easier to read and maintain (and for enough ranges much more efficient) if it is rewritten to a question of set containment:
static CharRangeSet ALPHANUMERIC = new CharRangeSet(
new CharRange('a', 'z'),
new CharRange('A', 'Z'),
new CharRange('0', '9'),
);
...
if (ALPHANUMERIC.contains(c)) {
...
}
As the benchmarks show, TRangeSet should not be used
when additions or removals can split the ranges too much. Random
elements gives very bad performance, and when elements cannot
be packed into ranges, there is no advantage in memory use.
TBitSet classes
The TBitSet classes provide a bit array based implementation
of TSortedSet.
A TBitSet is represented by an array of bits. When a
bit is set it means that the element with the same value as the position
of that bit is an element of the set. For example,
the least five elements of the set {1,5,7,8,9,...} are represented by
the bit array shown below:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | ... |
See TSet for examples.
| Type | Class |
|---|---|
boolean |
n/a |
char |
bak.pcj.set.CharBitSet |
byte |
bak.pcj.set.ByteBitSet |
short |
bak.pcj.set.ShortBitSet |
int |
bak.pcj.set.IntBitSet |
long |
n/a |
float |
n/a |
double |
n/a |
TBitSet is provided for byte, char, short, and int.
TBitSet depends on the set type having a meaningful enumeration.
This is not the case for boolean, double, and float.
TBitSet is not provided for long values since the underlying
arrays would have to be too large.
TBitSets generally perform good for small set elements and
dense sets (elements that are close to each other).
Iteration over TBitSets is not particularly efficient -
especially not when the set elements are sparse. The reason for this is
that the object must search through an array of bits for the next set
bit before it can return the next value of an iteration.
The memory used by TBitSets is proportional not to its number
of elements but to the value of its largest element. Don't use bit sets
with large values if memory is a problem.
When elements are added in ascending order, it requires many capacity increases. If the largest element is known it should be added first. Alternatively the initial capacity can be set to a number corresponding to the largest element via a constructor.
TBitSets cannot be used with negative elements.
TKeySMap interfaces
The TKeySMap interfaces define maps from T keys
to S values.
The primitive map interfaces are probably the interfaces that deviate the most
from their JCF counterparts. The
Map.Entry
has been omitted because it restricts
the implementations. Unless one wants to represent a map by a
Map.Entry instance for each entry of the map, one is bound to
create these extra instances when returning the entry set (see
Map.entrySet()). It is very useful to be able
to access all entries of a map, but it does not have to be through a set. PCJ
introduces a special iterator called a TKeySMapIterator (see
below).
It has the
same characteristics as a normal iterator, but is capable of