TCollection
interfacesTList
interfacesTStack
interfacesTDeque
interfacesTArrayList
classesTArrayStack
classesTArrayDeque
classesTSet
interfacesTSortedSet
interfacesTOpenHashSet
classesTChainedHashSet
classesTRangeSet
classesTBitSet
classesTKeySMap
interfacesTKeySOpenHashMap
classesTkeySChainedHashMap
classesTKeyMap
interfacesTKeyOpenHashMap
classesTKeyChainedHashMap
classesObjectKeyTMap
interfacesObjectKeyTOpenHashMap
classesObjectKeyTChainedHashMap
classesPrimitive Collections for Java (PCJ) started with the
need for a set data structure for character values. The set
was to be used for configuring a phonetic encoder class. The
encoder would be given a text in some language, and for each
character in the text, it would determine whether or not the
character was to be encoded. I started using the
HashSet
class of Java Collections Framework (JCF). Somewhere in the
encoder the code was something like this:
public class PhoneticEncoder {
java.util.Set acceptedChars;
public Encoder(java.util.Set ac) {
acceptedChars = ac;
}
String encode(CharSequence s) {
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (acceptedChars.contains(new Character(c))) {
...
}
}
}
}
// And somewhere outside the encoder...
java.util.Set s = new java.util.HashSet();
for (char c = 'a'; c <= 'z'; c++)
s.add(new Character(c));
for (char c = 'A'; c <= 'Z'; c++)
s.add(new Character(c));
for (char c = '0'; c <= '9'; c++)
s.add(new Character(c));
PhoneticEncoder enc = new PhoneticEncoder(s);
I usually don't like representing primitive values as
objects, especially not when a simple comparison requires my
program to create a new instance (unless I want to represent
texts as sequences of Character
objects). It
may be an illusion taking JVM optimization into account, but
it does not seem efficient, and thinking about the data in
my programs as wrapped values seems counter-intuitive. How I
would like to think about my program is this:
public class PhoneticEncoder {
CharSet acceptedChars;
public Encoder(CharSet ac) {
acceptedChars = ac;
}
String encode(CharSequence s) {
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (acceptedChars.contains(c)) {
...
}
}
}
}
// And somewhere outside the encoder...
CharRangeSet s = new CharRangeSet();
s.add(new CharRange('a', 'z'));
s.add(new CharRange('A', 'Z'));
s.add(new CharRange('0', '9'));
PhoneticEncoder enc = new PhoneticEncoder(s);
So I began investigating other possibilities. Keeping
memory usage low was quite important for this particular
application. First, I thought of using a
BitSet
.
I believed that if I subtracted (int)'A'
from
each character value to be stored, I could do with a hundred
bits or so. A little research into the Unicode standard
meant that this particular approach had to be abandoned
because it would mean a memory usage of about 0xFFFF/8 =
16384 bytes for representing a set of perhaps a few hundred
characters (some of the Unicode characters have pretty high
numbers). Even worse: most of the characters fell into long
ranges of consecutive values, so there were actually no
reason for setting a bit for each and every character.
Instead I decided to represent a set of characters as a
sorted list of ranges, storing only the lower and upper
bounds of each range. This list could be searched very
quickly because it would rarely have more than 15 ranges and
testing a character for containment in a range would only
require comparing it to the lower and upper bounds. Using
binary search on 15 ranges it would in the worst case
require 2×log2(16) = 2×4 = 8
character comparisons to find a character in the set. That
was acceptable for my purposes. What was not acceptable was
that my changing the representation of character sets meant
that the API of the phonetic encoder would have to depend
upon a very specialized binary searching range based
character set class, which in turn had to be packaged with
the encoder. Having done this trick for the (n+1)'th
time and having noted for the n'th time the stupidity
of doing it, I decided to make a generalized framework for
primitive types. The first n-1 times I had resisted
the temptation of creating yet another standard on which all
other APIs should depend. This time I fell short, and PCJ is
the result.
Søren Bak
When you examine the class hierarchy and the methods of PCJ, you will find that the interfaces and abstract classes (not the inner workings of the implementing classes) bear a close resemblance to those of the Java Collections Framework. This is intended. I have designed PCJ with the following in mind: Programmers are used to JCF and a close resemblance will increase the learning rate; and the complexity of adapters between JFC and PCJ should be kept low. There are plenty of other opportunities to introduce bugs. Besides that, I think that Joshua Bloch's JCF has a good interface design - worth basing other collections libraries on.
You will find some rather pathological classes in PCJ.
For example, there is a set class for boolean
values called BooleanDirectSet
. The value
of such a class is questionable and I felt quite silly
coding it (take a look at the source code, it seems like a
lot of work to represent the four possible states: {}, {F},
{T}, and {F,T}). Another example of such a class is the hash
map from float
keys to short
values (that one is generated). However, I think that a
primitive collections API should provide a complete set of
classes instead of just those that the author finds useful
at the time. It should be completely symmetric over all
primitive types.
TRangeSet
classes,
which are more efficient in memory usage.
CharSet
as an argument to a method.
The focus of v1.2 was to implement sorted set operations.
Changes:
SortedSet
to primitive sorted sets
SortedSet
TKeyOpenHashMap
TKeyChainedHashMap
This was a bugfix release:
TArrayDeque.getLast()
raised under
certain circumstances an IndexOutOfBoundsException
with
singleton deques.The focus of v1.1 was to complete a branch of the map interface hierarchy and make the API interact better with client Java code.
Changes:
The classes and interfaces of PCJ have been built symmetrically around
the primitive types: boolean
, char
, byte
, short
,
int
, long
, double
, and float
.
When I refer to a class, it can either be a type independent reference or a
type specific reference. For type independent references, T
and
S
are used to stand for primitive types. When I write
TCollection
it
is therefore a general (type independent) reference to the classes
BooleanCollection
,
CharCollection
,
ByteCollection
,
ShortCollection
,
IntCollection
,
LongCollection
,
FloatCollection
, and
DoubleCollection
.
When for some reason it makes no sense to provide a class for some primitive type, the symmetry is allowed to fail. This is only the case for classes - not for the general interfaces. There are generally made notes about such cases throughout this guide.
bak.pcj.*
packagesbak.pcj
Provides general primitive collection classes. All other classes in the API depends on those classes.
bak.pcj.adapter
Provides adapter classes between PCJ and JCF. The adapters provide a convenient interface between PCJ-based and JCF-based classes.
bak.pcj.benchmark
Provides classes for benchmarking PCJ and parts of JCF. When new classes are implemented they can be compared to other implementations using standard tasks and data sets.
bak.pcj.hash
Provides classes for creating hash functions. Besides providing default hash functions the package also provides interfaces that can be implemented to change the hash functions of the hashing collection classes.
bak.pcj.list
Provides list collection interfaces and classes. Lists extend the general collection interfaces with sequential access to collection elements.
bak.pcj.map
Provides map interfaces and classes. The implementations are currently based on hashing.
bak.pcj.set
Provides set interfaces and classes. Sets extend the general collection interfaces with set behavior. There are set classes providing different storage strategies for different situations.
The core of PCJ is a small set of interrelated interfaces. All collection classes derive in some way or other from these interfaces. They are shown shown in the UML class diagram below:
The main interface is TCollection
. It provides basic collection
behaviour like adding and removing elements.
The basic interfaces can be found in the package
bak.pcj
.
There are two major branches of
sub-interfaces. The one is the TSet
branch which provides
set behaviour for collections. All set classes can be found in the
bak.pcj.set
package.
The other is the TList
branch
which provides list behaviour.
All list classes can be found in the
bak.pcj.list
package.
There are a number of alternative implementations of these interfaces. Each implementation is useful in specific situations, and it requires a little knowledge about the classes to choose the right one. The classes implementing the interface hierarchy are shown in the UML class diagram below without method details.
Note that each of the main interfaces have an abstract class directly below in the hierarchy. Unless you are going to implement new collection classes, the abstract classes can safely be ignored. The implemented interfaces are important, not the super-classes. If you are going to implement new collection classes, the abstract classes provide base implementations of each of the main interfaces.
As mentioned earlier T
is not
{boolean
, char
, byte
,
short
, int
, long
,
float
, double
} for all classes. The table below shows the
type independent classes and their implementations for each
of the primitive types. You should examine the table and the
corresponding APIs to get an idea of the class
hierarchy.
Class or interface | Description | Availability | |||||||
---|---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
||
TCollection |
General collection interface | × | × | × | × | × | × | × | × |
TSet |
General set interface | × | × | × | × | × | × | × | × |
TSortedSet |
Sorted set interface | × | × | × | × | × | × | × | × |
TOpenHashSet |
Hash set implementation using open addressing | × | × | × | × | × | × | × | |
TChainedHashSet |
Hash set implementation using chained addressing | × | × | × | × | × | × | × | |
TRangeSet |
Set class based on value ranges | × | × | × | × | × | |||
TBitSet |
Set class based on bit arrays | × | × | × | × | ||||
TDirectSet |
Boolean set | × | |||||||
TList |
General list interface | × | × | × | × | × | × | × | × |
TStack |
General stack interface | × | × | × | × | × | × | × | × |
TDeque |
General deque interface | × | × | × | × | × | × | × | × |
TArrayList |
Array list class | × | × | × | × | × | × | × | × |
TArrayStack |
Array stack class | × | × | × | × | × | × | × | × |
TArrayDeque |
Array deque class | × | × | × | × | × | × | × | × |
As in JCF, map classes are not collections, but interact
closely with collections. All primitive map classes
implement either the
TKeySMap
interface, whose
instances map T
keys to S
values,
the
TKeyMap
interface,
whose
instances map T
values to
java.lang.Object
s,
or the
ObjectKeyTMap
interface,
whose
instances map java.lang.Object
s to T
values.
The three interfaces are not
directly related but share some design and logic. They are
shown in the UML class diagram below:
PCJ provides two implementation variants of both the
TKeySMap
interfaces, the TKeyMap
interfaces, and the ObjectKeyTMap
interfaces; namely
TKeySOpenHashMap
/TKeySChainedHashMap
,
TKeyOpenHashMap
/TKeyChainedHashMap
,
and
ObjectKeyTOpenHashMap
/ObjectKeyTChainedHashMap
as shown in the UML class diagram below:
To provide a bridge between PCJ based and JCF based APIs
there are adapters for all of the main interfaces:
collections, iterators, sets, lists, list iterators, and
maps. They can be found in the package
bak.pcj.adapter
.
Alternatively, use the class
bak.pcj.Adapter
,
which provides factory methods for all adapter classes. The table below
lists all adapter classes.
Class or interface | Description | Availability | |||||||
---|---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
||
TCollectionToCollectionAdapter |
Adapters from PCJ collections to JCF collections | × | × | × | × | × | × | × | × |
TIteratorToIteratorAdapter |
Adapters from PCJ iterators to JCF iterators | × | × | × | × | × | × | × | × |
TListToListAdapter |
Adapters from PCJ lists to JCF lists | × | × | × | × | × | × | × | × |
TListIteratorToListIteratorAdapter |
Adapters from PCJ list iterators to JCF list iterators | × | × | × | × | × | × | × | × |
TSetToSetAdapter |
Adapters from PCJ sets to JCF sets | × | × | × | × | × | × | × | × |
TSortedSetToSortedSetAdapter |
Adapters from PCJ sorted sets to JCF sorted sets | × | × | × | × | × | × | × | × |
TKeyMapToMapAdapter |
Adapters from PCJ maps to JCF maps | × | × | × | × | × | × | × | × |
ObjectKeyTMapToMapAdapter |
Adapters from PCJ maps to JCF maps | × | × | × | × | × | × | × | × |
TKeySMapToMapAdapter |
Adapters from PCJ maps to JCF maps |
All 64 combinations, e.g.
IntKeyCharMapToMapAdapter
|
|||||||
CollectionToTCollectionAdapter |
Adapters from JCF collections to PCJ collections | × | × | × | × | × | × | × | × |
IteratorToTIteratorAdapter |
Adapters from JCF iterators to PCJ iterators | × | × | × | × | × | × | × | × |
ListToTListAdapter |
Adapters from JCF lists to PCJ lists | × | × | × | × | × | × | × | × |
ListIteratorToTListIteratorAdapter |
Adapters from JCF list iterators to PCJ list iterators | × | × | × | × | × | × | × | × |
SetToTSetAdapter |
Adapters from JCF sets to PCJ sets | × | × | × | × | × | × | × | × |
SortedSetToTSortedSetAdapter |
Adapters from JCF sorted sets to PCJ sorted sets | × | × | × | × | × | × | × | × |
MapToTKeyMapAdapter |
Adapters from JCF maps to PCJ maps | × | × | × | × | × | × | × | × |
MapToObjectKeyTMapAdapter |
Adapters from JCF maps to PCJ maps | × | × | × | × | × | × | × | × |
MapToTKeySMapAdapter |
Adapters from JCF maps to PCJ maps |
All 64 combinations, e.g.
MapToIntKeyCharMapAdapter
|
Many of the collection and map classes are table based. Elements and mappings are stored in an underlying table (a Java array). The tables underlying table based collections have to grow as elements are added and possibly shrink as they are removed. That means allocating a new longer table and copy the existing elements into that table. It is easy to see that tables should not be expanded every time an element is added because that would result in very poor performance for additions. Instead the tables are expanded to a length well over the currently needed length. We call that length the capacity of the collection. Increasing the capacity of a collection is an expensive operation because of the copying involved (and re-hashing for hashtable-based maps), so the performance of add operations depend very much on how often the capacity is increased.
The table based collections and maps in PCJ supports two strategies for increasing the capacity:
Each strategy can be specified by selection of constructor. For example, we can create an int array list that grows with a fixed number of elements with the following code:
IntList list = new IntArrayList(100, 10);
The first argument in the constructor is the initial capacity of the list. The second is the number of elements with which to increase the capacity as elements are added. So in this case the first 100 elements are added without any capacity increase. When the 101st element is added the capacity is increased to 110, when the 111th element is added the capacity is increased to 120, and so on. Similarly, to create a list that grows with 20% size for each capacity increase, we would write:
IntList list = new IntArrayList(100, 0.2);
When the 101st element is added, the capacity is increased to 1.2×100 = 120. When the 121st element is added, the capacity is increased to 1.2×120 = 144, and so forth.
The strategy that should be employed depends on the situation. If you know how many elements are going to be in a collection, it is generally a good idea to give the collection that capacity from the beginning so as to avoid capacity increases. Very often we cannot know the needed capacity, and in those cases we should think about how collections grow. A rule of thumb is that if a collection rarely grows or grows slowly its capacity should increase by a fixed amount. In that way we avoid wasting too much memory on an exponential growth rate. On the other hand, if a collection grows fast its capacity should grow exponentially, because that is the best way to minimize the number of capacity increases. However, with a growth factor of 1.0 (doubling the capacity) we potentially allocate approximately twice as much memory as needed.
A number of the classes of PCJ are based on hash tables. As explained below, there are two main hashing variants in use: open addressing with double hashing and chaining.
The open hash collection and map variants (e.g.
TOpenHashSet
and
TKeySOpenHashMap
)
are based on a hash table using open addresing with double hashing.
Double hashing is a method of resolving collisions in hash tables
when more than one key hash to the same address in the table.
The secondary hash function used is always:
Primes
is used for selecting prime numbers.
The chained hash table variants (e.g.
TChainedHashSet
and
TKeySChainedHashMap
)
are based on hash tables with linked lists. Instead of
searching for an empty slot in the hash table, each slot
contains a linked list (a chain) of those entries whose keys hash to
that slot. In that way collision resolution is completely
avoided, but we have to search linked lists instead.
Both the OpenHash
and ChainedHash
variants
begin to perform bad when many keys have been hashed to the same slot
in the underlying hash table. To prevent that, hash tables are equipped
with a maximum load factor, which is some proportion of the its size.
When that proportion of the hash table is used, its length is increased
(see the discussion on growth above).
The default load factor is 0.75 for both variants, and a custom load
factor may be specified with the constructors.
The plots below shows how the load of a hash based set affects performance. For open addressing the time to look up a key increases steadily until it suddenly explodes around 0.7. For chained hashing the mean time to look up a key will increase steadily as the lists grow longer. The plots were created for hash sets, but are very similar for hash maps.
(The horizontal lines on the plots are due to the resolution of the timer on my computer, which is quite clearly somewhere around 10ms.)
If nothing else is specified with the constructors for the hash table based
collections and maps, the default hash functions of
bak.pcj.hash
are used. It is possible that the default functions give poor performance
results for specific data sets, and in such cases a new hash function may be specified.
Assume that the hash table length M is 17 and the default hash function h(n) = n
(which it is for int
values). If we add 17 it goes into slot 0 since 17 mod 17 = 0.
If we add 34 it also goes into slot 0 since 34 mod 17 = 0, thus activating the use of the secondary
hash function in the case of open addressing with double hashing. With chaining we get longer lists.
This example could benefit from an alternative hash function, e.g.
h(n) = (n div 17) + n. That would make n = 17 go into
slot ((17 div 17) + 17) mod 17 = 18 mod 17 = 1, and similarly 34 would go into slot
((34 div 17) + 17) mod 17 = 19 mod 17 = 2, thus eliminating the use of the secondary hash
function (or chains).
The example below shows how to pass a new hash function to a hash set:
import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.hash.IntHashFunction;
...
IntSet s = new IntOpenHashSet(
new IntHashFunction() {
public int hash(int v) {
return v/17 + v;
}
}
);
In this section the interfaces and classes are described in more detail and examples are given. The design is also discussed.
TCollection
interfacesThe TCollection
interfaces define the basic collection
behaviour. It is adapted from the JCF
Collection
interface. It provides the basic methods for modifying and querying collections.
Although TCollection
is the general ancestor
of all collection classes, each method may have a different
semantics depending on the collection type. For example, the
method add(T)
adds an element to the end of a
TList
whereas it ensures that a
TSet
contains an element.
If you only use PCJ and do not extend it, you
will only use TCollection
as a polymorphic type
for collections.
See TSet
for examples.
The two interfaces are identical in signature.
Type | Interface |
---|---|
boolean |
bak.pcj.BooleanCollection |
char |
bak.pcj.CharCollection |
byte |
bak.pcj.ByteCollection |
short |
bak.pcj.ShortCollection |
int |
bak.pcj.IntCollection |
long |
bak.pcj.LongCollection |
float |
bak.pcj.FloatCollection |
double |
bak.pcj.DoubleCollection |
TList
interfaces
The TList
interfaces define an extension to the
TCollection
interfaces for sequential lists. As TCollection
this is basically an adpation of the JCF
List
interface.
Example:
import bak.pcj.list.IntList;
import bak.pcj.list.IntArrayList;
import bak.pcj.list.IntListIterator;
public class ListExample {
public static void main(String[] args) {
IntList s = new IntArrayList();
// Adding elements to the end
s.add(1); // s=[1]
s.add(2); // s=[1,2]
// Inserting elements at positions
s.add(2, 3); // s=[1,2,3]
s.add(0, 0); // s=[0,1,2,3]
s.add(1, 10); // s=[0,10,1,2,3]
// Removing elements at positions
s.removeElementAt(1); // s=[0,1,2,3]
// Removing elements by value
s.remove(2); // s=[0,1,3]
// Updating elements
s.set(2, 2); // s=[0,1,2]
s.set(0, -1); // s=[-1,1,2]
// Searching for elements
int v;
v = s.indexOf(1); // v=1
v = s.indexOf(2); // v=2
v = s.indexOf(10); // v=-1
// Adding collections at positions
IntArrayList t = new IntArrayList();
t.add(10); // t=[10]
t.add(20); // t=[10,20]
t.add(30); // t=[10,20,30]
s.addAll(1, t); // s=[-1,10,20,30,1,2]
}
}
Other examples:
ListExample.java
Reverse.java
Sorting.java
Type | Interface |
---|---|
boolean |
bak.pcj.list.BooleanList |
char |
bak.pcj.list.CharList |
byte |
bak.pcj.list.ByteList |
short |
bak.pcj.list.ShortList |
int |
bak.pcj.list.IntList |
long |
bak.pcj.list.LongList |
float |
bak.pcj.list.FloatList |
double |
bak.pcj.list.DoubleList |
The TList
interfaces do not provide methods
to retrieve a sub-list of a list. It is relatively complex
to implement sub-list views, so this feature (which is
implemented in JCF lists) is omitted for the time being.
A deviation from JCF in method names is that of removeElementAt()
.
JCF lists has a remove(int)
and remove(Object)
method
for removing elements by index and by equality, respectively. Since the objects
of PCJ lists can be of type int
, there would be a clash between
the two method signatures if the JCF naming should be used. Therefore PCJ lists
use removeElementAt(int)
to remove elements by index.
TStack
interfaces
The TStack
interfaces extend the
TList
interfaces with the
typical stack operations: push()
, pop()
, and
peek()
. Stacks typically only grow and shrink at the end.
Note that all TList
s in principle can be used as
stacks. We can easily replace pop()
with
removeElementAt(size()-1)
and push(v)
with
add(v)
.
The stack interfaces does not guarantee that stack operations are more
efficient than their list/collection counterparts, but it makes the
intention clear.
Example:
import bak.pcj.list.IntStack;
import bak.pcj.list.IntArrayStack;
public class StackExample {
public static void main(String[] args) {
IntStack s = new IntArrayStack();
// Pushing elements
s.push(1); // s=[1]
s.push(2); // s=[1,2,3]
s.push(3); // s=[1]
// Inspecting
int v;
v = s.peek(); // v=3, s=[1,2,3]
// Popping elements
v = s.pop(); // v=3, s=[1,2]
v = s.pop(); // v=0, s=[1]
}
}
Other examples:
StackExample.java
Calculator.java
Type | Interface |
---|---|
boolean |
bak.pcj.list.BooleanStack |
char |
bak.pcj.list.CharStack |
byte |
bak.pcj.list.ByteStack |
short |
bak.pcj.list.ShortStack |
int |
bak.pcj.list.IntStack |
long |
bak.pcj.list.LongStack |
float |
bak.pcj.list.FloatStack |
double |
bak.pcj.list.DoubleStack |
It may be questioned whether a stack is a special kind of list. Interfaces are often used for specifying the minimum capabilities of an object, e.g. an argument to a method. In those cases a stack ought to be just that: a stack, not a list, because we probably only need to push, pop, and peek the object.
On the other hand, stacks are sequential information structures.
It should be possible to iterate over the elements. It may also
be argued that they at least should extend the TCollection
interface to be interesting in a collections API. And extending
the TCollection
interface they might as well extend the
TList
interface.
In JCF stacks are merely special implementations of
lists. That leaves us with a problem when an operations
needs a stack to work efficiently: We can either 1) make the
argument a Stack
which is a special implementation
of List
or we can 2) make the argument a List
and hope that the caller passes an object that has fast
addition and removal from the end of the list.
// Solution 1:
void doSomething(Stack s) {
// Perform stack operations on s here using push(Object) and pop()
...
}
...
Stack s = new Stack();
doSomething(s);
// Solution 2:
void doSomething(List s) {
// Perform stack operations on s using add(Object) and remove(int)
...
}
...
List s = new Stack();
doSomething(s);
Solution 1 above is not acceptable, because the
doSomething()
method does not care about the
implementation of the stack argument. It puts a restriction
on the caller that is not needed. Solution 2 is not
acceptable because any list implementation can be passed to
doSomething()
may render it inefficient.
I think that a better way is to make Stack
an interface and to provide implementations with fast
addition and removal at the end of lists.
// Solution with Stack as an interface:
void doSomething(Stack s) {
// Perform stack operations on s here using push(Object) and pop()
...
}
...
Stack s = new MyStackImplementation();
doSomething(s);
That is why TStack
is an interface in PCJ.
TDeque
interfaces
The TDeque
interfaces extend the
TList
interfaces with the typical
typical operations of a deque:
addFirst()
, addLast()
,
getFirst()
, getLast()
,
removeFirst()
, and removeLast()
.
Deques typically only grow and shrink at the ends.
Note that all TList
s in principle can be used as
stacks because of the positional access to elements.
We can easily replace getFirst()
with get(0)
,
removeLast()
with removeElementAt(size()-1)
, and so forth.
The deque interfaces does not guarantee that deque operations are more
efficient than their list/collection counterparts, but it makes the
intention clear.
Example:
import bak.pcj.IntIterator;
import bak.pcj.list.IntDeque;
import bak.pcj.list.IntArrayDeque;
public class DequeExample {
public static void main(String[] args) {
IntDeque q = new IntArrayDeque();
// Adding elements
q.addFirst(1); // q=[1]
q.addFirst(2); // q=[2,1]
q.addFirst(3); // q=[3,2,1]
q.addLast(0); // q=[3,2,1,0]
// Retrieving elements
int v;
v = q.getFirst(); // v=3, q=[3,2,1,0]
v = q.getLast(); // v=0, q=[3,2,1,0]
// Removing elements
v = q.removeFirst(); // v=3, q=[2,1,0]
v = q.removeLast(); // v=0, q=[2,1]
}
}
Other examples:
DequeExample.java
DequeExample1.java
Type | Interface |
---|---|
boolean |
bak.pcj.list.BooleanDeque |
char |
bak.pcj.list.CharDeque |
byte |
bak.pcj.list.ByteDeque |
short |
bak.pcj.list.ShortDeque |
int |
bak.pcj.list.IntDeque |
long |
bak.pcj.list.LongDeque |
float |
bak.pcj.list.FloatDeque |
double |
bak.pcj.list.DoubleDeque |
TStack
also applies to TDeque
.
TArrayList
classes
The TArrayList
classes provide array based implementations
of the
TList
interfaces. The data in the list
is represented as a T[]
that grows with the size of the list.
See TList
for examples.
Type | Class |
---|---|
boolean |
bak.pcj.list.BooleanArrayList |
char |
bak.pcj.list.CharArrayList |
byte |
bak.pcj.list.ByteArrayList |
short |
bak.pcj.list.ShortArrayList |
int |
bak.pcj.list.IntArrayList |
long |
bak.pcj.list.LongArrayList |
float |
bak.pcj.list.FloatArrayList |
double |
bak.pcj.list.DoubleArrayList |
Currently PCJ provides no alternatives to TArrayList
other
than TArrayStack
and
TArrayDeque
which are specialized lists - they should not be used as replacements
for lists.
TArrayList
s perform best when they grow or shrink at the end
- like stacks.
Since the underlying data structure of a TArrayList
is an array,
removal and insertion (other than at the end of the list) is not efficient
for large list sizes.
IntArrayList s = new IntArrayList();
for (int i = 0; i < 1000000; i++)
s.add(i);
s.removeElementAt(0); This is not efficient!!!
s.add(0, 100); This is not efficient!!!
If you have to modify a long list at its beginning, use a
TDeque
instead of a list.
TArrayStack
classes
TArrayStack
is an array based implementation of TStack
.
It is implemented as a simple extension to
TArrayList
.
See TStack
for examples.
Type | Class |
---|---|
boolean |
bak.pcj.list.BooleanArrayStack |
char |
bak.pcj.list.CharArrayStack |
byte |
bak.pcj.list.ByteArrayStack |
short |
bak.pcj.list.ShortArrayStack |
int |
bak.pcj.list.IntArrayStack |
long |
bak.pcj.list.LongArrayStack |
float |
bak.pcj.list.FloatArrayStack |
double |
bak.pcj.list.DoubleArrayStack |
TArrayDeque
classes
The TArrayDeque
classes provide array based implementations
of TDeque
. Internally the elements
are stored in a T[]
the same size of or longer than the
current number of elements.
0 | 1 | 2 | 3 | 4 | 5 |
1 | 2 | 3 | - | - | - |
^ | ^ |
We operate with a pointer to the first element in the deque and a pointer to the last element (or 1 beyond the last). Insertions performed performed at the end moves the last-pointer forward, and insertions performed at the beginning moves the first-pointer backward. Likewise, removals from the end moves the last-pointer backward, and removals from the beginning move the first-pointer forward. Since the deque is restricted by a capacity all pointer movement is wrapped around the ends of the array (indices are added modulo the capacity). See the examples below:
addLast(4)
:
0 | 1 | 2 | 3 | 4 | 5 |
1 | 2 | 3 | 4 | - | - |
^ | ^ |
removeFirst()
:
0 | 1 | 2 | 3 | 4 | 5 |
- | 2 | 3 | 4 | - | - |
^ | ^ |
addLast(5)
:
0 | 1 | 2 | 3 | 4 | 5 |
- | 2 | 3 | 4 | 5 | - |
^ | ^ |
removeFirst()
:
0 | 1 | 2 | 3 | 4 | 5 |
- | - | 3 | 4 | 5 | - |
^ | ^ |
addLast(6)
:
0 | 1 | 2 | 3 | 4 | 5 |
- | - | 3 | 4 | 5 | 6 |
^ | ^ |
addLast(7)
:
0 | 1 | 2 | 3 | 4 | 5 |
7 | - | 3 | 4 | 5 | 6 |
^ | ^ |
Examples: see TDeque
for examples.
Type | Class |
---|---|
boolean |
bak.pcj.list.BooleanArrayDeque |
char |
bak.pcj.list.CharArrayDeque |
byte |
bak.pcj.list.ByteArrayDeque |
short |
bak.pcj.list.ShortArrayDeque |
int |
bak.pcj.list.IntArrayDeque |
long |
bak.pcj.list.LongArrayDeque |
float |
bak.pcj.list.FloatArrayDeque |
double |
bak.pcj.list.DoubleArrayDeque |
TSet
interfaces
The TSet
interfaces define the set
behaviour for primitive collections. It is adapted from the JCF
Set
interface and exists for all primitive types. In PCJ TSet
is only used to tag collections - it does not extend the collection
interface with new methods.
Examples:
import bak.pcj.set.IntSet;
import bak.pcj.set.IntOpenHashSet;
import bak.pcj.IntIterator;
public class SetExample {
public static void main(String[] args) {
IntSet s = new IntOpenHashSet();
// Adding elements
s.add(1); // s={1}
s.add(2); // s={1,2}
s.add(3); // s={1,2,3}
s.add(4); // s={1,2,3,4}
s.add(5); // s={1,2,3,4,5}
s.add(2); // s={1,2,3,4,5} (not changed)
s.add(4); // s={1,2,3,4,5} (not changed)
// Removing elements
s.remove(2); // s={1,3,4,5}
s.remove(5); // s={1,3,4}
// Clearing
s.clear(); // s={}
boolean c;
c = s.isEmpty(); // c=true
// Testing containment of elements
s.add(1);
s.add(2);
s.add(3); // s={1,2,3}
c = s.contains(1); // c=true, s={1,2,3}
c = s.contains(3); // c=true, s={1,2,3}
c = s.contains(5); // c=false, s={1,2,3}
// Testing containment of collections
IntSet t;
t = new IntOpenHashSet();
t.add(1); // t={1}, s={1,2,3}
c = s.containsAll(t); // c=true
t.add(0); // t={0,1}, s={1,2,3}
c = s.containsAll(t); // c=false
// Adding collections
s.addAll(t); // t={0,1}, s={0,1,2,3}
// Retaining elements of collections
t = new IntOpenHashSet();
t.add(2);
t.add(3);
t.add(4);
t.add(5); // t={2,3,4,5}, s={0,1,2,3}
s.retainAll(t); // t={2,3,4,5}, s={2,3}
// Removing whole collections
t = new IntOpenHashSet();
t.add(3);
t.add(4);
s.removeAll(t); // t={3,4}, s={2}
// Finding the size
int v;
v = s.size(); // v=1, s={2}
s.add(1);
v = s.size(); // v=2, s={1,2}
// Converting to an array
int[] a = s.toArray(); // a=int[]{1,2}, s={1,2}
// Iterating over elements
for (int i = 0; i < 10; i++)
s.add(i); // s={0,1,2,3,4,5,6,7,8,9}
IntIterator i = s.iterator();
while (i.hasNext()) {
int e = i.next();
if ((e % 2) == 1)
i.remove();
} // s={0,2,4,6,8}
// Comparing
t = new IntOpenHashSet(s); // s={0,2,4,6,8}, t={0,2,4,6,8}
c = s.equals(t); // c=true
t.remove(2); // s={0,2,4,6,8}, t={0,4,6,8}
c = s.equals(t); // c=false
}
}
Other examples:
Type | Interface |
---|---|
boolean |
bak.pcj.set.BooleanSet |
char |
bak.pcj.set.CharSet |
byte |
bak.pcj.set.ByteSet |
short |
bak.pcj.set.ShortSet |
int |
bak.pcj.set.IntSet |
long |
bak.pcj.set.LongSet |
float |
bak.pcj.set.FloatSet |
double |
bak.pcj.set.DoubleSet |
TSortedSet
interfaces
The TSortedSet
interfaces define the sorted set
behaviour for primitive collections. It extends
TSet
with some sub-set operations, retrieval of first
and last elements, and a contract to iterate in the natural order of the
elements. It is adapted from the JCF
SortedSet
interface and exists for all primitive types.
Type | Interface |
---|---|
boolean |
bak.pcj.set.BooleanSortedSet |
char |
bak.pcj.set.CharSortedSet |
byte |
bak.pcj.set.ByteSortedSet |
short |
bak.pcj.set.ShortSortedSet |
int |
bak.pcj.set.IntSortedSet |
long |
bak.pcj.set.LongSortedSet |
float |
bak.pcj.set.FloatSortedSet |
double |
bak.pcj.set.DoubleSortedSet |
TOpenHashSet
classes
TOpenHashSet
provide a hash table based implementation of
the TSet
interface with
open addressing.
See TSet
for examples.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.set.CharOpenHashSet |
byte |
bak.pcj.set.ByteOpenHashSet |
short |
bak.pcj.set.ShortOpenHashSet |
int |
bak.pcj.set.IntOpenHashSet |
long |
bak.pcj.set.LongOpenHashSet |
float |
bak.pcj.set.FloatOpenHashSet |
double |
bak.pcj.set.DoubleOpenHashSet |
TChainedHashSet
classes
TOpenHashSet
provide a hash table based implementation of
the TSet
interface with
chained hashing.
See TSet
for examples.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.set.CharChainedHashSet |
byte |
bak.pcj.set.ByteChainedHashSet |
short |
bak.pcj.set.ShortChainedHashSet |
int |
bak.pcj.set.IntChainedHashSet |
long |
bak.pcj.set.LongChainedHashSet |
float |
bak.pcj.set.FloatChainedHashSet |
double |
bak.pcj.set.DoubleChainedHashSet |
TRangeSet
classes
The TRangeSet
classes provide range based implementations of
TSortedSet
.
The elements of a TRangeSet
s are represented by a number of
ranges of consecutive values. A TRangeSet
keeps track of the lower
and upper bounds of the ranges. Internally a list of ranges are kept in sorted
order for fast lookup (O(log(n), where n is the number of
ranges).
See TSet
for examples.
TRangeSet
provides a number of specialized methods for
adding and searching for elements. See the
API documentation
for details.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.set.CharRangeSet |
byte |
bak.pcj.set.ByteRangeSet |
short |
bak.pcj.set.ShortRangeSet |
int |
bak.pcj.set.IntRangeSet |
long |
bak.pcj.set.LongRangeSet |
float |
n/a |
double |
n/a |
TRangeSet
is only provided for byte
, short
,
int
, and long
. They are the only types for which
it makes sense to speak about ranges of consecutive values. It is also
not quite clear that the
LongRangeSet
class should exist. With ranges of long
values we can easily construct a
set that has more than Integer.MAX_VALUE
elements:
import bak.pcj.set.LongRangeSet;
...
LongRangeSet s = new LongRangeSet();
s.add(0L, Integer.MAX_VALUE+1L);
int size = s.size();
// What is the value of size at this point???
This problem is unresolved. I think that use of the LongRangeSet
will be deprecated some time in the future.
TRangeSet
objects use very little memory when the
number of ranges is small. It does not depend on the number of
elements, only the number of ranges.
A typical use of the TRangeSet
classes is to
get rid of hard-coded range checks, as in:
if ((c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9')) {
...
}
The above code is much easier to read and maintain (and for enough ranges much more efficient) if it is rewritten to a question of set containment:
static CharRangeSet ALPHANUMERIC = new CharRangeSet(
new CharRange('a', 'z'),
new CharRange('A', 'Z'),
new CharRange('0', '9'),
);
...
if (ALPHANUMERIC.contains(c)) {
...
}
As the benchmarks show, TRangeSet
should not be used
when additions or removals can split the ranges too much. Random
elements gives very bad performance, and when elements cannot
be packed into ranges, there is no advantage in memory use.
TBitSet
classes
The TBitSet
classes provide a bit array based implementation
of TSortedSet
.
A TBitSet
is represented by an array of bits. When a
bit is set it means that the element with the same value as the position
of that bit is an element of the set. For example,
the least five elements of the set {1,5,7,8,9,...} are represented by
the bit array shown below:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... |
0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | ... |
See TSet
for examples.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.set.CharBitSet |
byte |
bak.pcj.set.ByteBitSet |
short |
bak.pcj.set.ShortBitSet |
int |
bak.pcj.set.IntBitSet |
long |
n/a |
float |
n/a |
double |
n/a |
TBitSet
is provided for byte
, char
, short
, and int
.
TBitSet
depends on the set type having a meaningful enumeration.
This is not the case for boolean
, double
, and float
.
TBitSet
is not provided for long
values since the underlying
arrays would have to be too large.
TBitSet
s generally perform good for small set elements and
dense sets (elements that are close to each other).
Iteration over TBitSet
s is not particularly efficient -
especially not when the set elements are sparse. The reason for this is
that the object must search through an array of bits for the next set
bit before it can return the next value of an iteration.
The memory used by TBitSet
s is proportional not to its number
of elements but to the value of its largest element. Don't use bit sets
with large values if memory is a problem.
When elements are added in ascending order, it requires many capacity increases. If the largest element is known it should be added first. Alternatively the initial capacity can be set to a number corresponding to the largest element via a constructor.
TBitSet
s cannot be used with negative elements.
TKeySMap
interfaces
The TKeySMap
interfaces define maps from T
keys
to S
values.
The primitive map interfaces are probably the interfaces that deviate the most
from their JCF counterparts. The
Map.Entry
has been omitted because it restricts
the implementations. Unless one wants to represent a map by a
Map.Entry
instance for each entry of the map, one is bound to
create these extra instances when returning the entry set (see
Map.entrySet()
). It is very useful to be able
to access all entries of a map, but it does not have to be through a set. PCJ
introduces a special iterator called a TKeySMapIterator
(see
below).
It has the
same characteristics as a normal iterator, but is capable of returning a key and
a value apart from each other.
Some of the issues that arise when using primitive types in maps are discussed below.
Examples:
import bak.pcj.CharCollection;
import bak.pcj.map.IntKeyCharMap;
import bak.pcj.map.IntKeyCharOpenHashMap;
import bak.pcj.map.IntKeyCharMapIterator;
import bak.pcj.set.IntSet;
public class IntKeyCharMapExample {
public static void main(String[] args) {
IntKeyCharMap s = new IntKeyCharOpenHashMap();
// Adding mappings
s.put(1,'a'); // s={(1,'a')}
s.put(2,'b'); // s={(1,'a'),(2,'b')}
s.put(3,'c'); // s={(1,'a'),(2,'b'),(3,'c')}
// Mapping keys to values
char v;
v = s.get(1); // v='a', s={(1,'a'),(2,'b'),(3,'c')}
v = s.get(3); // v='c', s={(1,'a'),(2,'b'),(3,'c')}
v = s.get(4); // v='\0', s={(1,'a'),(2,'b'),(3,'c')}
// Testing for a key mapping
boolean c;
c = s.containsKey(1); // c=true, s={(1,'a'),(2,'b'),(3,'c')}
c = s.containsKey(4); // c=false, s={(1,'a'),(2,'b'),(3,'c')}
// Testing for a value
c = s.containsValue('a'); // c=true, s={(1,'a'),(2,'b'),(3,'c')}
c = s.containsValue('d'); // c=false, s={(1,'a'),(2,'b'),(3,'c')}
// Adding elements from maps
IntKeyCharMap t = new IntKeyCharOpenHashMap();
t.put(2, 'x');
t.put(3, 'c');
t.put(4, 'd'); // t={(2,'x'),(3,'c'),(4,'d')}, s={(1,'a'),(2,'b'),(3,'c')}
s.putAll(t); // s={(1,'a'),(2,'x'),(3,'c'),(4,'d')}
// Replacing mappings
s.put(2, 'b'); // s={(1,'a'),(2,'b'),(3,'c'),(4,'d')}
// Iterating over entries
IntKeyCharMapIterator i = s.entries();
while (i.hasNext()) {
i.next();
int key = i.getKey();
char value = i.getValue();
// Print or do something else with entry here
}
// Retrieving keys (view of map)
IntSet keys = s.keySet(); // keys={1,2,3}
keys.remove(4); // s={(1,'a'),(2,'b'),(3,'c')}
// Retrieving values (view of map)
CharCollection values = s.values(); // values={'a','b','c'}
// Iterate over or do something else with values here
// Finding size of map
int size = s.size(); // size=3, s={(1,'a'),(2,'b'),(3,'c')}
// Comparing maps
t = new IntKeyCharOpenHashMap(s);
c = s.equals(t); // c=true, s={(1,'a'),(2,'b'),(3,'c')}, t={(1,'a'),(2,'b'),(3,'c')}
t.remove(1); // s={(1,'a'),(2,'b'),(3,'c')}, t={(2,'b'),(3,'c')}
c = s.equals(t); // c=false
// Clearing
s.clear(); // s={}
c = s.isEmpty(); // c=true
}
}
Other examples:
Key type | Value type | |||||||
---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
|
boolean |
× | × | × | × | × | × | × | × |
char |
× | × | × | × | × | × | × | × |
byte |
× | × | × | × | × | × | × | × |
short |
× | × | × | × | × | × | × | × |
int |
× | × | × | × | × | × | × | × |
long |
× | × | × | × | × | × | × | × |
float |
× | × | × | × | × | × | × | × |
double |
× | × | × | × | × | × | × | × |
TKeySMap
is provided for all combinations of T
and S
.
get()
problemThe get(Object)
method of Map
returns null, if no mapping can be found (and also if the
key maps to a null value). To make the distinction clear,
JCF provides the method containsKey(Object)
which is also supported by PCJ. With JCF we can almost
always avoid the use of containsKey
because in
most programs null values are very rare compared to non-null
values:
Map m;
Object value;
Object key;
...
if ( (value = m.get(key)) != null ) {
// The mapping exists, and we know its value
...
}
However, with primitive types we will rarely allow
specific primitive values in a map, such as the value
0
, to be interpreted as a non-existing
mapping. The problem with using containsKey()
is that
we have to perform the same lookup twice. First to check if
the mapping exists (using containsKey()
), and then to retrieve
the mapping using get()
:
IntKeyCharMap m;
int key;
char value;
...
if (m.containsKey(key)) {
// The mapping exists, now get its value
value = m.get(key);
...
}
An improvement is to check for containment only when a special value is returned, but it does not solve the basic problem of having to perform the same lookup twice. The following code is only an improvement if the special value returned to indicate a non-existing mapping is not too frequently occuring as a value in the map:
IntKeyCharMap m;
int key;
char value;
...
if (((value = m.get(key)) != SPECIAL || m.containsKey(key))
// The mapping exists, and we know its value
...
}
When a mapping is known to exist (i.e. it is a program error if the mapping does not exist) we usually don't want to burden the virtual machine with confirming its existence. But we can easily afford the extra check, if it is very simple.
With JCF we have the safety of get()
returning a null value.
After retrieving a value, we will typically perform some operation on
that value, e.g. invoke a method on it or pass it to another method, so
we have a good chance that a NullPointerException
will be thrown very
quickly:
Map m;
Object value;
Object key;
...
value = m.get(key);
// Throws a NullPointerException if the mapping does not exist
return value.toString();
For primitive maps, a similar assumption cannot be made.
Operations on primitive types (excluding division operators)
tend to work just fine with 0
values (or any
other chosen default value). So we cannot depend on the
operations succeeding the map lookup to produce an
error:
StringBuffer s;
IntKeyCharMap m;
int key;
char value;
...
// Returns '\0' if the mapping does not exist
value = m.get(key);
// No error is produced if value is '\0'
s.append(value);
In that case a second map lookup is needed just for the purpose of discovering bugs early.
get()
methods of PCJ mapsTo solve the problems described in the section above, PCJ
provides three different get()
methods for
the different siturations:
If you can accept that a default
value indicates the non-existence of a mapping, use
get()
. The method will return a
default value when the key does not map to any value. The
default value returned is defined in the class
MapDefaults
.
IntKeyCharMap m;
char value;
int key;
...
if ( (value = m.get(key)) != MapDefaults.defaultChar() ) {
// The mapping exists, and we know its value
...
}
Future versions of PCJ may allow a configurable default value, but this has not been judged necessary for the time being.
If you are certain about the existence of
a mapping, use tget()
. The method will throw
a runtime exception if the mapping does not exist.
IntKeyCharMap m;
char value;
int key;
...
// Throws an exception if key does not map to a value
value = m.tget(key);
...
If you cannot accept a primitive default value
and are not certain about the existence of the mapping, use
containsKey()
and
lget()
The method will return the last value corresponding to the
key of the last call to containsKey()
, provided
that the key exists. You may object that this is indeed two
lookups, but in practice implementations of maps will be
able to cache the value when looking up a key.
IntKeyCharMap m;
char value;
int key;
...
if (m.containsKey(key)) {
// Returns the value corresponding to key
value = m.lget();
...
}
JCF maps provide a set view of all the entries in a map. This actually
restricts the implementation to be based on entries having a specific
form
(Map.Entry
).
PCJ uses an alternative approach. When we obtain the entries of a map, it
will almost always have the purpose of performing an iteration. Instead of
providing a set view of the entries of a primitive set, PCJ provides an
iterator over all entries, which does not constrain the underlying implementation.
The TKeySMapIterator
interfaces define such iterators for
all TKeySMap
type combinations (an example is
IntKeyLongMapIterator
).
As shown in the example below, there is a difference between using normal iterators
and map iterators. Normal iterators return the next value when next()
is invoked. Map iterators make the key and the value of the next entry available
when next()
invoked. The key and the value can thereafter be
retrieved using the map iterator's getKey()
and getValue()
methods.
Typical JCF code for iterating over entries:
Map m;
Iterator i = m.entrySet().iterator();
while (i.hasNext()) {
Map.Entry e = (Map.Entry)i.next();
int key = ((Integer)e.getKey()).intValue();
long value = ((Long)e.getValue()).longValue();
...
}
Corresponding PCJ code:
IntKeyLongMap m;
IntKeyLongMapIterator i = m.entries();
while (i.hasNext()) {
i.next();
int key = i.getKey();
long value = i.getValue();
...
}
TKeySOpenHashMap
classes
TKeySOpenHashMap
provide a hash table based implementation of
the TKeySMap
interface with
open addressing.
See TKeySMap
for examples.
Key type | Value type | |||||||
---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
|
boolean |
n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
char |
× | × | × | × | × | × | × | × |
byte |
× | × | × | × | × | × | × | × |
short |
× | × | × | × | × | × | × | × |
int |
× | × | × | × | × | × | × | × |
long |
× | × | × | × | × | × | × | × |
float |
× | × | × | × | × | × | × | × |
double |
× | × | × | × | × | × | × | × |
TKeySOpenHashMap
is provided for all combinations of T
and S
except
where T
is boolean
.
TKeySChainedHashMap
classes
TKeySChainedHashMap
provide a hash table based implementation of
the TKeySMap
interface with
chained hashing.
See TKeySMap
for examples.
Key type | Value type | |||||||
---|---|---|---|---|---|---|---|---|
boolean |
char |
byte |
short |
int |
long |
float |
double |
|
boolean |
n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
char |
× | × | × | × | × | × | × | × |
byte |
× | × | × | × | × | × | × | × |
short |
× | × | × | × | × | × | × | × |
int |
× | × | × | × | × | × | × | × |
long |
× | × | × | × | × | × | × | × |
float |
× | × | × | × | × | × | × | × |
double |
× | × | × | × | × | × | × | × |
TKeySChainedHashMap
is provided for all combinations of T
and S
except
where T
is boolean
.
TKeyMap
interfaces
The TKeyMap
interfaces define maps from T
keys
to objects.
As for TKeySMap
a special
map iterator is provided for TKeyMap
s.
An example is
IntKeyMapIterator
.
Examples:
import bak.pcj.map.IntKeyMap;
import bak.pcj.map.IntKeyOpenHashMap;
import bak.pcj.map.IntKeyMapIterator;
import bak.pcj.set.IntSet;
import java.util.Collection;
public class IntKeyMapExample {
public static void main(String[] args) {
IntKeyMap s = new IntKeyOpenHashMap();
// Adding mappings
s.put(1,"a"); // s={(1,"a")}
s.put(2,"b"); // s={(1,"a"),(2,"b")}
s.put(3,"c"); // s={(1,"a"),(2,"b"),(3,"c")}
// Mapping keys to values
String v;
v = (String)s.get(1); // v="a", s={(1,"a"),(2,"b"),(3,"c")}
v = (String)s.get(3); // v="c", s={(1,"a"),(2,"b"),(3,"c")}
v = (String)s.get(4); // v='\0', s={(1,"a"),(2,"b"),(3,"c")}
// Testing for a key mapping
boolean c;
c = s.containsKey(1); // c=true, s={(1,"a"),(2,"b"),(3,"c")}
c = s.containsKey(4); // c=false, s={(1,"a"),(2,"b"),(3,"c")}
// Testing for a value
c = s.containsValue("a"); // c=true, s={(1,"a"),(2,"b"),(3,"c")}
c = s.containsValue("d"); // c=false, s={(1,"a"),(2,"b"),(3,"c")}
// Adding elements from maps
IntKeyMap t = new IntKeyOpenHashMap();
t.put(2, "x");
t.put(3, "c");
t.put(4, "d"); // t={(2,"x"),(3,"c"),(4,"d")}, s={(1,"a"),(2,"b"),(3,"c")}
s.putAll(t); // s={(1,"a"),(2,"x"),(3,"c"),(4,"d")}
// Replacing mappings
s.put(2, "b"); // s={(1,"a"),(2,"b"),(3,"c"),(4,"d")}
// Iterating over entries
IntKeyMapIterator i = s.entries();
while (i.hasNext()) {
i.next();
int key = i.getKey();
String value = (String)i.getValue();
// Print or do something else with entry here
}
// Retrieving keys (view of map)
IntSet keys = s.keySet(); // keys={1,2,3}
keys.remove(4); // s={(1,"a"),(2,"b"),(3,"c")}
// Retrieving values (view of map)
Collection values = s.values(); // values={"a","b","c"}
// Iterate over or do something else with values here
// Finding size of map
int size = s.size(); // size=3, s={(1,"a"),(2,"b"),(3,"c")}
// Comparing maps
t = new IntKeyOpenHashMap(s);
c = s.equals(t); // c=true, s={(1,"a"),(2,"b"),(3,"c")}, t={(1,"a"),(2,"b"),(3,"c")}
t.remove(1); // s={(1,"a"),(2,"b"),(3,"c")}, t={(2,"b"),(3,"c")}
c = s.equals(t); // c=false
// Clearing
s.clear(); // s={}
c = s.isEmpty(); // c=true
}
}
Other examples:
Type | Interface |
---|---|
boolean |
bak.pcj.map.BooleanKeyMap |
char |
bak.pcj.map.CharKeyMap |
byte |
bak.pcj.map.ByteKeyMap |
short |
bak.pcj.map.ShortKeyMap |
int |
bak.pcj.map.IntKeyMap |
long |
bak.pcj.map.LongKeyMap |
float |
bak.pcj.map.FloatKeyMap |
double |
bak.pcj.map.DoubleKeyMap |
TKeyOpenHashMap
classes
TKeyOpenHashMap
provide a hash table based implementation of
the TKeyMap
interface with
open addressing.
See TKeyMap
for examples.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.map.CharKeyOpenHashMap |
byte |
bak.pcj.map.ByteKeyOpenHashMap |
short |
bak.pcj.map.ShortKeyOpenHashMap |
int |
bak.pcj.map.IntKeyOpenHashMap |
long |
bak.pcj.map.LongKeyOpenHashMap |
float |
bak.pcj.map.FloatKeyOpenHashMap |
double |
bak.pcj.map.DoubleKeyOpenHashMap |
TKeyChainedHashMap
classes
TKeyChainedHashMap
provide a hash table based implementation of
the TKeyMap
interface with
chained hashing.
See TKeyMap
for examples.
Type | Class |
---|---|
boolean |
n/a |
char |
bak.pcj.map.CharKeyChainedHashMap |
byte |
bak.pcj.map.ByteKeyChainedHashMap |
short |
bak.pcj.map.ShortKeyChainedHashMap |
int |
bak.pcj.map.IntKeyChainedHashMap |
long |
bak.pcj.map.LongKeyChainedHashMap |
float |
bak.pcj.map.FloatKeyChainedHashMap |
double |
bak.pcj.map.DoubleKeyChainedHashMap |
ObjectKeyTMap
interfaces
The ObjectKeyTMap
interfaces define maps from Object
keys
to T
values.
As for TKeySMap
a special
map iterator is provided for ObjectKeyTMap
s.
An example is
IntKeyMapIterator
.
Examples:
import bak.pcj.map.ObjectKeyIntMap;
import bak.pcj.map.ObjectKeyIntOpenHashMap;
import bak.pcj.map.ObjectKeyIntMapIterator;
import bak.pcj.IntCollection;
import java.util.Set;
public class ObjectKeyIntMapExample {
public static void main(String[] args) {
ObjectKeyIntMap s = new ObjectKeyIntOpenHashMap();
// Adding mappings
s.put("a",1); // s={("a",1)}
s.put("b",2); // s={("a",1),("b",2)}
s.put("c",3); // s={("a",1),("b",2),("c",3)}
// Mapping keys to values
int v;
v = s.get("a"); // v=1, s={("a",1),("b",2),("c",3)}
v = s.get("c"); // v=3, s={("a",1),("b",2),("c",3)}
v = s.get("d"); // v=null, s={("a",1),("b",2),("c",3)}
// Testing for a key mapping
boolean c;
c = s.containsKey("a"); // c=true, s={("a",1),("b",2),("c",3)}
c = s.containsKey("d"); // c=false, s={("a",1),("b",2),("c",3)}
// Testing for a value
c = s.containsValue(1); // c=true, s={("a",1),("b",2),("c",3)}
c = s.containsValue(4); // c=false, s={("a",1),("b",2),("c",3)}
// Adding elements from maps
ObjectKeyIntMap t = new ObjectKeyIntOpenHashMap();
t.put("b", 100);
t.put("c", 3);
t.put("d", 4); // t={("b",100),("c",3),("d",3)}, s={("a",1),("b",2),("c",3)}
s.putAll(t); // s={("a",1),("b",100),("c",3),("d",4)}
// Replacing mappings
s.put("b", 2); // s={("a",1),("b",2),("c",3),("d",4)}
// Iterating over entries
ObjectKeyIntMapIterator i = s.entries();
while (i.hasNext()) {
i.next();
String key = (String)i.getKey();
int value = i.getValue();
// Print or do something else with entry here
}
// Retrieving keys (view of map)
Set keys = s.keySet(); // keys={"a","b","c","d"}
keys.remove("d"); // s={("a",1),("b",2),("c",3)}
// Retrieving values (view of map)
IntCollection values = s.values(); // values={1,2,3}
// Iterate over or do something else with values here
// Finding size of map
int size = s.size(); // size=3, s={("a",1),("b",2),("c",3)}
// Comparing maps
t = new ObjectKeyIntOpenHashMap(s);
c = s.equals(t); // c=true, s={("a",1),("b",2),("c",3)}, t={("a",1),("b",2),("c",3)}
t.remove("a"); // s={("a",1),("b",2),("c",3)}, t={("b",2),("c",3)}
c = s.equals(t); // c=false
// Clearing
s.clear(); // s={}
c = s.isEmpty(); // c=true
}
}
Other examples:
Type | Interface |
---|---|
boolean |
bak.pcj.map.ObjectKeyBooleanMap |
char |
bak.pcj.map.ObjectKeyCharMap |
byte |
bak.pcj.map.ObjectKeyByteMap |
short |
bak.pcj.map.ObjectKeyShortMap |
int |
bak.pcj.map.ObjectKeyIntMap |
long |
bak.pcj.map.ObjectKeyLongMap |
float |
bak.pcj.map.ObjectKeyFloatMap |
double |
bak.pcj.map.ObjectKeyDoubleMap |
ObjectKeyTOpenHashMap
classes
ObjectKeyTOpenHashMap
provide a hash table based implementation of
the ObjectKeyTMap
interface with
open addressing.
See ObjectKeyTMap
for examples.
ObjectKeyTChainedHashMap
classes
ObjectKeyTChainedHashMap
provide a hash table based implementation of
the ObjectKeyTMap
interface with
chained hashing.
See ObjectKeyTMap
for examples.
The adapter classes of PCJ make it possible to interchange PCJ and JCF collections.
PCJ collections contain
only primitive type elements whereas JCF collections contain
only object type elements, so type conversion is necessary
to make a PCJ collection act as a JCF collection and vice
versa. There are restrictions to the types that can be
converted. For example, it is not possible to adapt a JCF
collection of String
values to a PCJ
collection, and a PCJ collection of char
values
cannot be converted to a JCF collection of
Locale
values. The only conversions allowed are
those of wrapping and unwrapping primitive values. The table
below shows how primitive types map to object types and the
other way around.
Primitive type | Object type |
---|---|
boolean | Boolean |
char | Character |
byte | Byte |
short | Short |
int | Integer |
long | Long |
float | Float |
double | Double |
For example, a PCJ collection of int
values
can be adapted to a JCF collection of Integer
values. And a JCF collection of Character
values can be adapted to a PCJ collection of
char
values.
As described above there
are adapter classes for all main interfaces (not stacks and
deques) both ways. An adapter class implements the interface
of the collection to which it adapts another collection, so
an adapter from an IntList
to a JCF List
implements the List
interface of JCF.
The example below shows how the
IntListToListAdapter
can be used directly as a
List
.
...
// Create list of 10 random int values
Random rnd = new Random();
IntList ilist = new IntList();
for (int i = 0; i < 10; i++)
ilist.add(rnd.nextInt());
// Adapt it to list of Integer values
List list = new IntListToListAdapter(ilist);
// Sort using JCF sorting
Collections.sort(list);
...
Adapter classes are wrappers around their counterpart -
they are not copies. So changes to a collection through an
adaper are reflected by the underlying collection and vice
versa. If we iterated over both collections,
list
and ilist
in the example
above, printing the elements, two equal number sequences
would be printed. The list
object is only
a view of the ilist
object.
All adapter classes can be found in the package
bak.pcj.adapter
.
An alternative is to use the
bak.pcj.Adapter
class which has factory methods for creating adapters. Its method naming scheme is somewhat
simpler than that of adapter classes. For example, and adaption from
an IntList
to a List
can be created with
the call Adapter.asObjects(ilist)
.
Since a JCF collection is not statically typed (at least not below the
java.lang.Object
type) it cannot be guaranteed that it only contains
elements of a specific type.
By default, the adapters from JCF to PCJ does not directly perform any
checks of the backing JCF collection. There is no simple way of doing it
since it is always possible to write a program that creates an adapter to,
say, int values on an empty JCF collection. After creation of the adapter
String
values or other values of wrong class can be added to
the JCF collection without the adapter knowing it.
Only when an element of the JCF collection is unwrapped by the adapter
is the element validated. Errors resulting from
illegal values in the underlying JCF collection can therefore be hard to find.
The example below illustrates the point:
List list = new ArrayList();
IntList ilist = new ListToIntListAdapter(list);
// Add one legal and two illegal values to list
list.add(new Integer(1));
list.add(null);
list.add("42");
// Set x = 1
int x = ilist.get(0);
// Will fail with a NullPointerException
int y = ilist.get(1);
...
// Will fail with a ClassCastException
int z = ilist.get(2);
As can be seen, the code fails not when an illegal value is added
to the underlying JCF collection (that would require an event listener mechanism
for JCF collections to intercept). Neither does it fail when the first
value of the illegal collection is retrieved because that value is legal.
It it first when the null
value at position 1
or the String
value at position 2
is retrieved
that the code fails with a runtime exception.
To perform runtime checks of the underlying JCF collections, a number of methods are provided. First, JCF collections can be validated when the adapter is created. All adapters from JCF to PCJ collections has a constructor that accepts a boolean value indicating whether validation should be performed. This is shown in the example below:
Set set;
...
// Create adapter, validate set
CharSet cset = new SetToCharSetAdapter(set, true);
The validating constructors throw a runtime exception, if the JCF collection cannot be adapted at the moment of construction, but they do not ensure that no illegal values are added to the JCF collection later in the program flow. Apart from the constructors, all adapters from JCF to PCJ collections provide two methods for validation of the underlying JCF collection:
// Indicates whether underlying collection is valid
boolean validate();
// Throws exception if underlying collection is invalid
void evalidate() throws IllegalStateException;
Invocations of these methods can be inserted in the code whenever the adapted JCF may have been changed. The first method is designed for use with the assertion facility; the second for the lack of assertions. The example below shows typical uses of validation methods:
List list = ExternalAPI.createData();
IntList ilist = new ListToIntListAdapter(list, true);
// ilist is safe here
...
ExternalAPI.modifyData(list); // <--- untrusted, may contain errors
// Validate after external modification
ilist.evalidate();
// Alternatively use the assertion facility
assert ilist.validate();
// ilist is safe here
...
The validation methods are not intended for
normal operation because a failed validation means that the
program has errors. If you want to check whether a JCF collection
can be adapted without throwing exceptions, the Adapter
class has methods for that. They all follow the pattern isXAdaptable
.
Instead of adapting a JCF collection to a PCJ collection, it is sometimes a better alternative to create a PCJ copy of the JCF collection. The reason for this is that all operations unwrap values each time they are called. If a copy is created, the unwrapping is performed once and for all. Copying can be performed by creating a new PCJ collection, creating an adaption of the JCF collection to a PCJ collection, and finally adding the adapted collection's elements to the new PCJ collection:
List list;
...
// Create int list to hold copy
IntList ilist = new IntList();
// Create an adapter to int values
IntList adapt = Adapter.asInts(list);
// Add adapted list's elements to create a copy
ilist.addAll(adapt);
The above example can be trimmed by taking advantage of the general one-argument constructor of collections, which is the preferred way of copying collections:
List list;
...
IntList ilist = new IntList(Adapter.asInts(list));
PCJ comes with a benchmarking API (see
bak.pcj.benchmark
).
There are two main reasons why the API
is distributed along with the core library: 1) Benchmarks are machine
dependent. Running a benchmark on a different machine will yield different
results from those that I have provided. 2) They are very handy when
developing new classes implementing one of the main interfaces of PCJ.
Each benchmark exercises a collection class with some data set. There are
benchmarks for all int
variants of the PCJ classes and also
for their JCF counterparts with Integer
objects. There are
also benchmarks for JCF classes adapted to primitive collection classes
using the
bak.pcj.adapter
classes.
Two data sets have been used for producing the benchmark results. Both
are compact in the sense that they contain ranges of consecutive numbers.
One of the data sets is ordered and the other is shuffled (which makes a
huge difference for the TRangeSet
classes).
The data sets are available in the
bak.pcj.benchmark
package.
The benchmark results are available in the generated reports:
The benchmark names refer to the classes that produced the results. They can all
be found in the
bak.pcj.benchmark
package. Each benchmark exercise a specific class which is named right below the
name of the benchmark. Regarding the adapter classes, they adapt instances of
java.util.ArrayList
and
java.util.HashSet
.
Each benchmark contains a number of tasks which are executed for the data sets available.
For each task and data set, a timing result is given in milliseconds.
When benchmarking a JCF class, the time needed to wrap and unwrap objects is not counted in. I believe that the adapted JCF-class provides a more realistic image since it performs the wrapping and unwrapping of primitive values that is necessary when working with JCF. In the tables below, timings are given as the PCJ time as a fraction of the JCF time. The timings for comparison with adapted JCF classes are given in paranthesis.
It is interesting to compare the results of a JCF class with the results of an adaptor of that class to a primitive collection. The difference tells us how much time is spent wrapping and unwrapping collection elements.
The table below shows the benchmark task times for
PCJ set classes as percentages of the corresponding benchmark task times for
HashSet
.
The percentages in parenthesis show the task times of PCJ classes as
percentages of the corresponding benchmark task times for
a PCJ adapted HashSet
.
Task | Data | IntBitSet |
IntChainedHashSet |
IntOpenHashSet |
IntRangeSet |
||||
---|---|---|---|---|---|---|---|---|---|
AddNonExisting | Ordered/Compact/400000 | 344% | (761%) | 94% | (208%) | 14% | (31%) | 13% | (29%) |
AddNonExisting | Shuffled/Compact/400000 | 1% | (3%) | 62% | (122%) | 18% | (35%) | n/a | n/a |
AddExisting | Ordered/Compact/400000 | 24% | (12%) | 49% | (24%) | 49% | (24%) | 37% | (18%) |
AddExisting | Shuffled/Compact/400000 | 5% | (3%) | 85% | (49%) | 61% | (35%) | n/a | n/a |
Iterator | Ordered/Compact/400000 | 37% | (22%) | 62% | (38%) | 37% | (22%) | 62% | (38%) |
Iterator | Shuffled/Compact/400000 | 27% | (16%) | 63% | (38%) | 27% | (16%) | n/a | n/a |
RemoveExisting | Ordered/Compact/400000 | 25% | (3%) | 62% | (7%) | 50% | (6%) | 126% | (15%) |
RemoveExisting | Shuffled/Compact/400000 | 5% | (1%) | 91% | (30%) | 59% | (19%) | n/a | n/a |
ContainsExisting | Ordered/Compact/400000 | 16% | (10%) | 83% | (50%) | 50% | (30%) | 66% | (40%) |
ContainsExisting | Shuffled/Compact/400000 | 2% | (2%) | 94% | (64%) | 61% | (42%) | n/a | n/a |
RemoveNonExisting | Ordered/Compact/400000 | 20% | (16%) | 60% | (50%) | 60% | (50%) | 100% | (83%) |
RemoveNonExisting | Shuffled/Compact/400000 | 5% | (4%) | 88% | (76%) | 71% | (61%) | n/a | n/a |
ContainsNonExisting | Ordered/Compact/400000 | 20% | (16%) | 40% | (33%) | 60% | (50%) | 100% | (83%) |
ContainsNonExisting | Shuffled/Compact/400000 | 0% | (0%) | 88% | (76%) | 66% | (57%) | n/a | n/a |
TRangeSet
performs extremely bad when
inserting or removing elements in random order. What happens
is that each range contain only one value, and in that
case a hashing approach is much better. The results are
excluded from the table since the time to produce them has
become intolerable.
When inserting elements in ascending order (ordered dataset) in a TBitSet
it will expand its capacity with only 64 elements at a time resulting in bad
performance.
The table below shows the benchmark task times for
PCJ map classes as percentages of the corresponding benchmark task times for
HashMap
.
The percentages in parenthesis show the task times of PCJ classes as
percentages of the corresponding benchmark task times for
a PCJ adapted HashMap
.
Task | Data | IntKeyIntChainedHashMap |
IntKeyIntOpenHashMap |
IntKeyChainedHashMap |
IntKeyOpenHashMap |
||||
---|---|---|---|---|---|---|---|---|---|
PutNonExisting | Ordered/Compact/400000 | 102% | (60%) | 13% | (8%) | 150% | (124%) | 91% | (76%) |
PutNonExisting | Shuffled/Compact/400000 | 84% | (65%) | 24% | (18%) | 149% | (135%) | 105% | (96%) |
PutExisting | Ordered/Compact/400000 | 87% | (5%) | 50% | (3%) | 87% | (41%) | 62% | (29%) |
PutExisting | Shuffled/Compact/400000 | 89% | (15%) | 102% | (17%) | 89% | (53%) | 111% | (66%) |
GetExisting | Ordered/Compact/400000 | 85% | (40%) | 44% | (20%) | 71% | (50%) | 42% | (30%) |
GetExisting | Shuffled/Compact/400000 | 77% | (44%) | 97% | (56%) | 88% | (61%) | 94% | (66%) |
Iterator | Ordered/Compact/400000 | 88% | (44%) | 44% | (22%) | 77% | (43%) | 33% | (18%) |
Iterator | Shuffled/Compact/400000 | 92% | (47%) | 23% | (11%) | 84% | (54%) | 30% | (19%) |
RemoveExisting | Ordered/Compact/400000 | 87% | (38%) | 50% | (22%) | 87% | (58%) | 50% | (33%) |
RemoveExisting | Shuffled/Compact/400000 | 92% | (53%) | 97% | (56%) | 89% | (60%) | 105% | (71%) |
RemoveNonExisting | Ordered/Compact/400000 | 75% | (30%) | 75% | (30%) | 75% | (50%) | 75% | (50%) |
RemoveNonExisting | Shuffled/Compact/400000 | 88% | (59%) | 72% | (47%) | 94% | (80%) | 72% | (61%) |
GetNonExisting | Ordered/Compact/400000 | 75% | (33%) | 75% | (33%) | 77% | (62%) | 75% | (60%) |
GetNonExisting | Shuffled/Compact/400000 | 84% | (59%) | 68% | (47%) | 84% | (80%) | 68% | (65%) |
The table below shows the benchmark task times for
PCJ list classes as percentages of the corresponding benchmark task times for
ArrayList
.
The percentages in parenthesis show the task times of PCJ classes as
percentages of the corresponding benchmark task times for
a PCJ adapted ArrayList
.
Task | Data | IntArrayList |
IntArrayDeque |
||
---|---|---|---|---|---|
RemoveNonExisting | Ordered/Compact/400000 | 15% | (15%) | 16% | (16%) |
RemoveNonExisting | Shuffled/Compact/400000 | 15% | (15%) | 16% | (16%) |
RemoveExisting | Ordered/Compact/400000 | 111% | (106%) | 0% | (0%) |
RemoveExisting | Shuffled/Compact/400000 | 111% | (106%) | 0% | (0%) |
AddExisting | Ordered/Compact/400000 | 38% | (5%) | 38% | (5%) |
AddExisting | Shuffled/Compact/400000 | 38% | (6%) | 45% | (7%) |
Iterator | Ordered/Compact/400000 | 66% | (50%) | 133% | (100%) |
Iterator | Shuffled/Compact/400000 | 150% | (75%) | 200% | (100%) |
RemoveEnd | Ordered/Compact/400000 | 0% | (0%) | 0% | (0%) |
RemoveEnd | Shuffled/Compact/400000 | 0% | (0%) | 0% | (0%) |
ContainsNonExisting | Ordered/Compact/400000 | 21% | (15%) | 23% | (16%) |
ContainsNonExisting | Shuffled/Compact/400000 | 21% | (15%) | 23% | (16%) |
AddNonExisting | Ordered/Compact/400000 | 36% | (21%) | 35% | (20%) |
AddNonExisting | Shuffled/Compact/400000 | 35% | (20%) | 43% | (24%) |
AddMiddle | Ordered/Compact/400000 | 90% | (101%) | 90% | (101%) |
AddMiddle | Shuffled/Compact/400000 | 90% | (100%) | 90% | (101%) |
RemoveBeginning | Ordered/Compact/400000 | 111% | (106%) | 0% | (0%) |
RemoveBeginning | Shuffled/Compact/400000 | 111% | (105%) | 0% | (0%) |
RemoveMiddle | Ordered/Compact/400000 | 108% | (104%) | 108% | (104%) |
RemoveMiddle | Shuffled/Compact/400000 | 109% | (104%) | 109% | (104%) |
AddBeginning | Ordered/Compact/400000 | 97% | (97%) | 0% | (0%) |
AddBeginning | Shuffled/Compact/400000 | 97% | (97%) | 0% | (0%) |
ContainsExisting | Ordered/Compact/400000 | 16% | (5%) | 16% | (5%) |
ContainsExisting | Shuffled/Compact/400000 | 20% | (6%) | 20% | (6%) |
It may be useful to run the benchmarks on your own computer or with
your own benchmark classes. To do so, you can use the classes
bak.pcj.benchmark.BenchmarkRunner
and
bak.pcj.benchmark.ResultCollector
.
The BenchmarkRunner
can run a benchmark on a specified data set
with a specified size, and write the results to a file. When you have run a number
of benchmarks, the results can be collected in a report using the
ResultCollector
.
The syntax for running the BenchmarkRunner
is:
bak.pcj.benchmark.BenchmarkRunner <data set class> <data set size> <benchmark class> <output file>
where
bak.pcj.benchmark.DataSet
.
DataSet
classes will automatically produce data of the specified size.
bak.pcj.benchmark.Benchmark
.
The standard benchmarks can be found in the
bak.pcj.benchmark
package which also contains some abstract bases for implementing new benchmarks.
To run the standard benchmark for IntBitSet
s with
a data set of 200000 consecutive shuffled values and write the results
to a file called output.txt write:
java -cp pcj.jar;pcj-benchmark.jar bak.pcj.benchmark.BenchmarkRunner \ bak.pcj.benchmark.DataSetShuffledCompact 200000 \ bak.pcj.benchmark.IntBitSetBenchmark output.txt
When a number of benchmarks has been run, the results can be collected
and formatted as HTML using the
bak.pcj.benchmark.ResultCollector
.
The syntax for using the ResultCollector
is:
bak.pcj.benchmark.ResultCollector <output file> <title> <result1> <result2> ...
where
BenchmarkRunner
).
You need to have a setup where the JVM can run relatively undisturbed by other processes and can take plenty of memory without the operating system swapping memory out to the hard-disk. Some of the first runs that I made took a lot of memory and the Windows began swapping memory out in the middle of the benchmark, invalidating the results.
Most of the benchmarks use a lot of memory. Since you cannot rely 100% on garbage collection, you should only run one or two benchmarks during the same invocation of the JVM (that is one of the reasons why it is possible to collect benchmark results from separate reports into one report). Alternatively you can cut down the size of the data sets, but that will make the time measurements uncertain since the benchmarks wont be allowed to run for as long a time as they would with larger data sets.
If you have Windows 2000/NT you can use the task manager to verify your benchmark results. Each benchmark run should look like two trapezoids on the CPU-usage graph. They should be completely flat on the top (which should lie at 100%). If they don't, they have been interrupted by a process waiting for some resource to become available, e.g. memory that is swapped out.
Report a bug or request a feature.
Further information on the development and latest release of PCJ can be found at the project homepage.
Primitive Collections for Java is released under the GNU Lesser General Public License.
Copyright © 2002, 2003 Søren Bak. All Rights Reserved.
Hosted by SourceForge.net
Java is a trademark or registered trademark of Sun Microsystems, Inc. in the US and other countries.
Document version: 1.4, August 24, 2003