Wednesday, April 2, 2008

Set

Set:
A Set represents a mathematical set.
It is a Collection that, unlike List, does not allow duplicates.
There must not be two elements of a Set, say e1 and e2, such that e1.equals(e2).
The add method of Set returns false if you try to add a duplicate element.

import java.util.HashSet;
import java.util.Set;
public class MainClass {
public static void main(String[] a) {
Set set = new HashSet();
set.add("Hello");
if (set.add("Hello")) {
System.out.println("addition successful");
} else {
System.out.println("addition failed");
}
}
}

This tutorial is about Sets (java.util.Set interface). I assume that the audience have mathematical background of set theory.

Sets cannot contain duplicate elements and you cannot add elements at a certain position unlike List. The classes implementing the Set interface are:AbstractSet, HashSet, LinkedHashSet, TreeSet

Set has 3 general purpose implementations:

- HashSet
- TreeSet
- LinkedHashSet

HashSet does not offer any ordering of elements. So HashSet is not the right choice if you value oriented operations are required. For that, TreeSet is proffered. But it is true to say that HashSet is used more often than TreeSet because HashSet is more faster and the reason is obvious i.e no ordering of elements. LinkedHashSet is something in between HashSet and TreeSet. It is as fast as HashSet and is implements using HashSet and LinkedList. The benefit of LinkedHashSet is that it provides ordering feature but the cost of ordering is not even close to what provided by TreeSet.

HashSet

HashSet implement Cloneable, Collection, Serializable and Set interface.
Since HashSet implements Cloneable interface, it means that HashSet can use clone() method to make a copy (field to field) of its object. HashSet is also serializable, so it can be saved on disk and then retrieved later for use.
JobStateReasons and LinkedHashSet are direct subclasses of HashSet.

HashSet has 4 constructors:

HashSet()
HashSet(Collection c)
HashSet(int initialCapacity)
HashSet(int initialCapacity, float loadFactor)

If you use default HastSet constructor, it will create an empty HashSet of default initial capacity (16) and load factor (0.75).

Add() method of the HashSet add the object into the storage if it is not already present. Duplicates are not allowed as it’s a set.

Lets review the code below. We will just add unique values to a HashSet and will display the contents of HashSet.
Code:
String [] strArray = new String[10];

for loop strArray.length with i
strArray[i] = "Element" + i;

HashSet s = new HashSet();

for loop strArray.length with i{
if(!s.add(strArray[i]))
System.out.println("Duplicate found : " + strArray[i]);
}

System.out.println(s.size() + " distinct words detected : " + s );
}
Output:
Code:
10 distinct words detected : [Element6, Element7, Element1, Element9,
Element3, Element8, Element2, Element0, Element5, Element4]
As you know, HashSet does not allow duplicates, so lets try this in the following example.
Code:
String [] strArray = new String[10];
for loop strArray.length with i
strArray[i] = "Element" + i;

strArray[1] = "Element2";
strArray[7] = "Element3";
strArray[8] = "Element4";

HashSet s = new HashSet();

for loop strArray.length with i{
if(!s.add(strArray[i]))
System.out.println("Duplicate found : " + strArray[i]);
}

System.out.println(s.size() + " distinct words detected : " + s );
}
Output:
Code:
Duplicate found : Element2
Duplicate found : Element3
Duplicate found : Element4
7 distinct words detected : [Element6, Element9, Element3, Element2, Element0, Element5, Element4]
We tried to add duplicates into the HastSet which is not allowed. Thing to note is, that no exception will be raised in case of duplicates. Also note the order of entries in HashSet. The order of elements is not maintained.

Creating a HashSet through a Collection

HashSet(Collection c) constructor is used to store a collection as a HastSet. It is very useful in scenarios where you have some collection (it can be a List (e.g. Vector/ArrayList) and you want to store it into a HashSet. Lets review the following example where a HashSet is created using a Vector.
Code:
Vector vec = new Vector();
vec.add("String1");
vec.add("String2");
vec.add("String3");
HashSet hs = new HashSet(vec);
Writing/Retriving HashSet to disk

Since HashSet implements Serializable interface, its objects can be written and retrieved from the disk. In other words, we can save the state of a HashSet and can retrieve it when needed.
Code:
HashSet hs = new HashSet();
hs.add("String1");
hs.add("String2");
hs.add("String3");

FileOutputStream f_out = new FileOutputStream("C:\\hashset.data");
ObjectOutputStream obj_out = new ObjectOutputStream (f_out);

obj_out.writeObject (hs);
System.out.println("HashSet written on the disk.");

FileInputStream f_in = new FileInputStream("C:\\hashset.data");
ObjectInputStream obj_in = new ObjectInputStream (f_in);
HashSet hashset =
new HashSet((HashSet)obj_in.readObject());

System.out.println("HashSet fetched from the disk.");

Iterator it = hashset.iterator();

while(it.hasNext())
System.out.println(it.next().toString());
Union and Intersection Example

Following is an interesting example of union and intersection. I assume that you are familiar with basics of set theory and know what union and intersection are.
Code:
public static void main(String[] args) {
Set s1 = new HashSet();
s1.add("Australia");
s1.add("Sweden");
s1.add("Germany");

Set s2 = new HashSet();
s2.add("Sweden");
s2.add("France");

Set union = new TreeSet(s1);
union.addAll(s2);

print("union", union);

Set intersect = new TreeSet(s1);
intersect.retainAll(s2);

print("intersection", intersect);
}

protected static void print(String label, Collection c) {

System.out.println("--------------" + label + "--------------");

Iterator it = c.iterator();
while (it.hasNext()) {
System.out.println(it.next());
}
}
Output:
Code:
--------------union--------------
Australia
France
Germany
Sweden
--------------intersection--------------
Sweden
TreeSet

Treeset implements Cloneable, Collection, Serializable, Set and SortedSet interfaces. This class maintains order of the elements in it according to the natural order of the elements which was missing in the HashSet.

TreeSet is not synchronized so if a TreeSet is concurrently accessed by threads then thread modifying the contents must be synchronized externally.

TreeSet provides following constructors:

TreeSet()
TreeSet(Collection c)
TreeSet(Comparator c)
TreeSet(SortedSet s)
Lets dig deep into TreeSet using example. In the following example, we created an empty TreeSet using default constructor. Then we added some values into it.
Code:
TreeSet ts = new TreeSet();
ts.add("String1");
ts.add("String3");
ts.add("String2");

Iterator it = ts.iterator();

while(it.hasNext())
System.out.println(it.next().toString());
Output:
Code:
String1
String2
String3
We used iterator to print the contents of the TreeSet. Thing to note is the order of storage. Order of storage is ascending one.

We cannot insert element at a particular index and also cannot fetch element from a particular index. This is done in List. Also duplicates are not allowed in TreeSet. This is a property of Sets. If you try to add duplicate value, no exception will be thrown. Lets do this with an example:
Code:
TreeSet ts = new TreeSet();
ts.add("Australia");
ts.add("Germany");
ts.add("Zambia");
ts.add("Australia");
ts.add("Zambia");

Iterator it = ts.iterator();

while(it.hasNext())
System.out.println(it.next().toString());
Output:
Code:
Australia
Germany
Zambia
Output suggests two things. The objects are stored in sorted order and duplicates are not allowed.

Contents of TreeList can be cleared using clear() method. Following example creates a TreeSet from a Vector and then clears the contents.
Code:
Vector vec = new Vector();
vec.add("String3");
vec.add("String1");
vec.add("String3");

TreeSet ts = new TreeSet(vec);
Iterator it = ts.iterator();

while(it.hasNext())
System.out.println("TreeSet: " + it.next().toString());
ts.clear();
while(it.hasNext())
System.out.println("TreeSet_Cleared: " + it.next().toString());
Output:
Code:
TreeSet: String1
TreeSet: String3
Creating a TreeSet through a Collection

A TreeSet can be created using any collection. TreeSet(Collection c) is the constructor to use in such condition. Any collection for example: Vector, ArrayList, HashSet or any can be used to create a TreeSet. Following is an interesting example:
Code:
Vector vec = new Vector();
vec.add("String3");
vec.add("String1");
vec.add("String3");

Iterator it = vec.iterator();

while(it.hasNext())
System.out.println("Vector: " + it.next().toString());

TreeSet ts = new TreeSet(vec);
it = ts.iterator();

while(it.hasNext())
System.out.println("TreeSet: " + it.next().toString());
Output:
Code:
Vector: String3
Vector: String1
Vector: String3
TreeSet: String1
TreeSet: String3
Vector is List that can contain duplicates and its not ordered. In the example above, we created a TreeSet from a Vector that contained an unordered list of objects along with duplicates. Some of you might thing that its not possible to create a TreeSet from it but it is possible. The output of the above example shows that a TreeList is created from the Vector with all the duplicated removed. Furthermore, contents of newly created TreeList are ordered.

Can we create a TreeList from an exiting HashSet? Since HashSet is also a collection, so answer is yes. The following example first creates a HashSet, populates it and then creates a TreeSet using already created HashSet.
Code:
HashSet hs = new HashSet();
hs.add("String2");
hs.add("String3");
hs.add("String1");
hs.add("String2");

TreeSet ts = new TreeSet(hs);

Iterator it = ts.iterator();

while(it.hasNext())
System.out.println("TreeSet: " + it.next().toString());
Output:
Code:
TreeSet: String1
TreeSet: String2
TreeSet: String3
TreeSet with generics

Java 5 introduced generics which adds more power to Java. TreeSet stores objects in it, but that object can be of any type. It can be a String, Integer, Float or even a custom object. With generics, you can specify the type of objects that are allowed to be stored in a TreeSet. In the following example, I tried to store an integer into a Vector that is bounded by String.
Code:
TreeSet ts = new TreeSet();
ts.add(1);
ts.add(2);
ts.add(3);
ts.add("String2");
Output:
Code:
The method add(Integer) in the type TreeSet is not applicable for the arguments (String)
TreeSet Example

You now will have some understanding of TreeSet. Review the example below. It has some interesting lines of code.
Code:
SortedSet sortedSet = new TreeSet(Arrays
.asList("one two three four five six seven eight".split(" ")));
System.out.println(sortedSet);
Object low = sortedSet.first();
Object high = sortedSet.last();
System.out.println(low);
System.out.println(high);
Iterator it = sortedSet.iterator();

System.out.println("low: " + low);
System.out.println("high: " + high);
System.out.println(sortedSet.subSet(low, high));
System.out.println(sortedSet.headSet(high));
System.out.println(sortedSet.headSet(low));
System.out.println(sortedSet.tailSet(low));
System.out.println(sortedSet.tailSet(high));
Output:
Code:
[eight, five, four, one, seven, six, three, two]
eight
two
low: eight
high: two
[eight, five, four, one, seven, six, three]
[eight, five, four, one, seven, six, three]
[]
[eight, five, four, one, seven, six, three, two]
[two]
As we already know, ordering is important in TreeList. This is shown in the example below. We also see methods like first, last, headset, tailSet which produced interesting results.

Method headSet() returns a view of the portion of this set whose elements are strictly less than toElement.
Method tailSet() returns a view of the portion of this set whose elements are greater than or equal to fromElement.

Differences between HashSet and TreeSet

Many developer are confused about sets and ask when to use HashSet and when to use TreeSet. Keep following points in mind for deciding the right set.
- HashSet is much faster than TreeSet.
- HashSet offers no ordering guarantees.
- TreeSet is preffered when performing operations in the SortedSet.
- HashSet can be tuned by specifying the right initial capacity using the constructor HashSet(int initialCapacity).
- HashSet can also be tuned used load factor parameter. If you choose right value for this, it can boost the performance a bit but there there won't be a big difference.
- TreeSet has no tuning parameters.

LinkedHashSet

LinkedHashSet handles collisions by using hash-bucket approach to avoid linear probing and rehashing. It uses doubly-linked list to maintain the insertion order. LinkedHashSet can accept the null entry and it is not synchronized.

LinkedHashSet implements Cloneable, Collection, Serializable and Set interface. It’s a set to no duplicates are allowed. LinkedHashSet has following 4 constructors:

LinkedHashSet()
LinkedHashSet(Collection c)
LinkedHashSet(int initialCapacity)
LinkedHashSet(int initialCapacity, float loadFactor

Thing will become obvious when you start using LinkedHashSet. Lets take an example to make things easier to understand.
Code:
LinkedHashSet lhs = new LinkedHashSet();

lhs.add("String1");
lhs.add("String9");
lhs.add("String7");
lhs.add("String6");
lhs.add("String7");

Iterator it = lhs.iterator();

while(it.hasNext())
System.out.println("LinkedHashSet: " + it.next().toString());
Output:
Code:
LinkedHashSet: String1
LinkedHashSet: String9
LinkedHashSet: String7
LinkedHashSet: String6
We declared a LinkedHashSet and populated it. Then we used iterator to print the contents of LinkedHashSet. The output suggests that the order of insertion is maintained and duplicates are not allowed.

Set Example

Now we know what HashSet, TreeSet and LinkedHashSet are and when to use these. Lets review the following example that will use all of the 3 sets and will clear a lot of confusions:
Code:
package set;

import java.util.Arrays;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.TreeSet;

public class HastSetTest {

static void fillSet(Set set) {
set.addAll(Arrays.asList("Zambia Australia Germany France".split(" ")));
}

public static void testSet(Set set) {
System.out.println(set.getClass().getName().replaceAll("\\w+\\.", ""));
fillSet(set);
fillSet(set);
fillSet(set);
System.out.println(set);
set.addAll(set);
set.add("Germany");
set.add("Germany");

System.out.println(set);
System.out.println("s.contains(\"Zambia\"): " + set.contains("Zambia"));
}

public static void main(String[] args) {
testSet(new HashSet());
testSet(new TreeSet());
testSet(new LinkedHashSet());
}
}
Output:
Code:
HashSet
[Germany, France, Zambia, Australia]
[Germany, France, Zambia, Australia]
s.contains("Zambia"): true
TreeSet
[Australia, France, Germany, Zambia]
[Australia, France, Germany, Zambia]
s.contains("Zambia"): true
LinkedHashSet
[Zambia, Australia, Germany, France]
[Zambia, Australia, Germany, France]
s.contains("Zambia"): true
The output suggests that no duplicate is added in any set. The order of elements is all of the 3 sets is different. LinkedList maintained the order of insertion. TreeSet maintained alphabetic order whereas HashSet does not kept any order. This is what we talked about above and this example confirms what we discussed.

0 comments:

Google