Routines for Set manipulation

Post

Posted
Rating:
#1 (In Topic #932)
Avatar
Trainee
Median Joe is in the usergroup ‘Trainee’
The set data structure is quite simple but can be very useful, and in fact the whole of mathematics can be founded on sets.  A set is just a collection of values or items which are unique and unordered.

e.g. S1 = {1,2,3}

This is identical to the set {3,1,1,2,3,2} because repeated elements don't count, and neither does the order in which they occur.

The first language I learned (back in the nineties) was Turbo Pascal, which had sets built in. Gambas doesn't have native support for them, so I wrote a few simple routines to handle the most commonly used set operations and relations.

There is more than one way of implementing the routines, but I've chosen the "bit vector" approach, which is quite efficient and easy to understand. The basic idea is to use an array of boolean the elements of which (true/false) indicate the presence or absence of an item in the set (Note : I have restricted set elements to integers in these routines).

For example, consider the set operation of Intersection, which takes sets S1, S2, and returns a third set the elements of which are common to both:

S1 = {1,2,3,4,5}
S2 = {4,5,6,7,8}

So S1 Intersection S2 = {4,5}

The simplest operation is adding an element to the set. To do this, declare an array of boolean large enough to hold the highest value in your set(s) + 1 (+ 1, because array indexing starts at zero). All elements in the arrays should initially be set to False, which signifies an empty set, and to add the elements of S1 above to this initially empty set use the following loop :

Code

Dim S1 As New Boolean[9]
Dim i As Integer
For i = 1 to 5
  S1[i] = True
Next

Use a similar loop to populate the set S2. Now to find the elements common to both sets, declare another boolean array of the same size, call it "result" say, and do the following :

Code

For i = 0 to 9
  If s1[i] and s2[i] Then
    result[i] = True
  Endif
Next   

See how it works? Here are all the routines, which are hopefully self-explanatory.

Code

'Set Routines

'clear/initialise set
SUB set_clear(s AS boolean[])
  s.Fill(FALSE)
END

'add element to a set
SUB set_incl(n AS integer, s AS boolean[])
  s[n] = TRUE  
END

'remove element from set
SUB set_excl(n AS integer, s AS boolean[])
  s[n] = FALSE
END

'Is an element in the set?
FUNCTION set_in(n AS integer, s AS boolean[]) AS boolean
  RETURN s[n]
END

'cardinality (return number of elements in set)
FUNCTION set_card(s AS boolean[]) AS integer
  DIM n AS integer,i AS integer
  n = 0
  FOR i = 0 TO s.Max
    IF s[i] = TRUE THEN
      INC n
    ENDIF
  NEXT
  RETURN n
END

'intersection
SUB set_inter(s1 AS boolean[],s2 AS boolean[],result AS boolean[])
  DIM i AS integer
  FOR i = 0 TO s1.Max
    IF s1[i] and s2[i] THEN
      result[i] = TRUE
    ENDIF
  NEXT   
END

'union
SUB set_union(s1 AS boolean[],s2 AS boolean[],result AS boolean[])
  DIM i AS integer
  FOR i = 0 TO s1.Max
    IF s1[i] or s2[i] THEN
      result[i] = TRUE
    ENDIF
  NEXT   
END

'difference (s1 - s2)
SUB set_diff(s1 AS boolean[],s2 AS boolean[],result AS boolean[])
  DIM i AS integer
  FOR i = 0 TO s1.Max
    IF s1[i] and not s2[i] THEN
      result[i] = TRUE
    ENDIF
  NEXT   
END

'symmetric difference (result is union(s1,s2) - inter(s1,s2))
SUB set_symdiff(s1 AS boolean[],s2 AS boolean[],result AS boolean[])
  DIM i AS integer
  FOR i = 0 TO s1.Max
    IF s1[i] xor s2[i] THEN
      result[i] = TRUE
    ENDIF
  NEXT   
END

' are the sets equal?
FUNCTION set_isequal(s1 AS boolean[],s2 AS boolean[]) AS boolean
  DIM i AS integer
  DIM equal AS boolean
  equal = TRUE
  FOR i = 0 TO s1.Max
    IF s1[i] <> s2[i] THEN
      equal = FALSE
      BREAK
    ENDIF
  NEXT
  RETURN equal
END  

'Is set s1 a subset of set s2?
FUNCTION set_issubset(s1 AS boolean[],s2 AS boolean[]) AS boolean
  DIM subset AS boolean
  DIM i AS integer
  subset = TRUE
  FOR i = 0 TO s1.Max
    IF NOT ((s1[i] AND s2[i]) = s1[i]) THEN
      subset = FALSE
      BREAK
    ENDIF
  NEXT
  RETURN subset
END

'show a set
SUB set_show(s AS boolean[])
  DIM i AS integer
  FOR i = 0 TO s.Max
    IF s[i] = TRUE THEN
      PRINT i;;
    ENDIF
  NEXT
  PRINT
END

Here are a couple of example scripts to test the routines. I saved the routines in a separate file and put the INCLUDE command at the top of each script.

Code

#!/usr/bin/env gbs3

INCLUDE "sets.gbs3"

CONST SIZE AS byte = 10

DIM A AS NEW boolean[SIZE]
DIM B AS NEW boolean[SIZE]
DIM C AS NEW boolean[SIZE]
DIM result AS NEW boolean[SIZE]
DIM i AS integer

'initialize
set_clear(A)
set_clear(B)
set_clear(C)

' A = \{1,2,3,4,5,6,7}
' B = \{4,5,6}
' C = \{6,7,8,9}
' fill sets
FOR i = 1 TO 7
  set_incl(i, A)
NEXT
FOR i = 4 TO 6
  set_incl(i, B)
NEXT
FOR i = 6 TO 9
  set_incl(i, C)
NEXT

'show sets
PRINT "A: ";
set_show(A)
PRINT "B: ";
set_show(B)
PRINT "C: ";
set_show(C)

'intersection
print "Intersection of A & B: ";
set_inter(A, B, result)
set_show(result)

'union
PRINT "Union of B & C: ";
set_union(B, C, result)
set_show(result)

'difference between B & C (B - C)
set_clear(result)
PRINT "Difference between C & A (C - A): ";
set_diff(C, A, result)
set_show(result)

'symmetric difference between A & C
set_clear(result)
PRINT "Symmetric difference of A & C: ";
set_symdiff(A, C, result)
set_show(result)

'subset; is C a subset of A?
PRINT "Is C a subset of A? ";
PRINT IF(set_issubset(C, A), "True", "False")
PRINT "Is B a subset of A? ";
PRINT IF(set_issubset(B, A), "True", "False")

' how many elements in A?
PRINT "Set A has ";
PRINT set_card(A);
PRINT " elements"

' remove elements 1..5 from A
FOR i = 1 TO 5
  set_excl(i, A)
NEXT  
PRINT "Removing elements 1..5 from A..."
PRINT "A: ";
set_show(A)

' add elements 8,9 to A
PRINT "Adding elements 8 & 9 to A..."
PRINT "A: ";
set_incl(8, A)
set_incl(9, A)
set_show(A)

'equal sets
PRINT "Does A = C?: ";
PRINT IF(set_isequal(A, C), "True", "False")
PRINT "Does A = B?: ";
PRINT IF(set_isequal(A, B), "True", "False")

This example is a bit more interesting.

Code

#!/usr/bin/env gbs3

include "sets.gbs3"

'Find the probability of seeing all 6 numbers in 6 rolls of a die.

RANDOMIZE

CONST TRIALS AS integer = 100000
DIM uniques AS NEW boolean[7]
DIM i,j,roll,n,success AS integer

success = 0
FOR j = 1 TO TRIALS
  FOR i = 1 TO 6
    roll = rand(1,6)
    set_incl(roll, uniques)
  NEXT
  IF set_card(uniques) = 6 THEN
    INC success  
  ENDIF
  ' reset for next trial
  set_clear(uniques)
NEXT
PRINT "Probability =  "; success/TRIALS  

I'm new to Gambas so there are probably ways the code can be improved. Any feedback welcome!

P.S. For more info on sets see here and here.
Online now: No Back to the top

Post

Posted
Rating:
#2
Avatar
Guru
cogier is in the usergroup ‘Guru’
cogier is in the usergroup ‘GambOS Contributor’
Well, I read about sets from your links and now my brain hurts! What was missing was a useful example. What do you do with these sets?
Online now: No Back to the top

Post

Posted
Rating:
#3
Avatar
Trainee
Median Joe is in the usergroup ‘Trainee’

cogier said

now my brain hurts!

That's how I feel about OOP!

What was missing was a useful example. What do you do with these sets?

Well it depends what you mean by "useful". I quite often write code similar to the above probability example, to find probabilities that are too complex to compute using maths. The database language SQL uses set operations heavily to filter data, so they can be used for that purpose when a full-blown database is overkill. Related to this application is using sets as flags. You might want to keep track of certain states in a game, for example. You would normally use boolean variables for this but what if you have many states to monitor which belong to well-defined categories? Sets can be used here because they are just boolean arrays. You can select and turn on/off particular states using set operations.

Use of sets can quite often shorten code; suppose you want to take some action depending on whether a number is in a certain range, eg,
 
IF(n >= 5 AND n <= 10) OR (n >= 25 AND n <= 30) THEN…

Using a set with the appropriate range you need only write IF set_in(n, ranges) THEN… which is simpler and more readable.
Online now: No Back to the top

Post

Posted
Rating:
#4
Avatar
Guru
cogier is in the usergroup ‘Guru’
cogier is in the usergroup ‘GambOS Contributor’
Use of sets can quite often shorten code; suppose you want to take some action depending on whether a number is in a certain range, eg,

IF(n >= 5 AND n <= 10) OR (n >= 25 AND n <= 30) THEN…

Using a set with the appropriate range you need only write IF set_in(n, ranges) THEN… which is simpler and more readable.

This would be my solution to your example.

Code (gambas)

  1.   Dim iSet As Integer[] = [6, 7, 8, 9, 26, 27, 28, 29]
  2.   Dim iToFind As Integer = 5
  3.  
  4.   If iSet.Exist(iToFind) Then Print iToFind Else Print "Error"
  5.  

There is always more than one way to look at these things. I look forward to more posts from you.  :D
Online now: No Back to the top
1 guest and 0 members have just viewed this.