Below we give pseudocode for the fGES algorithm assuming one-edge faithfulness. For the continuous case, we use the BIC score; for the discrete case we use the BDeu score. Parallelization procedures are described following the pseudocode.
The actual value of I(x, y, Z) is used as a heuristic for adjusting the order in which “Arrows” (see below) are considered in the forward and backward searches. Various scoring measures may be considered other than those considered above (e.g., Gibb’s sampling estimation of posteriors; Fisher Z scores, etc.); we have not explored which of these measures yields the highest accuracy, although the continuous BIC and discrete BDeu yield good precision and recall, as was shown in the text.
We make the following definitions and a few explanations in advance of the pseudocode. An Arrow is a 4-tuple <xày, NaYX, S, b> where xày is a directed edge; x—y is an undirected edge. If xày, then x is a parent of y; b is a score difference. The set adj(y) is the set of all neighbors of Y adjacent to x—that is, the set of all z such that zày, zßy, or z—y (i.e., z and x are connected by an undirected edge), and S is a set of nodes, interpreted as T in the forward direction and H in the backward direction (double duty), where T is the set of all neighbors of y not adjacent to x, and H Í NaYX. NaYX is the set of nodes z such that z—x and z is adjacent to y. The list sortedArrows is a list of Arrows kept in sorted decreasing order by score difference and containing only Arrows with positive score difference. The map lookupArrows(xày) is from Arrows to sets of edges xày with various NaYX sets, S sets, and B values that were added in the same call to CalculateArrowsForward or CalculateArrowsBackward. The map neighbors is a global map from nodes to sets of nodes. It is used as an optimization to check for changes in neighbors, since if the neighbors of a node have not changed, any NaYX, T, or H set using those neighbors cannot have changed either, so they do not need to be further processed by reevaluateForward or reevaluateBackward.
G is the graph that is being built. It starts from an empty graph and after each edge addition or removal is reverted to a pattern or cpDAG (consisting of directed and undirected edges) representing a Markov Equivalence Class (MEC). Thus, the G moves through MEC space as the algorithm progresses. The final pattern in this sequence is returned. A semidirected path X*-*…*-*Y in G is a path consisting of directed or undirected edges with no directed edges pointing back to X. (It may consist of all undirected edges.) Parents(X, G) is the set of parents of X in G. Adj(X, G) is the set of adjacent nodes of X in G.
In what follows later procedures may call earlier ones.
// V is a set of nodes
// Meek rules 1, 2, and 3 are used (away from collider, away from cycle,
// and double triangle), from (Meek, 1995). After each call to this operation,
// G is a pattern.
Procedure ApplyMeekRulesLocally(V, G)
1. R <- V
2. Add x and y to R for any edge xày or x—y that was oriented or unoriented,
- Let x and y be “implicated” in an unshielded collider if there exists a z such that x->z<-y and ~adj(x, y). Remove orientations about variables in V that are not implicated in unshielded colliders in this sense.
- Repeat applications of the first three Meek rules until no changes result.. //See code for fast implementation of this step.
- While node n exists in G where removing variables about n not implicated in unshielded colliders and applying Meek rules to n and adj(n) would results in a change in the graph,
- Remove variables about n not implicated in unshielded colliders and apply Meek rules to n and the all of adj(n, G) .
3. Return R.
// x and y are non-adjacent nodes. We are considering whether to
// add xày to G and what set T of nodes should be oriented into y
// in the process, creating mostly false colliders that will be shielded
// and corrected in the backward step. One “shields” a collider A->B<-C
// by adding an undirected edge A—C and reorienting A->B and C->B
// as undirected.
Procedure CalculateArrowsForward(xày, lookupArrows, sortedArrows , G)
1. T <- neighbors of y that are not adjacent to x
2. NaYX <- neighbors of y that are adjacent to x
3. For each subset T of T
- If NaYX U T is a clique in G
- S <- NaYX U T U Parents(Y, G)
- If (I(x, y, Z) > 0)
- Add a new Arrow A to sortedArrows for xày, NaYX, T, and b = I(x, y, Z)
- Add A to lookupArrows(xày)
// R is a set of nodes.
Procedure ReevaluateForward(R, G, sortedArrows, lookupArrows)
1. For each x in R
- For each node w in G such that I(x, w, Æ) > 0
- Remove all Arrows in lookupArrows(wàx) from sortedArrows
- CalculateArrowsForward(wàx, lookupArrows, sortedArrows)
// Forward Equivalence Search
Procedure FES(sortedArrows, lookupArrows, G)
1. While sortedArrows is not empty
- Remove an Arrow A from sortedArrows for edge xày with the highest score difference.
- If x is not adjacent to y in G, and the NaYX set in A is correct in G, and the T set in A is contained in the T Neighbors of xày in G,
- If NaYX È S is a clique, and there is no semidirected path from y to x that is blocked by NaYX È S
- Add xày to G and orient every node in S into y. Recall the edges from s in S to y were previously undirected. // This creates new colliders in G of the form XàYßs for s in S. Most of these are not all colliders in the generating DAG; these will be shielded and reoriented in the backward search, unless an edge X—s cannot be added, or unless they are required by the Meek rules.
- R <- ApplyMeekRulesLocally({x, y})
- Remove from R any nodes whose neighbors were not changed by step 1 or by the Meek rules call.
- Add x and y to R if they’re not already in R
- ReevaluateForward(R, G, sortedArrows, lookupArrows), which potentially adds new Arrows to sortedArrows and lookupArrows as a side effect.
Procedure CalculateArrowsBackward(xày, lookupArrows, sortedArrows , G)
1. NaYX <- neighbors of y that are adjacent to x in G
2. For each subset H of NaYX
- If NaYX \ H is a clique in G
- S <- (NaYX \H) È Parents(y, G)
- If (-I(x, y, Z) > 0)
- Add a new Arrow A to sortedArrows for xày, NaYX, H, and score b = –I(x, y, Z)
- Add A to lookupArrows(xày).
// R is a list of nodes.
// G is the graph being built.
Procedure ReevaluateBackward(R, G, sortedArrows, lookupArrows)
1. For each edge E in R
- If E = ràw for some r and w
- Remove all lookupArrows(ràw) from sortedArrows
- Remove all lookupArrows(wàr) from sortedArrows
- CalculateArrowsBackward(ràw, lookupArrows, sortedArrows, G)
- Else if E = r—w for some r and w
- Remove all of lookupArrows(ràw) from sortedArrows
- Remove all of lookupArrows(wàr) from sortedArrows
- CalculateArrowsBackward(ràw, lookupArrows, sortedArrows, G)
- CalculateArrowsBackward(wàr, lookupArrows, sortedArrows, G)
// “Backward Equivalence Search”
Procedure BES(sortedArrows, lookupArrows)
1. While sortedArrows is not empty
- Remove an Arrow A from sortedArrows for edge xày with the highest B value.
- If x is adjacent to y in G, and the NaYX set in A is correct in G,
- If NaYX \ S is a clique,
- Remove xày from G and orient both x and y into every node in NaYX \ H. // For adjacencies that are in the true DAG, these are genuine colliders. Some of these adjacencies may be further removed.
- R <- ApplyMeekRulesLocally({x, y} È (NaYX \ H))
- Remove all of lookupArrows(xày) from sortedArrows
- Remove from R any nodes whose neighbors were not changed by the Insert call or by the Meek orientation call.
- R <- R È (x, y} È (adj(x) Ç adj(y))
- ReevaluateBackward(R, G, sortedArrows, lookupArrows), which potentially adds new Arrows to sortedArrows.
// Modification of Reevaluate2 to add needed marriages of parents.
Procedure ReevaluateForward2(R, G, sortedArrows, lookupArrows)
1. For each x in R
- For each node w in G such that w Î adj(adj(x, G), G)) \ {w}
- Remove all Arrows in lookupArrows(wàx) from sortedArrows
- CalculateArrowsForward(wàx, lookupArrows, sortedArrows)
// V a set of variables.
Procedure fGES(V) // “Fast Greedy Equivalence Search”
1. sortedArrows <- > // A list of Arrows
2. lookupArrows <- > // A map from Arrows to lists of Arrows.
3. For each ordered pair <x, y> of nodes (x ¹ y)
- If I(x, y, Æ) > 0
- Add a new Arrow to sorted Arrows with edge xày, S = Æ, NaYX = Æ, and score difference = I(x, y, Æ)
4. G <- an empty graph over the variables V
5. FES(sortedArrows, lookupArrows, G)
6. sortedArrows <- >
7. lookupArrows <- >
8. BES(sortedArrows, lookupArrows, G)
9. For each x*-*y*-*z in G where it is not the case that x->y<-z and x is not adjacent to z // Marry unmarried parents if necessary
- CalculateArrowsForward(x->y, lookupArrows, sortedArrows)
10. FES(sortedArrows, lookupArrows, G) // Use ReevaluateForward2 in place of ReevaluateFoward in step FES.1.B.i.5.
11. BES(sortedArrows, lookupArrows, G) // Divorce the parents just married to correctly orient unshielded colliders not previously oriented, and revise orientations that are implied.
12. Return G
The ReevaluateForward procedure, step 1(a), assuming faithfulness, would proceed to considering each node in V as a possible parent of x; the heuristic substitution for this is to consider only variables w that are unconditionally d-connected to x. Using the formulation given in the pseudocode The formulation in the pseudocode, above, is much faster but has the effect of leaving unshielded some noncolliders in G that are unshielded colliders in the generating model and getting some of the implied orientations wrong as a result. This is easily fixed in steps fGES.9-11 by simply covering the unshielded noncolliders G that were not previously covered by FES, where necessary, and re-running BES. The result is graph that is consistent with the independence constraints of the generating model.[1] Nevertheless, it is worth pointing out that the algorithm without steps 9-11 (which we may call the “heuristic” algorithm) yield almost identical performance characteristics for large sparse models and is faster. On the million-node search, it clocks in at about 9 hours instead of 11 hours.
The use of parallelization can be described by referring to steps in the above pseudocode, as follows. In the fGES procedure, step 3, each edge x->y is evaluated assuming an empty graph. These evaluations are independent, and the score for x -> y is always equal to the score for y -> x. These scorings can be done independently and so may be parallelized using any scheme that accomplishes this. We score edges into each variable y in turn (without duplication) and divide nodes into epochs of reasonable size. This step takes the lion’s share of the processing time of fGES.
The next most expensive procedure in the algorithm for sparse graphs assuming weak faithfulness is the FES procedure. This is a serial procedure, since at each step the maximum scoring edge must be determined. However, within each step, the process of selecting the next edge to add may be parallelized. With a large model, this style of parallelization engages nearly all processors even on a large machine for data generated by a sparse model and with one-edge faithfulness assumed.
The least expensive procedure in the algorithm in our testing of sparse models is the BES procedure. The reason is that BES needs to check only the edges that have been included in the graph so far for possible removal and doesn’t need to consider all edges with positive score difference that have not been added to the model yet, as FES does. BES may be parallelized by parallelizing each edge removal, as with FES. Fewer processors are used by this procedure. In denser graphs, the backwards search will does more work, and more use is made of the available processors
[1] The idea of carrying out the FES step heuristically and then going back and correcting the mistakes that were made was suggested to us by Juan Miguel Ogarrio (personal communication).