Nupur Bansal , Zheng Zheng , David S. Cerutti and Kenneth M. Merz*

Nupur Bansal , Zheng Zheng , David S. Cerutti and Kenneth M. Merz*

On the Fly Estimation of Host-Guest Binding Free Energies Using the Movable Type Method: Participation in the SAMPL5 Blind Challenge

Nupur Bansal†, Zheng Zheng†, David S. Cerutti and Kenneth M. Merz*

Department of Chemistry, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824-1322, United States

† Authors contributed equally to this work.

Supplementary Information

Section S1. Explanation of some of the key terms relevant to Movable Type method

The MT method applies dimensionality reduction and corresponding mathematical approximations tosimplify the energy sampling procedure.

Under the assumption that the molecular energy can be dissociated into atom pairwise potentials, we can express the molecular energy as:

(1)

where i and j are atom index notations and N is the total number of atoms in a molecular system. Thus the molecular energy is the linear combination of a series of one-dimensional energies as functions of pairwise distances rij, whose combination need to satisfy the chemical connectivity in 3-dimensional space with respect to the molecular configuration τM. The partition function of a molecular ensemble is hence expressed as:

(2)

with the total number of molecular configurations in the canonical ensemble as TM.

We assume an “alchemical” canonical ensemble, in which all the pairwise Boltzmann factors, exp(-βEij(rij)), of the partition function are independent along their distances and satisfy the distributive law. The ensemble partition function can then be expressed as:

(3)

where, TM is the total number of molecular configurations in the canonical ensemble, i and j are atom index notations and N is the total number of atoms in a molecular system.

Given this distributive property to the molecular system, the partition function calculation can be achieved using the multiplication through the independent integrals of atom pairwise energies. Without performing the time-demanding conformational search, the partition function is hence calculated using a one-dimensional numerical procedure, which accelerates the estimation of the free energy simulation significantly.

(4)

The following sections will explain meaning and significance of some of the key terms used in MS regarding Movable type approach.

1. Sampling Volume (VM)

The distributive property applied to a molecular systemgenerates a number of configurational states asCN(N-1)/2, whereN is the number of atoms in the molecular system under study, and C is the cutoff distance for calculating the atom pairwise contactsbeyond which the pairwise contacts are omitted.However practically, a 3-dimensional space does not contain such degrees of freedom regarding the particle pairwise connectivity. So, the MT algorithm searches for the independent internal degrees of freedom in a closed 3-dimensional space containing N particles (atoms). The number of configuration states within that 3-dimensional spaceis summed up as the sampling volume (VM).

Derivation of Sampling Volume:

Consider an “alchemical” molecular model with N atoms divided into two regions: a) the explicit region where every two atoms contact with each other, and b) background region where atoms only contact with the atoms in explicit region and have no contact within themselves. Assume that there is only one atom in the explicit region and (N-1) atoms in the background region. At this stage, the independent internal degree of freedom is CN-1, where C is the cutoff distance for evaluating atom pairwise contacts. Now, let us move second atom to the explicit region. On moving second atom to the explicit region, the independent degrees of freedom increase by πN-2, where the N-2 is the number of atoms left in the background region. As more atoms are placed into the explicit region, the number of contacts increases, thereby increasing the pairwise contact degrees of freedom. Similarly, by moving the third atom to the explicit region, the degree of freedom increases by πN-3, where N-3 is the number of atoms in the background region. In other words, another πis added to the independent degree of freedom for each atom in the background region. The number of independent degrees of freedom then reaches its maximum in a molecular system. In this way, the number of independent internal degrees of freedom or the sampling volume (VM) can be evaluated as:

(5)

For an N-atom molecular system, there are options to select three center atoms. Therefore, by taking all the possible combinations, the sampling volume can be summed up as:

(6)

where, N is the number of atoms and C is the cutoff distance for calculating atom pairwise contacts. HerebyV, the number of configuration states within the 3-dimensional space, is the quantity representing the sampling size, or “volume”, in the MT method.

2.Unphysical states

In the previous section, we have already specified that MT algorithm apply distributive law to the atom pairwise Boltzmann factors.However, this distributive property cannot be applied to most molecular systems due to the number of independent degrees of freedom in the 3-dimensional spacelimits the options for atom pairwise distancescombination. Any combination of atom pairwise distances that fail to satisfy the connectivity in the 3-dimensional space results in an “unphysical” state.

Figure S1. An example of the “unphysical” state, where the combination of rα, rβ and rγcannot satisfy the posing of atom A in the molecular system given fixed locations of atom B, C and D.

As is illustrated in the previous section, there are 3N-6 independent atom pairwise contacts out of a total number of N(N-1)/2 pairwise contactsin a 3-dimensional molecular system. The rest pairwise distances are fixed given a determined configuration of those 3N-6pairwisecontacts.The following equations will show the use of distributive property in the MT approach and how it is responsible for generating “unphysical” states. The molecular partition function can be evaluated using equation 7 by separating the independent contacts from the rest.

(7)

where Z0is the “fixed” contact diagonal matrix obtained as the product of Boltzmann weighted energies from the initial configuration introduced into MT procedure.VErepresents a vector of the set of deviations from each pairwise contact in Z0.VIcontains the independent pairwise potentials, centered at any three arbitrary selected atoms α, β and γ. indicates the pairwise distance between atom i and α in configuration x.CommutatingVE and Z0 in equation 7, generates:

(8)

In order to apply the distributive property to the vector VI, we dissociate VE from VI by converting the vector VE into a numerical approximation exp(-β∆Ẽ), which is the expected value of vector VE weighted onto the VI vector.

(9)

where,

(10)

Unphysical states are induced due to the approximation applied to the generation of exp(-β∆Ẽ). According to equation 10, exp(-β∆Ẽ) is defined as the expected value of all the energy deviations from the initial configuration weighted by the elements in the vector VI, which, in the current version of the MT method, is simulated as the numerical average of each individual pairwise contact deviated from the initial configuration.

Errors from such a simulation are mostly caused by the introduction of “unphysical” states described in Figure S1 due to which each pairwise contact is independently averaged within its own sampling range. Among the “unphysical” states introduced from such a procedure, the high energy states will not cause issues in a MT calculation (they are factored out by the Boltzmann term), while if an “unphysical” state has a low energy it will introduce errors into a MT calculation.

In this work, we are just highlighting this issue instead of digging into solving the potential problem associated with unphysical configurations. Future publications in preparation will discuss this in detail and present possible solutions to this issue.

3. Collision Tolerance

The MT algorithm simulates the local partition function against an initial configuration using three components, including the fixed molecular configuration energy with its local deviation, together with a numerical Boltzmann factor integration with a defined sampling range (0.5Å in this study) for the relevant contacts of the center atoms (Please refer eq.11).

(11)

where, Z0 and exp(-β∆Ẽ) are two constants with respect to one certain molecular system.∆Ẽ is the deviation from Z0, which is calculated as the Boltzmann average of VE weighted by the elements in VI.

The vector norm of VI () can be generated using the distributive law.

(12)

where,Riα, Rjβ and Rkγare the sampling distance ranges for the atom pairwise contacts on the condition that their corresponding degrees of freedom are C, π and π respectively. In order to perform an effective sampling, one or multiple initial input structures are prepared with Z0 reproducing each of their configuration energies and a 0.5Å distance range is set for each pairwise contact centered on every initial structure.

The molecular energy of the input structure would be relaxed by introducing the deviation term exp(-β∆Ẽ) and the local sampling term , if lower energy pairwise regions are included within the sampling range. In this study, several host-guest complex conformers were generated with close intermolecular contacts as a result of using the conformational search module in the Schrödinger suite. This local relaxation feature in the MT procedure tolerated such repulsions and generated lower binding energies compared to the single point energy calculations against the initial input structures. Table 6 in the manuscript provides the MT simulation results compared to the single point calculation against several host-guest conformations with mild collisions.