Supplementary Notes
PCK110700
Mutations Present in the Final Population of Variants
Figure 3 Sequences
Detailed Description of a Round
In order to give a fuller picture of the decision-making process utilized in the ProSAR-driven methodology, here we describe our 14th round of evolution. We had completed our ProSAR analysis on two previous libraries, 12-1 and 12-2 (round 13 was still under analysis). The best variant out of these libraries was chosen as the parent for the next set of libraries with a 1.2-fold improvement over the round 12 parent. The ProSAR model from library 12-1 was of relatively low quality (r=0.31, p=9.37x10-3, where r is the leave-one-out crossvalidated correlation coefficient and p is the frequency of observing such a correlation by chance alone given the null hypothesis of no correlation), so regression coefficientswere not weighted heavilyforpurposes of decision making and all mutations that appeared potentially beneficial were included in the next round, giving seven mutations of interest. (It should be noted that the magnitudes of the regression coefficients are particular to each library and cannot be meaningfully compared across two models without normalization.) Two mutations from this library were in the chosen backbone and appearedthat they may be detrimental; these positions were allowed to mutate back to their original residue (flip-out) in the next library. There were two other mutations in the backbone that appeared positive, but due to the lack of confidence in this model they were also allowed to mutate back to the original residue in the next library. All told, 12 of the initial 15 mutations in 12-1 were tested in the next library. Library 12-2 gave a better model (r=0.47, p=2.51x10-4) and revealed four mutations that were either neutral or beneficial. This round of evolution was at a point where we were running low on mutations of interest and so we had completed multiple saturation mutagenesis libraries (sat. mut.) within the binding pocket and at positions that had previously shown influence on activity. These libraries gave us 18 mutations worth pursuing further. We had also hit-shuffled three of our best variants and completed ProSAR analysis of this library (Hit Shuffle 15). This analysis provided an additional five mutations of interest. In total, these libraries provided 39 mutations to test in further combinatorial libraries as shown in Tables 3 and 4.
We split these mutations into two libraries: 14-1 with 19 mutations and 14-2 with 20 mutations. The sequence of the backbone and the oligonucleotides used to construct the libraries are listed after Tables 3 and 4. Both libraries were analyzed with ProSAR and gave relatively high quality models (14-1: r=0.58, p=7.9x10-6, 14-2: r=0.71, p=4.55x10-10). The next library’s parent came from 14-1 and had three mutations with high regression coefficients. Four of the mutations in the parent had negative regression coefficients and so were allowed to mutate back to the original residue in the next library. Two more mutations were positive, but not in the backbone so were included in the next library. Library 14-2 provided 14 mutations thatwere neutral to beneficial. All told this resulted in three positive mutations fixed in the new backbone, 20 mutations to be tested in the next set of libraries, and 16 mutations removed from consideration.
Library 14-1Mutation / Previous Library / Previous Regression Coefficient / Fold improved of the highest activity variant with the mutation / Notes / Regression Coefficient / In Next Backbone? / In Next Library?
L10K / 12-1 / - / L, 1.20 / mutated back / 0.257 / yes
D121K / 12-1 / - / D, 1.20 / mutated back / -0.27
T152A / 12-1 / + / T, 1.20 / mutated back / -0.33
F177Y / 12-1 / + / F, 1.20 / mutated back / 0.152 / no, A was better than Y or F
Q38L / sat. mut. / 1.25 / -0.12 / yes / flip-out
S78N / sat. mut. / 1.25 / -0.010
T100M / sat. mut. / 1.04 / 0.169 / yes
V101I / sat. mut. / 1.70 / -0.31
F177A / sat. mut. / 1.70 / 0.59 / yes
W238R / sat. mut. / 1.25 / -0.13
T67N / sat. mut. / 1.20 / 0.00 / yes / flip-out
G181W / sat. mut. / 1.17 / -0.24
V205Y / sat. mut. / 1.16 / -0.11 / yes / flip-out
A114Q / sat. mut. / 1.15 / -0.21
D99G / Hit Shuffle 15 / 0.07 / 0.98 / 0.003 / yes
V112A / Hit Shuffle 15 / 0.05 / 0.98 / 0.033 / yes
W139D / Hit Shuffle 15 / 0.08 / 0.98 / -0.44
N176R / Hit Shuffle 15 / 0.06 / 0.98 / -0.02
W238C / Hit Shuffle 15 / 0.03 / 0.96 / -0.12 / yes / flip-out
Table 3 – 14-1 Library Design. The source of each mutation is given by the previous library it was observed in along with any regression coefficient information from ProSAR analysis. In some cases mutations present in the backbone were allowed to vary back to the previous residue (mutated back) because we were unsure about their impact on function or believed the mutation may be deleterious. The regression coefficient for the mutation in the context of the new library is given along with an indication of its presence in the new backbone and whether it is part of the next round library design.
Mutation / Previous Library / Previous Regression Coefficient / Fold improved of the highest activity variant with the mutation / Notes / Regression Coefficient / In Next Backbone? / In Next Library?
T152A / 12-1 / + / T, 1.2 / mutated back / -0.030 / yes
E95G / 12-1 / + / 0.92 / -0.120
D121E / 12-1 / + / 0.89 / 0.323 / yes
V202L / 12-1 / + / 1.17 / -0.500
V245A / 12-1 / + / 1.01 / -0.990
P135S / 12-2 / 0.001 / 1.14 / 0.651 / yes
M252V / 12-2 / -0.001 / 1.00 / -0.050 / yes
E40V / 12-2 / 0.066 / 1.12 / -0.260
A60V / 12-2 / 0.090 / 1.02 / random mutation / 0.051 / yes
R87Q / 12-1 / + / 0.93 / -0.320
S146A / 12-1 / + / 0.93 / 0.132 / yes
T100A / 12-1 / + / 1.12 / random mutation / 0.068 / yes
S180T / sat. mut. / 1.29 / 0.499 / yes
T144S / sat. mut. / 1.09 / 0.166 / yes
G251E / sat. mut. / 1.04 / 0.159 / yes
M54I / sat. mut. / 1.03 / -0.020 / yes
D121R / sat. mut. / 1.03 / 0.119 / yes
G251S / sat. mut. / 1.18 / 0.087 / yes
W238T / sat. mut. / 1.01 / 1.259 / yes
I52T / sat. mut. / 1.01 / -0.260
Table 4 – 14-2 Library Design. The source of each mutation is given by the previous library it was observed in along with any regression coefficient information from ProSAR analysis. In some cases mutations present in the backbone were allowed to vary back to the previous residue (mutated back) because we were unsure about their impact on function or believed the mutation may be deleterious. In some cases random mutations appeared in the combinatorial library and were included in the next library design when they appeared potentially beneficial. The regression coefficient for the mutation in the context of the new library is given along with an indication of its presence in the new backbone and whether it is part of the next round library design.
Round 14 Backbone and Oligonucelotides Used in Library Constructions
The oligonucleotides listed cover a defined region of the backbone and set of mutations desired in that region. In some cases, multiple oligonucleotides were required in order to allow for all combinations of mutations in a targeted region, e.g. V112A and A114Q are collectively coded by two oligos (aagccatttgctctagyaaatgccgtcgcttcgcaaatgand aagccatttgctctagyaaatcaggtcgcttcgcaaatg) though we do not further indicate which mutation is carried by a particular oligonucleotide though this information can be deduced by inspection.
Round 14 Backbone:
atgagcaccgctattgtcaccaacgtcctgcattttggaggtatgggtagcgctctgcgtctgagcgaagctggtcataccgtcgcttgccatgatgaaagctttaagcatcaggatgaactagaagcttttgctgaaacctacccacagctgataccaatgagcgaacaggaaccagctgaactgattgaagctgtcaccagcgcccttggtcatgtcgatatcctggtcagcaacgatatcgcgcctgtggaatggcggccaatcgataaatacgctgtcgaggattacagggatactgtcgaagctctgcagatcaagccatttgctctagtgaatgctgtcgcttcgcaaatgaaggatcgaaagtcggggcacatcatcttcatcacttcggctgccccgttcgggccatggaaggagctatcgacttactcttcggctcgagctgggaccagtgcactagctaatgctctatcgaaggagctaggagagtacaatatcccggtgttcgctatcgctccgaattttctagactcgggggattcgccgtactattacccctctgagccgtggaagacttctccggagcacgtggctcacgtgcgtaaggtgactgctctacaacgactagggactcaaaaagagttgggggaattggtgacgtttttggcatctggctcttgtgattatttgactggccaggtgttttggttggcaggcggctttcccgttgtagagcgttggcccggcatgcccgaataatga
14-1 Oligos:
attgtcaccaacgtcaagcattttggaggtatg (L10K)
gaaagctttaagcatctggatgaactagaagct (Q38L)
ctgattgaagctgtcaatagcgcccttggtcat (T67N)
gtcgatatcctggtcaacaacgatatcgcgcct (S78N)
gtcgaggattacagggrcaygrtcgaagctctgcagatc (D99G, T100M, V101I)
aagccatttgctctagyaaatgccgtcgcttcgcaaatg (V112A, A114Q)
aagccatttgctctagyaaatcaggtcgcttcgcaaatg (V112A, A114Q)
gcttcgcaaatgaagaaacgaaagtcggggcac (D121K)
gccccgttcgggccagataaggagctatcgact (W139D)
tcggctcgagctggggcgagtgcactagctaat (T152A)
ttcgctatcgctccgcgttwtctagactcgkgggattcgccgtactat (N176R, F177YA, G181W)
ttcgctatcgctccgcgtgccctagactcgkgggattcgccgtactat (N176R, F177YA, G181W)
ttcgctatcgctccgaactwtctagactcgkgggattcgccgtactat (N176R, F177YA, G181W)
ttcgctatcgctccgaacgccctagactcgkgggattcgccgtactat (N176R, F177YA, G181W)
gctcacgtgcgtaagtacactgctctacaacga (V205Y)
actggccaggtgtttygtttggcaggcggcttt (W238CR)
14-2 Oligos:
tttaagcatcaggatgtgctagaagcttttgct (E40V)
acctacccacagctgaytccaatkagcgaacaggaacca (I52T, M54I)
agcgaacaggaaccagttgaactgattgaagct (A60V)
gcgcctgtggaatggcaaccaatcgataaatac (R87Q)
atcgataaatacgctgtcggcgattacagggat (E95G)
gattacagggatgccgtcgaagctctgcagatc (T100A)
gcttcgcaaatgaaggaacgaaagtcggggcac (D121RE)
gcttcgcaaatgaagcgccgaaagtcggggcac (D121RE)
atcacttcggctgccagcttcgggccatggaag (P135S)
tggaaggagctatcgasttackcttcggctcgagctggg (T144S, S146A)
tcggctcgagctggggccagtgcactagctaat (T152A)
gagcacgtggctcacctgcgtaaggtgactgct (V202L)
actggccaggtgtttactttggcaggcggcttt (W238T)
gcaggcggctttcccgcggtagagcgttggccc (V245A)
gtagagcgttggcccrgcrtgcccgaataa (G251SE, M252V)
gtagagcgttggcccgaartgcccgaataa (G251SE, M252V)
- 1 -Codexis, Confidential