Homology modelling of proteins.

Definition: Prediction of the three dimensional structure of a target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available.

Synonyms: Comparative modelling & Knowledge-based modelling.

Protein Structure Modelling

Three approaches to structure prediction:

a. Ab initio prediction

(no known homology with any sequence of known structure) Given only the
sequence, predict the 3D structure from “first principles”, based on

energetic or statistical principles.

b. Sequence- Structure Threading

Given the sequence, and a set of folds observed in PDB, see if any of the
sequences could adopt one the known folds.

c. Homology Modelling

Given a sequence with homology (> 25%) to a known structure in PDB, use
known structure as template to create a 3D model from the sequence.

Various ways of homology modelling

·  One structure as main template (I will illustrate here).

·  Fragment based modelling: Protein structure can be build from a combination of segments from other proteins. The program Composer depends on the assembly of rigid fragments.

Ab initio modelling

There are two components to ab initio prediction:

·  devising a scoring (ie, energy) function that can distinguish correct structures from incorrect ones

·  a search method to explore the conformational space.

In many methods, the two components are coupled together such that a search function drives, and is driven by, the scoring function to find native-like structures.

BUT

there is a difficulty of formulating an adequate scoring function

and it requires formidable computational effort to solve it

BECAUSE

fully-descriptive energy function must consider interactions between all pairs of atoms in the polypeptide chain and the number of such pairs grows exponentially with the number of amino acids in the protein. A full model must also take into account vitally important interactions between the protein’s atoms and the environment, the so-called ‘hydrophobic effect’.

For practical reasons, simplifying assumptions must be made.

(you can predict a structure using ab initio techniques on http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.html)

Why does Homology (Comparative/knowledge based) modelling work?
Proteins have a limited number of folds: The structure of a new protein can resemble a known fold even with no apparent sequence similarity.

Why a model?

A model is desirable when either X-ray crystallography or NMR cannot determine the structure of a protein, in time or at all. Many structure-function relationships can be deduced from a reasonable model. Indeed, sometimes a modelled structure can be used for successful drug design.

The 3D structure of a protein can tell us much more about how individual residues interact to form a functional entity. For example residues that are far away in a 1D sequence can be very close together in the actual folded protein.

Models are quite accurate: Form a rational basis for explaining experimental observation & help redesigning proteins to improve their function.

Models can be used as starting points in the determination of protein structure by NMR or X-ray.

Post-genomics – structural genomics

The potential benefits of having a structural model has lead to the concept that the structures of all gene products should either be structurally solved or experimentally modelled. To model so many proteins the technique of producing accurate alignments and building three-dimensional models from the alignments has to be fully automated. Sanchez and Šali have automatically modelled a large fraction of the yeast genome, using their program MODELLER (see later section). But the process has been limited to only ORF (Open Reading Frame) sequences from yeast that had a relatively high homology to a three-dimensional template structure. Automation of techniques for lower sequence homology model building is a step that still needs to be addressed and considerable effort is being put into this type of research.

History.

The first homology modelling studies were done using wire and plastic models of bonds and atoms as early as the 1960’s. The models were constructed by taking the coordinates of a known protein structure and modified by hand for those amino acids that did not match the structure. In 1969 David Phillips, Brown and co-workers published the first paper regarding homology modelling. They modelled a-lactalbumin based on the structure of hen-egg white lysozyme. The sequence identity between these two proteins was 39%. In addition both proteins contained an identical pattern of cysteins suggesting a similar arrangement of disulphide bonds. When the structure of a-lactalbumin was solved by X-ray crystallography it was compared to the model and analysed. The model was essentially correct apart from the C-terminal ends, which diverge in the structure in any case

Method.

Figure below illustrates the major steps of obtaining structure from sequence.

.

Steps in molecular modelling:

  1. Identification of structures that will form the template for the target structure (model).
  2. Alignment – the most important step. Alignment of low homology sequences can be improved using secondary structure prediction (align-model-realign-remodel).
  3. Transfer of coordinates from the template(s) to the target of structurally conserved regions (SCR’s) - many fragment method

- single structure.

  1. Modelling variable regions

·  Loops

·  Insertions: Search of a high resolution fragment database

·  Deletions: Local minimisation often sufficient.

  1. Modelling side chains (practically a virtual step)
  2. Minimisation:

·  Local – especially loop-hinge regions

·  Global.

  1. Molecular Dynamics: To study regional flexibility.
  2. Checking the correctness of the model.

·  Correctness of the overall fold by:

-  Bad: Non-polar side chains exposed to the solvent.

-  Bad: Buried ionizable groups.

-  Conformational energy calculations – Incorrect folds have high solvation energy.

-  Luthy’s method.

·  Stereochemical properties: PROCHECK

-  Bond angles

-  Bond Length

Modelling using the Restrained-based method

Ø  Distance restraints (Havel & Snow 1991)

Ø  Structural features restraint (Sali & Blundell 1993)

Modelling of Loops

Modelling of Side Chains

Side chains adopt distinct conformations that are dependent on Back Bone structure.

This observation gave rise to ROTAMER libraries that are used in modelling procedures.

Minimisation

•  LOCAL: Minimise a fragment. Usually a loop and its anchor regions - as these often have bad geometries. First minimise without influence of surrounding structure then take surrounding structure into account.

•  GLOBAL: Minimise whole protein (& H2O). Mainly to relieve short contacts and to rectify bad geometry, like bond angles, peptide planarity etc. Problems with minimisations are Local minima (egg box) and Approximations

•  (Dynamics - often local. To study movement of particular loop and/or improve its geometry.)

Local minima problem of minimisation

Accuracy.

Generally the accuracy of a model depends on the initial sequence alignment and percentage homology of the target to the template. Most errors occur in the loop or variable regions of the model.

Check structural integrity of model

•  Check the correctness of the overall fold
Look at distribution of polar (charged) and
hydrophobic residues on surface and inside
the protein. Buried charges must interact

•  Detect local errors

•  Check stereochemical parameters like bond length, bond angles and short contacts.
Ramachandran plot.
Procheck.

Automatic modelling –

Swiss model free Web and local. http://www.expasy.ch/swissmod/

Easypred free Web http://www.fundp.ac.be/urbm/bioinfo/esypred/

WhatIf $$ local

Modeller – Unix machines – quite difficult to learn

How does Swiss-model work – an introduction:

For complete reference look at the web site documentation.

Step 1

Swiss_model first does a database search for homologous proteins. Then it Superposes all the structures it finds.

Step 2

It generates a multiple alignment with the sequence to be modelled and all the homologous structure

Step 3

Generates 3D framework for the target protein sequence.

·  Atoms that occupy a similar spatial area and are aligned to the target
sequence and are used to compute the averaged atomic position of the
framework from which the target will be build.

·  Side chains with incorrect geometries are removed

.

Step 4 Building of insertions or loops.
SWISS_Model uses two techniques:

The first method is the same as I described earlier.

It also uses first principles, in other words it searches conformational space to build loops where:
is uses 7 allowed f, y combinations
adequate space allocation for the loop
space allocation for each a-carbon

Both methods exclude loops in conflict with structure

Step 5 Side chain building

It also uses a library of allowed side-chain rotamers.

First the distorted but otherwise complete side chains are corrected

Then the incomplete side chains are built with a probabilistic approach using the rotamers. A van der Waals exclusion test and dihedral angle constraints can be used to select the “best” side chain conformation

Step 6 minimization

Step 7

The correctness of the structure is checked by analysing the conformational space of each residue energetically.

The correctness of the structure is also checked by looking at the packing density of the model which is compared to what is expected.

Automatic v Manual

Is our target protein homologous enough for an automatic procedure?
First it has to be found in a sequence search.
Otherwise you can use PDB-viewer and your own sequence/structure.

Even with some manual input will we get a good enough structure?
Sometimes, other times only exhaustive manual modelling is needed.

Need to decide what the model is going got be used for?
Is it to look at e.g. mutations … or do we want to do docking of ligands.

Do we really want to use an averaged template structure?
Some structure can distort an averaged template

Note:

Modelling is not the end of the “Experiment” - it is the means for further theoretical studies.

It gives us a 3D representation of a sequence alignment with the gaps filled in.

It can be used further in structure-based ligand design if the model is accurate enough.
It can suggest residues to mutate and these mutations can be further studied both theoretically and biochemically.
It can be used to understand the function of the protein better.

Further Reading & References:

General:

Protein Structure Prediction – A practical approach.

Ed: Michael J. E. Sternberg. IRL Press. 1996. ISBN: 0-19-963496-3.

Browne, W.J. et al. 1969. J. Mol. Biol., 42, 65.

Greer, J. 1981. J. Mol. Biol. 153, 1027.

Havel, T.F. & Snow, M. E. 1991. J. Mol. Biol., 217,1.

Sali, A. & Blundell, T.L. 1993. J. Mol. Biol. 234, 779

Finan P, Koga H, Zvelebil, M.J, Waterfield, MD & Kellie S. 1996. J. Mol. Biol. 261, 173.

3