European Biopharmaceutical Review, April 2017 pp22-26.
Becoming
Better Informed
Christian Loch, Director of R&D, AVMBioMed
Proteomics could be the tool that preclinicallycharacterises
investigational molecules for safety and efficacy and identifies
useful biomarkers, but must be done more reproducibly and
account for post-translational modification
Increasing success rates and speed to failure requires more
information. We need to know which proteins to target, which
molecules to use and to which patients drugs should be
administered. The devil, as always, is in the details. Cell-based
screening, computational modelling and ribosomal profiling,
among other recent improvements, are informative – but do
not seem to be the solution. Clearly, another approach is
needed. Mine is premised on three assumptions:
1. Proteomics represents the most abundant and useful pool of
biological information
2. Post-translational modification (PTM) is the defining feature
of human proteins
3. Mass spectrometry (MS) is a structurally flawed tool for
accessing these data
Proteomics is King
Blueprints and recipes contain assembly instructions; genomes
do not. Our approximately 20,000 human genes get spliced into
about 100,000 unique transcripts variable across development
and cell type (1). TP53, one of the best-studied proteins in humans,
has at least 15 structurally and functionally distinct isoforms (2).
In point of fact, the previous statement betrays a biological error
shared by MS – namely that TP53 is a gene, not a protein. Ignoring
splice variation makes conversation simpler, but masks the
molecular complexity we require to exist and survive.
Unless a spectrometer can identify isoform-specific peptides,
information about variable splicing is lost. Ribonucleic acid
sequencing (RNA-seq) theoretically informs this larger pool of
data, but transcription does not equal translation. Many proteins
will be present in the cell without concurrent transcription.
Likewise, transcripts will be present whose cognate proteins are
not. The transcriptome may provide more information, but most
of it will remain hidden or ignored depending on the technique.
Fortunately, DNA and RNA exist to make proteins. Not all
nucleic acid is protein coding, of course, but regulatory DNA,
ribosomal RNA, transfer RNA and small interfering RNA all
function to regulate protein production. Consequential
changes to these non-coding molecules will ultimately be
reflected in the proteome as well. Arriving here as we are, at
the end of the original central dogma of molecular biology –
the proteins – we find unambiguously and quantifiably that
proteomics reigns supreme.
Updating the Central Dogma
Through PTM, the 100,000 human transcripts get made into an
estimated 20 million proteins (see Figure 1) (3). This logarithmic
expansion of biological information is not trivial – human cells
expend great energy to accomplish the process and most
forms are regulated positively and negatively with control
reminiscent of coagulation. The two largest families of enzymes
in humans are the E3 ubiquitin ligases (620 members) and
the kinases (518) in addition to roughly 100 deubiquitylases
(DUBs) and 200 phosphatases that reverse or antagonise the
respective processes. Those four families alone constitute 7% of
our genome. There are over 200 unique forms of modification
affecting the shape, charge, activity, localisation, interaction
and half-life of proteins (3). PTM therefore controls everything
about human proteins that makes them proteins.
Many forms of modification occur co-translationally and
there is almost no such thing as a naked human protein; thus
‘protein’ is not simply a morphological variant of a transcript –
it is a dynamic individual that must mature, relocate, change
behaviour or interactions and die in coordinated fashion. PTMdirected
maturation and adaptation is the fourth step of the
updated central dogma. Protein degradation, a process precisely
controlled by PTM, is the fifth step (see Figure 2).
Once dismissed as ‘the garbage can of the cell’, malfunction
in degradation is now recognised to have great consequence.
The causative ΔF508 mutation of cystic fibrosis transmembrane
conductance regulator (CFTR) in most cystic fibrosis cases
yields a functional protein that gets prematurely ubiquitylated
and degraded (4). Failure to ubiquitylate and degrade
cyclin-dependent kinases drives oncogenesis (5).
Nearly all physiology is driven by PTM. Monogenetic mutation
in the responsible machinery causes illnesses like cancer,
neurodegeneration and heart disease. These are among the
most important enzymes in humans and they are usually
present in low concentration and difficult to identify among
the diversity of proteins, some of which are present at 12 orders
of magnitude higher concentration (6).
Figure 1: Numerical comparison of the information present within each of
the first three steps of the central dogma. Splice variation drives the 5-fold
expansion from genes to transcripts, while PTM drives the 200-fold expansion
between transcripts and proteins. Given the historical dearth of tools useful
to PTM discovery, most of this information pool remains untapped
Figure 2: The updated central dogma. Proteins do not stop their lifecycle
at the stage of undecorated polypeptide chain. Step 4, maturation, reflects
the fact that PTM affects their shape, charge, localisation, activity and
interactions. Step 5, degradation, reflects the critical importance of removing
proteins with temporal precision. Shown is ubiquitylation-driven degradation
at the proteasome
MS ≠ Proteomics
MS is one of the most powerful means of identifying molecules
ever invented. Originally adapted by Donald Hunt, MS quickly
displaced Edman degradation and became synonymous with a
field of study it was creating – proteomics (7). The technology
is complex even to those who use the machines, but all mass
spectrometers share three core components: a source of ions,
a mass analyser and a detector.
A necessary consequence of protein ionisation is protein
fragmentation. Most investigators choose to additionally
digest proteins prior to ionisation using endopeptidases,
because peptides separate in the mass analyser better than
proteins. Better separation results in more proteins identified
due to the competition among peptides at the final step in the
process: the detector. Such ‘bottom-up’ approaches can find
evidence of nearly 8,000 unique proteins.
However, digestion and neutral loss ablate most forms
of PTM. Thus, MS operates within samples from the most
dynamic, complex and technologically challenging direction
(the proteins) but, by ignoring splice variation and PTM, it
provides information about them from the smallest pool of
biological information (the genes). Top-down proteomics
omits digestion, thereby preserving splice variation and PTM.
Nonetheless, because proteins separate more poorly than
peptides, these experiments max at roughly 1,200 proteins
identified (8). To capture information about PTM, another
method is to immunoprecipitate the sample with PTM-specific
antibodies prior to digesting and mass analysing. The proteins
identified can be assumed to have harboured the PTM, but
all data concerning structure are lost. More problematically,
additional human manipulation adds variability to a technique
already labouring under the weight of poor reproducibility.
MS is the most sensitive and reliable tool for targeted protein
identification, but novel discovery in complex samples is
extremely variable. Even without upfront manipulation, most
discovery experiments are only 50-70% reproducible. For
basic applications, such variability has no real detriment to the
undertaking. For purposes of separating responders from nonresponders,
or cases from controls, it is fatally flawed. Biomarker
discovery drives the field of proteomics, and early diagnosis of
cancer is the holy grail of both. No wonder, then, we live in a
world with so few biomarkers useful to early diagnosis of cancer.
Like Sanger sequencing of DNA, protein identification by MS
requires separating products, often chromatographically or
electrophoretically in addition to within the mass analyser.
Next-generation sequencing is defined by its ability to eliminate
this exact step, thereby increasing speed, decreasing costs
and improving accuracy. Fortunately, there also exists the
means to make novel discovery of proteins without breaking
or separating them. It is over 90% reproducible, informs PTM
and has no bias to protein concentration (9). It benefits from
enzymatic amplification but does not require antibodies to
specific proteins and makes novel discovery across 20,000
proteins in every experiment, a number inaccessible to MS.
Microarray, Meet Lysate
The process begins by printing picolitre quantities of
thousands of purified human proteins in spatially addressable
locations onto slides coated with thin-film nitrocellulose.
Incubation of these arrays with complex samples allows
enzymes that modify proteins to act on them as if they
were endogenous to the lysate or sera. The concentration
of arrayed proteins is mathematically ‘infinite’, meaning that
enzymes seeking to alter them encounter virtually unlimited
supply, thus affording enzymatic amplification of their
modification signatures. Modifications occur in the presence
of intact multi-component complexes, competing substrates
and antagonistic enzymatic activities. Since proteins do
not operate independently or in isolation, this yields far
more physiologically relevant activity and information.
Modifications to specific proteins are quantified with
antibodies directed against PTM.
Experiments are always differential including an
experimental and a control array, revealing changes to PTM
across 20,000 potential substrates. This eliminates structural
artefacts from protein misfolding, since a protein folded
80% correctly on the case array will also be 20% misfolded
on the control array. Randomised deposition of proteins
to the nitrocellulose ensures that virtually all epitopes
are exposed to the aqueous phase. Their utility is limited
only by imagination and the ability to identify a control
sample. Comparing wild type to knock-out cells can reveal
substrates or interacting partners of a DUB, phosphatase or
any protein, while equating drug to vehicle-treated cells can
reveal mechanisms of action, inform efficacy in a preclinical
setting, suggest companion diagnostic biomarkers or reveal
opportunities for companion therapy.
By gathering and interpreting the deepest pool of biological
information, it will become feasible to preclinicallycharacterise
safety, enabling proper candidate prioritisation. Informatics has
progressed to such a degree that even without understanding
each individual event, we can obtain a very good sense of
what was happening at a biological level by looking across
them. Cells contain the molecular pathways and switches that
drive all bodily responses. When phosphorylation changes
are observed among mechanistic target of rapamycin, AMPactivated
protein kinase, unc-51-like-kinase-1 and WD repeat
domain phosphoinositide-interacting protein 2, the drug has
induced autophagy. When we see changes to PTM of certain
mitochondrial proteins, we may suspect it induces apoptosis.
The first demonstration of this technique came in 2009
from the Kirschner group’s extract-based functional assay
(EFA) (10). A quorum of publications validate that enzymes
retain specificity for substrates even within lysate (11-14).
EFA-based screens report false positives and negatives
like any, but this is irrelevant for many applications. If
phosphorylation of protein X discriminates serum of
responders from non-responders, it is immaterial to clinical
utility whether or not protein X is phosphorylated in vivo.
Likewise, the fact that real events are missed is acceptable;
we need one biomarker capable of discriminating response,
not every one theoretically possible.
Protein arrays quantitate changes to PTM of proteins between
samples, not the binary levels thereof. Absolute quantification
will remain the prerogative of AQUA MS. Relative quantification
will continue utilising label-free, metabolic, isotopic and isobaric
approaches and, for better reproducibility, many will choose
RNAseq accepting that messenger RNA levels correlate with
protein levels imperfectly. People intuitively seek information
about changing levels of protein, but proteins are not static
things occasionally going up or down in expression – they
are societies of networked individuals that evolve with their
surroundings in concert as directed.
Future Outlook
Thousands of years from now, when biology is finished
and every detail within every cell type over every period of
development has been elucidated, investigators might ask
‘what do we do now?’ Assuming they have not cured all
disease, they will use the information to construct models that
reveal pathological processes and the effects of investigational
molecules intended to cure them. Their model will be a
very expensive and well-understood cell lysate. Let us forgo
millennia of great but ultimately unnecessary knowledge
concerning assembly instructions and begin looking properly
at that same model now. There is a lot to be done, and we
know quite enough already to make it extremely useful
and informative.
References
1. Pan Q et al, Deep surveying of alternative splicing complexity in
the human transcriptome by high-throughput sequencing, Nature
Genetics 40: pp1,413-1,415, 2008
2. Bourdon JC et al, p53 isoforms can regulate p53 transcriptional
activity, Genes & Development 19: pp2,122-2,137, 2005
3. Merbl Y and Kirschner MW, Protein microarrays for genome-wide
posttranslational modification analysis, Wiley Interdisciplinary
Reviews: Systems Biology and Medicine 3(3): pp347-356, 2011
4. Okiyoneda T et al, Peripheral protein quality control removes
unfolded CFTR from the plasma membrane, Science 329(5,993):
pp805-810, 2010
5. Sherr CJ and Roberts JM, CDK inhibitors: Positive and negative
regulators of G1-phase progression, Genes & Development 13(12):
pp1,501-1,512, 1999
6. Anderson NL et al, The human plasma proteome: A nonredundant
list developed by combination of four separate sources, Molecular
& Cellular Proteomics 3: pp311-326, 2004
7. Hunt DF et al, Protein sequencing by tandem mass spectrometry,
Proceedings of the National Academy of Sciences of the United
States of America 83(17): pp6,233-6,237, 1986
8. Catherman AD et al, Large-scale top-down proteomics of the human
proteome: Membrane proteins, mitochondria, and senescence,
Molecular & Cellular Proteomics 12(12): pp3,465-3,473, 2013
9. Loch CM et al, Use of high density antibody arrays to validate and discover
cancer serum biomarkers, Molecular Oncology 1(3): pp313-320, 2007
10. Merbl Y and Kirschner MW, Large-scale detection of ubiquitination
substrates using cell extracts and protein microarrays,
Proceedings of the National Academy of Sciences of the United
States of America 106(8): pp2,543-2,548, 2009
11. Woodard CL et al, Profiling the dynamics of a human
phosphorylome reveals new components in HGF/c-Met signaling,
PloS One 8: e72671, 2013
12. Merbl Y et al, Profiling of ubiquitin-like modifications reveals
features of mitotic control, Cell 152(5): pp1,160-1,172, 2013
13. Loch CM and Strickler JE, A microarray of ubiquitylated proteins
for profiling deubiquitylase activity reveals the critical roles of
both chain and substrate, Biochimica et BiophysicaActa 1823(11):
pp2,069-2,078, 2012
14. Del Rincon SV et al, Development and validation of a method for
profiling post-translational modification activities using protein
microarrays, PloS One 5: e11332, 2010
About the author
Christian Loch is co-founder and Director
of R&D at AVMBioMed. He is also adjunct
faculty in the Department of Chemistry at
Villanova University, US, where he teaches
proteomics in the graduate school. Christian
has 10 years of industry experience and
holds an MPH in Epidemiology from the University of
Washington, US, and a PhD in Biochemistry from the
University of Virginia, US.
Email: