European Biopharmaceutical Review, April 2017 pp22-26.

Becoming

Better Informed

Christian Loch, Director of R&D, AVMBioMed

Proteomics could be the tool that preclinicallycharacterises

investigational molecules for safety and efficacy and identifies

useful biomarkers, but must be done more reproducibly and

account for post-translational modification

Increasing success rates and speed to failure requires more

information. We need to know which proteins to target, which

molecules to use and to which patients drugs should be

administered. The devil, as always, is in the details. Cell-based

screening, computational modelling and ribosomal profiling,

among other recent improvements, are informative – but do

not seem to be the solution. Clearly, another approach is

needed. Mine is premised on three assumptions:

1. Proteomics represents the most abundant and useful pool of

biological information

2. Post-translational modification (PTM) is the defining feature

of human proteins

3. Mass spectrometry (MS) is a structurally flawed tool for

accessing these data

Proteomics is King

Blueprints and recipes contain assembly instructions; genomes

do not. Our approximately 20,000 human genes get spliced into

about 100,000 unique transcripts variable across development

and cell type (1). TP53, one of the best-studied proteins in humans,

has at least 15 structurally and functionally distinct isoforms (2).

In point of fact, the previous statement betrays a biological error

shared by MS – namely that TP53 is a gene, not a protein. Ignoring

splice variation makes conversation simpler, but masks the

molecular complexity we require to exist and survive.

Unless a spectrometer can identify isoform-specific peptides,

information about variable splicing is lost. Ribonucleic acid

sequencing (RNA-seq) theoretically informs this larger pool of

data, but transcription does not equal translation. Many proteins

will be present in the cell without concurrent transcription.

Likewise, transcripts will be present whose cognate proteins are

not. The transcriptome may provide more information, but most

of it will remain hidden or ignored depending on the technique.

Fortunately, DNA and RNA exist to make proteins. Not all

nucleic acid is protein coding, of course, but regulatory DNA,

ribosomal RNA, transfer RNA and small interfering RNA all

function to regulate protein production. Consequential

changes to these non-coding molecules will ultimately be

reflected in the proteome as well. Arriving here as we are, at

the end of the original central dogma of molecular biology –

the proteins – we find unambiguously and quantifiably that

proteomics reigns supreme.

Updating the Central Dogma

Through PTM, the 100,000 human transcripts get made into an

estimated 20 million proteins (see Figure 1) (3). This logarithmic

expansion of biological information is not trivial – human cells

expend great energy to accomplish the process and most

forms are regulated positively and negatively with control

reminiscent of coagulation. The two largest families of enzymes

in humans are the E3 ubiquitin ligases (620 members) and

the kinases (518) in addition to roughly 100 deubiquitylases

(DUBs) and 200 phosphatases that reverse or antagonise the

respective processes. Those four families alone constitute 7% of

our genome. There are over 200 unique forms of modification

affecting the shape, charge, activity, localisation, interaction

and half-life of proteins (3). PTM therefore controls everything

about human proteins that makes them proteins.

Many forms of modification occur co-translationally and

there is almost no such thing as a naked human protein; thus

‘protein’ is not simply a morphological variant of a transcript –

it is a dynamic individual that must mature, relocate, change

behaviour or interactions and die in coordinated fashion. PTMdirected

maturation and adaptation is the fourth step of the

updated central dogma. Protein degradation, a process precisely

controlled by PTM, is the fifth step (see Figure 2).

Once dismissed as ‘the garbage can of the cell’, malfunction

in degradation is now recognised to have great consequence.

The causative ΔF508 mutation of cystic fibrosis transmembrane

conductance regulator (CFTR) in most cystic fibrosis cases

yields a functional protein that gets prematurely ubiquitylated

and degraded (4). Failure to ubiquitylate and degrade

cyclin-dependent kinases drives oncogenesis (5).

Nearly all physiology is driven by PTM. Monogenetic mutation

in the responsible machinery causes illnesses like cancer,

neurodegeneration and heart disease. These are among the

most important enzymes in humans and they are usually

present in low concentration and difficult to identify among

the diversity of proteins, some of which are present at 12 orders

of magnitude higher concentration (6).

Figure 1: Numerical comparison of the information present within each of

the first three steps of the central dogma. Splice variation drives the 5-fold

expansion from genes to transcripts, while PTM drives the 200-fold expansion

between transcripts and proteins. Given the historical dearth of tools useful

to PTM discovery, most of this information pool remains untapped

Figure 2: The updated central dogma. Proteins do not stop their lifecycle

at the stage of undecorated polypeptide chain. Step 4, maturation, reflects

the fact that PTM affects their shape, charge, localisation, activity and

interactions. Step 5, degradation, reflects the critical importance of removing

proteins with temporal precision. Shown is ubiquitylation-driven degradation

at the proteasome

MS ≠ Proteomics

MS is one of the most powerful means of identifying molecules

ever invented. Originally adapted by Donald Hunt, MS quickly

displaced Edman degradation and became synonymous with a

field of study it was creating – proteomics (7). The technology

is complex even to those who use the machines, but all mass

spectrometers share three core components: a source of ions,

a mass analyser and a detector.

A necessary consequence of protein ionisation is protein

fragmentation. Most investigators choose to additionally

digest proteins prior to ionisation using endopeptidases,

because peptides separate in the mass analyser better than

proteins. Better separation results in more proteins identified

due to the competition among peptides at the final step in the

process: the detector. Such ‘bottom-up’ approaches can find

evidence of nearly 8,000 unique proteins.

However, digestion and neutral loss ablate most forms

of PTM. Thus, MS operates within samples from the most

dynamic, complex and technologically challenging direction

(the proteins) but, by ignoring splice variation and PTM, it

provides information about them from the smallest pool of

biological information (the genes). Top-down proteomics

omits digestion, thereby preserving splice variation and PTM.

Nonetheless, because proteins separate more poorly than

peptides, these experiments max at roughly 1,200 proteins

identified (8). To capture information about PTM, another

method is to immunoprecipitate the sample with PTM-specific

antibodies prior to digesting and mass analysing. The proteins

identified can be assumed to have harboured the PTM, but

all data concerning structure are lost. More problematically,

additional human manipulation adds variability to a technique

already labouring under the weight of poor reproducibility.

MS is the most sensitive and reliable tool for targeted protein

identification, but novel discovery in complex samples is

extremely variable. Even without upfront manipulation, most

discovery experiments are only 50-70% reproducible. For

basic applications, such variability has no real detriment to the

undertaking. For purposes of separating responders from nonresponders,

or cases from controls, it is fatally flawed. Biomarker

discovery drives the field of proteomics, and early diagnosis of

cancer is the holy grail of both. No wonder, then, we live in a

world with so few biomarkers useful to early diagnosis of cancer.

Like Sanger sequencing of DNA, protein identification by MS

requires separating products, often chromatographically or

electrophoretically in addition to within the mass analyser.

Next-generation sequencing is defined by its ability to eliminate

this exact step, thereby increasing speed, decreasing costs

and improving accuracy. Fortunately, there also exists the

means to make novel discovery of proteins without breaking

or separating them. It is over 90% reproducible, informs PTM

and has no bias to protein concentration (9). It benefits from

enzymatic amplification but does not require antibodies to

specific proteins and makes novel discovery across 20,000

proteins in every experiment, a number inaccessible to MS.

Microarray, Meet Lysate

The process begins by printing picolitre quantities of

thousands of purified human proteins in spatially addressable

locations onto slides coated with thin-film nitrocellulose.

Incubation of these arrays with complex samples allows

enzymes that modify proteins to act on them as if they

were endogenous to the lysate or sera. The concentration

of arrayed proteins is mathematically ‘infinite’, meaning that

enzymes seeking to alter them encounter virtually unlimited

supply, thus affording enzymatic amplification of their

modification signatures. Modifications occur in the presence

of intact multi-component complexes, competing substrates

and antagonistic enzymatic activities. Since proteins do

not operate independently or in isolation, this yields far

more physiologically relevant activity and information.

Modifications to specific proteins are quantified with

antibodies directed against PTM.

Experiments are always differential including an

experimental and a control array, revealing changes to PTM

across 20,000 potential substrates. This eliminates structural

artefacts from protein misfolding, since a protein folded

80% correctly on the case array will also be 20% misfolded

on the control array. Randomised deposition of proteins

to the nitrocellulose ensures that virtually all epitopes

are exposed to the aqueous phase. Their utility is limited

only by imagination and the ability to identify a control

sample. Comparing wild type to knock-out cells can reveal

substrates or interacting partners of a DUB, phosphatase or

any protein, while equating drug to vehicle-treated cells can

reveal mechanisms of action, inform efficacy in a preclinical

setting, suggest companion diagnostic biomarkers or reveal

opportunities for companion therapy.

By gathering and interpreting the deepest pool of biological

information, it will become feasible to preclinicallycharacterise

safety, enabling proper candidate prioritisation. Informatics has

progressed to such a degree that even without understanding

each individual event, we can obtain a very good sense of

what was happening at a biological level by looking across

them. Cells contain the molecular pathways and switches that

drive all bodily responses. When phosphorylation changes

are observed among mechanistic target of rapamycin, AMPactivated

protein kinase, unc-51-like-kinase-1 and WD repeat

domain phosphoinositide-interacting protein 2, the drug has

induced autophagy. When we see changes to PTM of certain

mitochondrial proteins, we may suspect it induces apoptosis.

The first demonstration of this technique came in 2009

from the Kirschner group’s extract-based functional assay

(EFA) (10). A quorum of publications validate that enzymes

retain specificity for substrates even within lysate (11-14).

EFA-based screens report false positives and negatives

like any, but this is irrelevant for many applications. If

phosphorylation of protein X discriminates serum of

responders from non-responders, it is immaterial to clinical

utility whether or not protein X is phosphorylated in vivo.

Likewise, the fact that real events are missed is acceptable;

we need one biomarker capable of discriminating response,

not every one theoretically possible.

Protein arrays quantitate changes to PTM of proteins between

samples, not the binary levels thereof. Absolute quantification

will remain the prerogative of AQUA MS. Relative quantification

will continue utilising label-free, metabolic, isotopic and isobaric

approaches and, for better reproducibility, many will choose

RNAseq accepting that messenger RNA levels correlate with

protein levels imperfectly. People intuitively seek information

about changing levels of protein, but proteins are not static

things occasionally going up or down in expression – they

are societies of networked individuals that evolve with their

surroundings in concert as directed.

Future Outlook

Thousands of years from now, when biology is finished

and every detail within every cell type over every period of

development has been elucidated, investigators might ask

‘what do we do now?’ Assuming they have not cured all

disease, they will use the information to construct models that

reveal pathological processes and the effects of investigational

molecules intended to cure them. Their model will be a

very expensive and well-understood cell lysate. Let us forgo

millennia of great but ultimately unnecessary knowledge

concerning assembly instructions and begin looking properly

at that same model now. There is a lot to be done, and we

know quite enough already to make it extremely useful

and informative.

References

1. Pan Q et al, Deep surveying of alternative splicing complexity in

the human transcriptome by high-throughput sequencing, Nature

Genetics 40: pp1,413-1,415, 2008

2. Bourdon JC et al, p53 isoforms can regulate p53 transcriptional

activity, Genes & Development 19: pp2,122-2,137, 2005

3. Merbl Y and Kirschner MW, Protein microarrays for genome-wide

posttranslational modification analysis, Wiley Interdisciplinary

Reviews: Systems Biology and Medicine 3(3): pp347-356, 2011

4. Okiyoneda T et al, Peripheral protein quality control removes

unfolded CFTR from the plasma membrane, Science 329(5,993):

pp805-810, 2010

5. Sherr CJ and Roberts JM, CDK inhibitors: Positive and negative

regulators of G1-phase progression, Genes & Development 13(12):

pp1,501-1,512, 1999

6. Anderson NL et al, The human plasma proteome: A nonredundant

list developed by combination of four separate sources, Molecular

& Cellular Proteomics 3: pp311-326, 2004

7. Hunt DF et al, Protein sequencing by tandem mass spectrometry,

Proceedings of the National Academy of Sciences of the United

States of America 83(17): pp6,233-6,237, 1986

8. Catherman AD et al, Large-scale top-down proteomics of the human

proteome: Membrane proteins, mitochondria, and senescence,

Molecular & Cellular Proteomics 12(12): pp3,465-3,473, 2013

9. Loch CM et al, Use of high density antibody arrays to validate and discover

cancer serum biomarkers, Molecular Oncology 1(3): pp313-320, 2007

10. Merbl Y and Kirschner MW, Large-scale detection of ubiquitination

substrates using cell extracts and protein microarrays,

Proceedings of the National Academy of Sciences of the United

States of America 106(8): pp2,543-2,548, 2009

11. Woodard CL et al, Profiling the dynamics of a human

phosphorylome reveals new components in HGF/c-Met signaling,

PloS One 8: e72671, 2013

12. Merbl Y et al, Profiling of ubiquitin-like modifications reveals

features of mitotic control, Cell 152(5): pp1,160-1,172, 2013

13. Loch CM and Strickler JE, A microarray of ubiquitylated proteins

for profiling deubiquitylase activity reveals the critical roles of

both chain and substrate, Biochimica et BiophysicaActa 1823(11):

pp2,069-2,078, 2012

14. Del Rincon SV et al, Development and validation of a method for

profiling post-translational modification activities using protein

microarrays, PloS One 5: e11332, 2010

About the author

Christian Loch is co-founder and Director

of R&D at AVMBioMed. He is also adjunct

faculty in the Department of Chemistry at

Villanova University, US, where he teaches

proteomics in the graduate school. Christian

has 10 years of industry experience and

holds an MPH in Epidemiology from the University of

Washington, US, and a PhD in Biochemistry from the

University of Virginia, US.

Email: