2016-10-04



Scientific nomenclature follows a quirky path. First we had the gene. Then, scientists thought it would be convenient to lump all genes of an organism into a category called the genome (for gene + soma, L. body from chromosome). When epigenetics entered the discussion, we now had the epigenome, along with derivative terms genomics and epigenomics. Don't forget proteins -- though, they needed a term for the set of all proteins in a cell: the proteome.

The study of that is called proteomics. These days you can read about the lipidome (the totality of lipids in a cell), the metabolome (the metabolic players in a cell), and even the interactome (all the interactions in a cell). All these subjects merge into a higher-level category called -omics. The interaction of all omics categories is called economics (not really; that part is a joke).

Now that the genome is familiar, the study of proteomics is coming of age. Two recent papers show why proteomics is attracting so much attention. In a Nature review, Ruedi Aebersold and Matthias Mann show how "Powerful mass-spectrometry-based technologies now provide unprecedented insights into the composition, structure, function and control of the proteome, shedding light on complex biological processes and phenotypes." What can we expect with this new knowledge? Without referring to evolution once in their article, they list design-based benefits of proteomics:

The integration of various omics approaches and many perturbations will generate exponential flows of disparate data types. This will necessitate commensurate advances in bioinformatics and computational proteomics, which will be powered increasingly by machine-learning technologies while retaining their ability to generate biological insights. In this regard, the journey from single-protein analysis to a true understanding of the proteome and the importance of proteotypes will be long, challenging and exciting. [Emphasis added.]

Their first paragraph shows some of that excitement. Here are some "wow" facts they share about the proteins in a tiny yeast cell:

Collectively, proteins catalyse and control essentially all cellular processes. They form a highly structured entity known as the proteome, the constituent proteins of which carry out their functions at specific times and locations in the cell, in physical or functional association with other proteins or biomolecules. A proliferating Schizosaccharomyces pombe cell contains about 60 million protein molecules, which have abundances that range from a few copies to 1.1 million copies per expressed gene. Across the species, proteins constitute about 50% of the dry mass of a cell and reach a remarkable total concentration of 2-4 million proteins per cubic micrometre or 100-300 mg per ml (ref. 2). The extensive proteome network of the cell adapts dynamically to external or internal (that is, genetic) perturbations and thereby defines the cell's functional state and determines its phenotypes. Describing and understanding the complete and quantitative proteome as well as its structure, function and dynamics is a central and fundamental challenge of biology.

Aebersold and Mann take a "systems biology" view of the proteome, an inherently design-friendly perspective. Instead of viewing each protein molecule separately, they look at the proteome as an "integrated system." All these millions of proteins cooperate to contribute to the life and health of the cell, responding dynamically to perturbations, each playing its role to provide energy from nutrients, deliver cargo, translate and maintain genetic information, remove waste, and replicate. One surprising result comes from the systems biology view:

Present technology already enables analysis of the complete protein inventory of biological systems, including cell-type-specific proteomes of mammalian organs. One outcome of in-depth proteomics studies has been a demonstration of the extent to which diverse cellular systems have similar proteomes, with few proteins being uniquely detectable in specific situations. This surprising finding is supported by the Human Protein Atlas, a large-scale antibody-based study that also reports ubiquitous expression. The identity of cells and tissues therefore seems to be determined primarily by the abundance at which they express their constituent proteins, and perhaps by the manner in which the proteins are organized in the proteome, rather than the presence or absence of certain proteins.

Organization by Chance?

The ID-friendly findings of the "top down" systems approach contrast with statements in a second paper in Nature about "bottom up" protein design. Huang, Boyken, and Baker discuss "The coming of age of de novo protein design" wherein researchers hope to not only tinker with existing proteins, but develop brand new ones from first principles. To do that, they need to understand how an amino acid sequence determines folding patterns.

This paper is interesting because it relates to the work of Douglas Axe that resulted in a paper in the Journal of Molecular Biology in 2004. Axe answered questions about this paper earlier this year, and also mentioned it in his recent book Undeniable (p. 54). In the paper, Axe estimated the prevalence of sequences that could fold into a functional shape by random combinations. It was already known that the functional space was a small fraction of sequence space, but Axe put a number on it based on his experience with random changes to an enzyme. He estimated that one in 1074 sequences of 150 amino acids could fold and thereby perform some function -- any function.

The new paper in Nature seems to point to a much smaller functional space. The authors say,

It is useful to begin by considering the fraction of protein sequence space that is occupied by naturally occurring proteins (Fig. 1a). The number of distinct sequences that are possible for a protein of typical length is 20200 sequences (because each of the protein's 200 residues can be one of 20 amino acids), and the number of distinct proteins that are produced by extant organisms is on the order of 1012. Evidently, evolution has explored only a tiny region of the sequence space that is accessible to proteins. And because evolution proceeds by incremental mutation and selection, naturally occurring proteins are not spread uniformly across the full sequence space; instead, they are clustered tightly into families. The huge space that is unlikely to be sampled during evolution is the arena for de novo protein design. Consequently, evolutionary processes are not a good guide for its exploration -- as discussed already, they proceed incrementally and at random. Functional folded proteins have been retrieved from random-sequence libraries, but this is a laborious (and non-systematic) process. Instead, it should be possible to generate new proteins from scratch on the basis of our understanding of the principles of protein biophysics.

Since 20200 is about 10260, and the space actually sampled by living organisms is 1012, the numbers differ by at least 240 orders of magnitude for proteins of length 200, or about 183 orders of magnitude the 150-amino-acid chains Axe used. No wonder the authors say that "the natural evolutionary process has sampled only an infinitesimal subset" of sequence space.

The authors have nothing but their imagination to suggest that evolution restricted its search to functional clusters. Any random search has no possible chance, using all the atoms in the universe for the entire age of the universe, of finding a functional cluster in such a vast space. Dembski said that any search for a target that has less than 1 chance in 10150 exceeds the universal probability bound; it will never happen anywhere in the entire history of the universe.

Axe's estimate of one in 1074, one must note, referred to mutations to existing proteins in the universal proteome of all organisms. When considering random chains of amino acids in a primordial soup, however, Steve Meyer noted in Signature in the Cell (pp. 210-212) two other requirements. The amino acids must be one-handed, and they must form only peptide bonds. Applying generous probabilities of 0.5 for handedness and 0.5 for peptide bonds, Meyer reduced the probability for a lucky functional protein chain of 150 amino acids to one in 10164, far beyond the universal probability bound (p. 212).

With these numbers in mind, note the incredible faith that Huang, Boyken, and Baker invest in blind chance. We end with this quote:

Proteins mediate the fundamental processes of life, and the beautiful and varied ways in which they do this have been the focus of much biomedical research for the past 50 years. Protein-based materials have the potential to solve a vast array of technical challenges. Functions that naturally occurring proteins mediate include: the use of solar energy to manufacture complex molecules; the ultrasensitive detection of small molecules (olfactory receptors) and of light (rhodopsin); the conversion of pH gradients into chemical bonds (ATP synthase); and the transformation of chemical energy into work (actin and myosin). Not only are these functions remarkable but they are encoded in sequences of amino acids with extreme economy. Such sequences specify the three-dimensional structure of the proteins, and the spontaneous folding of extended polypeptide chains into these structures is the simplest case of biological self-organization. Despite the advances in technology of the past 100 years, human-made machines cannot compete with the precision of function of proteins at the nanoscale and they cannot be produced by self-assembly. The properties of naturally occurring proteins are even more remarkable when considering that they are essentially accidents of evolution. Instead of a well-thought-out plan to develop a machine to use proton flow to convert ADP to ATP, selective pressure operated on randomly arising variants of primordial proteins, and there were also hundreds of millions of years in which to get it right.

Now ponder that. They are duly impressed by the intricate molecular machines that proteins make in the cell, yet their worldview does not allow them to consider this as evidence for design.

Photo: Protein pattern analyzer, by cancer.gov [Public domain], via Wikimedia Commons.

Show more