Expanded alphabet of life

The alphabet of life as we know it is quite simple. Four letters — that is all you really need to speak the language of life. […]

The alphabet of life as we know it is quite simple. Four letters — that is all you really need to speak the language of life. Adenosine, thymidine, guanine and cytidine, or A, T, G and C, are the only nucleotides in which the genetic code of DNA is written. The reason why these four nucleotides were somehow selected during the course of evolution to be the building blocks of life is not obvious. In fact, scientists have been able to create synthetic chemical analogues of the four nucleotides for many years. Even though these analogues differ in their chemical structure from the classical nucleotides, the cellular machinery, which copies, amplifies and translates the genetic code, is capable of using them almost as effectively as the original four. A study from the Romesberg group in the Scripps Research Institute has recently shown that living organisms can also survive and successfully grow with these foreign nucleotides embedded in their DNA.

In 2014 the Romesberg group published the first paper describing a bacterium, E. coli, designed to use an unnatural base pair (UBP) in its genetic code. The UBP, d5SICS:dNaM, was successfully used to expand the genetic code and this semi-synthetic bacteria became the first example to suggest the viability of the six-letter genetic alphabet. Their new research, published this January in PNAS, expands this earlier work.

To import the nucleotide analogues into E. coli, researchers needed to add a special nucleotide transporter into the bacterial cell wall. The transporter (which actually comes from an algal cell) is quite toxic to bacteria and therefore required highly regulated expression. By removing a small part of the transporter protein, which mediates its localization, the researchers were able to significantly reduce its toxicity to the cell.

The first semi-synthetic organism also did not effectively retain UBP’s in its genome. After several rounds of replication, UBP’s would either mutate to one of the classical nucleotides or would be deleted from the DNA altogether. To select for the maintenance of UBP’s the scientists employed the now ubiquitous Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system. Traditionally CRISPR has been used in cell engineering to remove genes or change the gene coding sequences. The Romesberg group, however, used CRISPR as a self-destruction mechanism. By engineering CRISPR to recognize and induce degradation of DNA sequences that have lost their UBP’s, the group was able to select for cells that maintain them in their genome. UBP’s placed in the transfer RNA and green fluorescent protein genes were successfully retained even after a hundred cell cycles.

This research showed the potential for almost unlimited information storage in DNA. An expanded alphabet of life could potentially be used to introduce new regulatory elements into genomes, enabling scientists to more precisely regulate gene transcription and translation. However, these grand ideas are still far from being employable in practice. UBP-containing genetic code can so far only be used in simple organisms and UBPs themselves can only be added to DNA and not RNA. Still, hope persists that such research will result in useful changes to our definition of what the alphabet of life is somewhere down the line.


Zhang, Yorke, et al. “A semisynthetic organism engineered for the stable expansion of the genetic alphabet.” Proceedings of the National Academy of Sciences (2017): 201616443.

Malyshev, Denis A., et al. “A semi-synthetic organism with an expanded genetic alphabet.” Nature 509.7500 (2014): 385-388.

(featured image courtesy of MIKI Yoshihito on Flickr)

About Bernadeta Dadonaite