Introduction:-
Who doesn't love something extra in everything? What if you can store all your digital data and memories from your childhood and access it whenever you need them or pass on to your next generation. Yes it is possible with the help of DNA. Yes the same DNA from our own genome, produced artificially in labs to store the information.
Recent technological advances in bio molecular science and
bio technology are allowing us to synthesize DNA from scratch. These advancing
methods are becoming cheaper and more efficient as time goes by. Bringing with
it a variety of practical applications one of which is data storage on DNA.
Here we will know how data can be stored within DNA
and why this technology has the potential to out-compete current storage
methods. Now before we dive into the biological aspects, let me explain to you
how current storage methods work using the flash drive or the USB.
As an example
essentially a flash drive is just a platform with millions of tiny transistors.
These transistors are set up in circuits which interact, store information for
your data. So we have one of many tiny transistors. These devices may vary how they
relay the information, but their storage mechanisms are all the same.
How Does a Flash Drive store your data...
Open view of a Hard Drive |
There is this oxide layer sandwiched between a control gate
and a floating gate. This layer has the ability to store electrons without the
need for power. This layer can be charged pushing the electrons through and
relaying the bit information.
These transistors can be organized to store very complex information
using this mechanism. There are however drawbacks to using flash storage. For
instance if you were to damage and crack this oxide layer by let's say
corrosion or by dropping it, this crack can lead to the leaking of electrons
from the oxide layer corrupting all of your precious information.
Why choose DNA?
The durability and the
data storage capacity of these flash drives is enhanced every year but the danger of data
corruption by physical damage will always be there.
Now what if I told you that DNA can bypass this problem. Deoxyribonucleic
acid or DNA for sure is a biological molecule composed of nucleotides which
code for the all genetic information in all forms of life. It exists in a
double helical form with four nucleotide compositions.
First off in DNA we have the
purine molecule guanine. This purine binds with its pyrimidine complement
cytosine. Secondly we have the purine adenine which binds to its complement
thymine.
So you may be thinking
how can this biological molecule store our personal data. Sure it can store
genetic information, but how can we mess with it to make it store pictures,
books, movies and so on.
Let's now learn why
scientists nominated DNA as a potential data storage molecule, and the methods
they developed which allow us to store our personal information on DNA.
So why
George Church and his colleagues did described in their 2012 publication the
advantages of using DNA as a new platform for data storage?
Reason behind choosing DNA to store digital data:
- Firstly the reason that DNA had natural data storage capabilities as we know it stores a complex array of genetic information as a result in nature DNA is constantly being read and written through enzymes and other bio molecules.
- Second reasoning was its resilience. DNA can withstand a large range of temperatures without degradation. It's also been shown that the information on DNA can still be read after degradation.
- Lastly DNA provides grounds for nonplanar information storage. It can be condensed into tiny spaces much smaller than the planar organization of a transistor circuit on flash drives. So with these different aspects in mind let's have a look at how George church and his team went about storing non genetic data on DNA.
They used the book how
synthetic biology reinvent nature and ourselves as a sample of text to store. The
words within this book were converted into binary code or a big code. This code
was then further translated into a nucleotide sequence.
Following this the desired nucleotide sequences were synthesized from scratch encrypting the data
into nucleotide libraries.
How can you access this information?
DNA converted into binary codes |
You can do this by sequencing the nucleotide libraries using
next-generation sequencing. The sequence DNA can then be decoded revealing the
original bit language and further decoded to reveal the original words from the
book. So let's have a closer look at this process. You begin with the words from
your book, and it can be rewritten into a bit language. Essentially this
language replaces all of the letters, spaces, numbers, upper cases, lower cases
in any other literary aspects of a typical book with this unique code this bit
language is just a combination of ones and zeroes.
You can further translate this bit language into a
nucleotide sequence. George and his team did this by writing a code which allowed
them to convert the whole book into bits then further into nucleotides. They
also incorporated 19 bit barcode into each segment of their book to allow for
rapid identification of each nucleotide sequence within the whole library.
Overall they encoded
one bit per base turning the whole 5.2 7 mega bit book into 54,000 898 nucleotides.
These sequence also avoided four or more nucleotide repeats and had a balanced
G-C content after they had their desired sequences determined. The DNA
oligonucleotides were compiled using phosphoramidite chemistry which is a
three-step chemical process which essentially adds single nucleotides on top of
each other one by one. This process involves with the protection stage based
coupling stage in an oxidation stage.
Once the desired illegal
sequence is achieved it is then capped by acetylation these illegal nucleotide
strings are then compiled in a microarray chip. The legal libraries with
sequences corresponding to the bit code of the original book.
Accessing the information back:
Let's now know about how George and his team were able to
access their encoded information. They used next generation Illumina sequencing
which is a very rapid way of sequencing DNA. First DNA from a micro array
library undergoes reduced cycle PCR amplification through this amplification
additional motives are introduced to the ends of the oligonucleotides.
The oligos are then isothermally
amplified on a flow cell. These flow cells have only oligos bound to their
surface which are complementary to the motives which were added to each other
go during PCR the illegals again cluster amplified by different polymerizes
resulting in clonal amplification of all nucleotide fragments.
Following this
fluorescently tagged nucleotides are added to the single-stranded oligoes which
are detected and allowed for sequencing by synthesis. The emission length and intensity
of each bound nucleotide is analysed and determined using a computer software.
The data of millions
of different nucleotides is compiled and the sequences are determined with a
high degree of accuracy. Following this George and his team aligned the
sequence data using the incorporated 19 nucleotide barcode and utilized their
bits to DNA code to convert the sequence back into bit language and from there
back into words.
From this experiment
they were able to code 5.7 megabit bit stream in to 54,898 nucleotides
achieving 5.5 Peta bits per millimetre cubed. This greatly exceeds current
flash drive storage capacities. They also found an error rate of 10 bits per
five point two seven million proving that DNA can be a highly reliable mode of
data storage.
Drawbacks of this process:
1. Even though DNA has a much higher storage capacity and a
larger durability than conventional transistor circuitry, its data storage
abilities present some pretty significant drawbacks.
2. For instance the
actual legal synthesis and sequencing are all limiting steps in this process
they are far slower than conventional methods.
3. Furthermore the cost
of these technologies is very high presenting yet another obstacle for DNA as
data storage.
Summary:
It would be sensible
to use this method of DNA data storage for more archive type storage of very large
files. It is also important to note that DNA synthesis and sequencing technologies
are advancing at a considerable rate and becoming more and more accessible as
the years go by.
In summary DNA has the potential to provide a solid ground
for a very large and stable data storage platform. Advances in our abilities to
both synthesizing, study DNA have shown us its potential regarding data storage.
Unfortunately the technologies which would allow us to perform such tasks efficiently
and in a cost-effective manner are still highly lacking with us. As previously
mentioned both DNA synthesis and sequencing methods are becoming more and more
accessible and cost-effective allowing for DNA data storage to become a reality
in the near future.
0 Comments
Please feel free to ask any queries regarding the topic in comments section below