
From digital data to synthetic DNA
Preview
Information processing for efficient
molecular storage of digital data
Marc ANTONINI, DR CNRS
Pascal BARBRY, DR CNRS
Dominique LAVENIER, DR CNRS
The aim of the “From digital data to synthetic DNA” project is to make physical and logical
storage efficient with customised codes adapted to the physical and chemical constraints of
DNA writing and reading (in collaboration with partners from the targeted projects “next
generation DNA synthesis” and “synthetic digital polymers”).
Various compression strategies are being studied: transcoding to convert data into quaternary
code or even N-ary code when encoding with non-DNA polymers, direct encoding for certain
types of data (e.g. specific JPEG DNA encoder for images to be stored), structuring of
synthesised strands to allow random access to stored data, processing of third-generation
sequencing data, etc.
Keywords : information storage, big data, information compression, signal and image
processing, bioinformatics, sequencing
Missions
Our researches
Quantifying the constraints and signal
degradation caused by biotechnological
processes
Model the different types of errors in the storage chain (synthesis, packaging, long-term
degradation, molecule selection, sequencing). Design appropriate error-correcting codes to
achieve a good compromise between oligonucleotide size and quality.
Developing new solutions for encoding
structured and unstructured data
Introduce new joint source/channel coding and error correction strategies. Develop N-ary
codes in collaboration with the “synthetic digital polymers” project. Explore different DNA
storage architectures (based on transcoding, constrained coding and sampling).
Effectively retrieving data stored among
billions of DNA molecules
Specifically select the data to be retrieved without having to sequence all the stored DNA
molecules. Reconstruct a document spread across several million molecules using new scalable
reading consensus methods.
Adapting third-generation sequencing
to DNA storage
Bringing new solutions through third-generation sequencing, using in particular the UCA
Genomix platform in Sophia Antipolis. Developing cost-effective strategies for the complete
sequencing of large amounts of information (gigabytes or even terabytes) through the use of
new sequencing technologies.
Consortium
CNRS, Université Nice Côte d’Azur, INRIA, IMT Atlantique, EURECOM
Accelerate the development of new codes, new algorithms, and new strategies.
Build an emerging French research community on DNA coding.
Promote the training of a new generation of scientific leaders in the digital aspect of the DNA data storage process.
A community of 12 researchers and research professors, 2 post doctoral fellowships and 6 doctoral positions offered during the project, as well as the creation of 4 engineering positions and 2
junior chairs.
Laboratory of Computer Science, Signals and Systems of Sophia Antipolis (I3S – CNRS, Université Côte d’Azur)
Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA – CentraleSupélec, CNRS, ENS Rennes, IMT Atlantique, Inria, INSA Rennes, Inserm, Université Bretagne Sud, Université de Rennes)
Institute of Molecular and Cellular Pharmacology (IPMC – CNRS, Université Nice Côte d’Azur)
Lab-STICC “From sensors to knowledge: communicating and deciding” (Lab-STICC – CNRS, IMT Atlantique, UBO, UBS, ENIB et ENSTA Bretagne)
UCA Genomix platform, sequencing platform associated with the France Génomique infrastructure

Plus de projets