From digital data to bases - PEPR Molecularxiv

Preview

Information processing for efficient
molecular storage of digital data

Marc ANTONINI, DR CNRS

Pascal BARBRY, DR CNRS

Dominique LAVENIER, DR CNRS

The aim of the “From digital data to synthetic DNA” project is to make physical and logical
storage efficient with customised codes adapted to the physical and chemical constraints of
DNA writing and reading (in collaboration with partners from the targeted projects “next
generation DNA synthesis” and “synthetic digital polymers”).
Various compression strategies are being studied: transcoding to convert data into quaternary
code or even N-ary code when encoding with non-DNA polymers, direct encoding for certain
types of data (e.g. specific JPEG DNA encoder for images to be stored), structuring of
synthesised strands to allow random access to stored data, processing of third-generation
sequencing data, etc.

Keywords : information storage, big data, information compression, signal and image
processing, bioinformatics, sequencing

Missions

Our researches

Quantifying the constraints and signal
degradation caused by biotechnological
processes

Model the different types of errors in the storage chain (synthesis, packaging, long-term
degradation, molecule selection, sequencing). Design appropriate error-correcting codes to
achieve a good compromise between oligonucleotide size and quality.

Developing new solutions for encoding
structured and unstructured data

Introduce new joint source/channel coding and error correction strategies. Develop N-ary
codes in collaboration with the “synthetic digital polymers” project. Explore different DNA
storage architectures (based on transcoding, constrained coding and sampling).

Effectively retrieving data stored among
billions of DNA molecules

Specifically select the data to be retrieved without having to sequence all the stored DNA
molecules. Reconstruct a document spread across several million molecules using new scalable
reading consensus methods.

Adapting third-generation sequencing
to DNA storage

Bringing new solutions through third-generation sequencing, using in particular the UCA
Genomix platform in Sophia Antipolis. Developing cost-effective strategies for the complete
sequencing of large amounts of information (gigabytes or even terabytes) through the use of
new sequencing technologies.

Consortium

CNRS, Université Nice Côte d’Azur, INRIA, IMT Atlantique, EURECOM

Plus de projets

PEPtide Storage of Information using Spin COrrelation

An electronic chip for large-scale DNA writing

DNA Synthesis Next Generation