Database Structure

The D³TaLES data is segmented into two databases to accommodate the complexity and breadth of the data collected in D³TaLES—one for raw data (backend) and the other for processed data (frontend). Both databases will be accessible through the D³TaLES REST API.

Backend: contains raw data parsed directly from data source files, whether those be computational outfiles, experimental datafiles, or published articles.

Frontend: holds processed data derived from raw data including meta properties such as redox potential or reorganization energy

Data Workflow

Data to be processed and inserted into the databases is uploaded through the website. Upload data include data from individual D³TaLES members, high-throughput computation, and robotic experimentation. After upload, the files are parsed. The raw data parsed from these files is inserted into the backend database, and the source files are sent to storage. Next, if the user or data source approves the parsed data, automated scripts perform calculations with the raw data and populate the frontend database. The data in the frontend database are displayed on the website with an interactive user interface.

Backend Structure

The backend database contains data parsed directly from experiment data files. For example, the backend database might hold a molecule’s ground state HOMO energy, which is found in a DFT .log output file. It might also hold an array of CV datapoints, which are found in a CV .csv output file. Thus, the schema for this database relates directly to the datafiles that will supply data. Because D³TaLES will contain many types of data, this backend schema has many sub-schemas—one for each type of datafile. The sub-schemas will share common fields such as mol_id, submission_info, and data. The structure of the data field will be specific to each datafile.

Computation

Molecular DFT: The structural and electronic information derived from molecular DFT calculations on a molecule entry in database. Specific Properties
Periodic DFT: The structural and electronic information derived from periodic DFT calculations on a system using molecule instance(s) in the database. Specific Properties

Experimentation

Cyclic Voltammetry (CV): CV data extracted from a datafile produced during a CV experiment on a molecule entry in database. Specific Properties
Infrared Spectroscopy (IR): IR data extracted from a datafile produced during a IR experiment on a molecule entry in database. Specific Properties
UV-Vis Spectroscopy (UV-Vis): UV-Vis data extracted from a datafile produced during a UV-Vis experiment on a molecule entry in database. Specific Properties

Natural Language Processing (NLP)

Machine Learning (ML)

Synthesis

Frontend Structure

The frontend database holds data that are more useful for analysis. For example, the frontend database might contain a HOMO-LUMO gap calculated from the HOMO and LUMO energies found in the backend data. Likewise, the frontend database might contain an estimated redox potential calculated from cyclic voltammetry (CV) curve peaks. The base unit for the frontend database is a molecule. Each molecule has several base fields including its ID, accessibility status, and general molecular information -- such as molecular formula and SMILES string.

_id: A unique identifier for the molecule
public: Boolean value. If true, molecule is public outside of D³TaLES. If false, it is not.
mol_info: The basic structural information derivable from the molecule's SMILES structure. Specific Properties
mol_characterization: Chemical properties pertaining to the molecule (such as redox potential or reorganization energy).Specific Properties
species_characterization: Chemical properties pertaining to a charge species of the molecule (such as ground state energy or cation HOMO LUMO gap). Specific Properties
synthesis: Details of 3D geometry of the molecule including charge species. Specific Properties
related_literature: Collection of DOIs from natural language processing (NLP). Specific Properties