Skip to content

Database Structure

The D3TaLES data is segmented into two databases to accommodate the complexity and breadth of the data collected in D3TaLES—one for raw data (backend) and the other for processed data (frontend). Both databases will be accessible through the D3TaLES REST API.

Backend: contains raw data parsed directly from data source files, whether those be computational outfiles, experimental datafiles, or published articles.

Frontend: holds processed data derived from raw data including meta properties such as redox potential or reorganization energy

Data Workflow

Data Workflow

Data to be processed and inserted into the databases is uploaded through the website. Upload data include data from individual D3TaLES members, high-throughput computation, and robotic experimentation. After upload, the files are parsed. The raw data parsed from these files is inserted into the backend database, and the source files are sent to storage. Next, if the user or data source approves the parsed data, automated scripts perform calculations with the raw data and populate the frontend database. The data in the frontend database are displayed on the website with an interactive user interface.

Backend Structure

The backend database contains data parsed directly from experiment data files. For example, the backend database might hold a molecule’s ground state HOMO energy, which is found in a DFT .log output file. It might also hold an array of CV datapoints, which are found in a CV .csv output file. Thus, the schema for this database relates directly to the datafiles that will supply data. Because D3TaLES will contain many types of data, this backend schema has many sub-schemas—one for each type of datafile. The sub-schemas will share common fields such as mol_id, submission_info, and data. The structure of the data field will be specific to each datafile.

Computation

  • Molecular DFT: The structural and electronic information derived from molecular DFT calculations on a molecule entry in database. Specific Properties

  • Periodic DFT: The structural and electronic information derived from periodic DFT calculations on a system using molecule instance(s) in the database. Specific Properties

Experimentation

  • Cyclic Voltammetry (CV): CV data extracted from a datafile produced during a CV experiment on a molecule entry in database. Specific Properties

  • Infrared Spectroscopy (IR): IR data extracted from a datafile produced during a IR experiment on a molecule entry in database. Specific Properties

  • UV-Vis Spectroscopy (UV-Vis): UV-Vis data extracted from a datafile produced during a UV-Vis experiment on a molecule entry in database. Specific Properties

Natural Language Processing (NLP)

Machine Learning (ML)

Synthesis

Frontend Structure

The frontend database holds data that are more useful for analysis. For example, the frontend database might contain a HOMO-LUMO gap calculated from the HOMO and LUMO energies found in the backend data. Likewise, the frontend database might contain an estimated redox potential calculated from cyclic voltammetry (CV) curve peaks. The base unit for the frontend database is a molecule. Each molecule has several base fields including its ID, accessibility status, and general molecular information -- such as molecular formula and SMILES string.

  • _id: A unique identifier for the molecule
  • public: Boolean value. If true, molecule is public outside of D3TaLES. If false, it is not.
  • mol_info: The basic structural information derivable from the molecule's SMILES structure. Specific Properties
  • mol_characterization: Chemical properties pertaining to the molecule (such as redox potential or reorganization energy).Specific Properties
  • species_characterization: Chemical properties pertaining to a charge species of the molecule (such as ground state energy or cation HOMO LUMO gap). Specific Properties
  • synthesis: Details of 3D geometry of the molecule including charge species. Specific Properties
  • related_literature: Collection of DOIs from natural language processing (NLP). Specific Properties