Database Structure
The D3TaLES data is segmented into two databases to accommodate the complexity and breadth of the data collected in D3TaLES—one for raw data (backend) and the other for processed data (frontend). Both databases will be accessible through the D3TaLES REST API.
Backend
: contains raw data parsed directly from data source files, whether those be computational
outfiles, experimental datafiles, or published articles.
Frontend
: holds processed data derived from raw data including meta properties such as redox potential or
reorganization energy
Data Workflow
Data to be processed and inserted into the databases is uploaded through the website. Upload data include data from individual D3TaLES members, high-throughput computation, and robotic experimentation. After upload, the files are parsed. The raw data parsed from these files is inserted into the backend database, and the source files are sent to storage. Next, if the user or data source approves the parsed data, automated scripts perform calculations with the raw data and populate the frontend database. The data in the frontend database are displayed on the website with an interactive user interface.
Backend Structure
The backend database contains data parsed directly from experiment data files. For example, the backend
database might hold a molecule’s ground state HOMO energy, which is found in a DFT .log
output file.
It might also hold an array of CV datapoints, which are found in a CV .csv
output
file. Thus, the schema for this database relates directly to the datafiles that will supply data. Because
D3TaLES will contain many types of data, this backend schema has many sub-schemas—one for
each type of datafile. The sub-schemas will share common fields such as mol_id
, submission_info
, and
data
. The structure of the data field will be specific to each datafile.
Computation
-
Molecular DFT: The structural and electronic information derived from molecular DFT calculations on a molecule entry in database. Specific Properties
-
Periodic DFT: The structural and electronic information derived from periodic DFT calculations on a system using molecule instance(s) in the database. Specific Properties
Experimentation
-
Cyclic Voltammetry (CV): CV data extracted from a datafile produced during a CV experiment on a molecule entry in database. Specific Properties
-
Infrared Spectroscopy (IR): IR data extracted from a datafile produced during a IR experiment on a molecule entry in database. Specific Properties
-
UV-Vis Spectroscopy (UV-Vis): UV-Vis data extracted from a datafile produced during a UV-Vis experiment on a molecule entry in database. Specific Properties
Natural Language Processing (NLP)
Machine Learning (ML)
Synthesis
Frontend Structure
The frontend database holds data that are more useful for analysis. For example, the frontend database might contain a HOMO-LUMO gap calculated from the HOMO and LUMO energies found in the backend data. Likewise, the frontend database might contain an estimated redox potential calculated from cyclic voltammetry (CV) curve peaks. The base unit for the frontend database is a molecule. Each molecule has several base fields including its ID, accessibility status, and general molecular information -- such as molecular formula and SMILES string.
_id
: A unique identifier for the moleculepublic
: Boolean value. If true, molecule is public outside of D3TaLES. If false, it is not.mol_info
: The basic structural information derivable from the molecule's SMILES structure. Specific Propertiesmol_characterization
: Chemical properties pertaining to the molecule (such as redox potential or reorganization energy).Specific Propertiesspecies_characterization
: Chemical properties pertaining to a charge species of the molecule (such as ground state energy or cation HOMO LUMO gap). Specific Propertiessynthesis
: Details of 3D geometry of the molecule including charge species. Specific Propertiesrelated_literature
: Collection of DOIs from natural language processing (NLP). Specific Properties