Machine Learning: Train a Neural Network for Regression¶

This tutorial demonstrates how to train a multilayer perceptron ¹ for regression using Scikit-Learn ².

1. Acquire training data¶

The data used in this tutorial is taken from a recent model ³ of small molecule adsorption to transition metal nanoparticles. Specifically, the dataset contains DFT-calculated adsorption energies of ·CH₃, CO, and ·OH radicals on Ag, Au, and Cu nanoparticles ranging in size from 55 to 172 atoms.

This file contains the data used in this tutorial. A sample of the first 5 lines is shown below:

PBE_BE_eV	CE_Local_eV	ChemPot_eV	MADS_eV
-1.39	-2.38	-4.96	-2.10
-1.11	-3.35	-4.96	-2.10
-0.95	-4.81	-4.96	-2.10
-0.74	-4.60	-4.96	-2.10

2. Upload the training data¶

First, click the Dropbox button in the left sidebar to navigate to the Dropbox Page. Then click the Upload button, circled below:

Dropbox Page with Upload

When the browser's upload window appears, navigate to the downloaded file from section 1 and select it for upload. If the upload was successful, the file appears in the dropbox.

3. Copy the regression workflow from the bank¶

Next, click the Bank Workflows button in the left sidebar to navigate to the Bank Workflows Page. Search for the "Python ML Train Regression" workflow owned by the "Curators" account, and copy it to the account.

A diagram and detailed description of this workflow can be found here.

4. Create the ML job¶

Create a new job by clicking the Create Job button in the left sidebar. This opens a new job on the Job Designer page.

First, give the job a descriptive name, such as "Python ML Tutorial" (see below). Then, click the Actions Button (the three vertical dots in the upper-right of the job designer), and choose Select Workflow.

Job Designer with Circles

This brings up the Select Workflow dialogue. Search for "Python ML Train Regression" and select it.

5. Select the dataset¶

The job designer changes once the ML Training workflow is selected. The Materials tab is replaced with a Dataset tab. Just as the Materials tab shows a preview of the materials a job uses, the Dataset tab shows a preview of the selected dataset.

Dataset Tab

In order to select a dataset, click the Actions Button (the three vertical dots in the upper-right of the job designer) and choose Select Dataset. This brings up a files explorer containing all files on the dropbox. Select the training set uploaded earlier, "data_to_train_with.csv."

A preview of the data then appears on the dataset tab, indicating that the data has been loaded successfully.

6. Configure the workflow¶

With the ML workflow and training set selected, open the Workflows Tab to view the training workflow.

Two subworkflows are available: Set Up the Job and Machine Learning.

The Set Up the Job subworkflow contains instructions to copy in the training data.

Do not modify the setup subworkflow

The Set Up the Job subworkflow is automatically configured during the training process. Modifying it can disrupt creation of the Predict workflow, leading to inaccurate results or a failure to generate a predict workflow.

Select the Machine Learning subworkflow by clicking on it. The following workflow units should be visible:

Setup Packages and Variables — configures the job and downloads all required packages with pip
Data Input — reads the training data from disk
Train Test Split — splits the data into a training set and a testing set
Data Standardize — scales the data to mean 0 and standard deviation 1
Model Train and Predict — handles model training and prediction
Parity Plot — draws a plot of model predictions versus training data and saves it to disk (displayed on the Results tab)

6.1. Set the target column¶

Open the Important Settings portion of the workflow editor. Set target_column_name to "PBE_BE_eV" to define the target column of the training set.

Important settings with target column name set

6.2. Adjust model parameters¶

Return to the Overview portion of the workflow editor. Select the Model Train and Predict workflow unit, as shown below:

Workflows tab with ML train subworkflow and train unit circled

Scroll down and change the hidden_layer_sizes argument from (100,) to (100,100) to create two hidden layers of 100 neurons each. Also change max_iter to 5000 to train for up to 5000 iterations.

ML Train Neural Network with 2 Hidden Layers

Close the dialogue. The workflow is now configured.

7. Submit the job¶

Click the check-mark in the upper right of the job designer, in the Header Menu, to save the job. The job explorer page displays the job in a pre-submission status.

Jobs Tab with ML Training Calculation Set Up

The job can now be run.

8. Analyze the training results¶

After a few minutes, the job completes. The job's Results tab shows two calculated properties. The first, Machine Learning - Model Train and Predict, is the predict workflow generated by the training job. This workflow can be used to apply the trained model to new data for additional predictions.

The second result is Machine Learning - Parity Plot, which contains the predicted versus actual values for the adsorption energies.

Results Tab Showcasing Parity Plot

9. Video walkthrough¶

This tutorial is demonstrated in the following animation: