Machine Learning: Train a Neural Network for Regression¶
This tutorial demonstrates how to train a multilayer perceptron 1 for regression using Scikit-Learn 2.
1. Acquire training data¶
The data used in this tutorial is taken from a recent model 3 of small molecule adsorption to transition metal nanoparticles. Specifically, the dataset contains DFT-calculated adsorption energies of Β·CH3, CO, and Β·OH radicals on Ag, Au, and Cu nanoparticles ranging in size from 55 to 172 atoms.
This file contains the data used in this tutorial. A sample of the first 5 lines is shown below:
| PBE_BE_eV | CE_Local_eV | ChemPot_eV | MADS_eV |
|---|---|---|---|
| -1.39 | -2.38 | -4.96 | -2.10 |
| -1.11 | -3.35 | -4.96 | -2.10 |
| -0.95 | -4.81 | -4.96 | -2.10 |
| -0.74 | -4.60 | -4.96 | -2.10 |
2. Upload the training data¶
First, click the Dropbox button in the left sidebar to navigate to the Dropbox Page. Then click the Upload button, circled below:

When the browser's upload window appears, navigate to the downloaded file from section 1 and select it for upload. If the upload was successful, the file appears in the dropbox.
3. Copy the regression workflow from the bank¶
Next, click the Bank Workflows button in the left sidebar to navigate to the Bank Workflows Page. Search for the "Python ML Train Regression" workflow owned by the "Curators" account, and copy it to the account.
A diagram and detailed description of this workflow can be found here.
4. Create the ML job¶
Create a new job by clicking the Create Job button in the left sidebar. This opens a new job on the Job Designer page.
First, give the job a descriptive name, such as "Python ML Tutorial" (see below). Then, click the Actions Button (the three vertical dots in the upper-right of the job designer), and choose Select Workflow.

This brings up the Select Workflow dialogue. Search for "Python ML Train Regression" and select it.
5. Select the dataset¶
The job designer changes once the ML Training workflow is selected. The Materials tab is replaced with a Dataset tab. Just as the Materials tab shows a preview of the materials a job uses, the Dataset tab shows a preview of the selected dataset.

In order to select a dataset, click the Actions Button (the three vertical dots in the upper-right of the job designer) and choose Select Dataset. This brings up a files explorer containing all files on the dropbox. Select the training set uploaded earlier, "data_to_train_with.csv."
A preview of the data then appears on the dataset tab, indicating that the data has been loaded successfully.
6. Configure the workflow¶
With the ML workflow and training set selected, open the Workflows Tab to view the training workflow.
Two subworkflows are available: Set Up the Job and Machine Learning.
The Set Up the Job subworkflow contains instructions to copy in the training data.
Do not modify the setup subworkflow
The Set Up the Job subworkflow is automatically configured during the training process. Modifying it can disrupt creation of the Predict workflow, leading to inaccurate results or a failure to generate a predict workflow.
Select the Machine Learning subworkflow by clicking on it. The following workflow units should be visible:
Setup Packages and Variablesβ configures the job and downloads all required packages withpipData Inputβ reads the training data from diskTrain Test Splitβ splits the data into a training set and a testing setData Standardizeβ scales the data to mean 0 and standard deviation 1Model Train and Predictβ handles model training and predictionParity Plotβ draws a plot of model predictions versus training data and saves it to disk (displayed on the Results tab)
6.1. Set the target column¶
Open the Important Settings portion of the workflow editor. Set target_column_name to "PBE_BE_eV" to define the target column of the training set.

6.2. Adjust model parameters¶
Return to the Overview portion of the workflow editor. Select the Model Train and Predict workflow unit, as shown below:

Scroll down and change the hidden_layer_sizes argument from (100,) to (100,100) to create two hidden layers of 100 neurons each. Also change max_iter to 5000 to train for up to 5000 iterations.

Close the dialogue. The workflow is now configured.
7. Submit the job¶
Click the check-mark in the upper right of the job designer, in the Header Menu, to save the job. The job explorer page displays the job in a pre-submission status.

The job can now be run.
8. Analyze the training results¶
After a few minutes, the job completes. The job's Results tab shows two calculated properties. The first, Machine Learning - Model Train and Predict, is the predict workflow generated by the training job. This workflow can be used to apply the trained model to new data for additional predictions.
The second result is Machine Learning - Parity Plot, which contains the predicted versus actual values for the adsorption energies.

9. Video walkthrough¶
This tutorial is demonstrated in the following animation: