Skip to content

Machine Learning: Train Linear Regression

Deprecated tutorial

This tutorial uses the legacy ML engine. For current Machine Learning workflows, see Python ML tutorials or the MatterSim tutorial.

This tutorial demonstrates how to build a machine learning (ML) training model from a set of materials called "train materials". The model can then predict the properties of another set called "target materials", as described in a separate tutorial.

The Electronic Band Gap is the target property in this example, though the general approach works for many different properties.

1. Prepare the training set

The following stoichiometric combinations of silicon (Si) and germanium (Ge) are used to train the ML model. These structures each contain 16 atoms in a 2×2×2 supercell of the cubic-diamond primitive unit cell, and can be generated using combinatorial sets via Materials Designer:

  • Si₂Ge₁₄
  • Si₆Ge₁₀
  • Si₈Ge₈
  • Si₁₀Ge₆
  • Si₁₂Ge₄
  • Si₁₄Ge₂

The trained model can then predict the band gap of a target composition such as Si₄Ge₁₂, as described in this tutorial.

2. Obtain training data

2.1. Copy the workflow from the bank

A pre-assembled workflow for band gap calculations can be imported from the Workflow Bank into the account-owned collection. The import procedure is described in this page.

2.2. Create and run the job

Create a new Job using the Job Designer. Select all Si/Ge materials from the account-owned collection and add them to the job. Under the Workflow Tab, select the band gap workflow imported in the previous step. The job can then be executed.

3. Build and train the model

The "ML Train Model" workflow can be imported from the Bank by following the same import procedure. Create a new Job, selecting the "ML Train Model" workflow together with the Si/Ge materials for which the band gap was calculated. This allows the ML engine to build a model from the band gap data, which can then predict band gaps of similar materials.

The target properties (band gap in this case) can be selected by opening the unit editor for the "input" unit and scrolling to the "Targets" section.

4. Inspect the trained model

4.1. Retrieve the predict workflow

Once the model is trained, a new Workflow called "ml_predict" is generated and can be retrieved under the Results tab of Job Viewer. This workflow is automatically saved to the account-owned collection and can be used to predict properties of new materials without further physics-based simulations. The prediction procedure is described in a separate tutorial.

4.2. View model coefficients

Open the "ml_predict" workflow and view the "Score" unit in the unit editor, where model coefficients, feature importance, and model precision 1 are stored.

5. Video walkthrough

The animation below demonstrates the full procedure for building and inspecting an ML Train Model.