• Stars
    star
    144
  • Rank 255,590 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ASAP is a package that can quickly analyze and visualize datasets of crystal or molecular structures.

ASAP

Automatic Selection And Prediction tools for materials and molecules

Documentation (in progress)

Mapping Materials and Molecules (Paper link)

Cheng B, Griffiths RR, Wengert S, Kunkel C, Stenczel T, Zhu B, Deringer VL, Bernstein N, Margraf JT, Reuter K, Csanyi G. Mapping Materials and Molecules. Accounts of Chemical Research. 2020 Aug 14:12697-705.

This tutorial style paper covers:

  • A concise summary of the theory of representing chemical environments, an a simple yet practical conceptual approach for generating structure maps in a generic and automated manner.

  • Several illustrative examples on mapping material and chemical datasets, including crystalline and amorphous materials, interfaces, and organic molecules. The datasets of these examples are in this repo.

  • Snippets of asap commands that were used to analyze the examples and make figures.

Basic usage

Type asap and use the sub-commands for various tasks.

To get help string:

asap --help .or. asap subcommand --help .or. asap subcommand subcommand --help depending which level of help you are interested in.

  • asap gen_desc: generate global or atomic descriptors based on the input ASE) xyze file.

  • asap map: make 2D plots using the specified design matrix. Currently PCA pca, sparsified kernel PCA skpca, UMAP umap, and t-SNE tsne are implemented.

  • asap cluster: perform density based clustering. Currently supports DBSCAN dbscan and Fast search of density peaks fdb.

  • asap fit: fast fit ridge regression ridge or sparsified kernel ridge regression model kernelridge based on the input design matrix and labels.

  • asap kde: quick kernel density estimation on the design matrix. Several versions of kde available.

  • asap select: select a subset of frames using sparsification algorithms.

Quick & basic example

Step 1: generate a design matrix

The first step for a machine-learning analysis or visualization is to generate a "design matrix" made from either global descriptors or atomic descriptors. To do this, we supply asap gen_desc with an input file that contains the atomic coordintes. Many formats are supported; anything can be read using ase.io is supported. You can use a wildcard to specify the list of input files that matches the pattern (e.g. POSCAR*, H*, or *.cif). However, it is most robust if you use an extended xyz file format (units in angstrom, additional info and cell size in the comment line).

As a quick example, in the folder ./tests/

to generate SOAP descriptors:

asap gen_desc --fxyz small_molecules-1000.xyz soap

for columb matrix:

asap gen_desc -f small_molecules-1000.xyz --no-periodic cm

Step 2: generate a low-dimensional map

After generating the descriptors, one can make a two-dimensional map (asap map), or regression model (asap fit), or clustering (asap cluster), or select a subset of frames (asap select), or do a clustering analysis (asap cluster), or estimate the probablity of observing each sample (asap kde).

For instance, to make a pca map:

asap map -f small_molecules-SOAP.xyz -dm '[SOAP-n4-l3-c1.9-g0.23]' -c dft_formation_energy_per_atom_in_eV pca

You can specify a list of descriptor vectors to include in the design matrix, e.g. '[SOAP-n4-l3-c1.9-g0.23, SOAP-n8-l3-c5.0-g0.3]'

one can use a wildcard to specify the name of all the descriptors to use for the design matrix, e.g.

asap map -f small_molecules-SOAP.xyz -dm '[SOAP*]' -c dft_formation_energy_per_atom_in_eV pca

or even

asap map -f small_molecules-SOAP.xyz -dm '[*]' -c dft_formation_energy_per_atom_in_eV pca

Step 2+: interactive visualization

Using asap map, a png figure is generated. In addition, the code also output the low-dimensional coordinates of the structures and/or atomic environments. The default output is extended xyz file. One can also specify a different output format using --output or -o flag. and the available options are xyz, matrix and chemiscope.

  • If one select chemiscope format, a *.json.gz file will be writen, which can be directly used as the input of chemiscope

  • If the output is in xyz format, it can be visualized interactively using projection_viewer.

Installation & requirements

python 3

Installation:

pip install asaplib

or (after git clone https://github.com/BingqingCheng/ASAP)

pip install .

This should automatically install any depedencies.

List of requirements:

  • numpy scipy scikit-learn json ase dscribe umap-learn PyYAML click

Add-Ons:

  • (for finding symmetries of crystals) spglib
  • (for annotation without overlaps) adjustText
  • The FCHL19 representation requires code from the development brach of the QML package. Instructions on how to install the QML package can be found on https://www.qmlcode.org/installation.html.

Additional tools

In the directory ./scripts/ you can find a selection of other python tools.

Tab completion

Tab completion can be enabled by sourcing the asap_completion.sh script in the ./scripts/ directory. If a conda environment is used, you can copy this file to $CONDA_PREFIX/etc/conda/activate.d/ to automatically load the completion upon environment activation.

More Repositories

1

ML-in-chemistry-101

The course materials for "Machine Learning in Chemistry 101"
Jupyter Notebook
49
star
2

ab-initio-thermodynamics-of-water

“Ab initio thermodynamics of liquid and solid water” Bingqing Cheng, Edgar A. Engel, JÖrg Behler, Christoph Dellago and Michele Ceriotti
Jupyter Notebook
18
star
3

MLP-highP-H

Jupyter Notebook
15
star
4

Mapping-the-space-of-materials-and-molecules

Jupyter Notebook
10
star
5

Compute-absolute-Gibbs-free-energy

This repository contains detailed instructions, input files and data analysis Jupyter notebooks for performing free energy calculations using the thermodynamic integration method.
Jupyter Notebook
9
star
6

S0

Data repository for "Computing chemical potentials of solutions from structure factors"
Jupyter Notebook
8
star
7

ice-in-water

HTML
5
star
8

highP-ice

SI data for "Same and interconvertible high-pressure ice phases"
3
star
9

compute-heat-conductivity

SI for the paper "Computing the Heat Conductivity of Fluids from Density Fluctuations"
Liquid
3
star
10

neural-network-potential-for-water-revPBE0-D3

Neural network potential(s) for bulk ice and liquid water, including the training sets, benchmarks, examples on how to use, etc.
3
star
11

TiO2_random_search_pattern_recognition

Supplementary materials for "Predicting the phase diagram of titanium dioxide with random search and pattern recognition"
Jupyter Notebook
3
star
12

highp-ch

Data repository for the paper "Diamond formation from hydrocarbon mixtures in planets"
Python
1
star
13

human-intuition

Python
1
star
14

BingqingCheng.github.io

HTML
1
star
15

superionic-water

Mathematica
1
star
16

TiO2-water

Pascal
1
star
17

Compute-Tolman-length-4-solid-liquid-interfaces

The LAMMPS input files and data analysis scripts used in the paper "Communication: Computing the Tolman length for solid-liquid interfaces"
Tcl
1
star
18

linear-regression-benchmarks

Data sets used for linear regression benchmarks
Jupyter Notebook
1
star
19

nn-water-phase-diagram

The supplementary information and data sets of the paper "Quantum-mechanical exploration of the phase diagram of water"
Mathematica
1
star