Accessible surface area with the Shrake-Rupley algortihm

Let’s start this post with an undeniable axiom: molecules exist in space. They are three-dimensional complexes of atoms that move around. These molecules might fold in on themselves or interact with other molecules. The pertinent questions: what is the boundary between our molecule and its surrounding space? Can we compute the area of this interface? What might this quantity tell us about the internal structure of our molecule?

Earth-Space Boundary.png

At a scale much larger than biological entities, we can imagine a similar question. What is the boundary between our Earth and its medium, the vacuum of space? Well, answering that question turns out to be somewhat trivial due to the Earth's regular shape. The Earth is a sphere (fine; an oblate spheroid if you want to be pedantic), and we can directly compute the surface area of a sphere, which happens to be 4πr2 where r is the sphere's radius. This is, in other words, the shared space between our spherical Earth and the rest of the universe. There is, of course, more area inside of the Earth, but that is internal and not a part of our universe-boundary. Importantly, demonstrating that the Earth has a surface area approximately matching a sphere is a great argument that the Earth is spherical, which tells us, for instance, that the the Earth is isotropic without a preferential axis. (I concede that the Earth is, in fact, smushed at its poles, so this isn't true.)

For biological entities, understanding boundaries is trickier because the shapes of molecules are complicated and not spherical or any other regular geometrical shape. This means that we need to get into the nitty-gritty details of our molecule to compute its surface area. Also, the medium is usually not the vacuum of space; there's not nothing but something there. First, let's re-enforce that our molecule is just a collection of atoms. Any individual atom, via typical biomolecular simulations, will have an x, y, and z position indicating where it's located in space and a van der Waals radius defining the atom's size. A single atom behaves much like the Earth did in our simple example; it's a sphere. If the atom exists alone without any other atoms, the surface area will simply be 4πr2 where r is the atom's van der Waal radius. Let's refer to this as the idealized maximum area of the atom.

 
SASA.png
 

If our system contains more than one atom, we need to think about how atoms impact one another. In the best case scenario, if atoms are sufficiently far apart, their total accessible area is just the sum of their individual maximum areas. If an atom is impacted by other atoms, its area is a proportion less than unity to its maximum. To determine this proportion, we are going to perform a logical test. We are going to randomly pick a spot on the van der Waals surface of our atom (let's call our atom Ivan). If that spot is closer to another atom (Jeff) than Jeff's own van der Waals radius, then we consider that location on Ivan to be impacted by Jeff. We can do this as many times as we like, keeping track for Ivan how many of its random surface points are impacted or not by other atoms. Then, we can compute the accessible surface of Ivan to be number of points that are conflict free divided by the total number of tests we did multiplied by Ivan's maximum area. This gets us the area for Ivan, but we can follow a similar procedure to get the area for atoms Jeff, Kim, Larry, whatever.

This procedure thus far has still presumed that the medium is a vacuum. What if the medium is something else? We can devise an algorithm to solve this problem that presumes we have a ball of a certain radius. This ball represents the molecule in our medium. We roll this ball across the surface of our molecule. Anywhere this ball fits on the surface of the molecule counts as part of the boundary. Anywhere it doesn’t fit is considered the interior of the molecule. The radius of this ball matters and is somewhat related to the coastline paradox. Simply put, this means that as our ball radius gets smaller that the surface area will grow. Luckily, we don't have to play around too much with the ball radius. For biomolecules, the medium is typically water, and we can subsequently give the ball a radius of 1.4 Å. The ball could represent other things, like SDS or other solvents, but these come with their own radii.

Remember above where we were determining Jeff's impact on Ivan the atom? Well, we determined that there was only an impact if a point on Ivan's surface was inside Jeff's van der Waals radius. This solves the problem for a ball of radius 0 Å. To solve for an arbitrary radius r, we need to make sure that the point on Ivan's surface must be farther away than any other atom Jeff's van der Waals radius plus this ball diameter (twice its radius r). In other words, the ball must be able to fit between Ivan and Jeff's van der Waals surfaces without breaching into their volumes. If this ball represents water, this computes what is known as Ivan's solvent accessible surface area (SASA).

Let's summarize briefly. We are devising a method to compute SASA, the amount of area available for our molecule to interact with water. We can fulfill this lofty goal through a couple of steps.

  1. Define ball radius r = 1.4 Å (for water).
  2. For an atom Ivan, create a random point on its van der Waals surface.
  3. Check if that point is within the ball radius r plus the van der Waals radius of another atom Jeff.
  4. Run steps 2 and 3 for a number N times and count the number of points P of Ivan's that were not impact by another atom.
  5. Compute SASA for Ivan as (P/N)4πr2.
  6. Continue steps 2-5 for all atoms in the system. Sum up SASA for every atom, and this gives SASA for your molecule.

The steps described are, in essence, the algorithm proposed by Shrake and Rupley in 1973. The procedure is described in very broad terms, and there's no specifics in terms of how to address this programmatically. For instance, we neglected to mention the computational complexity of this task. The steps above have a complexity of O(n2) because, for all atoms n, we are comparing that atom between all other n atoms. We can simplify this by setting up a pairlist to minimize the computational burden. We've considered these problems in LLTK produced by Lockhart Lab, which includes SASA in its analysis tools. The full code for the SASA analysis function can be viewed via our git repository. Basically, to get SASA up and running, we will do a few things highlighted in the code snippet below.


# Import LLTK
import LLTK.io
import LLTK.analysis

# Load in structure
structure=LLTK.io.read_pdb("my_protein.pdb")

# Compute SASA
sasa=LLTK.analysis.sasa(structure,probe=1.4)


In practice, this can get a little bit more complicated. Let's say that we have a molecule and want to compute SASA, but we only want to output SASA for a few atoms out of that entire molecule. Here, we will presume that, for our subset of atoms, they can still be potentially impacted by all atoms in the system. If we have a list of atom indices in our structure to designate the subset, we can modify our code to only output the contribution of that subset to total SASA.


# Define a subset of atoms (this particular subset is arbitrary)
subset=[0, 1, 2, 3]

# Compute SASA for subset
sasa=LLTK.analysis.sasa(structure,subset,probe=1.4)


This is particularly useful, say, if we had a protein and wanted to compute SASA for each amino acid in the protein. This information could directly tell us, for instance, which amino acids are on the surface and which are buried. We could also compute SASA for amino acids in the presence of some ligand and use the change in SASA between the bound and unbound states to tell us to which amino acids the ligand prefers to bind. This is precisely what we did in a previous work when we used SASA to probe whether ibuprofen polar or apolar atoms bind to Aβ peptides.

The are numerous other things we can do with SASA. For instance, SASA can be used to estimate the apolar contribution to the salvation free energy in Poisson-Boltzmann and Generalized Born implicit solvent models. Similarly, the change in SASA due to a ligand binding can be combined with atomic solvation parameters to estimate the binding free energy.