Skip to content


A biophysicist teaches himself how to code

Category Archives: Uncategorized

Yeah, I'm procrastinating.

One of my favorite things to do is to try new ways to visually represent (in 2D) a complex three-dimensional protein structure. It’s an interesting challenge because it’s all too easy to end up with a protein backbone that looks like a pile of spaghetti from which limited useful information can be drawn.

For some time my go-to package of choice was VMD. Specifically I really appreciated the ambient occlusion lighting effects that have been incorporated into its Tachyon ray tracer, which can generate really stunning figures.

Ribosome rendered with VMD

Another “rendering mode” that I’m a big fan of are the illustrations of David Goodsell, most familiar from the PDB Molecule of the Month feature.

RNA polymerase by David Goodsell

Although these illustrations are often low in information content, they have a simplistic beauty that really appeals to me.

So recently I have been searching for a way to combine these two methods in a way that I can further modify to fit my needs. VMD itself has updated recently to include “outline rendering” in a manner similar to the Goodsell illustration style, but unfortunately this requires graphics hardware that I don’t have on my main machine. I’m also not a fan of VMD from the manipulation standpoint. Atom selections, accurate rotations (e.g. exactly 180 degrees) etc. are complex operations in this software package.

I’ve gone back to using another wonderful visualization package, PyMol. I find that it hits the sweet spot between easy setup of the scene I’d like and generating nice figures.

The specific feature that I’ve come to rely on quite heavily is the built-in ray tracer. There are three available ray tracing modes in addition to the default, each of which has its uses. Mode 1 will place a black outline around your structure, which can help make the secondary structure elements visually distinct. Mode 2 is really interesting, in that it only renders the outline. I find this especially helpful if I want to show something in an overlay without obscuring what is behind it. Mode 3 produces “quantized” color in addition to the outline, giving your figure a very cartoonish appearance. I find that this one has to be used with care 🙂

Anyhow, let’s make a few figures just for kicks. As usual I’ll be using my favorite protein structure, alpha hemolysin (PDB code 7AHL). You can load files directly from the PDB using the built-in PDB Loader plugin of pymol. For this demo I’m rendering all but one chain as a gray cartoon, and rendering the last as a blue molecular surface. I also turn of specular reflections (Display -> Specular Reflections) because I don’t like them

Here are the commands I enter on the command line to generate the image:

bg_color white
set antialias, 2
ray 600, 600

This took about 5 minutes to render on my underpowered laptop. You write out the image with (e.g.):

png mode_0.png

And gives this result:

Default ray tracing mode

Now let’s look at the other fun modes 🙂 Just enter set ray_trace_mode, 1 into the command line and repeat the ray tracing and png saving steps above. Iterate through the three modes and you end up with the following figures (click for larger versions):

Ray tracing mode 1

Ray tracing mode 2

Ray tracing mode 3

You can see that each of these has a different look, which may or may not be useful depending on the figure you are trying to produce. I’m finding that it’s especially useful to do a couple of renders (e.g. one in mode 2 and another in mode 1) and combine them via a little bit of post-processing in the GIMP.

I managed to find a paper in which some of the analysis I’ve been working on had been done. Unfortunately the raw results of the analysis were just that – raw. Specifically they had been dumped into a 6.8 MB text file as a supplement to the paper.

In order to extract the information I was interested in, and to prove to people who read this that I don’t solve all of my problems with Python, I thought I’d share the quick code I used.

First of all, I wanted all of the lines that reported proteins from humans. This turned out to be workable by running:

cat infile.txt | grep 'Homo sapiens' > oufile.txt

This gave me a long list which helpfully had each line starting with the NCBI GI number for the protein of interest. To extract the GI numbers alone involved:

cat outfile.txt | cut -c 1-11 > GI_list.txt

then to trim the whitespace:

sed 's/^[ \t]*//;s/[ \t]*$//' GI_list.txt > GI_list.txt

(this last one took some help from the handy sed one-liners page)

The entire process took about 1/4 of the time I’ve just taken writing it up, and I now have a nicely-formatted 11 kb file which I can use as input to my next round of tasks.

I’m giving a talk/interview later today, so I won’t have time to tinker much more with the PDBcleaner until tomorrow. I thought that instead I’d take this opportunity to write a post about this blog.

As I’m sure you can tell from the posts that are already here, I’m a novice programmer. I’ve always had an affinity for computers, and usually show an aptitude for getting the various scientific applications that we use in the lab to do what I want, but it was only recently that I realized the power of rolling your own code.

I also looked around and realized that the main issue in a lot of areas of science these days is not collecting data, it’s analyzing the huge amount of data that comes in to pull out useful information. The best way to do this is with some sort of program of course.

So I wrote a few very simple things which we were able to use to do some of this heavy lifting during my Ph.D. work. Now that I’ve graduated and am looking for a post-doc, I decided it was as good a time as any to “go to the programming gym” as it were, and make a concerted effort to get better at it. Hence this site. I thought about just setting up a github account or something like that, but I wanted a place where I could describe at length what I was trying to do, the problems I was having, and other thoughts along the way.

I’ve already realized that I’ve got a lot to learn. The process may be slow, but as long as progress continues I think I’ll be happy. I came across a thread on Reddit today that makes me feel better about the usefulness of this exercise.

Thanks for reading, and huge thanks in advance for any constructive criticism/advice.

Tags: ,

Link to the source article

Obviously this blog is my attempt at stimulating number 1 (Read), reporting on #2 (Write), and allowing for #3 (Review). I’m not sure if I’ve got the chops to do #4 (Contribute) just yet

Tags: ,