Skip to content

Biostumblematic

A biophysicist teaches himself how to code

Moving on to something a little more complicated than the previous two entries, I wanted to write something that would scan through a PDB file (or at least a file modified by the pdbcsvmaker) to look for contacts between individual protein chains.

As usual, when the programs I write get longer they get quirkier. Because PDB files don’t always have every column of data, we first have to calibrate where the stuff we want is. There is at least one caveat here, where I assume that the residue number is in the column immediate preceding the first coordinate. I’ll have to verify this with more PDBs to be sure that this is accurate.

Anyhow, the program then skips any close contacts between two chains and then pulls out any inter-subunit contacts within 5 angstroms. There is some duplication here, since it treats a contact from A–>B as distinct from the B–>A version. There should be a way to avoid this, I just have to think about how to handle it.

This also writes the distance of the first contact instance to the output file. I’m not sure how useful this is. Since we’re just scanning down the PDB file by atom this is not necessarily the closest distance or anything; it’s just the first one we came across. Incidentally this is why I can’t just use the distances to screen out duplicates at the moment either.

So at the moment it works, after a fashion. The performance is slower than it seems like it should be; I’m probably doing something inefficient here. Also you need to do a bit of post-processing to remove the duplicate contacts. I may restructure and rewrite bits of this later in order to improve these facets.

#!/usr/bin/env python
#############################
# https://biostumblematic.wordpress.com
#############################
import re, csv, string, sys
from math import sqrt

# Read in the csv
pdbcsv = raw_input(‘What is the name of your CSV file? >> ‘)
pdbcsvfile = open(pdbcsv, ‘rb’)
pdbcsvreader = csv.reader(pdbcsvfile)
lines = []

for row in pdbcsvreader:
lines.append(row)

# Calibrate where the data is that we want
for col in range(len(lines[0])):
print col,’:’,lines[0][col]

chaincol = input(‘What column listed above contains the chain identifier? >> ‘)
coordx = input(‘What column listed above contains the first coordinate for the atom? >> ‘)

# Define where to get the rest of the data
coordy = coordx + 1
coordz = coordx + 2

# Need to verify that the residue number is always the column left of coordx
rescol = coordx – 1

# Pull out the intersubunit contacts and save them to CSV
print ‘measuring close contacts and saving to contacts.csv’

linecounter = 0
contacts = []
rescontacts = []
while linecounter < len(lines): workline = lines.pop(0) for line in lines: workchain = workline[chaincol] workres = workline[rescol] thischain = line[chaincol] thisres = line[rescol] thiscontact = workchain+workres+','+thischain+thisres workcoordx = float(workline[coordx]) workcoordy = float(workline[coordy]) workcoordz = float(workline[coordz]) thiscoordx = float(line[coordx]) thiscoordy = float(line[coordy]) thiscoordz = float(line[coordz]) # Don't check contacts within a chain if thischain == workchain: pass else: dx = workcoordx-thiscoordx dy = workcoordy-thiscoordy dz = workcoordz-thiscoordz distance = sqrt(dx*dx + dy*dy + dz*dz) # Can alter this distance to get more/less contacts if distance < 5: if thiscontact in rescontacts: pass else: rescontacts.append(thiscontact) newcontact = thiscontact+','+str(distance) print newcontact contacts.append(newcontact) lines.append(workline) linecounter += 1 # Write out the list of contacts outputfile = open('contacts.csv', 'w') for contact in contacts: outputfile.write(contact+'\n') outputfile.close() [/sourcecode]

Advertisements

Tags: , , ,

%d bloggers like this: