Skip to content

Biostumblematic

A biophysicist teaches himself how to code

This is heavily based on the PDB cleaner script I wrote about in the previous post. I wanted to be able to do stuff with the coordinates in PDB files, and due to discrepancies in how they are formatted it was a pain. I decided to start things off by writing this script, which reads in the ATOM records as comma-separated columns.

In this case I’ve discarded the superfluous “END” line that I put in the other script, since it just gets in the way. Really this one is all about learning a bit more regarding regular expressions in Python.

Using a second re.sub() command to remove the comma from the end of the line feels clunky.  This usually means I’m doing this in a bad way.

#!/usr/bin/env python

# https://biostumblematic.wordpress.com

import sys, re
print '--------'
print 'This script is designed to pull all the ATOM records out of'
print 'a PDB file and write them to a CSV file.'
print '--------'
atomrecords = []
dirtyfile = raw_input('What is the filename for the PDB? >> ')
csvfile = raw_input('What would you like to name your CSV file? >> ')
# Open the file and put the ATOM record lines into a list
inputfile = open(dirtyfile, 'r')
lines = inputfile.readlines()
# Write out the records
for line in lines:
    match = re.search('^ATOM', line)
    if match:
	line=re.sub(' +',',',line)
        # This second command seems clunky.  Must be a better method
	line = re.sub(',$','',line)
        atomrecords.append(line)
    else:
        pass

outputfile = open(csvfile, 'w')
outputfile.writelines(atomrecords)
outputfile.close()
inputfile.close()
Advertisements

Tags: , , ,

%d bloggers like this: