Skip to content

Biostumblematic

A biophysicist teaches himself how to code

In continuing my slow migration away from “office-like” tools for working with my data, I’ve been taking a look lately at matplotlib. I’ve banged together a rough script to do some simple data plotting with a bit of flexibility:

#! /usr/bin/env python
# https://biostumblematic.wordpress.com

# An interface to matplotlib

# Import modules
import csv, sys
import matplotlib.pyplot as plt
import numpy as np

# Introduce the program
print '-'*60
print 'Your data should be in CSV format, with Y-values'
print 'in odd columns and X-values in even columns.'
print 'If your file contains a header row, these will be'
print 'automatically detected'
print '-'*60

# Open the data
datafile = sys.argv[1]
f = open(datafile, 'r')

# Check to see if the file starts with headers or data:
dialect = csv.Sniffer().has_header(f.read(1024))
f.seek(0)
reader = csv.reader(f)

# Assign the data to series via a dict
if dialect is True:
	reader.next() # Move down a line to skip headers
else:
	pass

series_dict = {}
for row in reader:
	i = 0
	for column in row:
		i += 1
		if series_dict.has_key(i):
			try:
				series_dict[i].append(float(column))
			except ValueError:
				pass
		else:
			series_dict[i] = [float(column)]
# Plot each data series
num_cols = len(series_dict)
i = 1 
while i < num_cols:
	plt.plot(series_dict&#91;i&#93;, series_dict&#91;i+1&#93;, 'o')
	i += 2 

# Get axis labels
xaxis_label = raw_input('X-axis label > ')
yaxis_label = raw_input('Y-axis label > ')

# Show the plot
plt.ylabel(yaxis_label)
plt.xlabel(xaxis_label)
plt.show()

# Enter loop for customizing appearance

# Stop
f.close()

As-is this will read in a CSV file of any number of columns and plot them as Y values/X values (alternating).

Some things that feel nasty:

  • Having to use the dictionaries to get the column data assembled. I feel like the CSV reader module should have a “transpose” function
  • The section near the end where I’m generating the different plots by iterating over the number of columns.

Some things that would be nice to implement, but I haven’t figured out yet:

  • More differentiation of the appearance for each series’ plot
  • Automatic generation of a legend using headers for the X-values from the initial file (or else requested from the user at run-time if not in the file)