<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Biostumblematic</title>
	<atom:link href="http://biostumblematic.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://biostumblematic.wordpress.com</link>
	<description>A biophysicist teaches himself how to code</description>
	<lastBuildDate>Thu, 26 May 2011 05:16:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='biostumblematic.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Biostumblematic</title>
		<link>http://biostumblematic.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://biostumblematic.wordpress.com/osd.xml" title="Biostumblematic" />
	<atom:link rel='hub' href='http://biostumblematic.wordpress.com/?pushpress=hub'/>
		<item>
		<title>More on data management, with reference to some comments by John Wilbanks</title>
		<link>http://biostumblematic.wordpress.com/2010/02/11/more-on-data-management-with-reference-to-some-comments-by-john-wilbanks/</link>
		<comments>http://biostumblematic.wordpress.com/2010/02/11/more-on-data-management-with-reference-to-some-comments-by-john-wilbanks/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 20:42:08 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[data management]]></category>
		<category><![CDATA[open science]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=132</guid>
		<description><![CDATA[Although he blogs almost as rarely as I do, John Wilbanks (VP of Science Commons) tends to inspire me with many of the things he writes. Back at the end of 2009, he had a few posts on why the Open Source metaphor doesn&#8217;t work well when talking about science. While he&#8217;s speaking in this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=132&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Although he blogs almost as rarely as I do, <a href="http://sciencecommons.org/about/whoweare/wilbanks/">John Wilbanks</a> (VP of Science Commons) tends to inspire me with many of the things he writes.</p>
<p>Back at the end of 2009, he had a few posts on why the Open Source metaphor doesn&#8217;t work well when talking about science. While he&#8217;s speaking in this case more generally about science as a whole, his comments reflect directly on my <a href="http://biostumblematic.wordpress.com/2010/02/09/data-management-the-key-to-open-science/">post from yesterday</a> on data management. I wanted to summarize a few of his key points and my thoughts on them.</p>
<p>Before I do so, however, I&#8217;ll put in another plug for the <a href="http://sciencecommons.org/events/salon/">Science Commons Symposium</a>, taking place on February 20th in Seattle. John Wilbanks will be there, along with a host of other strong voices interested in knowledge sharing. It should be a great event. If you can&#8217;t make it in person, it will be streamed live at <a href="http://chris.pirillo.com/live">http://chris.pirillo.com/live</a>.</p>
<p>If you&#8217;re interested in reading his posts in their entirety, you can find them in parts <a href="http://scienceblogs.com/commonknowledge/2009/10/open_source_science_or_distrib.php">1</a>, <a href="http://scienceblogs.com/commonknowledge/2009/11/distributed_science_part_2.php">2</a>, &amp; <a href="http://scienceblogs.com/commonknowledge/2009/12/in_which_we_continue_to_push_t.php">3</a>. In order to stick to a more continuous story, I&#8217;ll just be pulling quotes at random out of all three of John&#8217;s posts.</p>
<p>Several of the comments here yesterday pointed out some specific LIMS projects that have been started. I can see why (given how tightly I focus on a LIMS at the end of my post) people would latch onto this idea, but what I really had in mind was something more like the following:</p>
<blockquote><p>We need the biological equivalent of the C compiler, of Emacs [...] These tools need to be democratized to bring the beginning of distributed knowledge creation into labs, with the efficiencies we know from eBay and Amazon</p></blockquote>
<p>Because of the complex and variable nature of &#8220;DATA&#8221; being generated in science labs, I think making one LIMS to rule them all would be nearly impossible. What I&#8217;d rather see are some tools that are accessible to the average bench scientist which can be easily modified and expanded upon by the technically gifted scientist. These tools would (if they are to be truly useful) automate some annotation/tagging/parsing of the data as a precursor to deposition in shared repositories such as:</p>
<blockquote><p>[<a href="http://openwetware.org/wiki/Main_Page">OpenWetWare</a> and the <a href="http://partsregistry.org/Main_Page">Registry of Standard Biological Parts</a>] are resources and toolchains that absolutely support distribution of capability and increase capacity, which are fundamental to early-stage distributed innovation.</p></blockquote>
<p>Above the meat-space layer where the science is actually being done and data is being collected, we need decentralized places to store and share the &#8220;functional information units&#8221; &#8211; i.e. the data that other scientists can use. Unfortunately:</p>
<blockquote><p>science is like writing code in the 1950s &#8211; if you didn&#8217;t work at a research institution then, you probably couldn&#8217;t write code, and if you did, you were stuck with punch cards. Science is in the punch cards stage, and punch cards aren&#8217;t so easy to turn into GNU/Linux.</p></blockquote>
<p>I think John stretches the metaphor a bit here, but I see where he is going. The punch card above has more to do with the controlling influence of the institution than it has to do with the day-to-day practice of science. The key point is that there are interests who will put up a resistance to a more free distribution of scientific knowledge, for a variety of reasons.</p>
<p>He goes on to summarize his argument:</p>
<blockquote><p>I propose that the point of this isn&#8217;t to replicate &#8220;open source&#8221; as we know it in software. The point is to create the essential foundations for distributed science so that it can emerge in a form that is locally relevant and globally impactful</p></blockquote>
<p>and</p>
<blockquote><p>it&#8217;s not something that&#8217;s enabled by an open source license, a code version repository, and other hallmarks of open source software. It&#8217;s users saying, &#8220;screw this, I can do better&#8221; &#8211; and doing it. It&#8217;s users who know the problem best and design the best solutions.</p></blockquote>
<p>I couldn&#8217;t agree more, and I think this is what we&#8217;re seeing from the blog posts and conversations that are taking place. There are a subset of people who are doing science or who are avidly interested in aiding the practice of science who feel like they can do better than the current system. These people (probably most people reading this blog, especially if you&#8217;ve gotten this far) are the ones who have to effect change. It will take more than writing and talking about it, although these are important as well. I&#8217;d like to also see a nascent, community-driven project which we can point to and say &#8220;it will be like this, but better&#8221;.</p>
<p>One final word from John:</p>
<blockquote><p>Data and databases are another place where the underlying property regimes don&#8217;t work as well for open source as in software. But that&#8217;s difficult enough to merit its own post. Suffice to say if Open Data had a facebook page, its relationship status with the law would be &#8220;It&#8217;s Complicated.&#8221;</p></blockquote>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/132/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=132&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2010/02/11/more-on-data-management-with-reference-to-some-comments-by-john-wilbanks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>Data management: the key to open science?</title>
		<link>http://biostumblematic.wordpress.com/2010/02/09/data-management-the-key-to-open-science/</link>
		<comments>http://biostumblematic.wordpress.com/2010/02/09/data-management-the-key-to-open-science/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 15:09:48 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[data management]]></category>
		<category><![CDATA[open science]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=129</guid>
		<description><![CDATA[I&#8217;ve been thinking a bit more about open science lately, given the outside chance that I&#8217;ll be able to attend the upcoming Science Commons symposium in Seattle. It&#8217;s a topic that I&#8217;ve unfortunately pushed to the back burner a bit while I&#8217;ve been getting settled in my post-doc. Again I&#8217;ve been trying to decide what [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=129&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking a bit more about open science lately, given the outside chance that I&#8217;ll be able to attend the upcoming <a href="http://sciencecommons.org/events/salon/">Science Commons symposium in Seattle</a>. It&#8217;s a topic that I&#8217;ve unfortunately pushed to the back burner a bit while I&#8217;ve been getting settled in my post-doc.</p>
<p>Again I&#8217;ve been trying to decide what I think is the key issue for developing a culture of sharing with scientific data. At the moment I feel like the main problem is data management. What I mean here is that labs have a hard time keeping track of their data <em>internally</em>, let alone &#8220;preparing&#8221; it for broader release.</p>
<p>For example, in my lab we are generating a relatively small amount of DATA (easily quantifiable files, like results of instrument runs); on the order of a 1GB/month. Even though this is probably about average for a science lab, it&#8217;s surprisingly difficult to keep organized and readily accessible. This is because it&#8217;s being produced by several largely independent students on distinct projects. In addition, the tools we have for analyzing this data are clunky, prone to crashes, and using them is an exercise in caveats and &#8220;magic numbers&#8221;. Combining and parsing data across multiple experiments is a major operation.</p>
<p>I&#8217;d like to point out a couple of key points here. Firstly, this is actually a <em>better</em> situation than other labs I&#8217;ve been in. At least here there are some common repositories, in the form of a few spreadsheets saved on common-use computers, from which one can find pointers to the raw data files. Secondly, I think this example illuminates the type of ad-hoc system in place for many academic labs. I think there is a desire in many cases to implement a better system, but not really the drive, dedication, and resources that are required to implement one with the tools that are available.</p>
<p>Perhaps we can take a lesson from industry, where data management has financial and legal ramifications. Although my experience in this environment is somewhat limited, I believe that the difference is largely a matter of resources. Industrial labs might have access to a Technical Information Manager on staff and/or use a Laboratory Information Management System (LIMS). Why haven&#8217;t either of these taken hold in academia?</p>
<p>One issue is the separation between IT and scientists in many departments. Often the IT department is lightly staffed, and spends a large portion of their time doing desktop support for individual users (cleaning viruses, updating software, etc). When possible, they may be able to implement some larger projects like deploying a server, managing a common datastore, or things of this nature. The key is that almost all of these activities are more or less completely decoupled from the actual science. They are IT issues, and are handled by the IT folks. Meanwhile, the professors (or more often their students) are generating and analyzing data on the infrastructure that IT has provided. Again, this is decoupled from IT. They use the computers, and when the computers break they call IT. The issue here is that there is no guidance on good practices in data management. It&#8217;s an area that falls between the cracks, and is often only addressed as an afterthought or following a major computer failure. Individual professors don&#8217;t have the resources (or workload) to hire a full time technical information manager to fill this gap, and this isn&#8217;t a position that I&#8217;ve ever seen at a departmental level in academia.</p>
<p>The other option is to use a software system which can automate the data management. The term for this software &#8220;LIMS&#8221;, has been tarnished by an abundance of clunky, overpriced, closed-source products developed at fly-by-night software houses. I&#8217;m sure not all LIMS producers fall under this umbrella, but an unfortunate number do. So what would a good LIMS look like? I think there are just a few simple criteria:</p>
<ul>
<li>It has to be <strong>simple &amp; flexible</strong>. Getting your data into the LIMS needs to be <em>easier</em> than <em>not</em> doing it. Students are incredibly busy, and will resist anything that involves extra work.</li>
<li>It has to be <strong>open source</strong>, to leverage the power of the community. No development team can anticipate the needs of every lab (or even department), so an easily-extensible core with freely available code is the only way to encourage widespread adoption and contribution.</li>
<li>It has to be <strong>trustworthy</strong>. The data store has to be rock-solid, and backups need to be bulletproof. This data is the highly valuable output of labs, and no one will touch a system that has a whiff of instability.</li>
</ul>
<p>I think these can all be accomplished. Many open-source projects have already found acceptance, such as the <a href="http://www.open-bio.org/wiki/Main_Page">Open Bioinformatics</a> member projects, <a href="http://sourceforge.net/scm/?type=svn&amp;group_id=4546">PyMol</a>, and many others. One key will be developing a package that can be deployed on existing hardware (i.e. as close to a standard LAMP stack as possible), to ease the burden on the IT people who will need to do the on-site support. A web-based tool will also help with ease of use: if a student can include their data from their own laptop at the coffee shop, it&#8217;s a lot more likely to happen then if they need to fight for time on a certain cluttered common-use machine in the lab.</p>
<p>This type of tool would aid in the larger studies that many open science proponents are interested in. How great would it be if you wanted to do a meta-study from the published results of several labs, and all it took to have the data in a consistent format was a simple MySQL statement (or, if the software is coded properly, a couple of button clicks)? What if when you were reviewing a paper for publication you could quickly get all of the source data, again in a format that is immediately accessible and able to be parsed? What if, as a professor, all the data collected by your summer undergraduate from 4 years back was available with a few clicks? It&#8217;s possible. It will take a bit of work by a few intelligent people, but the payoff would be worth it many times over.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/129/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/129/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/129/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=129&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2010/02/09/data-management-the-key-to-open-science/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>A couple of short scripts for dealing with MASCOT results</title>
		<link>http://biostumblematic.wordpress.com/2010/01/05/a-couple-of-short-scripts-for-dealing-with-mascot-results/</link>
		<comments>http://biostumblematic.wordpress.com/2010/01/05/a-couple-of-short-scripts-for-dealing-with-mascot-results/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 18:49:03 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Data parsing]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=123</guid>
		<description><![CDATA[I&#8217;ve started doing some Mass Spec, and one of the issues we have is parsing our results. The basic workflow is to convert the raw instrument files into peak lists, then use MASCOT to identify the proteins present in the sample. Unfortunately the MASCOT results themselves can be a bit tedious to work with. I&#8217;ve [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=123&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started doing some Mass Spec, and one of the issues we have is parsing our results.</p>
<p>The basic workflow is to convert the raw instrument files into peak lists, then use MASCOT to identify the proteins present in the sample. Unfortunately the MASCOT results themselves can be a bit tedious to work with.</p>
<p>I&#8217;ve rapidly written a couple of scripts to speed up some common things that I need to do, namely to subtract the proteins in a control experiment from those identified in a sample, and to compare two lists of hits.</p>
<p>Here they are for your enjoyment. I&#8217;m trying to get back to a better organization of my scripts, so these are once again available on <a href="http://github.com/jwinget/Biochem-Scripts/tree/master/MassSpec">GitHub</a> as well.<br />
<span id="more-123"></span><br />
Control_subtractor.py<br />
<pre class="brush: python;">
#! /usr/bin/env python

import sys, csv

def help():
	print '='*20
	print 'Subtracts MASCOT hits of control MS from sample.'
	print '='*20
	print 'To use:'
	print '-'*10
	print 'Export your data in CSV format from MASCOT'
	print 'Invoke the program, followed by the two file names.'
	print 'The file with your control data should be first'
	print '-'*10
	print 'e.g.: Control_subtractor control.csv sample.csv'
	print '-'*10
	print 'This will print the list to the console. If you'
	print 'would like to save the list, cat it to a new file'
	print '-'*10
	print 'e.g.: Control_subtractor c.csv s.csv &gt; hits.txt'
	print '-'*10
	return

def subtractor():
	control_file = open(sys.argv[1])
	sample_file = open(sys.argv[2])
	
	control_reader = csv.reader(control_file)

	control_hits = []
	
	i = 0
	for row in control_reader:
		i += 1

		#Skip the first 65 lines, header info
		if i &lt; 65:
			pass
		elif row[1] == '':
			pass
		elif row[1] in control_hits:
			pass
		else:
			control_hits.append(row[1])
	control_file.close()

	print control_hits

	sample_reader = csv.reader(sample_file)
	
	sample_hits = []

	i = 0
	for row in sample_reader:
		i += 1
		if i &lt; 65:
			pass
		elif row[1] == '':
			pass
		elif row[1] in control_hits:
			pass
		else:
			sample_hits.append(row[1])
	sample_file.close()

	for hit in sample_hits:
		print hit
	return

if sys.argv[1] == '-h':
	help()
elif sys.argv[1] == '--help':
	help()
else:
	subtractor()
</pre></p>
<p>Hit_list_compare.py<br />
<pre class="brush: python;">
#!/usr/bin/env python

import sys, string

def help():
	print '='*20
	print 'Compares two lists of hits'
	print '='*20
	print 'To use:'
	print '-'*10
	print 'Generate two lists of IPI identifiers'
	print 'Files should have one identifier per line'
	print 'This can be the output of Control_subtractor'
	print 'Invoke the program, followed by the two file names.'
	print '-'*10
	print 'e.g.: ./Hit_list_compare.py list1.txt list2.txt'
	print '-'*10
	print 'This will print the comparison to the console. If you'
	print 'would like to save the comparison, cat it to a new file'
	print '-'*10
	print 'e.g.: ./Hit_list_compare.py 1.txt 2.txt &gt; compare.txt'
	print '-'*10
	return

def compare():
	list1 = open(sys.argv[1], 'r')
	list2 = open(sys.argv[2], 'r')

	list1_name = string.rstrip(sys.argv[1], '.txt')
	list2_name = string.rstrip(sys.argv[1], '.txt')

	list1_list = []
	list2_list = []

	for line in list1:
		list1_list.append(line)
	for line in list2:
		list2_list.append(line)

	matches = []
	list1_uniques = []
	list2_uniques = []

	for item in list1_list:
		if item in list2_list:
			matches.append(item)
		else:
			list1_uniques.append(item)
	
	for item in list2_list:
		if item in list1_list:
			pass
		else:
			list2_uniques.append(item)

	print '='*20
	print 'MATCHES'
	print '='*20
	for item in matches:
		print item
	print '='*20
	print list1_name+' UNIQUES'
	print '='*20
	for item in list1_uniques:
		print item
	print '='*20
	print list2_name+' UNIQUES'
	print '='*20
	for item in list2_uniques:
		print item

if sys.argv[1] == '-h':
	help()
elif sys.argv[1] == '--help':
	help()
else:
	compare()
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/123/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=123&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2010/01/05/a-couple-of-short-scripts-for-dealing-with-mascot-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>Manuscript enthusiasm</title>
		<link>http://biostumblematic.wordpress.com/2009/12/05/manuscript-enthusiasm/</link>
		<comments>http://biostumblematic.wordpress.com/2009/12/05/manuscript-enthusiasm/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 08:20:12 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/2009/12/05/manuscript-enthusiasm/</guid>
		<description><![CDATA[<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=122&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="attachment_121" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/12/so_sick_of_edits.png"><img src="http://biostumblematic.files.wordpress.com/2009/12/so_sick_of_edits.png?w=300&#038;h=160" alt="" title="so_sick_of_edits" width="300" height="160" class="size-medium wp-image-121" /></a><p class="wp-caption-text">Yeah, I'm procrastinating.</p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/122/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/122/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/122/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=122&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/12/05/manuscript-enthusiasm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/12/so_sick_of_edits.png?w=300" medium="image">
			<media:title type="html">so_sick_of_edits</media:title>
		</media:content>
	</item>
		<item>
		<title>Rendering proteins in PyMol</title>
		<link>http://biostumblematic.wordpress.com/2009/12/02/rendering-proteins-in-pymol/</link>
		<comments>http://biostumblematic.wordpress.com/2009/12/02/rendering-proteins-in-pymol/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 18:05:21 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=113</guid>
		<description><![CDATA[One of my favorite things to do is to try new ways to visually represent (in 2D) a complex three-dimensional protein structure. It&#8217;s an interesting challenge because it&#8217;s all too easy to end up with a protein backbone that looks like a pile of spaghetti from which limited useful information can be drawn. For some [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=113&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of my favorite things to do is to try new ways to visually represent (in 2D) a complex three-dimensional protein structure. It&#8217;s an interesting challenge because it&#8217;s all too easy to end up with a protein backbone that looks like a pile of spaghetti from which limited useful information can be drawn.</p>
<p>For some time my go-to package of choice was <a href="http://www.ks.uiuc.edu/Research/vmd/">VMD</a>. Specifically I really appreciated the ambient occlusion lighting effects that have been incorporated into its Tachyon ray tracer, which can generate really stunning figures.<br />
<div class="wp-caption aligncenter" style="width: 594px"><img alt="" src="http://www.ks.uiuc.edu/Gallery/Science/Structure/ribosome_ao_small_st.jpg" title="Ribosome rendered with VMD" width="584" height="522" /><p class="wp-caption-text">Ribosome rendered with VMD</p></div></p>
<p>Another &#8220;rendering mode&#8221; that I&#8217;m a big fan of are the illustrations of David Goodsell, most familiar from the PDB <a href="http://www.rcsb.org/pdb/motm.do">Molecule of the Month feature</a>.<br />
<div class="wp-caption aligncenter" style="width: 358px"><img alt="" src="http://www.rcsb.org/pdb/education_discussion/molecule_of_the_month/images/1i6h-composite.gif" title="RNA polymerase by David Goodsell" width="348" height="512" /><p class="wp-caption-text">RNA polymerase by David Goodsell</p></div><br />
Although these illustrations are often low in information content, they have a simplistic beauty that really appeals to me.</p>
<p>So recently I have been searching for a way to combine these two methods in a way that I can further modify to fit my needs. VMD itself has updated recently to include &#8220;outline rendering&#8221; in a manner similar to the Goodsell illustration style, but unfortunately this requires graphics hardware that I don&#8217;t have on my main machine. I&#8217;m also not a fan of VMD from the manipulation standpoint. Atom selections, accurate rotations (e.g. <i>exactly</i> 180 degrees) etc. are complex operations in this software package.</p>
<p>I&#8217;ve gone back to using another wonderful visualization package, <a href="http://pymol.org/">PyMol</a>. I find that it hits the sweet spot between easy setup of the scene I&#8217;d like and generating nice figures.</p>
<p>The specific feature that I&#8217;ve come to rely on quite heavily is the <a href="http://pymolwiki.org/index.php/Ray">built-in ray tracer</a>. There are three available ray tracing modes in addition to the default, each of which has its uses. Mode 1 will place a black outline around your structure, which can help make the secondary structure elements visually distinct. Mode 2 is really interesting, in that it <i>only</i> renders the outline. I find this especially helpful if I want to show something in an overlay without obscuring what is behind it. Mode 3 produces &#8220;quantized&#8221; color in addition to the outline, giving your figure a very cartoonish appearance. I find that this one has to be used with care <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Anyhow, let&#8217;s make a few figures just for kicks. As usual I&#8217;ll be using my favorite protein structure, alpha hemolysin (PDB code 7AHL). You can load files directly from the PDB using the built-in PDB Loader plugin of pymol. For this demo I&#8217;m rendering all but one chain as a gray cartoon, and rendering the last as a blue molecular surface. I also turn of specular reflections (Display -&gt; Specular Reflections) because I don&#8217;t like them</p>
<p>Here are the commands I enter on the command line to generate the image:<br />
<code><br />
bg_color white<br />
set antialias, 2<br />
ray 600, 600<br />
</code><br />
This took about 5 minutes to render on my underpowered laptop. You write out the image with (e.g.):<br />
<code><br />
png mode_0.png<br />
</code><br />
And gives this result:<br />
<div id="attachment_115" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/12/mode_0.png" rel="lightbox"><img src="http://biostumblematic.files.wordpress.com/2009/12/mode_0.png?w=300&#038;h=300" alt="" title="mode_0" width="300" height="300" class="size-medium wp-image-115" /></a><p class="wp-caption-text">Default ray tracing mode</p></div></p>
<p>Now let&#8217;s look at the other fun modes <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Just enter set ray_trace_mode, 1 into the command line and repeat the ray tracing and png saving steps above. Iterate through the three modes and you end up with the following figures (click for larger versions):<br />
<div id="attachment_114" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/12/mode_1.png" rel="lightbox"><img src="http://biostumblematic.files.wordpress.com/2009/12/mode_1.png?w=300&#038;h=300" alt="" title="mode_1" width="300" height="300" class="size-medium wp-image-114" /></a><p class="wp-caption-text">Ray tracing mode 1</p></div><br />
<div id="attachment_116" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/12/mode_2.png" rel="lightbox"><img src="http://biostumblematic.files.wordpress.com/2009/12/mode_2.png?w=300&#038;h=300" alt="" title="mode_2" width="300" height="300" class="size-medium wp-image-116" /></a><p class="wp-caption-text">Ray tracing mode 2</p></div><br />
<div id="attachment_118" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/12/mode_3.png" rel="lightbox"><img src="http://biostumblematic.files.wordpress.com/2009/12/mode_3.png?w=300&#038;h=300" alt="" title="mode_3" width="300" height="300" class="size-medium wp-image-118" /></a><p class="wp-caption-text">Ray tracing mode 3</p></div><br />
You can see that each of these has a different look, which may or may not be useful depending on the figure you are trying to produce. I&#8217;m finding that it&#8217;s especially useful to do a couple of renders (e.g. one in mode 2 and another in mode 1) and combine them via a little bit of post-processing in the GIMP.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=113&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/12/02/rendering-proteins-in-pymol/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>

		<media:content url="http://www.ks.uiuc.edu/Gallery/Science/Structure/ribosome_ao_small_st.jpg" medium="image">
			<media:title type="html">Ribosome rendered with VMD</media:title>
		</media:content>

		<media:content url="http://www.rcsb.org/pdb/education_discussion/molecule_of_the_month/images/1i6h-composite.gif" medium="image">
			<media:title type="html">RNA polymerase by David Goodsell</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/12/mode_0.png?w=300" medium="image">
			<media:title type="html">mode_0</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/12/mode_1.png?w=300" medium="image">
			<media:title type="html">mode_1</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/12/mode_2.png?w=300" medium="image">
			<media:title type="html">mode_2</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/12/mode_3.png?w=300" medium="image">
			<media:title type="html">mode_3</media:title>
		</media:content>
	</item>
		<item>
		<title>Plotting data with matplotlib</title>
		<link>http://biostumblematic.wordpress.com/2009/08/28/plotting-data-with-matplotlib/</link>
		<comments>http://biostumblematic.wordpress.com/2009/08/28/plotting-data-with-matplotlib/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 19:53:55 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=103</guid>
		<description><![CDATA[In continuing my slow migration away from &#8220;office-like&#8221; tools for working with my data, I&#8217;ve been taking a look lately at matplotlib. I&#8217;ve banged together a rough script to do some simple data plotting with a bit of flexibility: As-is this will read in a CSV file of any number of columns and plot them [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=103&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In continuing my slow migration away from &#8220;office-like&#8221; tools for working with my data, I&#8217;ve been taking a look lately at <a href="http://matplotlib.sourceforge.net/">matplotlib</a>.  I&#8217;ve banged together a rough script to do some simple data plotting with a bit of flexibility:<br />
<pre class="brush: python;">
#! /usr/bin/env python
# http://biostumblematic.wordpress.com

# An interface to matplotlib

# Import modules
import csv, sys
import matplotlib.pyplot as plt
import numpy as np

# Introduce the program
print '-'*60
print 'Your data should be in CSV format, with Y-values'
print 'in odd columns and X-values in even columns.'
print 'If your file contains a header row, these will be'
print 'automatically detected'
print '-'*60

# Open the data
datafile = sys.argv[1]
f = open(datafile, 'r')

# Check to see if the file starts with headers or data:
dialect = csv.Sniffer().has_header(f.read(1024))
f.seek(0)
reader = csv.reader(f)

# Assign the data to series via a dict
if dialect is True:
	reader.next() # Move down a line to skip headers
else:
	pass

series_dict = {}
for row in reader:
	i = 0
	for column in row:
		i += 1
		if series_dict.has_key(i):
			try:
				series_dict[i].append(float(column))
			except ValueError:
				pass
		else:
			series_dict[i] = [float(column)]
# Plot each data series
num_cols = len(series_dict)
i = 1 
while i &lt; num_cols:
	plt.plot(series_dict[i], series_dict[i+1], 'o')
	i += 2 

# Get axis labels
xaxis_label = raw_input('X-axis label &gt; ')
yaxis_label = raw_input('Y-axis label &gt; ')

# Show the plot
plt.ylabel(yaxis_label)
plt.xlabel(xaxis_label)
plt.show()

# Enter loop for customizing appearance

# Stop
f.close()
</pre></p>
<p>As-is this will read in a CSV file of any number of columns and plot them as Y values/X values (alternating).</p>
<p>Some things that feel nasty:</p>
<ul>
<li>Having to use the dictionaries to get the column data assembled.  I feel like the CSV reader module should have a &#8220;transpose&#8221; function</li>
<li>The section near the end where I&#8217;m generating the different plots by iterating over the number of columns.</li>
</ul>
<p>Some things that would be nice to implement, but I haven&#8217;t figured out yet:</p>
<ul>
<li>More differentiation of the appearance for each series&#8217; plot</li>
<li>Automatic generation of a legend using headers for the X-values from the initial file (or else requested from the user at run-time if not in the file)</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/103/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=103&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/08/28/plotting-data-with-matplotlib/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>Using conky + python to display twitter feeds</title>
		<link>http://biostumblematic.wordpress.com/2009/07/01/using-conky-python-to-display-twitter-feeds/</link>
		<comments>http://biostumblematic.wordpress.com/2009/07/01/using-conky-python-to-display-twitter-feeds/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 18:28:03 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=99</guid>
		<description><![CDATA[No posts for a while because I haven&#8217;t actually been writing anything new. Biopython has solved many of my day-to-day problems, and I&#8217;m in love with SeqIO. Today is Canada Day and it&#8217;s pretty quiet around the lab, so I thought I&#8217;d try to write something that would let me do two things: First, I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=99&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>No posts for a while because I haven&#8217;t actually been writing anything new.  Biopython has solved many of my day-to-day problems, and I&#8217;m in love with SeqIO.</p>
<p>Today is Canada Day and it&#8217;s pretty quiet around the lab, so I thought I&#8217;d try to write something that would let me do two things:<br />
First, I want to be able to view my twitter feed using conky, and secondly I&#8217;d like to be able to send updates from the console.</p>
<p>This also gives me the chance to work on some fundamentals like interfacing with APIs and passing options to scripts from the command line.  There, I totally justified it!</p>
<p>To start off, I installed python-setuptools (from the Ubuntu repo), <a href="http://pypi.python.org/pypi/simplejson">simplejson</a> and the <a href="http://code.google.com/p/python-twitter/">python-twitter</a> API interface.  To install the last two you just download the archives, extract them, and then run the following two commands from within their folders:</p>
<pre>python setup.py build
sudo python setup.py install</pre>
<p>Let&#8217;s start off with a pretty basic framework.  This script should print out your latest five friends&#8217; updates to the terminal, and the struts are in place to eventually add post capability:<br />
<pre class="brush: python;">
#! /usr/bin/env python
# http://biostumblematic.wordpress.com
# Simple twitter interface

# Change the following two lines with your credentials
user = 'username'
pw = 'password'

num_statuses = 5 # Changes number of statuses to show

import sys, twitter
api = twitter.Api(username=user, password=pw)

if sys.argv[1] == '-l':
    timeline = api.GetFriendsTimeline(user)
    i=0
    while i &lt; num_statuses:
        print timeline[i].user.name
        print timeline[i].text
        print '\n'
        i+=1
        
elif sys.argv[1] == '-p':
    pass
    
else:
    print 'Invalid input'
    print 'Allowed options are:'
    print '-p (to post an update)'
    print '-l (to list friend statuses)'
    sys.exit(2)
</pre></p>
<p>Adding the update functionality is facile.  Just change the code as follows:<br />
<pre class="brush: python;">
elif sys.argv[1] == '-p':
    status = api.PostUpdate(sys.argv[2])
    print 'Twitter status updated'
</pre><br />
The only caveat in doing it this way is that the status update entered at the command line must be passed as a string, with quotation marks around it.  Otherwise this will post a one word update, which is terse even by twitter standards.</p>
<p>This is already fully functional in the terminal, so the last step is dumping out the statuses to a file which conky can read.  Here&#8217;s the code I used to make the text file:<br />
<pre class="brush: python;">
if sys.argv[1] == '-l':
    timeline = api.GetFriendsTimeline(user)
    i=0
    output = open(os.environ['HOME']+'/tweets.txt', 'w')  
    while i &amp;lt; num_statuses:
        output.write(timeline[i].user.name+&amp;#39;\n&amp;#39;)
        output.write(timeline[i].text+&amp;#39;\n&amp;#39;)
        output.write(&amp;#39;\n&amp;#39;)
        i+=1
    output.close()
</pre></p>
<p>We then have conky read it using a file like this (I named in .conkytweets and placed it in my home directory, make sure to change your home directory below):</p>
<pre>use_xft yes
xftfont MyriadPro-Regular:size=8
alignment top_left
xftalpha 0.8
own_window yes
own_window_type override
own_window_transparent yes
own_window_hints undecorated,below,sticky,skip_taskbar,skip_pager
double_buffer yes
draw_shades no
draw_outline no
draw_borders no
stippled_borders 10
border_margin 4
border_width 0
default_shade_color black
default_outline_color black
use_spacer right
no_buffers no
uppercase no
default_color 222222
maximum_width 200
minimum_size 200 5
gap_y 400
gap_x 10
text_buffer_size 1024

TEXT
${font size=9}Latest Tweets:
${color}${font}${execi 600 cat /home/jason/tweets.txt | fold -w 35}</pre>
<p>Here is a screenshot of the output on my monitor (sorry for the blur over the tasks, there are some research details in there I just didn&#8217;t feel like posting for the whole world atm)<br />
<div id="attachment_100" class="wp-caption aligncenter" style="width: 310px"><a href="http://biostumblematic.files.wordpress.com/2009/07/conkytweets.png"><img src="http://biostumblematic.files.wordpress.com/2009/07/conkytweets.png?w=300&#038;h=180" alt="demo of conkytweets script" title="conkytweets" width="300" height="180" class="size-medium wp-image-100" /></a><p class="wp-caption-text">demo of conkytweets script</p></div></p>
<p>There are a few things that don&#8217;t work very well.  To my knowledge, you can&#8217;t include clickable links in conky, so URLs in tweets don&#8217;t do anything.  The textwrap in conky is also a bit wonky, but I don&#8217;t know that there is a nice fix for that.  I suppose one option would be to modify the text file that the twitter script generates, but I&#8217;ll leave that as an exercise to the reader.</p>
<p>The simplest way to use this is to add a link to your path.  For me it was:</p>
<pre>cd /usr/bin/
sudo ln -s ~/scripts/pytwit.py pytwit
</pre>
<p>Then you can use it from anywhere with either:</p>
<pre>pytwit -p 'My awesome twitter post'</pre>
<p> or
<pre>pytwit -l</pre>
<p>For extra points, you can add the listing to your crontab as follows:</p>
<pre>sudo gedit /etc/crontab
*/15 *	* * *	jason pytwit -l</pre>
<p>This will update the statuses every 15 minutes.</p>
<p>The full script is <a href="http://github.com/jwinget/Utility-scripts/tree/master">available on github</a>, and I welcome any additions/modifications/improvements as always.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/99/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/99/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=99&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/07/01/using-conky-python-to-display-twitter-feeds/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>

		<media:content url="http://biostumblematic.files.wordpress.com/2009/07/conkytweets.png?w=300" medium="image">
			<media:title type="html">conkytweets</media:title>
		</media:content>
	</item>
		<item>
		<title>Measuring identities of aligned protein sequences with BioPython</title>
		<link>http://biostumblematic.wordpress.com/2009/06/15/measuring-identities-of-aligned-protein-sequences-with-biopython/</link>
		<comments>http://biostumblematic.wordpress.com/2009/06/15/measuring-identities-of-aligned-protein-sequences-with-biopython/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 16:06:33 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=95</guid>
		<description><![CDATA[For some reason it seems that every program which will output a percentage of the identity between two proteins will also align them itself &#8211; therefore screwing up any alignment which you&#8217;ve already made. I knocked up a short script over the weekend which will read in a FASTA-formatted alignment and output the percent identity [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=95&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For some reason it seems that every program which will output a percentage of the identity between two proteins will also align them itself &#8211; therefore screwing up any alignment which you&#8217;ve already made.  I knocked up a short script over the weekend which will read in a FASTA-formatted alignment and output the percent identity of all of the proteins in it to the first one in the file.</p>
<p>I couldn&#8217;t find a built-in way to do this all in BioPython, but I did use it to parse the seqences out of the alignment.  The rest of the work is just brute force string crunching.</p>
<p><pre class="brush: python;">
#!/usr/bin/env python
# http://biostumblematic.wordpress.com

import string
from Bio import AlignIO

# change input.fasta to match your alignment
input_handle = open(&quot;input.fasta&quot;, &quot;rU&quot;)
alignment = AlignIO.read(input_handle, &quot;fasta&quot;)

j=0 # counts positions in first sequence
i=0 # counts identity hits 
for record in alignment:
    for amino_acid in record.seq:
        if amino_acid == '-':
            pass
        else:
            if amino_acid == alignment[0].seq[j]:
                i += 1
        j += 1
    j = 0
    seq = str(record.seq)
    gap_strip = seq.replace('-', '')
    percent = 100*i/len(gap_strip)
    print record.id+' '+str(percent)
    i=0
</pre><br />
I didn&#8217;t implement similarity here, but it gets the basic job done.  This script is <a href="http://github.com/jwinget/ProteinPy/tree/master">available on GitHub</a> as seqhomology.py</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/95/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=95&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/06/15/measuring-identities-of-aligned-protein-sequences-with-biopython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>A worked example in BioPython: From cDNA to protein and back again</title>
		<link>http://biostumblematic.wordpress.com/2009/06/04/a-worked-example-in-biopython-from-cdna-to-protein-and-back-again/</link>
		<comments>http://biostumblematic.wordpress.com/2009/06/04/a-worked-example-in-biopython-from-cdna-to-protein-and-back-again/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 18:43:57 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=90</guid>
		<description><![CDATA[I&#8217;m not sure if this behavior is normal, but I find that I learn a new system best by choosing a real-life problem that I need to solve and applying the new method in order to solve it. This inevitably means that I&#8217;ll probably be doing things in a non-efficient way (since I&#8217;m a noobie), [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=90&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not sure if this behavior is normal, but I find that I learn a new system best by choosing a real-life problem that I need to solve and applying the new method in order to solve it.  This inevitably means that I&#8217;ll probably be doing things in a non-efficient way (since I&#8217;m a noobie), but code can always be refined later.</p>
<p>Here is the problem I have in front of me today:  I have a series of proteins from which I&#8217;d like to isolate (via cloning) a certain domain.  The cDNA clones of the full length proteins are available from the <a href="http://image.hudsonalpha.org/">IMAGE consortium</a>.  Unfortunately these aren&#8217;t completely &#8220;clean&#8221; cDNAs; there tends to be some extraneous sequence on both ends of the gene.</p>
<p>The plan of action goes something like this:<br />
The starting materials are the cDNA sequence, the amino acid sequence of the protein, and the residue ranges of the domain of interest.  So what I&#8217;d like to do is to check each frame of the cDNA to find the one matching the translated protein sequence, then extract just the cDNA coding for the domain I&#8217;d like to isolate.  I can then (independently) design PCR primers for this domain.</p>
<p>You&#8217;re probably thinking that this could be done manually (and of course that&#8217;s true), but I find this painstaking work.  Also it gives me a chance to play around with the SeqIO functions of BioPython a bit <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Enough introduction, let&#8217;s get to work.  The protein I&#8217;ll be using for this exercise is <a href="http://www.uniprot.org/uniprot/O14920">IKBKB</a>.  This is a 756 amino acid protein; I&#8217;ll be trying to get the cDNA for the protein kinase domain from residues 15-300.  The IMAGE clone ID is 5784717.</p>
<p>Baby step 1 &#8211; find the ORF we&#8217;re interested in<br />
<pre class="brush: python;">
#! /usr/bin/env python

# http://biostumblematic.wordpress.com

# Extraction of the cDNA
# for a given protein domain

import re
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

input_cdna = raw_input('Paste your cDNA sequence &gt;&gt; ')
input_search = raw_input('What are the first amino acids of the protein? &gt;&gt; ')

cdna = Seq(input_cdna, IUPAC.unambiguous_dna)

i=0
while i &lt; 3:
    frame = cdna[i:150]
    trans = frame.translate()
    orf_find = re.search(input_search, str(trans))
    if orf_find:
        orf_frame = i+1
    else:
        pass
    i += 1
print 'The protein is coded in frame '+str(orf_frame)
</pre><br />
Given the input of the cDNA and the first 4 residues (MSWS), this outputs the right answer:</p>
<pre>The protein is coded in frame 3</pre>
<p>Note that I&#8217;m only checking the first 50 residues of the cDNA (see line 19) Hopefully this is enough to catch the protein of interest (it would be a lot of extraneous 5&#8242; sequence if not).</p>
<p>Obviously this is not enough sexy for Biopython.  It&#8217;s cumbersome to have to type in the starting sequence of the protein you&#8217;re interested in, so why don&#8217;t we let SeqIO handle that for us via the SwissProt code?  Also, we&#8217;ll change a couple of things to enable automation of the full list later.</p>
<p>First, I made a file called &#8216;test.csv&#8217; which has a single line consisting of SwissProt ID,cDNA:</p>
<pre>O14920,atagccccggg[...]</pre>
<p>Then I modified the script like so:<br />
<pre class="brush: python;">
import re, csv
from Bio import ExPASy, SeqIO
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

reader = csv.reader(open('test.csv'))
for row in reader:
    input_prot = row[0]
    get_prot = ExPASy.get_sprot_raw(input_prot)
    prot_obj = SeqIO.read(get_prot, &quot;swiss&quot;)
    get_prot.close()
    prot_seq = prot_obj.seq
    prot_start = prot_seq[0:4]

    cdna = Seq(row[1], IUPAC.unambiguous_dna)

    i=0
    while i &lt; 3:
        frame = cdna[i:150]
        trans = frame.translate()
        orf_find = re.search(str(prot_start), str(trans))
        if orf_find:
            orf_frame = i+1
        else:
            pass
        i += 1
    print 'The protein is coded in frame '+str(orf_frame)
</pre><br />
Biopython grabs the protein sequence from the web using the SwissProt ID.  The prot_start variable takes just the first few residues and uses that as the search term for the regular expression later on.  Now there is no command line input, as everything is done via the CSV file.  This will iterate over lines in the CSV file to do multiple proteins.  Right now, however, we would just get a long list of &#8220;The protein is coded in frame X&#8221; lines, which is less than useful.  Time to take care of that.</p>
<p>In this case the domain is annotated in SwissProt already.  This means that I <em>could</em> use the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html#chapter:swiss_prot">built-in parsing function</a> of BioPython to select the domain, however I have some custom annotations for other proteins in my list that make this not a good idea in this case.  Instead let&#8217;s just make some minor modifications to our input CSV and existing script.  The new CSV includes the start and stop residues of interest:</p>
<pre>O14920,15,300,atagccccgggttt[...]</pre>
<p>Now I just modify the top of the script to take into account the new structure of the CSV:<br />
<pre class="brush: python;">
reader = csv.reader(open('test.csv'))
for row in reader:
    input_prot = row[0]
    get_prot = ExPASy.get_sprot_raw(input_prot)
    prot_obj = SeqIO.read(get_prot, &quot;swiss&quot;)
    get_prot.close()
    prot_seq = prot_obj.seq
    prot_domain = prot_seq[int(row[1])-1:int(row[2])]
    cdna = Seq(row[3], IUPAC.unambiguous_dna)
</pre><br />
and adjust what happens if the script finds a match to the domain sequence:<br />
<pre class="brush: python;">
        if orf_find:
            trans_split = re.split('('+str(prot_domain)+')', str(trans))
            cdna_start = len(trans_split[0])*3
            cdna_stop = cdna_start + len(trans_split[1])*3
            cdna_extracted = frame[cdna_start:cdna_stop]
            print cdna_extracted
</pre><br />
And this gets the job done!  This prints a cDNA sequence which, when translated back, matches the domain of interest.</p>
<p>The last part there feels sloppy.  All I&#8217;m doing is counting the number of amino acids that come out of the translation before the start of the domain, then multiplying by three, and getting the cDNA start from this.  I feel like there <strike>must</strike> should be a way to transition more effectively between protein and cDNA sequence.</p>
<p>The entire script, slightly modified to write out the results to a new CSV file, <a href="http://github.com/jwinget/ProteinPy/tree/master">is available over on GitHub</a>.  I hope you found the post interesting and look forward to your comments.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/90/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=90&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/06/04/a-worked-example-in-biopython-from-cdna-to-protein-and-back-again/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
		<item>
		<title>Why the #^@% have I not done more with Biopython before now?</title>
		<link>http://biostumblematic.wordpress.com/2009/06/03/why-the-have-i-not-done-more-with-biopython-before-now/</link>
		<comments>http://biostumblematic.wordpress.com/2009/06/03/why-the-have-i-not-done-more-with-biopython-before-now/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 23:58:58 +0000</pubDate>
		<dc:creator>jwinget</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Biopython]]></category>

		<guid isPermaLink="false">http://biostumblematic.wordpress.com/?p=80</guid>
		<description><![CDATA[Yeah, I&#8217;ve heard of it. Biopython is A python module package (thanks Chris) that&#8217;s written to help with doing computational biology. To my utter dismay I somewhat ignored it, being the &#8220;ll just brew it myself&#8221; type. What a mistake. Today I was trying to wrangle some DNA and protein sequences and realized that this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=80&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Yeah, I&#8217;ve heard of it.  <a href="http://biopython.org/">Biopython</a> is A python <strike>module</strike> package (thanks Chris) that&#8217;s written to help with doing computational biology.  To my utter dismay I somewhat ignored it, being the &#8220;ll just brew it myself&#8221; type.  What a mistake.</p>
<p>Today I was trying to wrangle some DNA and protein sequences and realized that this might be something covered by Biopython.  It&#8217;s even better than that.  You want tasty yum yums?  How about a reverse complementer in 3 lines of code?  I even formatted it so it looks nice on the terminal:</p>
<pre>
from Bio.Seq import Seq
sequence = Seq(raw_input('Paste your DNA sequence &gt;&gt; '))
print '\nReverse Complement\n------------------\n'+sequence.reverse_complement()
</pre>
<p>The very next bit of code in the <a href="http://biopython.org/DIST/docs/tutorial/Tutorial.html">tutorial</a> replaced a ~50 line program I had cobbled together (and which still wasn&#8217;t working exactly the way I wanted) into this beauty:<br />
<pre class="brush: python;">
#! /usr/bin/env python

# Biopython can automatically parse FASTA
# as well as many other &quot;standard&quot; biological formats

from Bio import SeqIO
inputfile = open('myproteins_fasta.txt')

for seq_record in SeqIO.parse(inputfile, 'fasta'):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)
inputfile.close
</pre><br />
BOOM, FASTA reader.</p>
<p>I&#8217;m just getting started on reading the documentation, but so far I&#8217;m really impressed (and not a little bit sheepish at my previous obstinance).  Expect to see some Biopython examples in the coming days</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/biostumblematic.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/biostumblematic.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/biostumblematic.wordpress.com/80/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=biostumblematic.wordpress.com&amp;blog=6773967&amp;post=80&amp;subd=biostumblematic&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://biostumblematic.wordpress.com/2009/06/03/why-the-have-i-not-done-more-with-biopython-before-now/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cbedce553a7e7fda3955209db5a84858?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Jod</media:title>
		</media:content>
	</item>
	</channel>
</rss>
