Wednesday, January 13, 2010

Gold Standard Records

Starting on a new project always brings a certain level of excitement. Often what is a "new" project for a trainee is actually a continuation or a spin-off of a predecessor's work. If you're lucky, the predecessor is still around to give you all the highlights, provide protocols, show you the results and where to find the reagents, etc. If you're not so lucky, then your excitement can soon turn to frustration as you wade through stacks of notebooks and data files, trying to figure out what exactly your predecessor did and where s/he stored key reagents. Sometimes you learn that the "representative" results presented to the PI were actually the best results, a handful out of a virtual mountain of data.

This is just one situation that illustrates the importance of fastidious data management in research laboratories, an issue that might be one of the biggest weaknesses of academic research labs.

Back to basics
Let's start with the lab book. This is where most of us (and I include myself here) need to go back to the first day of gen chem. Anything that goes into the book should be legible and coherent. And we should be writing down everything--well, at least everything pertinent to the experiment (your successors don't really need to know what you had for breakfast or how hungover you are). This includes:
  • why you're doing the experiment (a.k.a. the objective)
  • the experimental setup and procedure including pesky things like recording concentrations of reagents, volumes for injections, the solvent or buffer used for dilutions, instrument and settings used... You catch my drift.
  • raw data (or reference to its location)
  • locations of data files, including physical location, directory, folder, file names
  • how data was processed
  • final results (i.e. the pretty graph or table)
  • conclusions and/or notes for future experiments
Writing all this can become extraordinarily tedious, especially when we're doing similar experiments on a weekly or even daily basis. In some cases, it is sufficient to reference a page in the lab book where the protocol was first described, making note of alterations. Alternatively write up the standard protocol in a word processing document, make appropriate changes for a given experiment, and print and paste it into the lab book. If something changes during the course of an experiments, make a note of it. It doesn't really matter what approach we use, so long as we are being thorough. There should be sufficient detail for someone to repeat the experiment without ever talking to us.

We should also be writing in the book as we work, whenever possible. Too often, we place faith in our memory or our complex system of notes on post-its, paper towels, and gloves. We become slack in maintaining our books, updating them every few days, or maybe even once a week... or less. Then as we're updating our books, we realize we're a little fuzzy on the details... or that we mistakenly tossed that glove in the trash because we thought it was rubbish... so we end up guessing or trying to back-calculate how much of X we added. Not good.

Finally, don't forget to index it! Those wonderfully detailed, coherent notes won't do anyone much good if they can't find it. Chances are, you don't need me to tell you how much of a PITA it is to dig through years of data and notebooks with no idea where you should be looking.

Data in the digital age
The thing about gen chem, at least when I took it, it was beautifully simplistic. I think there was maybe one lab in the entire year that used a probe connected to a computer. The same goes for every chemistry and most biology lab courses that I took as an undergrad. It was simple enough to put everything in a notebook then. As we advance to higher level research, though, the game changes. There's proteomics, FACS, real-time intravital imaging, and a myriad of other techniques that generate massive amounts of data. While working on this post, I was collecting about 5 GB of data... for a one replicate in one group of one experiment. Raw data from such experiments do not lend themselves to hard copy production. They only exist in the digital world. So we must be as fastidious in organizing and maintaining digital records as we are in maintaining our lab books.

Backup plan
I think we have lived in the digital age long enough to realize that sometimes computers die, and despite IT's best efforts, cannot be resuscitated. This is why we should be backing up all of our data files on a regular basis. Both Bear's and Guru's labs keep external hard drives around for this purpose. Some labs may have access to network storage through their institutes. Generally space is fairly limited, but this is fine, if you're not generating gigabytes of data on a daily basis.

When it comes to backups, though, one thing we don't think about so much is our physical lab books and data. However, there is the possibility of fire or flood in the lab destroying our research records. Or they might just sort of wander off. I have yet to see a lab that uses duplicator notebooks or that photocopies or scans notebook pages, but it's probably not a bad idea. Lab books, after all, are the primary record of everything that's been done in the lab.

Safeguard
A peculiarity of data management is that many PIs don't talk about it. In my graduate and postdoc labs, on my first day, someone showed me where the new notebooks were kept. That was it. When I left my graduate lab, I just told the lab manager where my lab books were stored. It seems PIs assume that scientists--whether students or postdocs or research associates--know how to fill out a lab book and keep data organized. Perhaps PIs anticipate that the lab manager or other colleagues will provide direction as necessary. Of course, because this is a day-to-day task, it is not feasible or reasonable for a PI to constantly check lab books. And some people won't make long-term change without constant reminders.

So what's a PI to do? How is s/he to monitory and maintain the integrity of data and records without randomly inspecting lab books? Does anyone actually do the "understood and witnessed by" thing outside of industry?

Guru is a fan of seeing all data--the good, the bad, the ugly, the inconclusive... He periodically meets with individuals to discuss projects and experiments. As a trainee, it's a necessity to bring your notebook to these meetings because Guru might ask you about results from days, weeks, or months ago. In so doing, Guru sees our notebooks. This could offer a solution. Yet I have encountered some of the same problems locating information from previous trainees.

Some might argue (rightfully) that PIs have better things to do and shouldn't bother. Trivial as it is, proper data management is a crux for an efficient and productive laboratory. Researchers must be vigilant in keeping good records, but PIs should ensure that records are clear and consistent.