Starting on a new project always brings a certain level of excitement. Often what is a "new" project for a trainee is actually a continuation or a spin-off of a predecessor's work. If you're lucky, the predecessor is still around to give you all the highlights, provide protocols, show you the results and where to find the reagents, etc. If you're not so lucky, then your excitement can soon turn to frustration as you wade through stacks of notebooks and data files, trying to figure out what exactly your predecessor did and where s/he stored key reagents. Sometimes you learn that the "representative" results presented to the PI were actually the best results, a handful out of a virtual mountain of data.
This is just one situation that illustrates the importance of fastidious data management in research laboratories, an issue that might be one of the biggest weaknesses of academic research labs.
Back to basics
Let's start with the lab book. This is where most of us (and I include myself here) need to go back to the first day of gen chem. Anything that goes into the book should be legible and coherent. And we should be writing down everything--well, at least everything pertinent to the experiment (your successors don't really need to know what you had for breakfast or how hungover you are). This includes:
We should also be writing in the book as we work, whenever possible. Too often, we place faith in our memory or our complex system of notes on post-its, paper towels, and gloves. We become slack in maintaining our books, updating them every few days, or maybe even once a week... or less. Then as we're updating our books, we realize we're a little fuzzy on the details... or that we mistakenly tossed that glove in the trash because we thought it was rubbish... so we end up guessing or trying to back-calculate how much of X we added. Not good.
Finally, don't forget to index it! Those wonderfully detailed, coherent notes won't do anyone much good if they can't find it. Chances are, you don't need me to tell you how much of a PITA it is to dig through years of data and notebooks with no idea where you should be looking.
Data in the digital age
The thing about gen chem, at least when I took it, it was beautifully simplistic. I think there was maybe one lab in the entire year that used a probe connected to a computer. The same goes for every chemistry and most biology lab courses that I took as an undergrad. It was simple enough to put everything in a notebook then. As we advance to higher level research, though, the game changes. There's proteomics, FACS, real-time intravital imaging, and a myriad of other techniques that generate massive amounts of data. While working on this post, I was collecting about 5 GB of data... for a one replicate in one group of one experiment. Raw data from such experiments do not lend themselves to hard copy production. They only exist in the digital world. So we must be as fastidious in organizing and maintaining digital records as we are in maintaining our lab books.
Backup plan
I think we have lived in the digital age long enough to realize that sometimes computers die, and despite IT's best efforts, cannot be resuscitated. This is why we should be backing up all of our data files on a regular basis. Both Bear's and Guru's labs keep external hard drives around for this purpose. Some labs may have access to network storage through their institutes. Generally space is fairly limited, but this is fine, if you're not generating gigabytes of data on a daily basis.
When it comes to backups, though, one thing we don't think about so much is our physical lab books and data. However, there is the possibility of fire or flood in the lab destroying our research records. Or they might just sort of wander off. I have yet to see a lab that uses duplicator notebooks or that photocopies or scans notebook pages, but it's probably not a bad idea. Lab books, after all, are the primary record of everything that's been done in the lab.
Safeguard
A peculiarity of data management is that many PIs don't talk about it. In my graduate and postdoc labs, on my first day, someone showed me where the new notebooks were kept. That was it. When I left my graduate lab, I just told the lab manager where my lab books were stored. It seems PIs assume that scientists--whether students or postdocs or research associates--know how to fill out a lab book and keep data organized. Perhaps PIs anticipate that the lab manager or other colleagues will provide direction as necessary. Of course, because this is a day-to-day task, it is not feasible or reasonable for a PI to constantly check lab books. And some people won't make long-term change without constant reminders.
So what's a PI to do? How is s/he to monitory and maintain the integrity of data and records without randomly inspecting lab books? Does anyone actually do the "understood and witnessed by" thing outside of industry?
Guru is a fan of seeing all data--the good, the bad, the ugly, the inconclusive... He periodically meets with individuals to discuss projects and experiments. As a trainee, it's a necessity to bring your notebook to these meetings because Guru might ask you about results from days, weeks, or months ago. In so doing, Guru sees our notebooks. This could offer a solution. Yet I have encountered some of the same problems locating information from previous trainees.
Some might argue (rightfully) that PIs have better things to do and shouldn't bother. Trivial as it is, proper data management is a crux for an efficient and productive laboratory. Researchers must be vigilant in keeping good records, but PIs should ensure that records are clear and consistent.

This is just one situation that illustrates the importance of fastidious data management in research laboratories, an issue that might be one of the biggest weaknesses of academic research labs.
Back to basics
Let's start with the lab book. This is where most of us (and I include myself here) need to go back to the first day of gen chem. Anything that goes into the book should be legible and coherent. And we should be writing down everything--well, at least everything pertinent to the experiment (your successors don't really need to know what you had for breakfast or how hungover you are). This includes:
- why you're doing the experiment (a.k.a. the objective)
- the experimental setup and procedure including pesky things like recording concentrations of reagents, volumes for injections, the solvent or buffer used for dilutions, instrument and settings used... You catch my drift.
- raw data (or reference to its location)
- locations of data files, including physical location, directory, folder, file names
- how data was processed
- final results (i.e. the pretty graph or table)
- conclusions and/or notes for future experiments
We should also be writing in the book as we work, whenever possible. Too often, we place faith in our memory or our complex system of notes on post-its, paper towels, and gloves. We become slack in maintaining our books, updating them every few days, or maybe even once a week... or less. Then as we're updating our books, we realize we're a little fuzzy on the details... or that we mistakenly tossed that glove in the trash because we thought it was rubbish... so we end up guessing or trying to back-calculate how much of X we added. Not good.
Finally, don't forget to index it! Those wonderfully detailed, coherent notes won't do anyone much good if they can't find it. Chances are, you don't need me to tell you how much of a PITA it is to dig through years of data and notebooks with no idea where you should be looking.
Data in the digital age
The thing about gen chem, at least when I took it, it was beautifully simplistic. I think there was maybe one lab in the entire year that used a probe connected to a computer. The same goes for every chemistry and most biology lab courses that I took as an undergrad. It was simple enough to put everything in a notebook then. As we advance to higher level research, though, the game changes. There's proteomics, FACS, real-time intravital imaging, and a myriad of other techniques that generate massive amounts of data. While working on this post, I was collecting about 5 GB of data... for a one replicate in one group of one experiment. Raw data from such experiments do not lend themselves to hard copy production. They only exist in the digital world. So we must be as fastidious in organizing and maintaining digital records as we are in maintaining our lab books.
Backup plan
I think we have lived in the digital age long enough to realize that sometimes computers die, and despite IT's best efforts, cannot be resuscitated. This is why we should be backing up all of our data files on a regular basis. Both Bear's and Guru's labs keep external hard drives around for this purpose. Some labs may have access to network storage through their institutes. Generally space is fairly limited, but this is fine, if you're not generating gigabytes of data on a daily basis.
When it comes to backups, though, one thing we don't think about so much is our physical lab books and data. However, there is the possibility of fire or flood in the lab destroying our research records. Or they might just sort of wander off. I have yet to see a lab that uses duplicator notebooks or that photocopies or scans notebook pages, but it's probably not a bad idea. Lab books, after all, are the primary record of everything that's been done in the lab.
Safeguard
A peculiarity of data management is that many PIs don't talk about it. In my graduate and postdoc labs, on my first day, someone showed me where the new notebooks were kept. That was it. When I left my graduate lab, I just told the lab manager where my lab books were stored. It seems PIs assume that scientists--whether students or postdocs or research associates--know how to fill out a lab book and keep data organized. Perhaps PIs anticipate that the lab manager or other colleagues will provide direction as necessary. Of course, because this is a day-to-day task, it is not feasible or reasonable for a PI to constantly check lab books. And some people won't make long-term change without constant reminders.
So what's a PI to do? How is s/he to monitory and maintain the integrity of data and records without randomly inspecting lab books? Does anyone actually do the "understood and witnessed by" thing outside of industry?
Guru is a fan of seeing all data--the good, the bad, the ugly, the inconclusive... He periodically meets with individuals to discuss projects and experiments. As a trainee, it's a necessity to bring your notebook to these meetings because Guru might ask you about results from days, weeks, or months ago. In so doing, Guru sees our notebooks. This could offer a solution. Yet I have encountered some of the same problems locating information from previous trainees.
Some might argue (rightfully) that PIs have better things to do and shouldn't bother. Trivial as it is, proper data management is a crux for an efficient and productive laboratory. Researchers must be vigilant in keeping good records, but PIs should ensure that records are clear and consistent.

Lab minion · 794 weeks ago
I'm struggling a little bit with the requirements of record keeping during "normal" benchwork, high throughput and/or data heavy experiments, and computational work/data analysis. I do some of each at different times, and the experiments often interweave. I'm not the most organized person in the world, so I have to make extra effort to keep a good standard of record keeping (and also some of my struggles might sound awfully dumb.) But so far, it's not going so great.
Benchwork is easy enough to record in a lab book in something resembling a standard fashion. But when I'm at the computer writing software, I gravitate towards electronic documentation (who wants to write out by hand a sequence of commands and results when I can just copy and paste into a text file?). But how to integrate these? Where does source control fit into the picture (another thing I haven't set up yet...) And where does that leave things like general objectives/strategies, todo lists, and notes from meetings, papers or seminars?
I'm thinking that I might have to start printing out my notes after a coding session and taping them into my lab book. But that seems like overkill. If anyone has any advice or even just statements along the lines of "well, this is what I do, and it works for me..." I would be intensely grateful
biochem belle 43p · 794 weeks ago
I know nothing about programming, but I'm getting into work where I generate several GB of data per experiment. After all the appropriate processing, I'll end up with a set of movies. The approach I'm taking is to jot down pertinent experimental details and filenames with a description specifying what was done in that file. Later I'll record how the data were processed and the corresponding files and probably include a few screen shots.
Generally, I restrict information in my lab books to design and execution of experimental work. I include some strategic planning, i.e. why and how I want to do X. For instance, when working out a retrosynthetic pathway, I sketched it out in my notebook and included references for reactions that I didn't typically use. I also jot down notes for how the next experiment could be improved or what needs to be changed. I keep a separate (non-lab) notebook for seminar notes and another for meeting notes, planning, and to-do lists; most of this is not pertinent to the lab work.
biochem belle 43p · 793 weeks ago
Tsu Dho Nimh said...
When you are collecting and tracking reams of computerized data, consider using the same systems that are used to maintain and modify software source code or web sites.
<a href="http://en.wikipedia.org/wiki/Revision_control" target="_blank">http://en.wikipedia.org/wiki/Revision_control
These systems can track changes and create branches off the main line, and handle reversion control on text documents. They can also check files into a controlled directory, for your data collection runs.
Some applications intended for web development have version control built into them. Look at Wikipedia articles' history. Automatic backups to an off-site facility, or daily burning to DVD, can be set up easily.
Good grief, join the computer age.
Comrade PhysioProf · 794 weeks ago
AHAHAHAHAHAHAHAH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Lab minion · 794 weeks ago
"There should be sufficient detail for someone to repeat the experiment without hating our guts and wishing death upon us."
biochem belle 43p · 794 weeks ago
Genomic Repairman · 794 weeks ago
biochem belle 43p · 794 weeks ago
As for the backup of lab books, I feel this is an issue that the PI should work out. As you point out, if we spend anymore time managing and backing up records, we start running out of time for experiments. I think this would ideally be under the purview of an admin or lab manager--although they might argue otherwise :P
SamanthaScientist · 794 weeks ago
biochem belle 43p · 794 weeks ago
SSci asks in one post if there is a lack of consensus for data management and organization. But I think the real issue (in academia) is a failure of PIs to establish practices and consequences. The most fastidiously maintained records I've seen among colleagues and predecessors are those who spent a few to several years in industry prior to working at a research university. If you're working in industry, there are established guidelines for good laboratory practice (GLP). GLP, esp. data management, is do-or-die for companies because of project and personnel turnover and legal requirements for patent filing. I don't whether the lack of GLP guidelines and repercussions for not following them in academic labs is due to "out of sight, out of mind" or the "I have better things to do with my time" mindset.