My Way - IT at the Friedrich Miescher Institute
Biomedical research specialist the Friedrich Miescher Institute needed a strategy to manage the data deluge that was increasing by at least 100Gb a year. Part of the answer was to throw the problem back at the users who are generating it all, head of informatics Dean Flanders tells E&T.
Engineering & Technology (E&T): Could you explain how this project came about?
Dean Flanders (DF): Years ago I sketched out a piece of paper with my perfect [storage] solution. Over time I saw different products, but didn't find anything that matched my needs. In the end I picked a solution, but with the money put aside and the purchase orders ready to sign, my boss asked me if it would work as a long-term solution. It wouldn't, no: it was a dead-end in the sense of that we were taking the traditional route of dealing with bigger fileservers by replacing them every few years.
E&T: So back to the drawing board?
DF: We came across storage vendor COPAN as another research institute had looked into its MAID storage solution. It was really cool, but we had to think about how it would work, how it could be integrated. Then we found SAN FS, which I had heard a lot of 'supercomputer centres' were using, and after some research I discovered that together they would create the perfect solution to our storage problems; but what really clinched the deal was that we found one of the world experts on SAN FS was based in Germany, and would help us implement the solution. It was a no brainer.
E&T: So this is something you've wanted to do for quite a few years?
DF: When I came into the job in 2000, I was replacing things with 75Gb drives, and now we're putting in Terabyte drives. Every year we've gone up 100Gb on a base drive, we're almost doubling capacity. This growth issue has really been a thorn in our side, because it doesn't go away. We kept putting in spot solutions, but I really wanted to find a long-term scalable solution, something we wouldn't have to replace every few years.
E&T: If the growth issue doesn't go away, then how can you truly address it?
DF: It may sound a bit tacky, but the goal was to have an infinitely scalable file system. People may say "My file system scales to Petabytes", or whatever. Technically their file system will scale, sure; but what about cooling, power, recovery, back-up? How do you deal with all these things?
E&T: How did you deal with all these things?
DF: Basically, we had to go with a solution that solves as many problems at one time as possible. With SAN FS and COPAN, we said let's just go in, and build this all into the solution from the beginning - and then we really have one throat to choke, just SAN FS and COPAN to point a finger at when we have a problem, as opposed to piecing things together and then trying to find out what 'solution' went wrong.
So that was the goal - meaning we killed back-ups, archive, power, heating, cooling issues. I could easily toss in another 80 to 160Tb, with no redesign of the system. That was the goal, and without question, I think we achieved it.
E&T: Life sounds a lot easier…
DF: Oh yes: I don't lose sleep on it anymore. I'm not trying to think how I can find more Terabytes to back-up things. We got lucky with how everything came together, the whole approach was a bit more innovative than we thought, but it's great.
E&T: The Institute has to deal with huge amounts of data compared to most other industries. What is it like to work in this kind of environment?
DF: Well, the biggest issue for us isn't the back-end now, we have that sorted. What we're doing now is involving the business - we make [storage] their problem. The old way was to make it IT's problem. It was a bottomless pit of storage problems, as the analysts - we call them technology platforms - would keep dumping things on you. Now we turn it round to them.
It's not about saying, "Oh, that will cost you x, y, or z", it's just to say, "Look, if you need more storage, here's your graph, and here's your utilisation - you go figure it out". What's amazing is you find out how creative they get. They start realising how much data they're using, and find ways to lower it.
E&T: So the point is, to deal with lots of data, push it back to the user?
DF: Not in a mean way, but we say "Great, we can technically do it, whatever you want, but now you tell us, this is your problem… What data is really important, what can be archived, what can be thrown away?".
E&T: But, realistically, how much can things change when you get in a single piece of new laboratory equipment?
DF: I've learnt not to plan too much in advance. I deal with a lot of IT guys in Switzerland, and they all get really excited about something a year in advance, but I'm like, no matter what you plan now, it will all change when that equipment hits the door.
You should be sure to design a scalable infrastructure, so that whatever they throw at you, you can deal with because whatever they say now will be different later. They'll say the margin of error, "This thing only requires 500Gb"; well, when it hits you it's going to be 1.5Tb, or vice versa.
You design something that's for 1.5Tb, and it's only 250Gb. So the most important thing in dealing with this is flexibility. Don't get too wound up in the details until they basically get the instrument in the door.
Work with them and see what the real needs are. For example, we put a 10Gb backbone into our building now, so I don't have to worry about where the instruments are located. We have infinitely scalable file systems. The most important thing to deal with that scenario is flexibility.
E&T: So how would you charaterise your role in all this?
DF: I would say psychoanalyst! The thing that I'm doing really is empowering the technology platforms to tell me what they need, and actually they're getting more and more IT savvy.
For example, one guy comes to me and says he needs 2Tb, and I say "You liaise with this guy who knows your area better than I do, from the analysis side, and you two come back from lunch and tell me what you need". They come back from lunch and they need 200Gb. He goes, "Oh, well, so-and-so used a really bad file format, and we can reduce 90 per cent of that data".
E&T: So it's really not an IT problem anymore - it's a people problem.
DF: We can solve any problem, but it's just talking to people about what the true needs are. You have to really listen to them, and you have to read their minds a little bit. You have to ask what they're trying to achieve - and talk to them gently.
Name: Dean Flanders.
Employer: Friedrich Miescher Institute (FMI) for biomedical research (pictured).
Job title: Head of informatics.
Reason for interest: The FMI has achieved its long-term goal of creating an infinitely scaleable file system by selecting a disk-based Virtual Tape Library from COPAN Systems, alongside a SAN FS product to create its own bespoke storage solution.
Background briefing: Located in Basel, Switzerland, the FMI has a strong record of innovation in the molecular biology of disease. Researching, developing, testing and delivering these medical breakthroughs requires generating, analysing, and retaining huge quantities of data.
Project imperatives: FMI sought a solution for backing-up and restoring multi-terabyte file systems while coping with limited power, cooling, and space resources. A traditional HSM system was out of the question since the high volume and file size of FMI's life science data meant a tape system would present slow access rates, data integrity issues, and no online access.
Bigger picture: A large proportion of FMI's life science data is generated from microscopy, but new projects led the FMI to seek a storage solution to support a wider range of data. A single new piece of laboratory equipment can radically alter the organisation's storage needs. For example, two new Illumina Genome Analysers are each capable of producing up to 2Tb of data per week.
Vision: Dean Flanders' IT goal is to work with the analysts - known as technology platforms - to enable them to determine their needs, and liaise with the IT team to get a properly-scaled solution. He works with the people, rather than the technology.
What's in a name?
The Friedrich Miescher Institute (FMI) is named after Basel-born biologist Johannes Friedrich Miescher (1844-1895). Miescher isolated various phosphate-rich chemicals, which he called nuclein (now nucleic acids), from the nuclei of white blood cells in 1869 at Felix Hoppe-Seyler's laboratory at the University of Tübingen, Germany, paving the way for the identification of DNA as the carrier of inheritance. The significance of the discovery, first published in 1871, was not at first apparent, and it was Albrecht Kossel who made the initial inquiries into its chemical structure.