DUKE ITAC - August 31, 2006 Minutes
Aug. 31, 2006
Members present : John Board, Shailesh Chandrasekharan, Tammy Closs, Wayne Miller for Dick Danner, Brian Eder, Audrey Ellerbee, Nevin Fouts, Susan Gerbeth-Jones, Michael Gettes, Michael Goodman, Daron Gunn; Billy Herndon, Bob Newlin for David Jamieson-Drake, Julian Lombardi, Roger Loyd, Dan Murphy, Kyle Johnson, George Oberlander, Lynne O’Brien, Mike Pickett, Molly Tamarkin, Christopher Timmins, Trey Turner III, Tom Wall, Robert Wolpert
Guests: Rick Hoyle, Psychology and Neuroscience; Jay Senerchia, Procurement; Feri Zsuppan, A&SIST; Stephen Galla, Fuqua; Kevin Davis, OIT; Elliott Wolf, Duke Student Government; Ginny Cake, OIT; Pat Driver, OIT
Start time : 4:07
I. Review of Minutes and Announcements:
Introduction of everyone present, welcoming of new members.
- Tammy Closs – On Monday the operating system for DukeMail had a critical outage. We did not lose any email, but we had significant delays. We were caught up by 5:30 or 6 p.m. The vendor already was aware of the problem and had developed patches. We’re confident we have it under control and we will be applying the patches ourselves
II. Storage Backup and Recovery Updates -- George Oberlander
George Oberlander – The backup group came about as a result of a Futures Forum which dealt with highly visible failures. It’s meant to give a foundation so you can look and evaluate your own situation and come up with a rational scheme for how to approach managing your data security and recovery. I also wanted to mention the team, which includes Roberta Benton, Pat Driver, Stephen Galla from Fuqua, Kyle Johnson, Bill Wesson, Dr. Yunliang Yu from math, and Feri Zsuppan from A&S.
[George refers to handout] We’re trying to provide a way that you can create a rational backup and recovery system for yourself, so you need to understand the factors that affect backup and recovery decisions, understand the needs for staff and the tradeoffs. You’ll have to make some pretty difficult tradeoff decisions, most likely.
We found that there were really three factors that became foremost in determining backup and recovery needs: the criticality of the data, sensitivity, and the amount of time you could allow in order to restore it.
Criticality is easy to determine. It either is critical or it isn’t. The problem is, you have to look at your whole infrastructure. I would venture to say that nobody in this room knows all the data you have out there, which means there’s a problem. So criticality can be hard to determine.
The restore and retention time are major factors in your choice of technology. If you have 500 GB and you have to get that restored in half an hour, you have to have the right technology. Retention time bears its ugly head. You start handling a lot of data and someone says you have to restore it three years back, so you’ve got to deal with a lot of cost.
One thing we found is that backup systems when they fail, often fail silently. The only way you’re likely to discover this is with a pretty aggressive testing program. Part of the reason for the silence is that backup is something that goes on five or seven days a week, 52 weeks a year. It occurs so often, so it has a high chance of failure. But it’s mind-numbing to check the data. Are you going to notice an error message that looks sort of like a success message?
[Onto page 2 of the handout] We get into management responsibilities and system administrator responsibilities. At the Futures Forum, one issue that came out was that it wasn’t clear who was responsible for doing what in some cases. The tech folks were determining the criticality of the data and trying to make financial decisions as well. We felt that it’s management’s responsibility to assess the value of data. Unfortunately that means management has to get involved in what is out there.
There’s also the issue of user education. There needs to be a shared responsibility between systems staff and management to make sure users understand that data needs to be protected. In many cases you don’t want them putting data in places that won’t get backed up. Both management and systems staff need to exert some pressure there.
A few other things: It seems obvious, but backup media shouldn’t be stored in the same place as the protected hardware, with the possible exception of a fireproof safe – although if there’s a fire you might not have access to the fireproof safe. A lot of people are now moving from disk to disk for backup instead of disk to tape, which is good. But if you need a longer retention time, you most likely will need tape.
Then there’s the knotty issue of home computers. There isn’t a simple solution to make sure data is protected. There are three options: 1) don’t put data on a home computer, use remote access into the servers; 2) back up your data yourself; or 3) use one of the new Web-based backup services. Number 1 involves the fewest risks at a modest cost. Number 2 requires technical knowledge many people don’t have. With number 3 we’re not sure about confidentiality issues and there’s the question of who monitors the contracts. Amazon, Google, there’s a whole spectrum of services. They bring up a whole host of management issues. When you sign the contract it may look great, but the services could change in three or six months.
[George refers to the second handout] We wanted to develop a questionnaire for asking if data is critical. When we designed the questions, it became obvious that if you’re looking at a piece of data you should know. The real problem is knowing what data is out there. The same with sensitivity. You have to think about it in all its contexts. The most obvious context may not be sensitive, but others could be.
Recovery time will be the major determinant of technology choices. What we attempted to do is give [in a matrix in the handout] guidance on what kind of issues you should be looking at. [George explains the matrix.] We recommend dual data systems if you can do it, so failure of backup isn’t an issue. For noncritical data it’s nonsensical to say you need it recovered quickly. If slow recovery is an option, you could go to any form. You could go inexpensive.
We will be presenting this at CLAC and CLIF as well. I gave the handouts because for me, when I have something in my hand I tend to pay more attention to it. We’re looking for comments. Please send me email with any thoughts you may have.
Kevin Davis – I was asked to give an update and an introduction on a collaborative project on trying to understand the community’s needs for backup. The service team brings in participants – not necessarily directors – from all the organizations that are involved in delivering a service, and thinks about it from a customer’s perspective. Not from a technologist’s perspective. Who are the customers? What do they need? What’s the value of this service for them?
We’ve found that there is a range of needs on campus. [Kevin refers to a handout] This sheet lists 13 gaps found so far, and I want to emphasize “so far.” We’ve only been at the process for a month, so we haven’t reached out to everyone. We’re looking for opportunities to hear more.
What the service team is trying to accomplish is not necessarily to list everything OIT or Duke is committed to providing, but what would be provided in an ideal world. This list shows what we think people would want if, two years down the line, they looked for services. We’re trying to understand needs from the technologists’ and the customers’ perspectives. From the customer side asking, is this picture right. What’s on here that isn’t really a service need on the campus and what’s not on here that should be here? On the technology side, we could use the list as a starting point for what service gaps we could fill and what the cost would be.
The general themes of what we’ve heard – there’s been four major things. The first is for individuals to have access to personal storage space, whether that’s improved AFS or access to 100 GB for files that could be shared with others. Another thing is personal computer insecurity. People realize they’re dragging personal information around with them. How do we do more to protect what’s on these devices? Also, schools and departmental units see themselves as home for data storage, but they also see a need for a larger infrastructure to help them with storage, maybe allow them to increase their capacity or offload some data. Finally, there’s an interest in more infrastructure for more backup and data recovery abilities.
We’re not looking at this in terms of funds or if OIT would be the source. It’s just an exercise to understand what the unmet needs are and what are the costs and priorities to figure out how to address some of them.
So we are seeing a need for larger network space for individuals; collaboration capabilities for individuals; a digital media commons; areas that are easily searchable; PC backup; the opportunity for researchers to maintain their own servers for personal desires or grant-funded obligations, and how they can back those servers up.
Other services or gaps are things we hear from schools or departments. They say we need someone to provide infrastructure, which includes storage; options for backup and recovery services including standby requirements in the event of a major disaster. Also we see a small need for existing customers who use file storage and backup; an ongoing need for people using digital media, who have increased collaboration but no infrastructure to allow people to move between those media.
So the big ideas are to help make sure we have the right picture, and to have other conversations about feasibility and cost.
John Board – Do you have a sense of where the economy of scale comes in with backup? There’ve been diseconomies of scale in recent years, which have encouraged disunity of backup.
Feri Zsuppan – It’s a very complicated and complex issue. There are new emerging technologies that will help classify data, which will help in better preparing the infrastructure and backup needs. That will help answer the scalability question.
George – It has to play into recovery time, if recovery time is such that they can do it and you have to apportion some degree of backup to disk. One of the failures we have is because the backup scheme is so complicated it gets out of sync.
Molly Tamarkin – Backup technology isn’t changing as fast as consumer technology.
Shailesh Chandrasekharan – If a student comes to Duke, what is the storage available to that student today?
Kevin – 70 MB to the student. Grad students may have more. But people aren’t just working on Word documents anymore. Even if they’re following a personal backup strategy, you can’t do that with that amount of storage. They have up to 2 GB for email.
Kyle Johnson – So the best strategy is to email yourself everything.
Feri – We could do a better job of trying to communicate options for storage and backup at Duke. I have a page from Stanford that lists the resources.
Robert Wolpert – Should we be making this problem harder by grossly expanding what we offer for students?
Daron Gunn – 70 MB is not a functional space for me personally. I need lots of space for engineering. I carry a 4 GB USB key and use that. If you offered more space on AFS, I would venture to say that many people might use that as an alternative.
Elliott Wolf – AFS is useful but a lot of people don’t know how to access or use it. It would be easier if it gets integrated into other things students are doing. We’re doing what we can to publicize how to use it. If it were used it would be positive.
John Board – Let me ask students, does some commercial model get used?
Daron – Something with a convenient Web-browser interface. Being able to go to a Web browser with a simple login and password is crucial. It’s friendly and familiar to people who aren’t too comfortable with computers.
[Comment] – The Web-based services are very attractive to users, so people will be tempted to use them. But the user-level agreements are not very attractive.
John – But we could make our own agreement.
Feri – Yes, the sooner the better would be good. That would help with the home computer storage as well.
John – Any comments from the grad students?
Audrey Ellerbee – I’m in engineering, too, so I tend to have more storage needs than most grad students. I haven’t taken the time to learn Novell.
Mike Pickett – So easy, easy, easy and reliable is what we need.
Audrey – I also keep getting email saying Novell is down, so that doesn’t build my confidence.
Kevin – People need lots of ways to access data and trust it.
Daron – Also, student groups need to be able to get more space easily. Now students need to go to professor, the professor has to email OIT and give permission for more space. Then OIT will give more space. That’s stopping people from doing really creative things, even things that aren’t associated with a class.
Audrey – To what extent are these things available when you’re not on campus?
John – One of best things about Web stuff is, it’s available everywhere.
Molly – Like we talked about in our last meeting, in the absence of a Web interface, a wiki can be that, and we want to think about that.
Elliott – Has there been discussion about making ways for people to share files with each other and share information? When you’re dealing with 70 MB, you can’t do much with that. When you have 10 people working on the same project, you don’t want to have too many versions going around.
[Question] -- What about shared email?
Elliott – I’ve never heard about that.
[Michael Gettes explains shared email.] Michael Gettes – Clearly email is one way of sending files. It’s probably not the optimal way, but I think from this discussion we need to provide better ways of doing that.Feri – I just wanted to say that I enjoyed watching the operation of whole effort. I see the discussion first among a tech group, which may not end by just handing out report, but could go on and have a continuous effort of being the guardian of the effort. I’m hopeful that the IT@duke forum will give us the opportunity to keep going on recovery and backup.
III. PCs and Laptops, End of Cycle Process – Kevin Davis
John Board – People have heard about the new program to get Duke’s old computers out into the community, but we want to know more about the end-of-life cycle before we’re comfortable decommissioning our drill presses.
Kevin Davis – One of Jane Pleasant’s areas of operation was the Duke Surplus Store. She’s been working with Community Affairs in terms of looking at what can be done with all the computers we’re processing – I believe over 5,000 a year, most being destroyed with a drill press. A small percentage gets sold at the surplus store. Now they’re being sent to Durham Public Schools and other nonprofits. They have to go through a disk-wiping process. OIT and other groups have helped get this started.
We have an on-the-ground product called EBAN, which is one letter short of DBAN. It has a server that lets you plug in up to 1,000 computers simultaneously. It audits the hard drives and returns a message about whether the wipe succeeded or failed. Those machines are getting wiped and logged.
In the two months that Jerry Winegarden has been working on this, 972 computers have been processed, 574 have been wiped and others were deemed too old or they had deteriorated. We’ve donated 439 machines, all but three to Durham Public Schools. The machines fit into their lifecycle. They’re working on preparing the machines and getting them started. It’s a phenomenal way for Duke to contribute to the community.
Right now the serial numbers are being logged manually. Our team is working with technologists on the procurement website and EBAN to get a customized version that can scan a barcode when the machine is picked up. This will allow a single scan and get the information in the record. It also will allow a scan for when the computer leaves surplus. On exit it will be scanned again to say yes, this was wiped. Over time it will be a process owned entirely by surplus.
Jay Senerchia – The process is to call procurement and we will come pick up the computer.
Robert Wolpert – I’m happy now with what happens with it, but I’m nervous about what happens between when it leaves my premises and the time it goes through the wiping process.
Kevin – The Surplus Store is no longer outsourcing transit. It’s just Surplus Store staff.
Tracy Futhey – The goal is to have a chain of custody.
Kevin – It will be handled by Duke employees. The other piece that happens is, you still input your own system into the surplus database. It counts how many leave a site and how many arrive at the Surplus Store. If a machine drops out, there is a data trail. I think there is a much greater sense of security with using Duke staff than outsourcing. We mostly have been seeing Pentium I and Pentium II, so some part of the process is getting rid of that old stuff to allow more room to make the process more secure.
Feri – Will the process be documented so we can show it to agencies? For sensitive data, the VA or other agencies might be interested in how we are handling disposal.
Kevin – Having the process, and also lessons learned in terms of what we’ve done, to get a level of confidence.
John – So you have confidence in EBAN and DBAN?
Kevin – There is confidence.
John – So they’re wiped when they leave. Are we installing Windows XP? That’s very labor-intensive.
Kevin – No. Durham schools can take the computers we give them and get them up and running.
Daron – That brings up another question. If a student’s machine goes to Computer Repair and they find that the hard drive is bad, what happens with that hard drive?
Robert – Who knows what’s on that disk and who’s working there?
John – There used to be a fee for disposing of a computer. Well, I’m much reassured from what I’ve heard, but I wanted to hear that level of detail.