Skip to content

Disaster Recovery Planning for Banks and Credit Unions - Tips from a CIO

 

Video Transcript:

Giovanni Tropeano   0:08 
Hi, I'm Giovanni Tropeano with the Summit Technology Consulting Group, and we're talking disaster recovery planning for banks and credit unions. With me today is Michael McGovern. For the past year, Michael has been the head of the virtual CIO practice at Summit Technology Group. He has 20 years of experience as a banking CIO, CTO and Information Security Officer. He now helps community banks and credit unions across the United States tackle their most difficult IT operations and data security challenges. 
Good morning, Michael, I am happy to have you with us. 

Michael McGovern   0:39 
Good morning, Gio. Thank you so much for inviting me to this podcast. 
Really looking forward to the conversation around disaster recovery. 

Giovanni Tropeano   0:47 
As am I!  Let's get into disaster recovery planning for banks and credit unions. We have a very IT dependent world today, especially in banking.  Ransomware and cyber-attacks seem to be in the news daily. In your experience, what is the top disaster recovery misconception in banking today? 

Michael McGovern   1:11 
I think the number one misconception, at least based on my 20 years of experience, is that IT [team] owns disaster recovery and business continuity. IT does play a really important role in that because they [IT] are responsible for assisting and recovering all your systems, but it really is an enterprise-wide responsibility. And it doesn't just take into consideration technology; it takes into consideration people and processes. So, it's much more than information technology and the IT people bringing-up systems. 

Giovanni Tropeano   1:46 
As a CIO, in your experience as CTO and Information Security Officer, how would you recommend going about building a disaster recovery plan? 
Where would you recommend starting? 
What would be your focus? 
Who would you involve? 
What suggestions can you provide? 

Michael McGovern   2:10 
I think first and foremost, it starts at the top. You really need to get buy-in from the senior management team as well as your Board of Directors. I think you would need that type of backing to have a successful disaster recovery plan. And truthfully, disaster recovery plans need to be a priority. It needs to always be top of mind and I think, at least in my opinion, having organizations add this to part of an individual's annual performance review is important. 

In my 20 years of experience, I've always seen disaster recovery being a fire drill. You schedule an enterprise-wide DR test and then weeks before that, now you're reaching out to the business lines to start doing things like building-out and making sure their “day-in-the-life” scripts are current because they use that scripting for testing. Back in the day we used to actually have what was called ”the business line emergency box.” and it had things in it like paper forms and pencils and rulers and typewriters and other necessary items. And we used to store that at our co-location. But those are the kind of things that I think should be always thought of and should always be top of mind. It really should be each manager’s responsibility to make sure each one of their employees understands the disaster recovery plan and what their role is in their disaster recovery plan. Because at the end of the day, if you ever have to declare a disaster, you don't want to have to think, you want to be able to just recover in a timely manner and actually be able to keep the business afloat.  So those things are really important. 

I think making disaster recovery ”fun” is something that's really important. There are a lot of different ways of going about doing that. There are disaster recovery games. A lot of organizations have table-top exercises.  I think you need to have those enterprise-wide as well as within your departments. And if you don't want to go out and buy a tool to do that disaster recovery testing, (I should say gamify it), CISA actually has on their website some examples of table-top exercises that you can actually use within your organization. 

I know in IT we use many different acronyms. :) 

CISA stands for Cyber Security and Infrastructure Security Agency, so you can look them up on the Internet and go to their website to be able to look at their table-top exercises.
((FS/ISAC disaster recovery (CAPS) exercises – All banks and credit unions should be members of this organization, Financial Services and Information Sharing and Analysis Center. ))

Another thing that I think is important for you to do is not have technology drive your disaster recovery and business continuity plan. 
 
You want to do what's called a “business impact analysis.” What that is really taking a look and evaluating all of the potential risks that might happen within your organization. You just brought one up a few minutes ago; ransomware. But there are other types of risks that you need to take into consideration.

Being here on the East Coast, we have hurricanes. We have Nor'easters, so you must look at every type of potential risks that are out there, risk rate them and make sure that you have a disaster recovery plan in order to protect you from those risks. 

And lastly, I would recommend if you don't have technology in place today to do this, use a simple Excel spreadsheet to do your business impact analysis. 

There are technologies out there that incorporate people, processes and technology, and there are cloud-based platforms that allow you to build that out within your organization. 

Once you do build out your business impact analysis, then you can look to see if the technologies that you have in place can meet your risk appetite.  

A few other metrics and acronyms I want throw out there: 

  • RTO - Recovery time objective (RTO) is the amount of time a business has to restore its operations to an acceptable level after a disaster in order to avoid continued business interruptions and intolerable data loss. 
  • RPO – Recovery point objective (RPO) is your goal for the maximum amount of data the organization can tolerate losing.  This is where I find organizations struggle because they do not know how to recover from data loss. 
  • MAD – Maximum Allowable Downtime (MAD) The amount of time mission/business process can be disrupted without causing significant harm to the organization's mission. 
  • MTD – Maximum Tolerable Downtime (MTD) is the time allowed before the entire business becomes at risk (e.g. the company may go out of business) 

But most importantly, you need to test, test, test. I worked for a credit union not too long ago that only did one enterprise test a year. We ended up going from 1 enterprise test a year to over 100, which included mini disasters and major disaster scenarios 

A mini-disaster (they happen much more than major disasters, thankfully) is when you lose an application, or you lose a vendor network and being able to understand the impact that those have on the business and how you need to recover from those particular mini-disasters. 
We did a lot of different types of testing over the years. 
We did surprise testing. 
We did scheduled testing. 
Most importantly, making sure that all employees within your department understand your department's disaster recovery plan because at the end of the day, if you only have one or two people that understand that plan and those individuals are not available during the disaster, you're going to have a hard time being able to do your day-to-day business activity.  

I did this a lot in the 20 years of working in information technology. Every one of my IT personnel understood the DR Plan and they were all involved in the disaster recovery testing. May it be the enterprise test or all the mini disasters tests that we did throughout the year. 

Giovanni Tropeano   8:47 
It's like muscle memory, right? The more you test. 

Can you share a little bit more about that? What does a dry run look like? 
You mentioned you went from one in a year to 100 in a year. 

How often do you recommend the baseline that that a test should be done? Could you shed some light on that? 

Michael McGovern   9:06 
It really depends on the organization and the organization’s understanding of how much time and energy it takes to do all these different tests.  That's really where we start talking again about having that buy-in from senior management and the Board of Directors to make sure that each department has the time to do those tests.  So there's no silver bullet there, to be honest with you. 

But I think, at least from the IT perspective, having the ability to or at least knowing that everybody on my team understood how to run certain tests and felt comfortable running those tests, it was more important for me to have that done; not only doing the testing, but also doing the cross training on a day to day basis. Because there are so many IT operational technologies that are in place today, not everyone has the opportunity to use them on a day-to-day basis and feel comfortable using them.    We ended up doing a lot of cross training within the IT team and having the team run through the day-in-the-life scripts during these DR tests so they could feel comfortable using the technology and recovering from the technology. 

 Giovanni Tropeano   10:19 
Thanks for shedding light on that. The last  question I have for you, Michael. In your opinion, having a disaster recovery plan and a business continuity plan; would you consider that just like an insurance policy? 

Michael McGovern   10:35 
I do, but just like a cyber-security insurance policy, you want to be able to have that in the event that you have a cyber security event and DR is the same way. You hope you never have to use it, but in the event that you do, you want to be able to be prepared and you want to be able to execute so that you can keep that particular organization that you're working for up and running and operational, at least being able to recover. 
 

And I have a great story around that. We used a third party organization that did a lot for us from a disaster recovery business continuity perspective. We actually used them for our co-location environment, so they provided us with power, cooling and security along with datacenter space so that we could put some of our own on-prem equipment on-site there. That's when on-prem was cost effective and so that was one of the services that they provided us. 

They provided us seating, so if we lost our operations center or another major location, we could take people from those locations and move them to this co-location, which had seats. computers and phones. They also allowed us to do “quick ships,” so if we needed to declare a disaster, we were able to actually quick ship laptops, computers and phones to any location. 

We actually enacted that particular service when COVID hit because where I worked, we didn't really have a remote or tele-work program. Some people had laptops, but most employees did not. So, we enacted that plan. We ended up having laptops shipped to our main facility. We then had those laptops updated with our desktop image, trained people how to use VPN technology and then sent them home. 

We were able to go from a “work-in-the-office" organization to a” work-at-home" organization in a three-week period. 
That was a piece of the overall plan that we had with this particular organization, but we needed to enact it and use it. 
Lastly, they actually provided mobile trailers. Those mobile trailers could actually be retrofitted into a retail branch, so if we lost a retail branch, we would be able to actually drop ship one of these particular trailers have it retrofitted to being a retail branch within a 24-to-48-hour period. 

So one day I wanted to actually bring the Board of Directors and the senior management team to this particular location because I wanted to show them what the money was being spent on. 
We actually ended up having a meeting scheduled with some of the senior management team of the third-party vendor along with the senior management team and the Board of Directors of the credit union. 
We were all sitting in this mobile trailer and one of the board members needed to use the facilities. 
So I ended up escorting the board member over to the building where the facilities were located and as we were walking over there, he literally said to me,  

“Geez, this is a really nice complex. I really like the idea of all the different products and services that this organization was providing us, but it's really expensive. This is just like having an insurance policy.” 

And I said to him, “Yeah, you're right, but if you have to enact it, at least we have a plan in place and we can actually execute on that plan.” 

So, I ended up taking him into the facilities. We came back. We're sitting in front of the senior management team and the board and I actually started off the presentation by saying, “Hey, we had a whole game plan here today, guys, but we've actually had a change that game plan because of the fact that there is an organization that is a global contact center that just declared its second disaster in six weeks and that particular global contact center is actually moving here to the third party location and setting up shop.” 

Now, this particular organization, since they're a global contact center, telecommunications is really important to them. Lo and behold, whatever they were doing in that location, where their contact center was housed, they had telecommunication issues two times in six weeks. 

So yes, it is an insurance policy. 

Yes, it's sometimes looked at as being an expensive one, but if this particular global contact center didn't have a solution, they would have gone under. They would no longer exist today, so I think it's really important for boards to understand that DR is really important. Yes, it's going to cost us some dollars to do it, but in the long run, if we ever have to use it, we have a plan in place and we know we can execute that plan. 

The other thing too is I want to make a recommendation to everybody that's listening to this. There are government programs out there today that you should be aware of. One is called CEAS and that actually stands for Corporate Emergency Access System. What that does is allow you to declare certain people within your organization mission critical. 

So in the event that you have a disaster at one of your facilities, in the event that there is a NorEaster that is happening and you need somebody to get into a facility to do something really important for the bank or the credit union,  what's given to that particular individual is what's called the CEAS card and it has the individual's picture and name, who they work for and they can actually be on the road and or be at a location. 
 
At the end of the day, even though you have that particular card, it's really up to the emergency response people, may it be Police or Fire, that will either allow you to go into a facility or not. It becomes a liability, but it also gives you the ability to have some of your key personnel go into certain locations in the event that you need to get important documents or whatever they might need to do within the organization at that point in time. 

There's also a couple other programs: 

One is called WPS (Wireless Priority Services). That is actually a service that allows you to register your corporate mobile devices and having higher priorities so that in the event that that there's a telecommunication issue and the phone lines are jammed, it actually allows you to have higher priority than somebody that is not registered with WPS. 

There's another program called GETS (Government Emergency Telecommunications Services). It's the same idea, but this is with wired phone technology.  If you are working from home and there are telecommunication issues within your region, you end up getting what's called a GETS card which has your own unique code, along with an identification number that you must enter prior to making a phone call. 

And there's a third one, which has a lot of requirements. It's called TSP (Telecommunications Service Priority) and that provides the ability for your organization to get higher priority in the event that you have total destruction in your telecommunication lines, and you need to get those lines up as quickly as possible. I think one of the requirements there is you must do like a billion (don’t hold me to this) but billions of dollars in ACH and wire transactions in order to have the ability to actually even sign up for TSP. 

So again, go to the Homeland Security, FBI, CEAS and the CISA websites for more information. 
The CISA website does have information about WPS, GETS and TSP, just find out what's out there that is free of charge that you can implement within your organization. It gives you one step up on anybody else that has not been signed up for these services. 

 Giovanni Tropeano   19:35 
Really insightful, Michael. These have been great jump off points for anyone looking for guidance and details to bulletproof their plans so this has been really helpful. 

Any other thoughts that you want to leave us with? 

Michael McGovern   19:54 
Test, test, test! I think testing is really important within organizations because you want to make sure it's second nature to your team to be able to recover. 

And then I think, Gio, one of the things I know you’ll end up posting this on social media. Once it's posted, I'll actually add some additional links so that you can get to the CEAS page and some of the other pages that I think are really critical for people that are just starting out and doing some disaster recovery or they want to build their disaster recovery plan with more detail and more capability. 

And then also always reach out to me on LinkedIn. 
I'm happy to have those conversations with not only banking and credit union individuals, but I think building a DR plan is important for every business vertical that's out there. I appreciate your time and I appreciate having the conversation. 

Giovanni Tropeano   20:47 
We appreciate your time and your insights, Michael, super helpful. 
We hope those of you that are listening or reading this, we hope you found it helpful and insightful. 
You can follow Summit Technology Group on LinkedIn and subscribe for updates, new intel and articles.

We will see you next time. Enjoy your day! 

2024 Banking IT Trends - Leveraging the Cloud

Leave a Comment: