A company with a single point of failure is primed for disaster. Sadly, there is one in my department, and it is human.
My senior security engineer is the “go-to” person for anything to do with firewalls, the virtual private network and certain other critical aspects of our security infrastructure. There's no one else. That makes me uncomfortable, but when I have brought it to the attention of upper management, I've been told that there is just no budget for more engineers.
So, how about cross-training one or more of the network engineers to administer the firewalls and VPN concentrators? Nope; the network manager doesn't have enough people to handle network-related work, let alone the added burden of firewall, VPN and SecurID administration.
Working without a net
Therefore, although I know the danger of relying on one person to maintain all knowledge about any aspect of my department's operations, I have my very own single point of failure. Naturally, a single point of failure can't work all the time, but my guy has been working his tail off for the past six months. When enough was finally enough, he asked to take a couple of days off to be with his family. I checked the calendar for upcoming changes to the infrastructure that might need his attention, and the coast looked clear. I let him take three days off. (Actually, I'm not even going to charge him vacation time; he works so hard that I gave him the days as comp time.)
On Wednesday, I received the first call. The manager of our mobility project needed a VPN set up between us and a service provider. This project enables our field service engineers to use BlackBerry devices to access the customer relationship management application on the internal network. Not surprisingly, the setup was needed immediately. Time to roll up my sleeves and get to work.
I've had hands-on experience at different points in my career, but I hadn't touched a Unix console or a firewall in at least a year. As a manager, I spend most of my time on project management, budget issues, personnel problems, policy writing and attending meetings. I simply don't have time for hands-on operational things, and I'm a bit rusty. But with my single point of failure unavailable, I had to make time, rusty or not.
I logged into our partner VPN firewall and attempted to configure the VPN tunnel using the parameters provided by the service provider. Sounds easy enough. But soon I was pulling my hair out as I tried to figure out why the VPN tunnel wasn't being established. I was almost bald when I realised what the problem was: The service provider's Cisco PIX firewall and my company's Juniper NetScreen firewall just don't talk the same language.
This is a well-documented issue, but there's no easy fix, and it took me a while to figure out that the solution lay with what is called "proxy ID," which essentially defines which networks are to be tunnelled. As soon as I configured the proxy ID properly, the tunnel came up, and I was able to successfully pass the proper traffic between three servers on our internal network and several resources on our partner's network.
That same day, I received a call from the network operations centre about another VPN problem. Our suppliers were having trouble using a portal we had set up for them to access some of our internal applications. The portal is built on a Juniper SSL VPN concentrator, with RSA SecurID tokens used for two-factor authentication, CA Netegrity for single sign-on, and Microsoft Active Directory for identifying authorisation levels.
Troubleshooting this problem took me several hours. First, I checked the SecurID logs, which indicated that the users were properly authenticating. The SSL VPN logs indicated that users' log-ons had been successful. Nonetheless, we couldn't be sure that the authentication traffic was reaching all the resources; in that regard, the logs weren't very meaningful.
I deployed a Snort sensor on the network segment that was running the supplier portal infrastructure. The network team configured the sensor on the proper network span ports, and I monitored the network traffic for indications of activity. That showed me that the SSL VPN concentrator wasn't sending properly formatted packets to the Web portal. This was odd, since the logs seemed to indicate that sessions had been successful. I ended up rebooting the SSL VPN concentrator, which fixed the problem. Then I opened up a support call; I'll let my security engineer handle this matter when he gets back.
Oh, how I wish my single point of failure never needed a vacation. But my days in the trenches showed me that he certainly deserved one.
Thank goodness my engineer was gone for just three days. Now that things have calmed down, I can attend to the management of a huge risk-assessment project. We hired a third party to conduct a risk assessment of some of our core applications, including our source-code repository, the product life-cycle management application, an EMC Documentum repository and an application that engineers use to create designs for our products.
I'm most concerned that the consultants have the right focus and that we get the results we need. I don't want to spend upwards of £40,000 for a glorified port-scanning exercise. I want the consultants to spend most of their time on a structured walk-through of the applications, and I want them to do application-specific vulnerability testing. I've given them their marching orders, and they are well under way. The final report could be helpful as I try to obtain additional funds for security infrastructure and personnel.
What do you think?
This opinion is written by a real security manager, "Mathias Thurman," whose name and employer have been disguised for obvious reasons. Contact him at firstname.lastname@example.org: