Colleges Need Write Access to Switches

Introduction

The local colleges within Colorado State University need write access to their switches in order to speed up day-to-day changes, development of solutions at the network level, and incident response. This does not imply that the importance and need for consistency across an environment is not understood; but there is some room for compromise. Having a subnet manager actively manage their environment in compliance with campus standards and communication techniques can offset a lot of the day-to-day burden of minor changes, ease the implementation of LAN scale network devices, and speed troubleshooting for issues inside the LAN and across the LAN/WAN link.

Day-to-Day Changes

The one of the most common tasks for a subnet manager is to make ports active for new or moved devices. For edge ports that only have one subnet assigned to them the ports can be pre-configured with the needed VLAN. However, in buildings with multiple subnets and devices that roam with people (such as a printers) if a device with a static IP on subnet A plugs into a port on subnet B then communication will break. It is far more efficient to simply change the VLAN of the port going to the device so it is on the correct subnet, rather than re-IPing the device and going around re-mapping the device for everyone who might use the device.

Multiple VLANS can be very helpful for deploying private networks used for isolating communication within a dedicated management cluster for Hyper-V, Vmware, Sharepoint, implementing NAT as assignable public IP address space runs out, shunting traffic through systems by having it act as a layer 2 gateway. All of these can be done by a trained subnet manager working closely with the service owners to detect, diagnose, and resolve issues without generating work for NOC.

Development of Solutions at the Network Level

When it comes to LAN management, an attentive college should be working with monitoring tools like to solve questions like “Are their devices are online? When was my last configuration backup? Can it be automated? What do my normal network traffic trends looks like? What traffic causes large spikes? Who is doing the most talking? Are they talking as expected?” Colleges should try to solve security questions like “Can I associate traffic to a user, not just to an IP address? How can I protect my servers from malicious clients on the same subnet? How can I throttle and block traffic from users dynamically without incurring administrative overhead? How do I slow or stop a virus outbreak?”

NOC already has several of monitoring tools built (for example: Nagios, RANCID, Cacti) to answer several of these questions. However not all of them are accessible to the different colleges today. Where possible, existing services built by NOC can be leveraged. However if a college wants to re-deploy an existing service deploy or a new service that is not available through NOC, then access to the switches is needed.

Incident Response

Having a clean, systematic, documented environment helps determine what is versus what should be, a critical step in troubleshooting errors. If a college is going to ask to retain write access to the switches they have the responsibility of building, maintaining, and communicating accurate documentation. In the event of the issue that spills out into the rest of campus, it is also their responsibility to work with NOC to solve the issue. Speed of resolution is important in preventing people from going home due to an outage. To this end, having close communication, clean and correct documentation configurations and topologies, and having write access to the switches helps at a college level.

Write access to the switch helps by giving the ‘show running-configuration’ command because it cleanly lists how the switch is configured. Write access also allows the use of monitor ports to sniff traffic, a useful troubleshooting technique. If a simple error is found by the college’s subnet manager in their environment they could also correct it directly. Calling or emailing NOC to correct it would slow down resolution.

NOC’s Concerns

NOC’s concerns include configuration change management, monitoring, network security, and documentation. With some changes having campus-wide and life-saving implications, these concerns should not be casually ignored. If the colleges are not allowed write access, a compromise to allow the colleges to be as flexible is needed.

Documentation

Each college should share with NOC documentation of their infrastructure, each one consistently documented and uniform with the rest of campus. Monitoring key links to gather information then becomes much simpler, and it is easier to tell what is really going on. A college’s subnet manager and NOC should work together to assemble that information. What is important to a subnet manager may not be immediately important to NOC.

Escalation Process

Though it may seem obvious, below is a generic escalation process that may be used for contacting NOC about issues. During incidents the subnet manager and NOC should have an open line of communication about what they each see, what implications that may have, what each party is doing, and what needs to happen. Once the issue is solved a report about what the problem was, the resolution, and steps for possible prevention should be issued to the relevant parties. As seen, the subnet-manager should be able to take some actions to help diagnose and solve the issue, such as disabling ports or reloading a switch. Having write access assists in the subnet managers ability to take care of issues inside the college.

Day-to-Day Changes

Day-to-day changes should be small changes like VLAN configuration and naming ports. If the subnet manager does not have write access then these tasks need to be done by NOC. If an indirect access can be given through a tool like ProCurve Manager Plus or custom tool restricting modifications to a set of pre-approved actions, then allow colleges can be as agile as required.

Development of solutions

As colleges grow they will need to deploy solutions at the network level like network firewalls, Intrusion Detection and Prevention (IDS/IPS) systems, Hardware Load Balancers (HLBs). The ability to design, test and implement these types of solutions requires non-trivial amounts of time and effort. Involving NOC in all of these projects, while informative from a campus perspective, may mean time-sharing NOC resources and slowing down deployment of college and non-college projects. Having write access to switches will remove that deployment limitation.

Training the Subnet Managers

To mitigate inaccuracies in change management it may be beneficial to train the subnet managers to meet NOC’s standards in communication, monitoring, and troubleshooting. While this improves the quality of change management and communication process it does not eliminate increased volume of people making changes and potential for inaccuracies; it just helps limit the scope of inaccuracies.

Below is a generic change management process that trained subnet managers can use to for day-to-day tasks as well development of solutions. The most complex step is grasping the implications and scope of a proposed change; it is also the step most likely to cause the most errors spanning campus.

Conclusion

The issues NOC is facing include scale, consistency, communication, documentation, and security. While having sole write access to the switch environments is one way to solve those issues, it incurs the burden of the daily tasks and projects that subnet managers were already doing. To avoid placing the burden of work the subnet managers do on NOC and slowing down campus projects, allow the subnet managers to continue to do the work in conjunction with NOC’s standards.