Latest Cisco, PMP, AWS, CompTIA, Microsoft Materials on SALE Get Now
My Catalyst 9200/9300 Upgrade is Stuck After an SSH Disconnect
2790

SPOTO Cisco Expert

SPOTO Cisco Expert

Settle a problem:41

Answered:

The Scenario: A Summary of the Problem

An engineer was remotely upgrading a Catalyst 9300 switch. They issued the one-shot upgrade command:

install add file flash:cat9k_iosxe.17.09.04.SPA.bin active commit

This command is designed to perform the entire upgrade process—adding the new software, setting it as the active package for the next boot, and committing the change. The final step is a prompt asking for confirmation to reload the switch.

However, the engineer’s SSH session disconnected for a minute right after the command was sent. When they reconnected, the confirmation wizard was gone. Any attempt to re-run the install command resulted in a frustrating error:

File cannot start new install. Operation is already running.

The engineer was now in a difficult position: the switch was “stuck” in an installation state, and they were afraid to simply reload it, fearing the boot configuration (packages.conf) might be corrupted or empty, leading to a boot loop or a trip to ROMMON.

Why This Happens: Understanding the IOS-XE Install Process

The key to solving this problem is understanding what the commit keyword does. The IOS-XE install process is transactional and has several distinct stages:

  1. Add: The new software package (.bin file) is expanded into its component .pkg files in the flash memory.
  2. Activate: The system prepares the boot variables for the next reload. This is the critical stage. When you use the commit keyword, the install activate process modifies the packages.conf file, telling the switch which .pkg files to load upon the next reboot. This action is completed before the final reload prompt appears.
  3. Commit: This finalizes the changes to the boot configuration, making them persistent.
  4. Reload Prompt: The switch then simply waits for your ‘yes’ or ‘no’ to reboot and complete the upgrade.

When your SSH session disconnected, the first three stages had likely already completed successfully. The switch had updated its boot instructions and was simply waiting on input from a session that no longer existed. This is why the system reports an “operation is already running”—it’s still technically waiting for that final confirmation.

The Solution: A Step-by-Step Recovery Guide

The good news is that the switch is not in a dangerous state. The “commit” action has already done the heavy lifting. The recovery is a matter of verification followed by a manual reload.

Step 1: Verify the Boot Configuration (Build Your Confidence)

Before you do anything else, verify that the switch is configured to boot the new software. This will confirm it is safe to reload. Connect to the switch and run the following command:

show install summary

This command will likely show the installation in a pending or waiting state. More importantly, check the packages.conf file. This file acts as the bootloader’s instruction manual.

more flash:packages.conf

You should see output that clearly lists the .pkg files from your new IOS-XE version. It will look something like this (version numbers will vary):

#! /usr/binos/bin/packages_conf.sh

# Copyright (c) 2016-2022 by Cisco Systems, Inc.
# All rights reserved.

boot rp 0 0 rp_boot flash:cat9k-rp-boot.17.09.04.SPA.pkg
boot rp 0 0 rp_core flash:cat9k-rp-core.17.09.04.SPA.pkg
boot rp 0 0 ssa flash:cat9k-ssa.17.09.04.SPA.pkg
... (and so on for all packages)

If you see the new version numbers listed here, you can be 100% confident that the commit phase was successful. The switch knows exactly what to do when it reboots.

Step 2: Save Your Configuration and Reload

Since the boot variables are correctly set, the only remaining step is the one you were interrupted from completing.

First, as a best practice, save your running configuration:

copy running-config startup-config

or

write memory

Now, manually reload the switch:

reload

Proceed with the confirmation. The switch will now reboot, load the new software packages as directed by packages.conf, and complete the upgrade.

Step 3: Post-Upgrade Verification and Cleanup

Once the switch is back online, SSH into it and verify that the upgrade was successful.

show version

Check the “System image file is” line to confirm it is running the new version.

show install summary

This should now show the installation state as “SUCCESS”.

Finally, it is a critical best practice to clean up the old, inactive software files to reclaim valuable space on your flash storage.

install remove inactive

Confirm the removal, and the process is complete.

Best Practices for Safer Remote Upgrades

To avoid this stressful situation in the future, follow these tips for any critical remote operation:

  1. Use a Scheduled Reload: Before starting the upgrade, schedule a failsafe reload. If you lose access for any reason, the switch will automatically reboot back to its previous state after the timer expires.

    • reload in 15 (Reloads in 15 minutes)
    • reload at 23:30 (Reloads at a specific time)
    • If the upgrade is successful, you can cancel the scheduled reload with reload cancel before initiating the final upgrade reboot.
  2. Use a Terminal Multiplexer (Screen/Tmux): If you are working from a Linux/macOS bastion host, run your SSH session inside a terminal multiplexer like screen or tmux. If your local connection to the bastion host drops, the session on the host remains active. You can simply reconnect and re-attach to your session, which will be exactly where you left it.

  3. Out-of-Band Access: For mission-critical infrastructure, always have a reliable out-of-band (OOB) management plan, such as a console server. This gives you direct console access to the device, completely independent of the production network, allowing you to recover from almost any situation.

Don't Risk Your Certification Exam Success – Take Real Exam Questions
Pass the Exam on Your First Try? 100% Exam Pass Guarantee