Cisco IronPort Email Security Appliance Reporting Data Corruption

I recently experienced an unexpected power down of my IronPort Email Security Appliance which led to the appliance generating an hourly email containing the following:


The Critical message is:

An application fault occurred: ('aggregator/master_aggregator.py _process_export_files|605', "<class 'reporting.aggregator.master_aggregator.ExportProcessError'>", '', '[aggregator/master_aggregator.py main|272] [aggregator/master_aggregator.py watch_incoming_queue|401] [aggregator/master_aggregator.py _process_export_files|605]')
After doing some investigating I noticed that I also wasn’t seeing any of the normal reporting data in the web interface on the appliance. The resolution ended up being quite simple in my case. Here are the steps to delete the reporting database (all your reporting data will be deleted, you’ve been warned).
  1. SSH to the IronPort appliance
  2. Enter diagnostic mode by typing “diagnostic”
  3. Then enter reporting mode by typing “reporting”
  4. Finally type “deletedb”
  5. You’ll be prompted to confirm that you do indeed wish to delete the reporting database

This process took a few minutes on my lab system and then reporting started working again.

Here’s what the process looked like:


ironport.intsystek.com> diagnostic

Choose the operation you want to perform:
- RAID - Disk Verify Utility.
- DISK_USAGE - Check Disk Usage.
- NETWORK - Network Utilities.
- REPORTING - Reporting Utilities.
- TRACKING - Tracking Utilities.
- RELOAD - Reset configuration to the initial manufacturer values.
- SERVICES - Service Utilities.
[]> reporting

The reporting system is currently enabled.

Choose the operation you want to perform:
- DELETEDB - Reinitialize the reporting database.
- DISABLE - Disable the reporting system.
[]> deletedb

This command will delete all reporting data and cannot be aborted. In some instances it may take several minutes to complete. Please do not attempt a system restart until the command has
returned. Are you sure you want to continue? [N]>Y

Reseting reporting data......
The reporting system is currently enabled.

Hosting a DNS Server for the NTP Pool Project

If you’re reading this I’m guessing you already know what NTP (Network Time Protocol) is, but as a quick refresher, it’s a simple network protocol to sync time of a device to a reference clock.

I’ve been a huge fan of the NTP Pool Project offering anyone including network operators, end users, and even device manufacturers the ability to leverage a globally distributed and highly resilient NTP time source.

In the past, I’d hosted NTP servers, but in the days of un-patched NTP servers being used for NTP amplification attacks my ISP and I grew tired of constantly chasing down issues and I stopped actively hosting NTP servers as part of the NTP Pool.

I’d always known that the basic way the NTP Pool operated was that you’d point your device at one of their regional NTP references (i.e. 0.pool.ntp.org or a geographically specific entry like 0.north-america.pool.ntp.org) at which point a DNS lookup would be done and an IP address of one of the NTP Pool member servers is returned.

At a small scale, you’d just need a few DNS servers and all would be well, but the NTP Pool processes millions of clients that all issue many DNS queries to find the appropriate name server to sync with. This much DNS traffic requires A LOT of DNS server capacity and that’s where another type of volunteer comes in.

After reading this page I realized I could easily offer up a virtual machine and provide some extra DNS capacity for the greater good. I installed a basic Ubuntu virtual machine, added some firewall rules, and the friendly guys at the NTP Pool Project installed their custom DNS server software and started sending queries my way. They said to expect 3-5 Mbps of DNS traffic on average with occasional spikes above that. DNS queries and responses are very small transactions so 3-5 Mbps of traffic is a TON of DNS traffic and a lot of connections through my internet firewall.

Take a look at the number of connections through my internet firewall before and after I started hosting NTP Pool DNS.

I would highly encourage anyone with the resources to either host an NTP server or an NTP DNS server.

Go forth and sync your devices to a reliable time source. Your log files and sysadmins will thank you.

Windows Server DHCP Failover BAD_ADDRESS

I have two Window Server 2016 servers configured as a failover pair DHCP servers. Everything had been working fine for more than a year until suddenly clients were not able to get leases and the DHCP scope statistics indicated that the pools had no more addresses to assign. Using a bit of PowerShell


$computername = “server01”

$scopeid = "10.20.0.0”

import-module DHCPServer

foreach ($object in Get-DhcpServerv4Lease –ComputerName $computername –ScopeId $scopeid)

{

if ($object.AddressState –eq 'BAD_ADDRESSES')

{

$object

}

}

I saw the following:


IPAddress ScopeId ClientId HostName AddressState LeaseExpiryTime
--------- ------- -------- -------- ------------ ---------------
10.20.0.101 10.20.0.0 65-00-14-0a BAD_ADDRESS Declined 7/17/2018 4:04:...
10.20.0.107 10.20.0.0 6b-00-14-0a BAD_ADDRESS Declined 7/20/2018 4:16:...
10.20.0.108 10.20.0.0 6c-00-14-0a BAD_ADDRESS Declined 7/18/2018 2:50:...
10.20.0.111 10.20.0.0 6f-00-14-0a BAD_ADDRESS Declined 7/9/2018 4:18:4...
10.20.0.120 10.20.0.0 78-00-14-0a BAD_ADDRESS Declined 7/9/2018 6:30:2...

Edited for brevity. Nearly the entire scope was filled up like this.

After much head scratching, I looked in the Windows Event Viewer on both servers and saw the following error repeatedly logged on one of the servers “The server detected that it is out of time synchronization with partner server: server02.domain.net for failover relationship: SLP-DHCP-Failover. The time is out of sync by: 163 seconds .” This error was logged under the “Applications and Service Logs -> Microsoft -> Windows-DHCP Server -> Microsoft-Windows-DHCP Server Events/Admin”

I checked the clock on the partner server and noticed it was more than four minutes off from the other DHCP server. When looking at the NTP status using the command “w32time /query /status” there wasn’t any NTP server defined! Once I re-issued the “w32tm /resync /rediscover” command it discovered the domain controller and after a bit of time the clocks were in sync and all my DHCP issues were resolved.