ServiceNow ITOM Discovery Interview Questions 2026

What is ServiceNow ITOM Discovery?

ServiceNow Discovery is an ITOM (IT Operations Management) application that automatically identifies and documents all hardware, software, and network devices across your IT infrastructure. It populates and keeps the CMDB (Configuration Management Database) accurate and up-to-date - without requiring manual input.

Discovery works by sending instructions to a MID Server installed inside the customer's network. The MID Server probes devices, collects data, and sends results back to ServiceNow, where the data is processed and written to CMDB tables.

Why it matters: An accurate CMDB is the foundation for all ITSM processes - Change, Incident, Problem, and Service Mapping all depend on reliable CI data. Discovery automates what would otherwise take a team of administrators months to maintain manually.

Explain the phases of Discovery.

Discovery follows four structured phases:

1. Scanning Phase: The MID Server probes the configured IP ranges using tools like Shazzam probes to identify which IPs are active and which ports are open. This tells Discovery what devices are present on the network and what protocols they may support (SSH, WMI, SNMP).

2. Classification Phase: Once active devices are found, Discovery analyses response patterns and service banners to determine the device type - for example, is it a Linux server, a Windows server, a Cisco router, or a printer? This classification decides which probes and patterns will run next.

3. Identification Phase: Discovery checks the IRE (Identification and Reconciliation Engine) to determine if the found device already exists as a CI in CMDB. If it matches an existing CI (by serial number, hostname, or IP), that record is updated. If no match is found, a new CI is created. This step prevents duplicate records.

4. Exploration Phase: The most detailed phase - Discovery uses credentials and patterns to connect to the device and collect granular data such as installed software, running processes, hardware specs, network adapters, and relationship information. This data is written to the relevant CMDB tables.

What is a MID Server and why is it required?

MID Server (Management, Instrumentation, and Discovery Server) is a Java application installed on a server inside the customer's on-premises network. It acts as a bidirectional proxy between the ServiceNow cloud instance and the internal network infrastructure.

Why it is required:
ServiceNow is a cloud platform and cannot directly access devices inside a customer's private network or behind firewalls. The MID Server bridges this gap. Key points:

- The MID Server communicates with ServiceNow via outbound HTTPS only - no inbound firewall rules are needed on the customer network
- It polls the ECC Queue in ServiceNow for work instructions, executes them locally, and sends results back
- It runs all network operations (port scans, SSH, WMI, SNMP queries) within the local network where it has access
- It is stateless - it never stores or retains discovery data locally

Placement: The MID Server must be installed on a host that has network connectivity to all target devices it will discover. In environments with multiple isolated network segments, multiple MID Servers are required.

What are Probes, Sensors, and Patterns in Discovery?

Probes: Lightweight scripts executed by the MID Server that collect raw data from target devices. They perform low-level network operations - SSH commands, WMI queries, SNMP GETs - and return raw results to ServiceNow via the ECC Queue.

Sensors: Server-side scripts that run in ServiceNow to process the raw data returned by probes. A sensor parses the probe output, extracts meaningful CI attributes, and writes them to the correct CMDB tables.

Patterns: Predefined, self-contained discovery sequences that replace the separate probe+sensor model for complex devices. A Pattern defines the full discovery logic - connection steps, data extraction, conditional branching, and CMDB writes - in a single, configurable unit. Patterns are built using ServiceNow's Pattern Designer and are version-controlled.

Key difference: Probes/Sensors are the legacy approach and work well for simple devices. Patterns are the modern approach, preferred for application discovery and complex multi-layer environments because they are easier to customise, test, and maintain.

What are the different types of Discovery?

Horizontal Discovery: Discovers infrastructure and network components - servers, routers, switches, virtual machines, storage devices. It finds what devices exist on the network. This is the foundation; you run this first.

Vertical Discovery: Focuses on the applications and services running on the infrastructure already discovered. For example, it discovers the Oracle database running on a Linux server that Horizontal Discovery found. Vertical Discovery builds the full picture of what is running and creates CI relationships between infrastructure and applications.

Cloud Discovery: Discovers cloud infrastructure (AWS EC2, Azure VMs, GCP instances) using cloud provider APIs rather than network scanning. Requires cloud credentials instead of device-level credentials.

Agent-based Discovery: Uses a lightweight agent installed on each endpoint. Particularly useful for laptops and remote devices that are not always on the corporate network. No MID Server required for the data collection.

How does the MID Server communicate with ServiceNow?

Communication is entirely outbound from the MID Server to ServiceNow - the MID Server initiates all connections over HTTPS. No inbound firewall ports need to be opened on the customer network.

The flow works as follows:

1. The MID Server continuously polls the ECC Queue in ServiceNow for new instructions (outbound messages)
2. When it finds a Discovery task, it executes the assigned probes or patterns against target devices within its local network
3. After execution, it posts the results back to the ECC Queue as inbound messages
4. ServiceNow picks up the inbound messages and runs the corresponding sensors or pattern post-processors to write data to CMDB
5. The MID Server operates statelessly - it does not store Discovery data locally

This polling-based architecture means even highly locked-down networks can use Discovery as long as the MID Server host can reach the ServiceNow instance URL on port 443.

What is the ECC Queue and what does it contain?

The ECC Queue (ecc_queue table) is the messaging channel between ServiceNow and MID Servers. It holds two types of messages:

Outbound messages (ServiceNow → MID Server): Instructions telling the MID Server what to do. Examples include:
- "Run Shazzam probe on IP range 10.0.1.0/24"
- "Run SSH exploration pattern on 10.0.1.55 using credential set X"

Inbound messages (MID Server → ServiceNow): Results and responses from executed probes. Examples include:
- Open port data from a Shazzam scan
- SSH command output from an exploration probe
- SNMP OID data from a network device scan

Key fields in ecc_queue: Agent (which MID Server), Queue (input/output), State (ready, processed, error, ignored), Topic (probe name), Payload (XML data), Source, Response.

The ECC Queue is the first place to check when troubleshooting Discovery issues - error state messages here tell you exactly where communication broke down.

What credentials are required for Discovery and how are they managed?

Discovery credentials are stored in the Discovery Credentials records (discoverer_credentials table) and are used by the MID Server to authenticate against target devices during the Exploration phase.

Types of credentials:
- SSH: For Linux/Unix/Mac - username + password or SSH private key
- Windows (WMI/NTLM): Domain account or local admin - domain\username + password
- SNMP: Community string (v1/v2c) or auth/privacy settings (v3)
- VMware vCenter: For virtual environment discovery
- AWS/Azure/GCP: API keys, service accounts, IAM roles
- Database: For application-level discovery of Oracle, MSSQL, etc.

Security: Credentials are never stored in plain text on the MID Server. They are sent encrypted at runtime and are never written to disk on the MID Server host.

Credential Affinity: You can assign specific credentials to specific IP ranges, so the right credential set is automatically used for each network segment without requiring a single shared account across all environments.

What is a Discovery Schedule and how do you configure one?

A Discovery Schedule defines what to discover, when to run, and how to run it. It is the starting point for any Discovery execution.

Key configuration fields:
- Name: Descriptive label
- Discover: Set to "IP Ranges"
- IP Ranges / Includes: CIDR blocks or individual IPs to scan (e.g., 10.0.1.0/24)
- MID Server or MID Server Cluster: Which MID Server(s) to use
- Behaviors: Which connection protocols to try and in what order
- Run: Once, Daily, Weekly, or on a custom recurring schedule
- Active: Toggle to enable/disable

Best practice: Create separate Discovery Schedules per network segment, device type, or data center. This makes troubleshooting easier and allows you to stagger execution windows to avoid network and ECC Queue overload.

How does Discovery uniquely identify a CI?

Discovery uses the Identification and Reconciliation Engine (IRE) along with configured CI Identifier Rules to determine whether an incoming CI payload matches an existing record in CMDB or requires a new CI to be created.

The IRE checks identifier entries in priority order. For example, for a server CI class:
- Entry 1: Match by serial number (most reliable - hardware-level unique ID)
- Entry 2: Match by hostname + IP address
- Entry 3: Match by name alone (least reliable)

If Entry 1 produces a match, that CI is updated. If not, IRE tries Entry 2, and so on. If no match is found at all, a new CI is created.

Common interview follow-up: "What happens if the serial number changes?" - A new CI may be created if the serial number was the only identifier, leading to a duplicate. This is why multiple identifier entries are configured as fallbacks.

What are Behaviors in Discovery?

Behaviors control how Discovery connects to a device - which protocol to try first and what to do if that attempt fails. Since different devices may support multiple protocols (SSH, WMI, SNMP, PowerShell), Behaviors provide a prioritised fallback strategy.

How Behaviors work:
1. The MID Server confirms that a target IP is active
2. Discovery consults the configured Behaviors to pick the first protocol + credential combination to try
3. If the first Behavior fails (wrong protocol, invalid credentials, port closed), Discovery automatically tries the next Behavior in sequence
4. This continues until a successful connection is made, or all Behaviors are exhausted

Practical example: A Behavior might try SSH first for Linux devices; if SSH fails, try SNMP as a fallback. This handles mixed environments where device types are not known in advance.

What ports must be open for Discovery to work?

SSH (Linux/Unix): TCP port 22 - must be open from the MID Server to the target device

WMI (Windows): TCP 135 (RPC Endpoint Mapper) + TCP 139 + TCP 445 + dynamic ports 1024–65535. The dynamic port range is the most commonly missed - firewall rules that only open 135 will cause WMI timeouts.

SNMP (Network devices): UDP 161 (SNMP queries) and optionally UDP 162 (SNMP traps)

HTTPS (MID Server → ServiceNow): TCP 443 outbound from the MID Server host to the ServiceNow instance URL

VMware vCenter: TCP 443 (HTTPS API)

ICMP (ping): Used by Shazzam to confirm a host is alive before running probes - ICMP echo should be allowed from MID Server to target subnets. If ICMP is blocked, Shazzam can fall back to TCP port scanning but performance is reduced.

What is the difference between Discovery and Service Mapping?

Discovery	Service Mapping
Identifies all infrastructure components (servers, network devices, software) and records them as CIs in CMDB	Maps how CIs work together to deliver a specific business service
Focuses on what exists in the environment	Focuses on how things relate to each other for a given service
Uses Horizontal scanning - sweeps IP ranges to find all devices	Uses Top-Down (Vertical) approach - starts from a known entry point (URL, process) and traces dependencies downward
Populates CMDB with CI inventory and basic relationships	Creates Application Service Maps showing end-to-end service dependencies
Prerequisite: needs network access	Prerequisite: relies on a populated CMDB from Discovery

In short: Discovery = finds what exists. Service Mapping = shows how those things work together to deliver services.

What happens when Discovery finds a device that already exists in CMDB?

When Discovery finds a device that matches an existing CI, the following process runs:

1. IRE matches the incoming CI payload to the existing record using Identifier Rules (e.g., matching by serial number)
2. Reconciliation Rules are checked for each field - they determine whether Discovery's incoming value should overwrite the existing value based on data source priority
3. Fields where Discovery has higher priority than the current source are updated
4. Fields set by a higher-priority source (e.g., manually edited by an admin who has top priority) are not overwritten
5. The CI's Last Discovered timestamp is always updated
6. Relationships are created or updated as needed
7. No duplicate CI is created

This selective update behaviour is why Reconciliation Rules are so important - without them, every Discovery run would overwrite all manually managed data.

Explain the Identification and Reconciliation Engine (IRE) in detail.

The IRE is the CMDB engine that governs two distinct processes:

1. Identification: When incoming CI data arrives (from Discovery, SCCM, a ServiceGraph Connector, or any integration), IRE uses CI Identifier Rules to find a matching CI in CMDB. Each rule defines which attribute combinations constitute a "match" for a given CI class. Multiple identifier entries are configured in priority order - IRE tries each until a match is found or all fail, at which point a new CI is created.

2. Reconciliation: Once the target CI is identified (new or existing), IRE applies Reconciliation Rules to decide which value "wins" for each attribute when multiple data sources report different values. Rules define a priority order of data sources per CI class and attribute. For example: Discovery > SCCM > Manual entry for hardware serial numbers. A lower-priority source can never overwrite a higher-priority source's value.

Additional IRE component - Related Entry Rules: Handle child items (like network adapters, running processes) discovered alongside a parent CI. They define how those related items are matched and written.

IRE Simulation Tool: ServiceNow provides an IRE Simulation tool (CMDB > Identification/Reconciliation > Simulate) that lets you test how a given CI payload will be processed before running a live Discovery. Essential for debugging duplicate CI and overwrite issues.

How do Discovery Patterns differ from the legacy Probe-Sensor approach?

Legacy Probe-Sensor model:
- Probes collect raw data; sensors parse it and write to CMDB
- Two separate scripts with a tightly coupled dependency between them
- Harder to debug since logic is split across two components
- Limited support for conditional logic within the discovery flow
- Still used for basic network device discovery

Pattern-based Discovery (modern):
- A single, self-contained unit that defines the complete discovery flow: connection, data extraction, conditional logic, relationship mapping, and CMDB writes
- Built visually in the Pattern Designer - no raw scripting required for standard patterns
- Supports version control, enabling safe testing before deployment
- Supports conditional steps (if/else branching based on collected data)
- Can be downloaded from the ServiceNow Store or custom-built
- Preferred for application-layer discovery (databases, web servers, middleware)

When to use which: Use Patterns for any complex application discovery. Legacy probes/sensors remain in use for basic SNMP device scanning where pattern support is limited.

What is a MID Server cluster and when would you use multiple MID Servers?

A MID Server cluster is a named group of MID Servers assigned to a Discovery Schedule as a unit. When a cluster is used, ServiceNow distributes Discovery work across all available MID Servers in the cluster for load balancing and redundancy.

When multiple MID Servers are needed:

- Geographic distribution: Data centers in different cities/countries - each location needs its own MID Server with local network access
- Network segmentation: Isolated segments (DMZ, production, dev/test) each need their own MID Server since routing between them is restricted
- Scale: A single MID Server can typically handle 500–1,000 concurrent device connections. Environments with 10,000+ devices need multiple MID Servers per segment
- High Availability: Two MID Servers in a cluster ensure Discovery continues if one goes down
- Cloud environments: Separate MID Server per cloud region or account for API rate limit compliance

Best practice: One MID Server per isolated network segment; cluster them for HA. Do not share a MID Server across firewall boundaries.

How does SNMP Discovery work? What SNMP versions does ServiceNow support?

SNMP (Simple Network Management Protocol) is used to discover network devices - routers, switches, firewalls, printers, UPS units - that don't expose SSH or WMI interfaces.

How it works:
1. MID Server sends SNMP GET / GETNEXT / GETBULK requests over UDP 161 to the target device
2. The device responds with OID (Object Identifier) data from its MIB (Management Information Base)
3. ServiceNow SNMP probes process the responses and extract device attributes (model, OS version, interfaces, etc.)
4. Data is written to the appropriate CMDB CI class (cmdb_ci_ip_switch, cmdb_ci_netgear, etc.)

SNMP versions supported:
- SNMPv1: Basic, uses community string, no encryption - avoid in production
- SNMPv2c: Improved performance, still uses community string - widely used for network devices
- SNMPv3: Secure - supports authentication (MD5/SHA) and privacy/encryption (DES/AES). Uses username-based credentials rather than community strings. Recommended for production

Practical tip: Always confirm the SNMP community string or v3 credentials are correctly configured and that UDP 161 is open from the MID Server host to the target network segment.

What are Discovery Ranges and IP Exclusion Lists?

Discovery Ranges define the set of IP addresses or CIDR blocks that a Discovery Schedule will scan. You can specify:
- Individual IPs: 192.168.1.50
- CIDR notation: 10.0.0.0/24
- IP ranges: 192.168.1.1 – 192.168.1.255

IP Exclusion Lists allow you to carve out specific IPs or ranges from Discovery scanning, even if those IPs fall within a configured Discovery Range. Common use cases:
- Excluding core network infrastructure (DNS servers, domain controllers) that shouldn't have their CMDB records auto-updated
- Excluding sensitive systems where Discovery probes could cause issues
- Excluding devices managed by a different tool that owns their CMDB records

CI-level exclusion: You can also mark an individual CI record with "Excluded from Discovery" - this prevents that specific device's CMDB record from being updated even if it is scanned, without removing it from the IP range.

Note: IP Exclusion is checked at the Scanning phase - excluded IPs are never probed. CI-level exclusion is checked at the Identification phase - the device is scanned but its CI is never updated.

How does ServiceNow Discovery work in cloud environments (AWS, Azure, GCP)?

Cloud Discovery uses API-based discovery rather than network-level IP scanning. The MID Server calls the cloud provider's management APIs to enumerate cloud resources.

How it works:
1. Cloud credentials (AWS IAM role/access keys, Azure Service Principal, GCP Service Account) are stored in Discovery Credentials
2. The MID Server is assigned these credentials and calls the cloud provider APIs (AWS EC2 API, Azure ARM API, GCP Compute API)
3. Resources discovered include: VMs/instances, virtual networks, load balancers, storage volumes, databases
4. Data is written to cloud-specific CMDB tables: cmdb_ci_vm_instance, cmdb_ci_cloud_network, cmdb_ci_cloud_load_balancer, etc.

Cloud Discovery vs. Service Graph Connectors for cloud:
For major cloud providers, Service Graph Connectors (SGC) - available in the ServiceNow Store - provide richer, more up-to-date data, better relationship mapping, and reduced API call overhead compared to native cloud Discovery. If SGC is available for your cloud provider, prefer it over Discovery for cloud resources.

Important: The MID Server used for cloud discovery must be able to reach the cloud provider's API endpoints (outbound HTTPS to AWS/Azure/GCP public APIs).

What is Agent-based Discovery and how does it differ from Agentless Discovery?

Agentless Discovery (standard):
- No software installed on target devices
- Uses network protocols (SSH, WMI, SNMP) with credentials stored in ServiceNow
- Requires network access from MID Server to target devices
- Runs on a scheduled basis
- Works well for servers and network devices that are always connected

Agent-based Discovery (Agent Client Collector - ACC):
- A lightweight agent is installed on each target device
- The agent communicates directly with ServiceNow - no MID Server required for data collection
- Works even when devices are off the corporate network (remote/home workers, laptops)
- Provides more continuous, near-real-time updates rather than periodic scans
- No need to open firewall ports or manage credential rotation for target devices

When to use agent-based:
- Laptops and mobile endpoints that frequently leave the corporate network
- Scenarios where deploying scan credentials across all devices isn't feasible
- Environments requiring near-real-time CI updates (e.g., frequent software installs)

Most large enterprises use a hybrid approach: agentless for servers and network devices, agent-based for endpoints.

What are Reconciliation Rules and why are they critical for CMDB health?

Reconciliation Rules define a priority-based hierarchy for each CMDB data source, controlling which source's value "wins" when multiple sources report conflicting data for the same CI attribute.

Why they matter: In most enterprises, CMDB data comes from multiple sources - Discovery, SCCM, JAMF, ServiceGraph Connectors, manual entry, and integrations. Without Reconciliation Rules, the "last writer wins" - whichever source ran most recently would overwrite everything, causing data to change unpredictably after every Discovery run. This is called "CMDB thrashing."

Example configuration:
For cmdb_ci_server.serial_number: ServiceNow Discovery (1st) > SCCM (2nd) > Manual (3rd)
For cmdb_ci_server.software_list: SCCM (1st) > Discovery (2nd)

This means Discovery owns hardware data, while SCCM owns software inventory. Each tool writes what it knows best.

Best practice: Define Reconciliation Rules before deploying Discovery in production. Without them, a Discovery run can wipe out weeks of carefully maintained manual CMDB data.

How does WMI Discovery work and what are common issues you encounter?

WMI (Windows Management Instrumentation) is Microsoft's framework for managing Windows devices. ServiceNow uses WMI to discover Windows servers and workstations.

How it works:
1. MID Server connects to the Windows target via DCOM/RPC on TCP 135, then negotiates a dynamic high port
2. Authenticates using a Windows domain account (or local admin account)
3. Executes WMI queries: Win32_ComputerSystem, Win32_OperatingSystem, Win32_Processor, Win32_DiskDrive, Win32_NetworkAdapterConfiguration, etc.
4. Results are processed by WMI sensors/patterns and written to cmdb_ci_win_server

Common WMI issues and fixes:
- Dynamic ports blocked: Firewall allows 135 but blocks the dynamic range (1024–65535) - open the dynamic port range or configure RPC to use a fixed port
- UAC restrictions: Non-admin accounts get blocked by UAC even with WMI permissions - use a domain admin account or configure DCOM permissions explicitly
- WMI service not running: Check Windows Services on target - restart "Windows Management Instrumentation" service
- WMI repository corruption: Rebuild the WMI repository using winmgmt /resetrepository
- Credential mismatch: Ensure domain\username format is correct - local accounts need .\username format

What is the Discovery Status record and how do you use it?

A Discovery Status record (discovery_status table) is created every time a Discovery Schedule runs. It tracks the progress and outcome of that specific run.

Key information captured:
- State: Discovering / Completed / Cancelled / Error
- Start time and end time
- IP ranges scanned
- Number of IPs scanned, active IPs found
- Number of CIs created, updated, and unchanged
- MID Server used
- Discovery log entries - per-device logs showing which phase succeeded or failed

How to use it for troubleshooting:
1. Open the Discovery Status record for the relevant run
2. Click into the per-device Discovery Logs to see exactly which phase (Scan / Classify / Identify / Explore) failed for a specific IP
3. Look at error messages in the log entries - they often show the exact credential failure, network timeout, or pattern error
4. Use the "Discoveries" related list to see all individual device discovery attempts

Practical tip: Always check Discovery Status first when a CI is missing or incorrect - it tells you whether the device was even found, and if found, at what phase it failed.

How are CI Relationships populated during Discovery and what can cause them to be missing?

CI Relationships are stored in the cmdb_rel_ci table and are created during the Exploration phase when patterns examine running processes, active network connections, and application configurations.

Common relationship types created by Discovery:
- "Runs on" - software running on a server
- "Hosted on" - VM hosted on a hypervisor
- "Depends on" - application depending on a database
- "Connected to" - network-level connections between devices

How relationships are created:
Vertical/Application discovery examines active TCP connections and maps them to known CIs. For example, if a web server has an active connection to 10.0.1.20:1521, Discovery recognises that as an Oracle port and creates a "Depends on" relationship between the web server CI and the Oracle CI at that IP.

Causes of missing relationships:
- The related CI (source or target) was not discovered - relationship can't be created if one end doesn't exist in CMDB
- Vertical Discovery was not configured or run - horizontal discovery alone does not map application dependencies
- Insufficient credentials - the exploration account didn't have permission to query network connection data
- Pattern step for relationship mapping failed - check the Exploration logs
- Connection-based relationships require the connection to be active at the time of Discovery

What is the impact of running Discovery too frequently, and what are the best practices for scheduling?

Negative impacts of over-frequent Discovery:
- Network load: Port scanning and SSH/WMI connections generate significant network traffic - aggressive scanning can impact production network performance
- Target device load: WMI queries can noticeably increase CPU on Windows servers, especially in large batches
- ECC Queue backlog: Too many simultaneous Discovery messages can overwhelm the queue, causing delays in processing
- CMDB write contention: Simultaneous CI updates can cause database locking and slow down the ServiceNow instance
- MID Server thread exhaustion: Exceeding the probe thread pool capacity causes probes to queue up and time out

Best practices for scheduling:
- Run most Discovery schedules daily or weekly - infrastructure changes rarely happen more frequently than that
- Use off-peak windows (nights, weekends) for broad IP range scans
- Stagger schedules so they don't all start simultaneously - spread them across different time windows
- Create separate schedules per segment (Linux servers, Windows servers, network devices) to avoid protocol mixing overhead
- For known, stable environments, consider using targeted re-discovery (re-scan specific CI lists) rather than full range scans

What is a Service Graph Connector and how does it compare to Discovery?

Service Graph Connectors (SGC) are a newer integration framework (built on IntegrationHub ETL) that import and normalise CI data from third-party tools directly into CMDB. Unlike Discovery, which actively probes devices, SGCs pull data from existing management tools via APIs.

Examples of available SGCs: SCCM, JAMF, Qualys, Rapid7, AWS, Azure, GCP, Tanium, CrowdStrike

Discovery	Service Graph Connector
Actively probes the network	Pulls data via API from existing tools
No third-party tool required	Requires the source tool to be in place
Can discover unknown devices	Only discovers what the source tool already manages
Included in ITOM Discovery license	Requires IntegrationHub + connector-specific license
Better for unknown/new environments	Better when you already have rich inventory data elsewhere

Recommendation: Use SGC when a well-maintained third-party inventory exists (e.g., SCCM for Windows endpoints). Use Discovery for devices not covered by any existing management tool.

How do you troubleshoot Discovery when it is not finding or updating specific devices?

Systematic troubleshooting approach:

Step 1 - MID Server status: Confirm the MID Server shows as "Up" in ServiceNow (MID Server > Servers). If Down, check the agent.log on the MID Server host for startup errors.

Step 2 - Discovery Status record: Open the Status record for the relevant run. At which phase did the device fail - Scanning, Classification, Identification, or Exploration?

Step 3 - Per-device Discovery Logs: Drill into the device's Discovery Log record for the specific error message. This pinpoints whether it was a network timeout, authentication failure, or pattern error.

Step 4 - ECC Queue: Check for error-state messages related to that device. Look for patterns in error messages across multiple devices.

Step 5 - Network reachability: From the MID Server host, manually test: ping the target, attempt SSH/WMI connection using the same credentials configured in ServiceNow.

Step 6 - IP Exclusion List: Confirm the device's IP is not on an IP Exclusion List and the CI doesn't have "Excluded from Discovery" checked.

Step 7 - Credential check: Verify credentials are valid, assigned to the correct IP range, and cover the protocol needed for that device type.

Step 8 - MID Server logs (agent.log): For probe-level errors not visible in the ServiceNow UI, review the MID Server's local agent.log file.

Discovery has been running for 2 hours but no CIs are being created in CMDB. Walk me through how you would diagnose this.

Step 1 - Is the MID Server actually running?
Check MID Server > Servers. Is it showing "Up"? If "Down," the MID Server Java process may have crashed. Check the agent.log on the MID Server host for the last error before it stopped.

Step 2 - Is Discovery reaching the Scanning phase?
Open the Discovery Status record. Is the state "Discovering" or stuck on initialisation? If it never moved past creation, the issue is between ServiceNow and the MID Server - check the ECC Queue for outbound messages and whether they have been picked up (state = "processed" vs stuck in "ready").

Step 3 - Is the IP range reachable?
From the MID Server host (not from ServiceNow), manually ping a few IPs in the Discovery Range. If they're unreachable, the MID Server doesn't have network access to that subnet - a routing or firewall issue.

Step 4 - Are devices being found but failing classification?
Check per-device Discovery Logs. If IPs are being scanned and ports are found but classification fails, the device type may not be recognised or the credential isn't matching the OS.

Step 5 - Are devices classified but failing exploration?
Authentication failure at this stage is common - wrong credentials, account locked, or SSH key mismatch. Review the Exploration log entry for the specific error message.

After a Discovery run, you find that hundreds of duplicate CI records exist in CMDB. What are the likely causes and how do you resolve it?

Likely causes:

1. Weak CI Identifier Rules: Using only hostname as the identifier when multiple devices share hostnames across different domains - or when hostnames change between runs, preventing matches from being made
2. Missing serial numbers: Physical servers without serial numbers can't be matched by the most reliable identifier, falling back to weaker combinations
3. IP address changes: If IP was the only identifier and the device received a new IP (DHCP), the old IP-based CI is orphaned and a new one is created
4. Multiple overlapping Discovery schedules scanning the same device with different MID Servers or credentials, producing slightly different CI payloads that don't match
5. Case sensitivity mismatch: Identifier rules not normalising hostname case (e.g., "SERVER01" vs "server01" treated as different devices)

Resolution steps:
1. Use the CMDB Health Dashboard to identify duplicate CI counts by class
2. Run the Duplicate CI Cleanup scheduled job to merge confirmed duplicates
3. Review and strengthen CI Identifier Rules - add serial_number as the primary entry before hostname-based ones
4. Use the IRE Simulation tool to test how a CI payload is matched before running Discovery again
5. Resolve overlapping Discovery schedules - each IP should be covered by only one schedule

A MID Server goes down in the middle of a large Discovery run. What happens to the in-progress data and how do you recover?

What happens immediately:
- Active probe executions on that MID Server are abandoned mid-flight
- ECC Queue outbound messages assigned to that MID Server remain in "ready" state - unprocessed
- The Discovery Status record will eventually show Error or remain in "Discovering" state until it times out
- CIs that were partially discovered (some fields collected, others not) may have incomplete data written if any inbound messages had already been returned before the crash

If a cluster is configured: ServiceNow will reassign pending work from the failed MID Server to other active members of the cluster - partial automatic recovery.

Recovery steps:
1. Identify and fix the reason for the MID Server failure (OOM, disk full, network loss, process crash) - check agent.log
2. Restart the MID Server service
3. Verify it reconnects to ServiceNow (status = "Up")
4. Clean up stuck ECC Queue messages (state = "error" or "ready" with no active MID Server)
5. Re-run the Discovery Schedule for the affected IP range - it will update CIs that were partially or not at all discovered
6. Discovery is designed to be idempotent - re-running it will not create duplicates if IRE rules are correct

Discovery is finding some devices in an IP range but consistently missing others in the same subnet. How do you investigate?

Investigation checklist:

1. Are the missing devices powered on? Obvious but commonly overlooked - check with the infrastructure team

2. Are they reachable via ping from the MID Server host? Log on to the MID Server host and ping the missing IPs. If they don't respond, it's a network/ICMP block between the MID Server and that specific device

3. Is ICMP blocked but TCP ports are open? Shazzam uses ping by default. If ICMP is blocked on those devices (some hardened servers block ping), the Shazzam probe won't detect them as alive. Workaround: configure Shazzam to use TCP port detection instead

4. Is the device in the IP Exclusion List? Check Discovery > Administration > Exclude IP Addresses

5. Different OS type with no credentials? If the missing devices are a different OS or device type (e.g., Linux servers while everything else is Windows), and you don't have SSH credentials configured for that range, they will scan but fail exploration silently

6. Different network segment? Even within the same /24, some devices may sit behind an internal micro-segmentation firewall that blocks MID Server access to specific IPs

Quick diagnostic: Run an on-demand Discovery targeting one missing IP to get isolated, detailed logs for just that device.

A specific server's CMDB record gets overwritten with incorrect hostname and IP data every time Discovery runs. How do you identify the root cause and fix it?

Root cause identification:

1. Check what Discovery is actually collecting: Open the ECC Queue and find the inbound message for that device's most recent exploration run. Review the payload XML - what hostname and IP values is Discovery reporting? Are they genuinely incorrect, or is the device itself misconfigured (OS hostname set incorrectly)?

2. Check the Discovery log for that CI: The Exploration log shows exactly which probe/pattern collected each field and what value it returned

3. Check Reconciliation Rules: Is Discovery given the highest data source priority for the hostname field? If yes, it will always overwrite manual corrections

Fix options depending on root cause:

- If the OS itself reports wrong data: Fix the hostname configuration on the server (hostnamectl on Linux, Computer Name on Windows) - the source of truth is wrong
- If Discovery is extracting it incorrectly: Check the pattern step responsible for collecting hostname - there may be a bug in the extraction logic
- If the data is correct but shouldn't be overwritten: Adjust Reconciliation Rules to lower Discovery's priority for that field, or set the field source as "Manual" to prevent any automated source from overwriting it
- If you want to stop Discovery updating this CI entirely: Set "Excluded from Discovery" on the CI record

The ECC Queue has tens of thousands of unprocessed messages and is growing. How do you handle this?

Immediate diagnosis:

1. Is the MID Server running? If all MID Servers are down, nothing is processing outbound messages - the queue will grow indefinitely. Check MID Server status first.

2. Is processing just slow? Check MID Server thread pool usage - if all threads are occupied with slow probes (e.g., WMI timeouts), new messages queue up. The thread pool size is controlled by mid.probe.thread.pool.size on the MID Server.

3. Are there error-state messages blocking the queue? Certain queue implementations process messages in order - a large number of error-state messages can act as bottlenecks. Run the ECC Queue cleanup job.

Resolution steps:
1. If MID Server is down: fix and restart it
2. Run ECC Queue Cleanup (Discovery > Administration > Cleanup ECC Queue) to remove old processed and error messages
3. If thread pool is exhausted: increase mid.probe.thread.pool.size in MID Server properties (default is usually 50 - increase cautiously based on server resources)
4. Cancel or pause running Discovery schedules if the queue is dangerously large
5. Investigate the source of the spike - which Discovery schedule generated the backlog? Tune its concurrency settings

Long-term prevention: Configure automatic ECC Queue cleanup policies. Stagger Discovery schedules so they don't all trigger simultaneous probe storms.

A client wants to discover 15,000 devices spread across 8 data centres in different countries. How would you design the Discovery architecture?

Architecture design:

1. MID Server placement:
Deploy at least one MID Server per data centre. Each MID Server must reside on a host with network access to all devices in its local data centre. For data centres with isolated network segments (DMZ, production, dev), deploy a MID Server per segment.

2. MID Server clustering:
In each data centre with more than ~1,500 devices, cluster two MID Servers for both load balancing and high availability. Name clusters by location (e.g., "DC-Singapore-Cluster").

3. Discovery Schedules:
Create one Discovery Schedule per data centre (or per major network segment within a DC). Assign each schedule to the corresponding MID Server cluster. Stagger run times - don't start all 8 schedules simultaneously.

4. Credential management:
Use Credential Affinity to assign data centre-specific credentials to their respective IP ranges. This avoids sending inappropriate credentials across data centres.

5. Schedule timing:
Stagger schedules across off-peak hours. For globally distributed DCs, use each DC's local off-peak window. Spread Windows vs Linux vs network device schedules to avoid concurrent WMI/SSH storms.

6. MID Server sizing:
Each MID Server should handle at most 1,000–1,500 devices per run for reliable performance. Size server hardware with at least 4 vCPUs and 8GB RAM per MID Server.

7. Monitoring:
Set up CMDB Health Dashboard alerts for discovery failure rates per schedule, and ECC Queue depth monitoring to catch backlogs early.

WMI Discovery is consistently timing out for Windows servers in one specific subnet but works fine everywhere else. What steps do you take?

The subnet-specific failure pattern strongly suggests a network or host configuration issue in that segment rather than a credential or ServiceNow configuration problem.

Investigation steps:

1. Latency test: From the MID Server host, ping devices in the problematic subnet. Compare RTT to the working subnet - high latency (>100ms) can cause WMI timeouts

2. Dynamic port test: WMI requires TCP 135 + dynamic high ports. Use a port scanner (telnet or nc) from the MID Server host to test ports 445 and a sample of ports in the 1024–65535 range on devices in that subnet. Many enterprises apply stricter firewall rules to specific subnets

3. Check subnet-specific firewall rules: Ask the network team if there are ACLs or firewall policies applied specifically to that VLAN or subnet that restrict DCOM/RPC traffic

4. Domain controller reachability: If devices in that subnet authenticate through a specific DC, confirm the MID Server can reach that DC. WMI authentication delays can cause probe timeouts

5. WMI service health: Log in to a device in the problematic subnet and check if WMI is responding normally: wmic computersystem get name

Resolution options:
- Work with network team to open DCOM dynamic ports for that subnet
- Configure WMI to use a fixed port range (restrict dynamic ports to a narrower range via Registry)
- Increase WMI probe timeout in Discovery MID Server properties
- Consider placing a MID Server directly in that subnet if network segmentation is the root cause

Discovery is creating CI records correctly but all relationship data is missing. What would you investigate?

CI creation without relationships means the Exploration phase is collecting infrastructure data but the relationship-mapping step is not completing.

Investigation checklist:

1. Is Vertical Discovery configured? Application-layer relationships require Vertical/Application Discovery to be enabled and scheduled. Horizontal Discovery alone only creates basic CI records with minimal relationships.

2. Are the related CIs in CMDB? Relationships can only be created between CIs that both exist in CMDB. If one end of the relationship hasn't been discovered yet, the relationship is skipped. For example, a web-to-database relationship requires both the web server and database server to be in CMDB.

3. Check Exploration phase logs: Open the per-device Discovery Log and look for the relationship-mapping step in the pattern execution. Did it run? Did it produce output? Any errors?

4. Insufficient credential permissions: Querying active network connections (netstat output, connection tables) often requires elevated permissions. If the Discovery account doesn't have sufficient access, that step will be skipped silently.

5. Check the cmdb_rel_ci table directly: Query for the CI's sys_id as either parent or child. Confirm relationships are genuinely absent - not just hidden by a list view filter.

6. Pattern version: Check if the Discovery Pattern version in use supports relationship mapping for that CI class. Older pattern versions may not include relationship steps.

A new Linux server was provisioned and added to the network, but after three Discovery runs it is still not appearing in CMDB. What is your investigation approach?

Systematic checklist for a single missing device:

1. IP in scope? Confirm the server's IP falls within an active Discovery Schedule's IP range. Check the schedule's Includes list.

2. IP on Exclusion List? Check Discovery > Administration > Exclude IP Addresses. Also check if a CI with that IP has "Excluded from Discovery" checked (though for a new server this is unlikely).

3. Reachability from MID Server: Log onto the MID Server host and run: ping [server-ip] and ssh [username]@[server-ip]. If ping fails, the MID Server can't see the device - firewall or routing issue.

4. SSH port open? telnet [server-ip] 22 from the MID Server host - confirms port 22 is accessible.

5. Credential match: Is there an SSH credential configured that covers this IP range? Is the username/password or SSH key correct for this new server? New servers often use a different default account than the standard credential set.

6. Discovery Schedule timing: Check the Discovery Status records - has the schedule actually run and scanned this IP range since the server was provisioned? Sometimes schedules are weekly and the timing just hasn't aligned.

Fastest fix: Run an on-demand Discovery targeting just this single IP. This generates a standalone Discovery Status record with detailed per-phase logs - it tells you exactly where the failure is within minutes.

Your network team reports that Discovery is causing bandwidth spikes during business hours. What changes do you recommend?

Immediate actions:

1. Reschedule to off-peak hours: Move all Discovery Schedules to run between midnight and 6 AM, or over weekends - the most impactful change

2. Reduce MID Server thread pool: Lower the mid.probe.thread.pool.size property on the MID Server to reduce the number of simultaneous probe connections (at the cost of longer Discovery run times)

3. Stagger schedules: If multiple Discovery Schedules currently start at the same time, offset their start times by 30–60 minutes each

Medium-term optimisations:

4. Use targeted re-discovery: Instead of scanning entire /16 or /24 ranges every run, only re-scan IP ranges where changes are expected. Use CI lists for re-discovery of known devices rather than blind IP sweeps

5. Reduce Shazzam port scan depth: The initial port scanning phase is the most network-intensive. Configure Shazzam to scan only the specific ports needed for your device types, not the full default port list

6. Consider Agent-based Discovery for endpoints: Laptops and workstations generating significant WMI scan traffic can be moved to agent-based ACC discovery, eliminating the network-intensive agentless scan for those devices

7. Reduce Discovery frequency: For stable infrastructure (data centre servers), weekly Discovery is sufficient - daily is unnecessary and doubles the network impact

You need to permanently exclude a group of 50 high-priority production servers from being updated by Discovery, while still allowing Discovery to scan the rest of the subnet. How do you implement this?

For 50 specific servers within a broader subnet, you have several approaches - using both for belt-and-suspenders protection is recommended for production:

Option 1 - IP Exclusion List (prevents scanning entirely):
Navigate to Discovery > Administration > Exclude IP Addresses and add all 50 IP addresses. These IPs will be skipped entirely during the Scanning phase - no probes will be sent to them at all. Best when you want zero Discovery activity against these servers.

Option 2 - CI-level "Excluded from Discovery" flag (prevents CMDB updates but allows scanning):
On each CI record, set the "Excluded from Discovery" checkbox to true. Discovery will scan and classify the device, but the IRE will refuse to update the CI record. Best when you still want Discovery to confirm the device is alive but not modify its CMDB data.

Option 3 - Discovery Range exclusion (cleanest approach for a block of IPs):
If the 50 servers share a common IP sub-range, refine the Discovery Schedule's IP Ranges to exclude that block. For example, change 10.0.1.0/24 to a set of ranges that explicitly omit 10.0.1.100–10.0.1.150.

Recommended approach: Use Option 1 (IP Exclusion List) + Option 2 (CI flag) together. This provides dual protection - even if someone accidentally removes the exclusion list entry, the CI-level flag prevents CMDB updates.

Ananth Ravi 2025-11-02 04:46:58

Can you prepare questions on alert and Event management please

0 Helpfuls

Share a Real Interview Question