This is a cache of https://docs.okd.io/4.9/virt/node_network/virt-troubleshooting-node-network.html. It is a snapshot of the page at 2024-11-25T00:05:32.977+0000.
Troubleshooting node network configuration - Node networking | Virtualization | OKD 4.9
×

If the node network configuration encounters an issue, the policy is automatically rolled back and the enactments report failure. This includes issues such as:

  • The configuration fails to be applied on the host.

  • The host loses connection to the default gateway.

  • The host loses connection to the API server.

Troubleshooting an incorrect node network configuration policy configuration

You can apply changes to the node network configuration across your entire cluster by applying a node network configuration policy. If you apply an incorrect configuration, you can use the following example to troubleshoot and correct the failed node network policy.

In this example, a Linux bridge policy is applied to an example cluster that has three control plane nodes (master) and three compute (worker) nodes. The policy fails to be applied because it references an incorrect interface. To find the error, investigate the available NMState resources. You can then update the policy with the correct configuration.

Procedure
  1. Create a policy and apply it to your cluster. The following example creates a simple bridge on the ens01 interface:

    apiVersion: nmstate.io/v1beta1
    kind: NodeNetworkConfigurationPolicy
    metadata:
      name: ens01-bridge-testfail
    spec:
      desiredState:
        interfaces:
          - name: br1
            description: Linux bridge with the wrong port
            type: linux-bridge
            state: up
            ipv4:
              dhcp: true
              enabled: true
            bridge:
              options:
                stp:
                  enabled: false
              port:
                - name: ens01
    $ oc apply -f ens01-bridge-testfail.yaml
    Example output
    nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created
  2. Verify the status of the policy by running the following command:

    $ oc get nncp

    The output shows that the policy failed:

    Example output
    NAME                    STATUS
    ens01-bridge-testfail   FailedToConfigure

    However, the policy status alone does not indicate if it failed on all nodes or a subset of nodes.

  3. List the node network configuration enactments to see if the policy was successful on any of the nodes. If the policy failed for only a subset of nodes, it suggests that the problem is with a specific node configuration. If the policy failed on all nodes, it suggests that the problem is with the policy.

    $ oc get nnce

    The output shows that the policy failed on all nodes:

    Example output
    NAME                                   STATUS
    control-plane-1.ens01-bridge-testfail        FailedToConfigure
    control-plane-2.ens01-bridge-testfail        FailedToConfigure
    control-plane-3.ens01-bridge-testfail        FailedToConfigure
    compute-1.ens01-bridge-testfail              FailedToConfigure
    compute-2.ens01-bridge-testfail              FailedToConfigure
    compute-3.ens01-bridge-testfail              FailedToConfigure
  4. View one of the failed enactments and look at the traceback. The following command uses the output tool jsonpath to filter the output:

    $ oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'

    This command returns a large traceback that has been edited for brevity:

    Example output
    error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' ''
    ...
    libnmstate.error.NmstateVerificationError:
    desired
    =======
    ---
    name: br1
    type: linux-bridge
    state: up
    bridge:
      options:
        group-forward-mask: 0
        mac-ageing-time: 300
        multicast-snooping: true
        stp:
          enabled: false
          forward-delay: 15
          hello-time: 2
          max-age: 20
          priority: 32768
      port:
      - name: ens01
    description: Linux bridge with the wrong port
    ipv4:
      address: []
      auto-dns: true
      auto-gateway: true
      auto-routes: true
      dhcp: true
      enabled: true
    ipv6:
      enabled: false
    mac-address: 01-23-45-67-89-AB
    mtu: 1500
    
    current
    =======
    ---
    name: br1
    type: linux-bridge
    state: up
    bridge:
      options:
        group-forward-mask: 0
        mac-ageing-time: 300
        multicast-snooping: true
        stp:
          enabled: false
          forward-delay: 15
          hello-time: 2
          max-age: 20
          priority: 32768
      port: []
    description: Linux bridge with the wrong port
    ipv4:
      address: []
      auto-dns: true
      auto-gateway: true
      auto-routes: true
      dhcp: true
      enabled: true
    ipv6:
      enabled: false
    mac-address: 01-23-45-67-89-AB
    mtu: 1500
    
    difference
    ==========
    --- desired
    +++ current
    @@ -13,8 +13,7 @@
           hello-time: 2
           max-age: 20
           priority: 32768
    -  port:
    -  - name: ens01
    +  port: []
     description: Linux bridge with the wrong port
     ipv4:
       address: []
      line 651, in _assert_interfaces_equal\n    current_state.interfaces[ifname],\nlibnmstate.error.NmstateVerificationError:

    The NmstateVerificationError lists the desired policy configuration, the current configuration of the policy on the node, and the difference highlighting the parameters that do not match. In this example, the port is included in the difference, which suggests that the problem is the port configuration in the policy.

  5. To ensure that the policy is configured properly, view the network configuration for one or all of the nodes by requesting the NodeNetworkState object. The following command returns the network configuration for the control-plane-1 node:

    $ oc get nns control-plane-1 -o yaml

    The output shows that the interface name on the nodes is ens1 but the failed policy incorrectly uses ens01:

    Example output
       - ipv4:
     ...
          name: ens1
          state: up
          type: ethernet
  6. Correct the error by editing the existing policy:

    $ oc edit nncp ens01-bridge-testfail
    ...
              port:
                - name: ens1

    Save the policy to apply the correction.

  7. Check the status of the policy to ensure it updated successfully:

    $ oc get nncp
    Example output
    NAME                    STATUS
    ens01-bridge-testfail   SuccessfullyConfigured

The updated policy is successfully configured on all nodes in the cluster.