Troubleshooting VTEP/VXGW configuration

VTEP deployments have a relatively large number of moving pieces and potential failure points. This guide will focus on troubleshooting MidoNet and the integration with the VTEP. For specifics on the configuration of the logical switch please refer to your vendor’s documentation.

Is the MidoNet API able to connect to the VTEP

After following the procedure to add a VTEP as described in the section called “Adding a VTEP”, the expected output should be as follows:

midonet> vtep add management-ip 192.168.2.10 management-port 6633 tunnel-zone tzone0
vtep0
midonet> vtep list
vtep vtep0 name VTEP-NAME management-ip 192.168.2.10 management-port 6633 tunnel-zone tzone0 connection-state connected

The same output should appear for VTEPs already added to MidoNet.

Note that the state is connected. An error state will indicate that the VTEP’s management IP is unreachable from the MidoNet API.

Is the VTEP well configured?

A typical reason for the VTEP being on error state is a misconfiguration of the VTEP OVDSB instance. You can verify this by executing the following command on the console:

ovsdb-client dump hardware_vtep

Scroll down to the Physical_Switch table, which will look like this:

Physical_Switch table
_uuid                                description         management_ips   name        ports                                  switch_fault_status tunnel_ips
------------------------------------ ------------------- ---------------- ----------- -------------------------------------- ------------------- ------------
3647f020-9ecf-4854-8f75-9011b8c9996a "VTEP-DESCRIPTION"  ["192.168.2.10"] "VTEP-NAME" [698ede89-31f8-4797-a885-1b2dd4c585e3] []                  ["10.0.0.1"]

Verify that an entry exists, and the management_ips and tunnel_ips fields correspond to the physical configuration. The management IP is the one you’ll be using on the vtep add command. The tunnel IP is not relevant at this point, however, MidoNet expects that a value is present in this field.

Is the OVSDB instance running and accessible?

If the MidoNet API shows an ERROR on the VTEP list, and your configuration is correct, you should verify that the OVSDB instance is listening on the same management-port that you’re specifying in the vtep add call.

From the host running the MidoNet REST API (and Coordinator) try to establish a Telnet connection to the VTEP management interface IP and port, assuming these are 192.168.2.10 and 6632:

telnet 192.168.2.10 6632

If the connection is successful, you should see the following output:

Trying 192.168.2.10..
Connected to localhost.
Escape character is '^]'.

This means that we have a TCP socket listening on the right port. We can now verify that the OVSDB is responsive. If this is not the case, then check your switch manual to listen for connections at the selected TCP port.

If the output was correct, then enter the following input into the console:

{"method":"list_dbs","id":"list_dbs","params":[]}

The desired output is:

{"id":"list_dbs","result":["hardware_vtep"],"error":null}

The content of the brackets after result may vary, but we must see a hardware_vtep, indicating that there is a VTEP schema on this instance of the OVSDB. If you fail to see this output, the VTEP likely doesn’t contain a hardware_vtep schema in its OVSDB instance. Refer to your switch documentation for instructions to configure it.

VTEP and bindings are added but no traffic goes through

Verify first that you enabled the VXLAN Gateway service in the MidoNet API. This service is enabled by default and it is required to configure the VTEP and synchronize its state. If the service has been previously disabled, you can enable it using the mn-conf utility tool to set the cluster.vxgw.enabled to true:

$ echo "cluster.vxgw.enabled=true" | mn-conf set -t default

After enabling the service you must restart one or more MidoNet cluster instances.

Verify that the VXLAN Gateway service is started

The VXLAN Gateway service may be enabled in several MidoNet Cluster instances. All of them will coordinate through the Network State Database (NSDB) to elect a leader that will perform all coordination tasks for a particular VTEP. When a MidoNet Cluster instance takes over leadership the following DEBUG message is displayed in the cluster log (/var/log/midonet-cluster/midonet-cluster.log):

DEBUG snatcher-99a629d1-c729-4973-b7e0-44183a5589ff I own Vtep ba2739df-87cf-458f-9ad2-39885cab217d

If another instance is already a leader, all other instances will display the following DEBUG message:

DEBUG snatcher-99a629d1-c729-4973-b7e0-44183a5589ff Vtep ba2739df-87cf-458f-9ad2-39885cab217d already taken by node 45926c1a-1264-4e19-87c9-b6aca33169f2

At least one instance of the MidoNet API should display the positive message indicating that it became the VXLAN Gateway leader for a particular VTEP. This is the instance that should be watched for further log messages.

When deploying more than one MidoNet Cluster instance, different instances may be responsible of different hardware VTEPs at the same time. In each case, the log indicates the VTEPs of which a particular instance has assumed ownership.

If a MidoNet cluster instance is shut-down or fails, the remaining MidoNet Cluster instances will assume the ownership and coordinate any orphaned VTEPs.

Verify that the VXLAN Gateway leader picks up VTEPs and Networks

The VXLAN Gateway service will scan all the Neutron networks in MidoNet’s NSDB and proceed to monitor those that are bound to any VTEPs.

Whenever a Neutron network is bound to a VTEP, the following message will appear in the DEBUG logs. Note that all log messages relevant for a given Neutron network will be tagged with the appropriate UUID:

DEBUG vxgw-vtep-ba2739df-87cf-458f-9ad2-39885cab217d Network 4659a6ab-fcd2-4744-bfbb-6a331164881e updated its bindings

You can filter updates relevant to specific networks by editing:

vi /etc/midonet-cluster/logback.xml

Follow the instructions detailed in this file to enable different processes in the coordinator. For brevity, log messages mentioned below will omit the Network UUID tag.

As mentioned above, you should be seeing a message like the following for each Neutron network:

DEBUG vxgw-vtep-ba2739df-87cf-458f-9ad2-39885cab217d Network 4659a6ab-fcd2-4744-bfbb-6a331164881e is now watched

Failures during this phase typically indicate errors accessing the NSDB, for example:

ERROR vxgw-vtep-ba2739df-87cf-458f-9ad2-39885cab217d VTEP error

The MidoNet controller will WARN in the logs whenever a recoverable error is found, and try to restore connectivity to the NSDB. Non recoverable errors will be marked as ERROR.

If the logs show problems connecting to the NSDB verify that the NSDB is active, and MidoNet Cluster is successfully able to access it.

Verify that the MidoNet coordinator synchronizes MAC with the VTEPs

After fetching Neutron network configuration from the NDSB, the MidoNet Cluster logs should display the following message:

MAC-port table of network 4659a6ab-fcd2-4744-bfbb-6a331164881e is now watched
ARP table of network 4659a6ab-fcd2-4744-bfbb-6a331164881e is now watched
Network 4659a6ab-fcd2-4744-bfbb-6a331164881e is now watched

These indicate that the MidoNet coordinator is monitoring the network’s state, which will be synchronized to the VTEP.

Verify that the MidoNet coordinator connects to the VTEP(s)

The MidoNet coordinator will also bootstrap a process to exchange state among the network, and all VTEPs with port-VLAN pairs bound to it. When the controller detects any port-VLAN pair in a new VTEP, it will show the following message (assuming management IP and management port are 192.168.2.10 and 6632):

New bindings from network 4659a6ab-fcd2-4744-bfbb-6a331164881e: ArrayBuffer((swp1,0))

At this point it will ensure that a connection is stablished to this VTEP’s management IP and that the bindings configured through the MidoNet REST API are correctly reflected in the VTEP. Normal output will look like this (note that they may appear mixed with other messages):

Publishing remote MAC to VTEP: MacLocation{logicalSwitch='mn-4659a6ab-fcd2-4744-bfbb-6a331164881e', mac=unknown-dst, ipAddr='null', tunnelEndpoint=null}

If the coordinator reports any errors connecting to the VTEP it will automatically try to connect, but you should verify that the VTEP is up and accessible.

Following a successful consolidation of state, MidoNet will start the synchronization of MACs and ARP entries:

VTEP table entry added: PhysicalLocatorSet{uuid=115da3cb-8926-42bd-a416-7e006a353b73, locatorIds='Set(be104a88-5b9c-4b61-bf94-6ea3c797f612)'}
VTEP table entry added: PhysicalLocator{uuid=be104a88-5b9c-4b61-bf94-6ea3c797f612, dstIp=172.17.2.10, encapsulation='vxlan_over_ipv4'}
VTEP table entry added: McastMac{uuid=57e5f212-72b6-4cd7-ba24-2eb3791a90c5, logicalSwitch=a8817c74-708f-438d-9a17-02377cbd01ce, mac=unknown-dst, ip=null, locatorSet=115da3cb-8926-42bd-a416-7e006a353b73}
VTEP table entry added: PhysicalPort{uuid=5b90109a-fafc-4d53-a2ae-f79736a5ae9b, name=swp1, description=null bindings=Map(0 -> a8817c74-708f-438d-9a17-02377cbd01ce) stats=Map(0 -> ac00d040-deca-45b1-a391-468d45503238) faultStatus=Set()

Connection errors to the VTEP are possible at this point, but should be handled gracefully by the coordinator.

If MidoNet finds a non recoverable error, the following ERROR will be displayed (assuming same management port and id as above):

Error on OVSDB connection status stream

The MidoNet coordinator will ignore this Neutron network until it’s updated again. It should however be able to continue operating with all other configured networks.

Verify that the MidoNet coordinator synchronizes state

If no errors are displayed up to here, edit the logback.xml file mentioned above and enable DEBUG logs in the vxgw processes:

<!-- <logger name="org.midonet.cluster.vxgw" level="DEBUG" /> -->

Remove the <!-- and -→ tags to enable this configuration and restart the MidoNet Cluster logs to begin showing DEBUG messages. Choose TRACE instead of DEBUG for more exhaustive information (none will be too verbose to have a significant performance impact).

Messages like the following show that the MidoNet coordinator is successfully exchanging MACs among Midonet and VTEPs.

TRACE vtep-172_17_2_10:6632 MAC remote tables updated successfully

Verify that VXLAN tunnels are being established

If the coordinator is working normally, but traffic is still not flowing, you should verify that the VTEPs and MidoNet hosts are able to stablish VXLAN tunnels successfully.

While keeping a ping active from the VM to the server you’re trying to communicate with, log in to the MidoNet compute hosting the VM that you’re trying to communicate with a server on the VTEP. Run the following command:

tcpdump -leni any port 4789

Assuming that the MidoNet compute is 192.168.2.14, and the VTEP’s tunnel IP is 192.168.2.17, the output should be similar to this (depending on your tcpdump version):

15:51:28.183233 Out fa:16:3e:df:b7:53 ethertype IPv4 (0x0800), length 94: 192.168.2.14.39547 > 192.168.2.17.4789: VXLAN, flags [I] (0x08), vni 10012
aa:aa:aa:aa:aa:aa > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.0.0.1 tell 10.0.0.10, length 28
15:51:28.186891  In fa:16:3e:52:d8:f3 ethertype IPv4 (0x0800), length 94: 192.168.2.17.59630 > 192.168.2.13.4789: VXLAN, flags [I] (0x08), vni 10012
cc:dd:ee:ee:ee:ff > aa:aa:aa:aa:aa:aa, ethertype ARP (0x0806), length 42: Reply 10.0.0.10 is-at cc:dd:ee:ee:ee:ff

The first line shows that the MidoNet Agent (192.168.2.14) is emitting a tunnelled packet towards the VTEP (192.168.2.17:4789), using 10012 as VNID. The encapsulated packet is shown on the second line, and corresponds to an ARP REQUEST from a VM with ip 10.0.0.10 regarding server 10.0.0.1.

In this example, the VTEP is responding correctly on the third line, showing a return packet with the same VNID.

The same example can be applied in reverse on the VTEP. A ping from the physical server connected to the VTEP should generate a tunnelled packet towards a MidoNet Agent, and receive similar return packets.

The MidoNet agent is not emitting traffic

Verify the VXLAN-related options in mn-conf(1). Examine the MidoNet Agent logs in debug mode and look for simulations on the Neutron network that might be dropping packets, or throwing errors on the simulation.

The VTEP is not emitting traffic on the tunnel

Ensure that the VTEP configuration reflects the bindings configured throught the MidoNet REST API. Use the following command to list the VTEPs present in the switch:

vtep-ctl list-ls

This will display all Logical Switches present in the switch. If you bound a Neutron network with UUID c68fa502-62e5-4b33-9f2f-d5d0257deb4f, then you should see the following item in the list:

mn-c68fa502-62e5-4b33-9f2f-d5d0257deb4f

Now list the bindings on the port that you used to create the port-vlan binding in the midonet-cli. Let’s assume we have swp1, and created a binding with swp1 and VLAN 93. The expected output would be:

vtep-ctl list-bindings <vtep-name> swp1
0093 mn-c68fa502-62e5-4b33-9f2f-d5d0257deb4f

You can find out the <vtep-name> using the vtep-ctl list-ps command.

If any of these outputs is not as expected, the MidoNet coordinator is most likely not being able to consolidate the configuration from the NSDB. Verify the MidoNet API logs and locate the relevant errors in order to correct them.

Verify that MACs are being synchronized correctly to the VTEP

Finally, you can list the local and remote MACs present in the VTEP’s database:

vtep-ctl list-local-macs mn-c68fa502-62e5-4b33-9f2f-d5d0257deb4f

This should show all the MACs learned by the VTEP from traffic observed on local ports. If the local server is correctly configured, you will typically see the server’s MAC here.

The following command will display the remote MACs:

vtep-ctl list-remote-macs mn-c68fa502-62e5-4b33-9f2f-d5d0257deb4f

The list here will show MACs in MidoNet VMs or other VTEPs, which are injected by the MidoNet coordinator.

If any of these steps don’t show an expected output, the synchronization processes may be failing. Inspect the MidoNet API logs for more details.

Questions? Discuss on Mailing Lists or Chat.
Found an error? Report a bug.


loading table of contents...