Jumbo Frame Support In XenServer

IT牛人.117 2015-11-23 09:26

XenServer - In depth investigation series

Background

Jumbo frame is always tricky because there is no standard.

In XenServer environment, jumbo frame is often used for network (storage network) used for IP based Storage traffic. But not all NIC drivers support jumbo frame, unfortunately the NIC driver (kernel module) documentation doesn’t normally mention jumbo frame supportability.

Symptom

Recently I’ve discovered that jumbo frame is NOT supported for Cisco VIC Ethernet NIC driver -enic, to my surprise. I would have thought that all Cisco NICs should support jumbo frame just because it carries the Cisco badge. BUT, it’s not the case.

If one insists on enabling jumbo frame for storage network overenicdriven NICs or bond, kernel panic and random host reboots are expected.

NOTE: Kernel Crash dump generated by XenServer dom0 is different from Linux kernel crash dump generated by kdump (kexec-tools) running on bare metal. Of course, dom0 is the privileged first PV guest on a host.

Crash Dump Analysis

In kernel crash dump generated in/var/crash, we should see the following in xen.log

  PCPU 4 Guest state (DOM0 VCPU4):
	RIP:    e033:[<ffffffff810014aa>] Ring 3
	RFLAGS: 0000000000000246  IOPL0   IF ZF PF

	rax: 0000000000000025   rbx: ffffffff817bc558   rcx: ffffffff810014aa
	rdx: 0000000000000006   rsi: ffff880182c83168   rdi: 0000000000000000
	rbp: ffff880182c83248   rsp: ffff880182c83130   r8:  0000000000000000
	r9:  ffff88016586af00   r10: 0000000000000001   r11: 0000000000000246
	r12: ffffffff81cc3d08   r13: 0000000000000018   r14: 0000000000000001
	r15: 000000000000004a

	guest_table_user: 000000201eb28000
	guest_table: 000000101ef98000
	HW cr3: 000000101ef98000

	ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e02b   cs: e033

	Pause Count: 0, Flags: 0x0 
	Currently running on PCPU4
	Struct vcpu at ffff8300778ea000
	VCPU in kernel mode

	Stack at ffff880182c83130:
	  ......
	  ......
	  ......

	Code:
	   cc cc cc cc cc 51 41 53 b8 25 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc

	Call Trace:
	 [ffffffff810014aa] xen_hypercall_kexec_op+0xa/0x20
	  ffffffff810535c8  panic+0x128/0x260
	  ffffffff81359e79  erst_writer+0x2d9/0x2f0
	  ffffffff8150692e  _raw_spin_unlock_irqrestore+0x1e/0x30
	  ffffffff81053d2f  kmsg_dump+0x9f/0xd0
	  ffffffff81507d0b  oops_end+0xbb/0xf0
	  ffffffff81043dc3  no_context+0x273/0x290
	  ffffffff81043fb4  __bad_area_nosemaphore+0x1d4/0x200
	  ffffffff810440a3  bad_area_nosemaphore+0x13/0x20
	  ffffffff8150a6ee  do_page_fault+0x26e/0x4b0
	  ffffffff8144ec39  dev_hard_start_xmit+0x309/0x4c0
	  ffffffff811578c3  __slab_alloc+0x503/0x520
	  ffffffff8143e586  __alloc_skb+0x96/0x1f0
	  ffffffff8105b017  local_bh_enable+0x27/0xa0
	  ffffffff81507198  page_fault+0x28/0x30
	  ffffffff812d9656  memcpy+0x6/0x110
	  ffffffff8143f593  skb_copy_bits+0x143/0x240
	  ffffffff81486461  ip_fragment+0x621/0x760
	  ffffffff8147a13a  nf_iterate+0x5a/0xa0
	  ffffffff8147a57c  nf_hook_slow+0x7c/0x130
	  ffffffff8147a13a  nf_iterate+0x5a/0xa0
	  ffffffff8147a57c  nf_hook_slow+0x7c/0x130
	  ffffffff8147a57c  nf_hook_slow+0x7c/0x130
	  ffffffff8147a13a  nf_iterate+0x5a/0xa0
	  ffffffff813640f7  xen_send_IPI_one+0x37/0x40
	  ffffffff8147a57c  nf_hook_slow+0x7c/0x130
	  ffffffff8144f7ec  __netif_receive_skb_core+0x56c/0x740
	  ffffffff810d00e7  irq_to_desc+0x17/0x20
	  ffffffff8136343e  info_for_irq+0xe/0x20
	  ffffffff8144fa2b  __netif_receive_skb+0x6b/0x80
	  ffffffff8144fc58  netif_receive_skb+0x78/0x80
	  ffffffff813d4075  xenvif_tx_action+0x1695/0x1810
	  ffffffff81084061  check_preempt_curr+0x41/0x90
	  ffffffff810841e1  ttwu_do_activate+0x51/0x60
	  ffffffff8150692e  _raw_spin_unlock_irqrestore+0x1e/0x30
	  ffffffff810873d4  try_to_wake_up+0x254/0x270
	  ffffffff81087402  default_wake_function+0x12/0x20
	  ffffffff81078406  autoremove_wake_function+0x16/0x40
	  ffffffff8107fe43  __wake_up_common+0x53/0x90
	  ffffffff8150692e  _raw_spin_unlock_irqrestore+0x1e/0x30
	  ffffffff810013aa  xen_hypercall_sched_op+0xa/0x20
	  ffffffff813d6118  xenvif_poll+0x48/0x80
	  ffffffff81450434  net_rx_action+0xc4/0x1f0
	  ffffffff8105b8fa  __do_softirq+0x10a/0x220
	  ffffffff810d3652  handle_edge_irq+0xf2/0x100
	  ffffffff8151031c  call_softirq+0x1c/0x30
	  ffffffff810140a0  do_softirq+0x50/0xa0
	  ffffffff8105b3bd  irq_exit+0x4d/0xa0
	  ffffffff813632b5  xen_evtchn_do_upcall+0x35/0x50
	  ffffffff8151037e  xen_do_hypervisor_callback+0x1e/0xa0

ip_fragment(defined innet/ipv4/ip_output.c), calledip_do_fragment) when IPv4 tried to fragment a large datagram (packet) because it could not be sent in one piece. This indicates that the packet size exceeded 1500 bytes. In other words, jumbo frame was enabled.

ip_do_fragmentthen calledskb_copy_bitsto copy bits from skb (socket buffer) to kernel buffer, during the process,memcpycaused segmentation fault, kernel mm trieddo_page_faultto handle page fault (determine address and the problem then pass it off to the appropriate routine) BUT failed unfortunately.

Based onbad_area_nosemaphoreand_bad_area_nosemaphore(defined inarch/x86/mm/fault.c) it seemed to be in an interrupt, with no user context (or were running in a region with pagefaults disabled), as a result the page fault could not be handled.

Looking deeper intono_context(defined inarch/x86/mm/fault.c), it seemed that kernel tried to access some bad page, triggeredoops_beginandoops_end(defined inarch/x86/kernel/dumpstack.c), do_exit (kernel/exit.c) called.

In dom0.log we saw similar call trace and more information about the Oops.

If you look intokernel/exit.c, we should understand thatBUG()was called. Kernel was not able to handle the paging request error nor recover, finally the running kernel gave up and panicked ;-D

[2637695.833005]  ALERT: BUG: unable to handle kernel paging request at ffff8801705505c0
[2637695.833116]  ALERT: IP: [<ffffffff812d9656>] memcpy+0x6/0x110
[2637695.833183]   WARN: PGD 1a0d067 PUD 187fc2067 PMD 187e3f067 PTE 20
[2637695.833295]   WARN: Oops: 0000 [#1] SMP 
[2637695.833337]   WARN: Modules linked in: tun nfsv3 nfs_acl nfs fscache ebtable_nat arptable_filter arp_tables xt_set ip_set_hash_net ip_set nfnetlink ebt_arp ebt_ip ebtable_filter ebtables xt_physdev bnx2i(O) cnic(O) ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin ses enclosure uio bonding lockd sunrpc 8021q mrp garp bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack ipv6 iptable_filter ip_tables x_tables dm_multipath nls_utf8 isofs video backlight sbs sbshc hed acpi_ipmi ipmi_msghandler nvram hid_generic usbhid hid sg fnic(O) libfcoe libfc scsi_transport_fc scsi_tgt enic(O) wmi tpm_tis tpm tpm_bios lpc_ich mfd_core ehci_pci microcode crc32_pclmul scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua
[2637695.834354]   WARN:  scsi_dh dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp megaraid_sas(O) sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_transport_iscsi]
[2637695.834552]   WARN: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           O 3.10.0+2 #1
[2637695.834625]   WARN: Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.2.0.4b.0.042820150826 04/28/2015
[2637695.834745]   WARN: task: ffff880178ebc530 ti: ffff880178ec2000 task.ti: ffff880178ec2000
[2637695.834816]   WARN: RIP: e030:[<ffffffff812d9656>]  [<ffffffff812d9656>] memcpy+0x6/0x110
[2637695.834915]   WARN: RSP: e02b:ffff880182c834f8  EFLAGS: 00010286
[2637695.834968]   WARN: RAX: ffff88013a62c024 RBX: ffff88014785eb10 RCX: 00000000000005c8
[2637695.835039]   WARN: RDX: 00000000000005c8 RSI: ffff8801705505c0 RDI: ffff88013a62c024
[2637695.835109]   WARN: RBP: ffff880182c83570 R08: ffff880182c96dc0 R09: ffff880119d4df00
[2637695.835190]   WARN: R10: ffffffff81a17260 R11: 000000002c1a1fac R12: 00000000000005dc
[2637695.835258]   WARN: R13: 00000000000005dc R14: 00000000000005c8 R15: 00000000000005c8
[2637695.835329]   WARN: FS:  0000000000000000(0000) GS:ffff880182c80000(0000) knlGS:ffff880182c80000
[2637695.835406]   WARN: CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
[2637695.835480]   WARN: CR2: ffff8801705505c0 CR3: 0000000129589000 CR4: 0000000000002660
[2637695.835580]   WARN: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2637695.835656]   WARN: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2637695.835731]   WARN: Stack:
[2637695.835756]   WARN:  ffffffff8143f593 0000000000000020 ffff880119d4df00 0000000000000050
[2637695.835852]   WARN:  ffff88013a62c024 ffff880119d4d300 00000ba48143e5ba 00000000000005c8
[2637695.835944]   WARN:  ffff880178ec2000 0000000000000002 ffff880119d4df00 0000000000000000
[2637695.836023]   WARN: Call Trace:
[2637695.836050]   WARN:  <IRQ> 
[2637695.836071]   WARN: 
[2637695.836094]   WARN:  [<ffffffff8143f593>] ? skb_copy_bits+0x143/0x240
[2637695.836140]   WARN:  [<ffffffff81486461>] ip_fragment+0x621/0x760
[2637695.836199]   WARN:  [<ffffffffa02d4410>] ? br_forward_finish+0x60/0x60 [bridge]
[2637695.836268]   WARN:  [<ffffffffa02dbaa7>] br_nf_dev_queue_xmit+0x77/0xa0 [bridge]
[2637695.836336]   WARN:  [<ffffffffa02dca79>] br_nf_post_routing+0x279/0x2c0 [bridge]
[2637695.836420]   WARN:  [<ffffffff8147a13a>] nf_iterate+0x5a/0xa0
[2637695.836514]   WARN:  [<ffffffffa02d4410>] ? br_forward_finish+0x60/0x60 [bridge]
[2637695.836621]   WARN:  [<ffffffff8147a57c>] nf_hook_slow+0x7c/0x130
[2637695.836697]   WARN:  [<ffffffffa02d4410>] ? br_forward_finish+0x60/0x60 [bridge]
[2637695.836772]   WARN:  [<ffffffffa02d43f4>] br_forward_finish+0x44/0x60 [bridge]
[2637695.836866]   WARN:  [<ffffffffa02dbd88>] br_nf_forward_finish+0x138/0x150 [bridge]
[2637695.836936]   WARN:  [<ffffffffa02dcf99>] br_nf_forward_ip+0x2e9/0x330 [bridge]
[2637695.837000]   WARN:  [<ffffffff8147a13a>] nf_iterate+0x5a/0xa0
[2637695.837056]   WARN:  [<ffffffffa02addac>] ? udp_packet+0x8c/0xa0 [nf_conntrack]
[2637695.837123]   WARN:  [<ffffffffa02d43b0>] ? deliver_clone+0x60/0x60 [bridge]
[2637695.837185]   WARN:  [<ffffffff8147a57c>] nf_hook_slow+0x7c/0x130
[2637695.837240]   WARN:  [<ffffffffa02d43b0>] ? deliver_clone+0x60/0x60 [bridge]
[2637695.837305]   WARN:  [<ffffffffa02d4581>] __br_forward+0xa1/0xe0 [bridge]
[2637695.837366]   WARN:  [<ffffffffa02d3d1b>] ? br_fdb_update+0x17b/0x290 [bridge]
[2637695.837450]   WARN:  [<ffffffffa02d4635>] br_forward+0x75/0xa0 [bridge]
[2637695.837517]   WARN:  [<ffffffffa02d5e06>] br_handle_frame_finish+0x216/0x310 [bridge]
[2637695.837591]   WARN:  [<ffffffff8147a57c>] ? nf_hook_slow+0x7c/0x130
[2637695.837667]   WARN:  [<ffffffffa02dc0bc>] br_nf_pre_routing_finish+0x31c/0x330 [bridge]
[2637695.837776]   WARN:  [<ffffffffa02dc7ca>] br_nf_pre_routing+0x5ca/0x600 [bridge]
[2637695.837848]   WARN:  [<ffffffff8147a13a>] nf_iterate+0x5a/0xa0
[2637695.837904]   WARN:  [<ffffffff813640f7>] ? xen_send_IPI_one+0x37/0x40
[2637695.837968]   WARN:  [<ffffffffa02d5bf0>] ? br_handle_frame+0x260/0x260 [bridge]
[2637695.838038]   WARN:  [<ffffffff8147a57c>] nf_hook_slow+0x7c/0x130
[2637695.838110]   WARN:  [<ffffffffa02d5bf0>] ? br_handle_frame+0x260/0x260 [bridge]
[2637695.838209]   WARN:  [<ffffffffa02d5bad>] br_handle_frame+0x21d/0x260 [bridge]
[2637695.838294]   WARN:  [<ffffffffa02d5990>] ? br_handle_local_finish+0x60/0x60 [bridge]
[2637695.838380]   WARN:  [<ffffffff8144f7ec>] __netif_receive_skb_core+0x56c/0x740
[2637695.838479]   WARN:  [<ffffffff810d00e7>] ? irq_to_desc+0x17/0x20
[2637695.838538]   WARN:  [<ffffffff8136343e>] ? info_for_irq+0xe/0x20
[2637695.838596]   WARN:  [<ffffffff8144fa2b>] __netif_receive_skb+0x6b/0x80
[2637695.838658]   WARN:  [<ffffffff8144fc58>] netif_receive_skb+0x78/0x80
[2637695.838720]   WARN:  [<ffffffff813d4075>] xenvif_tx_action+0x1695/0x1810
[2637695.838797]   WARN:  [<ffffffff81084061>] ? check_preempt_curr+0x41/0x90
[2637695.838856]   WARN:  [<ffffffff810841e1>] ? ttwu_do_activate+0x51/0x60
[2637695.838916]   WARN:  [<ffffffff8150692e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[2637695.838982]   WARN:  [<ffffffff810873d4>] ? try_to_wake_up+0x254/0x270
[2637695.839040]   WARN:  [<ffffffff81087402>] ? default_wake_function+0x12/0x20
[2637695.839103]   WARN:  [<ffffffff81078406>] ? autoremove_wake_function+0x16/0x40
[2637695.839168]   WARN:  [<ffffffff8107fe43>] ? __wake_up_common+0x53/0x90
[2637695.839226]   WARN:  [<ffffffff8150692e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[2637695.839293]   WARN:  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[2637695.839355]   WARN:  [<ffffffff813d6118>] xenvif_poll+0x48/0x80
[2637695.839407]   WARN:  [<ffffffff81450434>] net_rx_action+0xc4/0x1f0
[2637695.839489]   WARN:  [<ffffffff8105b8fa>] __do_softirq+0x10a/0x220
[2637695.839548]   WARN:  [<ffffffff810d3652>] ? handle_edge_irq+0xf2/0x100
[2637695.839611]   WARN:  [<ffffffff8151031c>] call_softirq+0x1c/0x30
[2637695.839669]   WARN:  [<ffffffff810140a0>] do_softirq+0x50/0xa0
[2637695.839753]   WARN:  [<ffffffff8105b3bd>] irq_exit+0x4d/0xa0
[2637695.839804]   WARN:  [<ffffffff813632b5>] xen_evtchn_do_upcall+0x35/0x50
[2637695.839866]   WARN:  [<ffffffff8151037e>] xen_do_hypervisor_callback+0x1e/0xa0
[2637695.839931]   WARN:  <EOI> 
[2637695.839966]   WARN: 
[2637695.839988]   WARN:  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[2637695.840054]   WARN:  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[2637695.840120]   WARN:  [<ffffffff81009b80>] ? xen_safe_halt+0x10/0x20
[2637695.840178]   WARN:  [<ffffffff8101a735>] ? default_idle+0x65/0xc0
[2637695.840250]   WARN:  [<ffffffff8101a418>] ? arch_cpu_idle+0x18/0x30
[2637695.840327]   WARN:  [<ffffffff8109e626>] ? cpu_startup_entry+0x1a6/0x210
[2637695.840397]   WARN:  [<ffffffff814f75d3>] ? cpu_bringup_and_idle+0x13/0x20
[2637695.840484]   WARN: Code: 25 c7 00 00 00 83 f8 05 0f 94 c0 0f b6 c0 eb 07 0f 1f 44 00 00 31 c0 48 8b 1c 24 4c 8b 64 24 08 c9 c3 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 
[2637695.840855]  ALERT: RIP  [<ffffffff812d9656>] memcpy+0x6/0x110
[2637695.840919]   WARN:  RSP <ffff880182c834f8>
[2637695.840959]   WARN: CR2: ffff8801705505c0
[2637695.841365]   WARN: ---[ end trace 3c032fd26d64546f ]---
[2637695.970156]  EMERG: Kernel panic - not syncing: Fatal exception in interrupt

Conclusion

The conclusion of the investigation is that enic does NOT support jumbo frame, DO NOT use it for storage networks on top of Cisco VIC NICs in XenServer.

I ended up changing the MTU for the storage network back to 1500 to fix the problem. The easy way is to remove the Storage IP, change the storage network MTU (if you don’t remove IP the MTU field is greyed out), reconfigure storage IP afterwards on each host in the pool. Alternatively, use xe command line (xe network-param-set uuid= MTU=1500) to change MTU for the network, and then unplug / plug the corresponding underlying PIFs are required, obviously more complicated process, your choice.

IMPORTANT: Broadcom NetXtreme II driver - bnx2x, you may know that jumbo frame can be enabled for bnx2x with GRO on back in XenServer 6.2 SP1 as per [CTX200270](http://support.citrix.com/article/CTX200270) (Yes, I wrote it...). It is NOT the case any more. This has changed, probably due the fact that the bnx2x driver keeps evolving.

The following Linux NIC drivers are known to support jumbo frame (some with conditions)

  • igb

  • ixgbe

  • e1000 (some cards may be affected due to errata)

  • e1000e (cards older than 82571 are affected)

  • bnx2 (not bnx2x)

  • be2net

  • bna

  • cxgb4

[返回] [原文链接]