Kernel not compatible with zookeeper version

Morning,

It’s important to share this situation with you. This morning i came to the office to see that a cluster that was upgraded/restarted had an issue with Zookeeper instances.

Symptoms¬† were clear: instances won’t start completely. But why?

After a little bit of investigation, i went to the /var/log/syslog (/var/log/zookeeper did not contain any information at all) to see that there is a bad page table in the jvm.

Java version is:

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

So, the log showed following lines:

Aug 16 07:16:04 kafka0 kernel: [  742.349010] init: zookeeper main process ended, respawning
Aug 16 07:16:04 kafka0 kernel: [  742.925427] java: Corrupted page table at address 7f6a81e5d100
Aug 16 07:16:05 kafka0 kernel: [  742.926589] PGD 80000000373f4067 PUD b7852067 PMD b1c08067 PTE 80003ffffe17c225
Aug 16 07:16:05 kafka0 kernel: [  742.928011] Bad pagetable: 000d [#1643] SMP 
Aug 16 07:16:05 kafka0 kernel: [  742.928011] Modules linked in: dm_crypt serio_raw isofs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy

Why should the JVM throw a memory error? The main reason is incompatibility with kernel version.

Let’s take a look in the GRUB config file.

Looks like we are using for boot:

menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-baf292e5-0bb6-4e58-8a71-5b912e0f09b6' {
	recordfail
	load_video
	gfxmode $linux_gfx_mode
	insmod gzio
	insmod part_msdos
	insmod ext2
	if [ x$feature_platform_search_hint = xy ]; then
	  search --no-floppy --fs-uuid --set=root  baf292e5-0bb6-4e58-8a71-5b912e0f09b6
	else
	  search --no-floppy --fs-uuid --set=root baf292e5-0bb6-4e58-8a71-5b912e0f09b6
	fi
	linux	/boot/vmlinuz-3.13.0-155-generic root=UUID=baf292e5-0bb6-4e58-8a71-5b912e0f09b6 ro  console=tty1 console=ttyS0
	initrd	/boot/initrd.img-3.13.0-155-generic

There was also an older version of kernel image available 3.13.0-153.

Short fix for this is to update the grub.cfg file with the old version and reboot the server.

Good fix is still in progress. Will post as soon as i have it.

P.S: I forgot to mention the Zookeeper version:

Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT

P.S 2: It seems that the issue is related with the java processes in general not only zookeeper

Cheers