Personal tools
You are here: Home Issue tracker zpool scrub scrambles data with 0.6.0-1 on Ubuntu 10.04 Lucid

#81 — zpool scrub scrambles data with 0.6.0-1 on Ubuntu 10.04 Lucid

State Confirmed
Version: 0.6.0
Area Functionality
Issue type Bug
Severity Critical
Submitted by drwatson
Submitted on Jul 28, 2010
Responsible Seth Heeren
Target release:
Return to tracker
Last modified on Jul 30, 2010 by drwatson
Hi all,

thank you very much for your efforts in developing zfs-fuse for linux.

I've just set up a new Ubuntu 10.04 Lucid 64 bit system on a system with 16GB DDR3 RAM and a core i5-650 processor.
I ported my harddisks from my old Ubuntu 9.04 system with zfs-fuse 0.6.0+critical20100301-3 and imported the mirrorred pool to my new system without upgrading the pool.
The pools reside on luks-encrypted devices.
Doing reads and writes works on the old pool as well as the new one.

Until I did a scrub. The more I scrub, the more errors it finds.
My first suspicion was the harddisks. They don't report any actual read/write errors.
Then I created a pool on a brandnew 2tb disk.
I dd-ed from /dev/zero to a large 20G file on that pool, while teeing it through md5. The checksum returned was always identical. Until I did a scrub again on that pool. The more I run scrub, the more checksum errors it finds and trying to md5 the file gives me I/O erros.

root@host:~# zpool status -v zfs_temp01
  pool: zfs_temp01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h14m with 10 errors on Wed Jul 28 12:30:50 2010
config:

    NAME STATE READ WRITE CKSUM
    zfs_temp01 ONLINE 0 0 11
      mapper/lv_host_TEMP01_01_crypt ONLINE 0 0 7
      mapper/lv_host_TEMP01_02_crypt ONLINE 0 0 11
      mapper/lv_host_TEMP01_03_crypt ONLINE 0 0 22
      mapper/lv_host_TEMP01_04_crypt ONLINE 0 0 9
      mapper/lv_host_TEMP01_05_crypt ONLINE 0 0 11

errors: Permanent errors have been detected in the following files:

        /zfs_temp01/loeschmich/leer.zero

Until now I have done a memcheck. I have tried whether the aesni_intel module used by default produces errors. I have created a encrypted 15GB file in ram, reading, writing, reading, checksumming. no errors. I have tried the same on the same logical volume which I used as a zfs test pool before: no errors.
The only thing that remains is that theres a bug in the zpool scrub function.

Since you guys already published 0.6.9 I don't know if this is of any concern to you. But if this bug is real, it might concern Ubuntu Lucid users. I'd be willing to give you any debug output you might want, but I'd also like the data from my pools a.s.a.p. given that it looks like Swiss cheese now, so I will upgrade to 0.6.9 asap and see if that helps.
Steps to reproduce:
install ubuntu lucid 10.04 on a core i5-650 system incl. default zfs-fuse package. run zfs-fuse on an encrypted lvm volume. read and write large amounts of data without problems. then do a zpool scrub and watch the data get scrambled.
Added by Seth Heeren on Jul 28, 2010 09:16 AM
Issue state: unconfirmedopen
Responsible manager: (UNASSIGNED)sgheeren
Thanks for the report. This is very creepy behaviour.

Could you give me some instructions on how to create luks encrypted devices, so I can (try to) reproduce this?

Meanwhile, does 'grep -i zfs /var/log/syslog' contain anything alarmy? For starters, I expect that these cryptvols cannot be sync-ed; expect a fair number of 'WARNING: Failed to flush write cache on device '/tmp/tank_blk/za2'. Data on pool 'tank' may be lost if power fails. No further warnings will be given.' lines for this reason. Anything else would be most interesting.
Added by Seth Heeren on Jul 28, 2010 10:14 AM
I'm sorry I cannot reproduce. I only left lvm out of the picture for now
There is no difference for me whether I use 0.6.0 (0.6.0+critical20100301-2 a.k.a. git 3de9c7c4f) or 0.6.9 (testing branch for 0.7.0).

Even leaving the tests running for a long time, and simultaneously rewriting the data file(s) did not (ex)pose any problems... [these extended tests were done on 0.6.0 as mentioned above)

Do you reckon lvm could be misconfigured/misbehaving?

Here is what I did:

0. created two sparse files in /tmp/

    ls /tmp/tank_blk/* -ltr
    -rw-r--r-- 1 root root 68719476736 2010-07-28 16:23 /tmp/tank_blk/za1
    -rw-r--r-- 1 root root 68719476736 2010-07-28 16:23 /tmp/tank_blk/za2

1. create loop devices from them
    losetup -f /tmp/tank_blk/za1
    losetup -f /tmp/tank_blk/za2

2. create luks containers in them
    cryptsetup luksFormat /dev/loop0
    cryptsetup luksFormat /dev/loop1

3. open them

    cryptsetup luksOpen /dev/loop0 crypt1
    cryptsetup luksOpen /dev/loop1 crypt2

4. create a mirrored pool

    zpool create -f luks1 mirror /dev/mapper/crypt[12]

5. write a 1 gb file

    dd if=/dev/zero bs=1M count=1024 of=/luks1/test.zeroes
    
6. stress test using
  
    while true; do sleep 10; zpool scrub luks1; done&
    watch zpool status -v

99. teardown
    
    zpool destroy luks1
    <reboot in the future>
Added by Seth Heeren on Jul 28, 2010 10:27 AM
Please post results of 'lvs' and possibly 'vgdisplay -v'
so we can rule out any de-activated lvm snapshot/origins ?
Added by drwatson on Jul 28, 2010 02:38 PM
thanks for your response. I hope I'm not wasting your time with an error that affects only my installation.

here the syslog:

Jul 28 09:40:09 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 09:52:59 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 09:53:00 host zfs-fuse: put_nvlist: out of memory 1704 > 1696
Jul 28 09:53:29 host zfs-fuse: enabling fuse big_writes
Jul 28 09:53:29 host zfs-fuse: mount options: fsname=zfs_temp01/deleteme,allow_other,suid,dev,big_writes
Jul 28 10:20:39 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 10:21:02 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 10:21:02 host zfs-fuse: put_nvlist: out of memory 1704 > 1696
Jul 28 11:20:29 host zfs-fuse: caching mechanisms: ARC 1, block cache 1 page cache 1
Jul 28 11:20:29 host zfs-fuse: ARC caching: maximum ARC size: compiled-in default
Jul 28 11:20:29 host zfs-fuse: FUSE caching: attribute timeout 0.000000, entry timeout 0.000000
Jul 28 11:20:29 host zfs-fuse: ARC setup: min ARC size set to 16777216 bytes
Jul 28 11:20:29 host zfs-fuse: ARC setup: max ARC size set to 134217728 bytes
Jul 28 11:20:41 host zfs-fuse: put_nvlist: out of memory 5376 > 4096
Jul 28 11:20:46 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_02_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 11:20:46 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_01_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 11:20:47 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:47 host zfs-fuse: mount options: fsname=zfs_host_vms01,allow_other,suid,dev,big_writes
Jul 28 11:20:49 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:49 host zfs-fuse: mount options: fsname=zfs_host_vms01/banking,allow_other,suid,dev,big_writes
Jul 28 11:20:51 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:51 host zfs-fuse: mount options: fsname=zfs_host_vms01/u104base,allow_other,suid,dev,big_writes
Jul 28 11:20:52 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:52 host zfs-fuse: mount options: fsname=zfs_host_vms01/u904_zfs_testserver,allow_other,suid,dev,big_writes
Jul 28 11:20:55 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:55 host zfs-fuse: mount options: fsname=zfs_host_vms01/u904_zfs_testserver/datastorage,allow_other,suid,dev,big_writes
Jul 28 11:20:57 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:57 host zfs-fuse: mount options: fsname=zfs_host_vms01/ubuntu104,allow_other,suid,dev,big_writes
Jul 28 11:20:59 host zfs-fuse: enabling fuse big_writes
Jul 28 11:20:59 host zfs-fuse: mount options: fsname=zfs_temp01,allow_other,suid,dev,big_writes
Jul 28 11:21:00 host zfs-fuse: enabling fuse big_writes
Jul 28 11:21:00 host zfs-fuse: mount options: fsname=zfs_temp01/deleteme,allow_other,suid,dev,big_writes
Jul 28 11:21:09 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_04_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 11:21:09 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_05_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 11:21:09 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_01_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 11:21:09 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_02_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 11:21:09 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_03_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 11:29:06 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 11:55:40 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/myramdisk'. Data on pool 'ramdisk' may be lost if power fails. No further warnings will be given.
Jul 28 11:55:42 host zfs-fuse: enabling fuse big_writes
Jul 28 11:55:42 host zfs-fuse: mount options: fsname=ramdisk,allow_other,suid,dev,big_writes
Jul 28 11:55:45 host zfs-fuse: put_nvlist: out of memory 6316 > 4096
Jul 28 12:03:42 host zfs-fuse: put_nvlist: out of memory 6316 > 4096
Jul 28 12:13:18 host zfs-fuse: caching mechanisms: ARC 1, block cache 1 page cache 1
Jul 28 12:13:18 host zfs-fuse: ARC caching: maximum ARC size: compiled-in default
Jul 28 12:13:18 host zfs-fuse: FUSE caching: attribute timeout 0.000000, entry timeout 0.000000
Jul 28 12:13:18 host zfs-fuse: ARC setup: min ARC size set to 16777216 bytes
Jul 28 12:13:18 host zfs-fuse: ARC setup: max ARC size set to 134217728 bytes
Jul 28 12:13:26 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 12:13:27 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_02_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:27 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_01_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:28 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:28 host zfs-fuse: mount options: fsname=zfs_host_vms01,allow_other,suid,dev,big_writes
Jul 28 12:13:29 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:29 host zfs-fuse: mount options: fsname=zfs_host_vms01/banking,allow_other,suid,dev,big_writes
Jul 28 12:13:29 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:29 host zfs-fuse: mount options: fsname=zfs_host_vms01/u104base,allow_other,suid,dev,big_writes
Jul 28 12:13:30 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:30 host zfs-fuse: mount options: fsname=zfs_host_vms01/u904_zfs_testserver,allow_other,suid,dev,big_writes
Jul 28 12:13:31 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:31 host zfs-fuse: mount options: fsname=zfs_host_vms01/u904_zfs_testserver/datastorage,allow_other,suid,dev,big_writes
Jul 28 12:13:32 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:32 host zfs-fuse: mount options: fsname=zfs_host_vms01/ubuntu104,allow_other,suid,dev,big_writes
Jul 28 12:13:32 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:32 host zfs-fuse: mount options: fsname=zfs_temp01,allow_other,suid,dev,big_writes
Jul 28 12:13:34 host zfs-fuse: enabling fuse big_writes
Jul 28 12:13:34 host zfs-fuse: mount options: fsname=zfs_temp01/deleteme,allow_other,suid,dev,big_writes
Jul 28 12:13:56 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_04_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:56 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_05_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:56 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_01_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:56 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_03_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 12:13:56 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_02_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 12:15:33 host zfs-fuse: put_nvlist: out of memory 5440 > 4096
Jul 28 14:25:12 host zfs-fuse: initial max_map_count 65530
Jul 28 14:25:12 host zfs-fuse: ARC caching: maximum ARC size: 100 MiB
Jul 28 14:25:12 host zfs-fuse: ARC setup: min ARC size set to 16777216 bytes
Jul 28 14:25:12 host zfs-fuse: ARC setup: max ARC size set to 104857600 bytes
Jul 28 14:25:17 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_02_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:17 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_A_01_crypt'. Data on pool 'zfs_host_vms01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:53 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_05_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:53 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_01_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:53 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_02_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:53 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_03_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 14:25:53 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_04_crypt'. Data on pool 'zfs_temp01' may be lost if power fails. No further warnings will be given.
Jul 28 14:39:36 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/lv_host_TEMP01_01_crypt'. Data on pool 'zfs_temp02' may be lost if power fails. No further warnings will be given.
Jul 28 14:39:36 host zfs-fuse: !created version 23 pool zfs_temp02 using 23
Jul 28 14:46:35 host zfs-fuse: ERROR: buffer modified while frozen!
Jul 28 14:46:54 host zfs-fuse: initial max_map_count 65530
Jul 28 14:46:54 host zfs-fuse: ARC caching: maximum ARC size: 100 MiB
Jul 28 14:46:54 host zfs-fuse: WARNING: /var/run/zfs-fuse.pid already exists; aborting.

between 12 and 14hrs I upgraded to 0.6.9 and created another test pool.
then when I tried to write to it, zfs-fuse crashed.

In my case it doesn't matter whether the pool is on an lvs volume (newly created zfs pools) or on a regular partition (migrated older version zfs pools) (both using luks encryption)

root@host:~# lvs
  LV VG Attr LSize Origin Snap% Move Log Copy% Convert
  lv_chome vg_croot -wi-ao 16.76g
  lv_croot vg_croot -wi-ao 11.18g
  lv_cswap vg_croot -wi-ao 18.62g
  lv_host_A_01 vg_host_A -wi-ao 100.00g
  lv_host_A_02 vg_host_A -wi-ao 100.00g
  lv_host_A_03 vg_host_A -wi-ao 100.00g
  lv_host_A_04 vg_host_A -wi-ao 100.00g
  lv_host_TEMP01_01 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_02 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_03 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_04 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_05 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_06 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_07 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_08 vg_host_TEMP01 -wi-ao 200.00g
  lv_host_TEMP01_09 vg_host_TEMP01 -wi-ao 200.00g




root@host:~# vgdisplay -v
    Finding all volume groups
    Finding volume group "vg_host_TEMP01"
  --- Volume group ---
  VG Name vg_host_TEMP01
  System ID
  Format lvm2
  Metadata Areas 1
  Metadata Sequence No 10
  VG Access read/write
  VG Status resizable
  MAX LV 256
  Cur LV 9
  Open LV 9
  Max PV 256
  Cur PV 1
  Act PV 1
  VG Size 1.82 TiB
  PE Size 4.00 MiB
  Total PE 476931
  Alloc PE / Size 460800 / 1.76 TiB
  Free PE / Size 16131 / 63.01 GiB
  VG UUID DMIKX0-QY7l-LuIE-ymWQ-HJC3-nQbI-Cl5zYJ
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_01
  VG Name vg_host_TEMP01
  LV UUID YZPWdF-rxtQ-d5HQ-1UIk-Govq-OXY2-cxf7dn
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:0
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_02
  VG Name vg_host_TEMP01
  LV UUID zA8Kat-fxNk-Qoer-HuMn-l86P-ipWc-deIfs2
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:1
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_03
  VG Name vg_host_TEMP01
  LV UUID TCWTFC-zNkT-Gn45-jndC-l9ki-tn2D-rXRPnc
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:2
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_04
  VG Name vg_host_TEMP01
  LV UUID 0pHCSc-hLut-3ncs-2Iv2-6ce5-n0mb-Hq0mha
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:3
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_05
  VG Name vg_host_TEMP01
  LV UUID rg9KDY-xT2s-YID2-r7Ai-kqfp-yd3K-WoOW1l
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:4
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_06
  VG Name vg_host_TEMP01
  LV UUID J9Ye1f-6oAB-Rsmv-GA9g-nIp3-pmYq-j0Lg8l
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:5
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_07
  VG Name vg_host_TEMP01
  LV UUID Z3Sojg-WwEF-5myN-F7Y2-DHPM-zjKY-N36uN8
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:6
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_08
  VG Name vg_host_TEMP01
  LV UUID 3BtT42-eOp6-eXQ4-3Q4w-bRIh-2fNM-fL8k5f
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:7
   
  --- Logical volume ---
  LV Name /dev/vg_host_TEMP01/lv_host_TEMP01_09
  VG Name vg_host_TEMP01
  LV UUID 8rGiev-v2bB-MFyc-QbiZ-XpHI-xJm6-2JyYt2
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 200.00 GiB
  Current LE 51200
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:8
   
  --- Physical volumes ---
  PV Name /dev/sdb1
  PV UUID n3pmNT-Vbjs-jO8D-y5Q8-Z0tc-AD7K-asKkIn
  PV Status allocatable
  Total PE / Free PE 476931 / 16131
   
    Finding volume group "vg_croot"
  --- Volume group ---
  VG Name vg_croot
  System ID
  Format lvm2
  Metadata Areas 1
  Metadata Sequence No 4
  VG Access read/write
  VG Status resizable
  MAX LV 0
  Cur LV 3
  Open LV 3
  Max PV 0
  Cur PV 1
  Act PV 1
  VG Size 46.56 GiB
  PE Size 4.00 MiB
  Total PE 11920
  Alloc PE / Size 11920 / 46.56 GiB
  Free PE / Size 0 / 0
  VG UUID h7PszD-E0nb-tyxk-KGBk-XcM9-lad2-W9s2PM
   
  --- Logical volume ---
  LV Name /dev/vg_croot/lv_croot
  VG Name vg_croot
  LV UUID H0rHlD-z6tP-3Rn2-wYk3-dFPt-N7gX-vZJ9o3
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 11.18 GiB
  Current LE 2861
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:14
   
  --- Logical volume ---
  LV Name /dev/vg_croot/lv_cswap
  VG Name vg_croot
  LV UUID p3yC6c-AYp3-DjB8-FMFv-9zsh-uz8P-jR5Ekv
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 18.62 GiB
  Current LE 4768
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:15
   
  --- Logical volume ---
  LV Name /dev/vg_croot/lv_chome
  VG Name vg_croot
  LV UUID wfBFkK-3nfE-lgr3-lavP-k5e7-4Cee-K8wrOx
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 16.76 GiB
  Current LE 4291
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:16
   
  --- Physical volumes ---
  PV Name /dev/mapper/sda5_crypt
  PV UUID LsB8Bh-yUrD-2KUk-Ssbo-d9D8-AOaf-EztObd
  PV Status allocatable
  Total PE / Free PE 11920 / 0
   
    Finding volume group "vg_host_A"
  --- Volume group ---
  VG Name vg_host_A
  System ID
  Format lvm2
  Metadata Areas 1
  Metadata Sequence No 5
  VG Access read/write
  VG Status resizable
  MAX LV 256
  Cur LV 4
  Open LV 4
  Max PV 256
  Cur PV 1
  Act PV 1
  VG Size 405.20 GiB
  PE Size 4.00 MiB
  Total PE 103732
  Alloc PE / Size 102400 / 400.00 GiB
  Free PE / Size 1332 / 5.20 GiB
  VG UUID PXKLly-ILxz-vdXQ-QSeV-Vui2-HVpe-v3Le5D
   
  --- Logical volume ---
  LV Name /dev/vg_host_A/lv_host_A_01
  VG Name vg_host_A
  LV UUID 0oZ7FX-gaey-H7Lu-NrK0-393n-MP34-MdlTfR
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 100.00 GiB
  Current LE 25600
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:9
   
  --- Logical volume ---
  LV Name /dev/vg_host_A/lv_host_A_02
  VG Name vg_host_A
  LV UUID NH03id-al2g-WkTj-Rg1i-yEC7-39sz-oEPC9Z
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 100.00 GiB
  Current LE 25600
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:10
   
  --- Logical volume ---
  LV Name /dev/vg_host_A/lv_host_A_03
  VG Name vg_host_A
  LV UUID 2GM0q2-5LVc-8BrE-zxrE-Jz24-Zohz-Qx1nfB
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 100.00 GiB
  Current LE 25600
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:11
   
  --- Logical volume ---
  LV Name /dev/vg_host_A/lv_host_A_04
  VG Name vg_host_A
  LV UUID UJzVIV-HDED-RCwV-uPyd-z86U-Vzn3-OamA0k
  LV Write Access read/write
  LV Status available
  # open 1
  LV Size 100.00 GiB
  Current LE 25600
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 251:12
   
  --- Physical volumes ---
  PV Name /dev/sda6
  PV UUID UkZbxe-dK7q-z8nf-Ca8e-WxMQ-5lS4-eCcLPw
  PV Status allocatable
  Total PE / Free PE 103732 / 1332



haven't done anything else with lvm except creating the volumes.
no snapshots or anything.
Added by Seth Heeren on Jul 28, 2010 02:46 PM
> between 12 and 14hrs I upgraded to 0.6.9 and created another test pool.
then when I tried to write to it, zfs-fuse crashed.

Was that _after_ it became Swiss cheese? In that case it is irrelevant to my diagnostics. However, you will probably want to make sure you get your data safe as quickly as possible.
Added by drwatson on Jul 28, 2010 03:25 PM
hi, to answer your question: i upgraded after the cheese was grated :-D
after the upgrade to 0.6.9 I created another test pool.
I destroyed the existing zfs_temp01 pool, i dded 20g from /dev/zero and teed it through md5sum. then I md5summed it reading it from the newly created pool. without errors. Then zfs-fuse completely crashed. after a reboot I tried to hash the "zero" file again and now I get this:

root@host:~# zpool status -v zfs_temp02
  pool: zfs_temp02
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

    NAME STATE READ WRITE CKSUM
    zfs_temp02 ONLINE 0 0 2
      mapper/lv_host_TEMP01_01_crypt ONLINE 0 0 4

errors: Permanent errors have been detected in the following files:

        /zfs_temp02/temp/zero.zero
root@host:~#
Added by Seth Heeren on Jul 28, 2010 06:30 PM
Ok, the crash seems the most tangible handle now. By any chance, did that leave a core file? Check ulimit -c (init script?) for more info.

I suppose the core file could be uploaded to the wiki (see notes on zfs-fuse.net/issues). Otherwise, I could set up an ftp location ad hoc

The pool being harmed after a crash corresponds with the fact that disks are not syncing. I don't have to tell you, obviously, that using zfs-fuse on this type of configuration is not recommendable. OTOH the net result is not unlike running with zil_disable==1 (see ZFS Evil Tuning Guide[1])

It will be most interesting to find out what exactly is confusing the scrub. I feel that the same confusion might trigger the crash. This will have to be some essential type of confusion, which I cannot reproduce[2]. That is why I still have (in the back of my head) the thought that perhaps logical volumes are overlapping blocks in some way, or something is wrong with the major/minor numbers of the vdevs as seen by zfs-fuse, to generate a few random long shots.

[1]http://www.solarisinternals.com/[…]/ZFS_Evil_Tuning_Guide
[2]Oh, my tests are on Lucid Ubuntu 10.04.1 LTS, 2.6.32-24-generic-pae i686, 8GB
Added by drwatson on Jul 30, 2010 10:53 AM
Hi,

the scrub problem appeared and continues on lvm as well as non-lvm volumes.
What my zpools have in common is that they all reside on luks encrypted device mapper volumes.
But that is the case on my file server (ubuntu karmic 64 bit with several zpools) and was the case on my previous desktop before this one (running ubuntu jaunty 64 bit) as well.
Both running trouble-free.

Call me paranoid, but working in the healthcare sector I know that even the smallest bit of cleartext can get one into legal trouble if it gets into unauthorized hands.

ulimit -c returns 0 on my system. If there were a coredump I wouldn't yet know where to find such.
So out of frustration about not getting any progress with my new desktop, I have removed all lvm groups except for the one root and swap sit on and once again recreated another mirrored zpool on top of two luks encrypted partitions:
I filled it with data and then ran a srub on it. And this is the result:
root@host:~# zpool status zfs_host_mirror01
  pool: zfs_host_mirror01
 state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
    attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 0h0m, 0.03% done, 5h32m to go
config:

    NAME STATE READ WRITE CKSUM
    zfs_host_mirror01 ONLINE 0 0 0
      mirror-0 ONLINE 0 0 0
        mapper/scsi-SATA_SAMSUNG_HD501LJS0MUJ1PPxxx ONLINE 0 0 24
        mapper/scsi-SATA_SAMSUNG_HD501LJS0MUJ1KPxxx ONLINE 0 0 25

at least without damage to any files......

root@host:~# uptime
 17:46:12 up 17:39, 3 users, load average: 0.99, 1.01, 1.02

syslog since last boot:
root@host:~# cat /var/log/syslog | grep -i zfs
Jul 30 00:43:59 host zfs-fuse: initial max_map_count 65530
Jul 30 00:43:59 host zfs-fuse: ARC caching: maximum ARC size: 100 MiB
Jul 30 00:43:59 host zfs-fuse: ARC setup: min ARC size set to 16777216 bytes
Jul 30 00:43:59 host zfs-fuse: ARC setup: max ARC size set to 104857600 bytes
Jul 30 01:49:21 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/scsi-SATA_SAMSUNG_HD501LJS0MUJ1PP509343'. Data on pool 'zfs_host_mirror01' may be lost if power fails. No further warnings will be given.
Jul 30 01:49:21 host zfs-fuse: WARNING: Failed to flush write cache on device '/dev/mapper/scsi-SATA_SAMSUNG_HD501LJS0MUJ1KP600768'. Data on pool 'zfs_host_mirror01' may be lost if power fails. No further warnings will be given.
Jul 30 01:49:22 host zfs-fuse: !created version 23 pool zfs_host_mirror01 using 23

I read up on http://blogs.sun.com/erickustarz/entry/zil_disable and I assure you hat I haven't "tuned" anything.

Anything else I could do to debug the whole situation?
Thinking about giving up and trying with diff. distrib.