Personal tools
You are here: Home Issue tracker Pool created with OpenSolaris not seen by zfs-fuse

#50 — Pool created with OpenSolaris not seen by zfs-fuse

State In progress
Version: 0.6.9
Area Functionality
Issue type Bug
Severity Medium
Submitted by (anonymous)
Submitted on May 30, 2010
Responsible Seth Heeren
Target release: 0.7.0
Return to tracker
Last modified on Sep 19, 2010 by Seth Heeren
This is essentially a ticket for the thread I previously opened, since I cannot see anything that I'm doing wrong and the behavior of zfs-fuse is not as expected: http://groups.google.com/group/zfs-fuse/t/5ecd9cbc48c691bd

Summary (guest/host refers to VirtualBox):
1. A pool created with zfs-fuse 0.6.9 with -o version=22 is visible and usable in an OpenSolaris dev guest.
2. A pool created in an OpenSolaris dev guest (also version=22) is NOT visible for zfs-fuse in the host, even though the guest can find and use it without problems.
3. The partition table presented by fdisk in host looks different in both cases: in case (1) it contains a partition with type EE, in case (2) it appears empty.
Steps to reproduce:
1. Set up the environment: virtualbox-3.2, Linux host with zfs-fuse 'testing' or 0.6.9, OpenSolaris guest upgraded to dev (pool version 22).
2. Create a dm-crypt device over a partition (probably any block device would do): cryptsetup -y blue /dev/sdb2
3. Register the device as a virtual disk with VirtualBox: VBoxManage internalcommands createrawvmdk -filename ~/.VirtualBox/HardDisks/blue.vmdk -rawdisk /dev/sdb2 -register
4. Attach the dm-crypt device as disk to the guest VM under 'Storage' in settings (create a SATA controller if needed).
5. Boot guest VM.
6. Find the disk name with iostat -En
7. In guest: zpool create blue <disk device name>; zpool export blue
8. Shut down guest.
9. Attempt to "zpool import blue" in the host - the pool is not seen.
Added by (anonymous) on May 30, 2010 11:55 AM
On hold

Confirmed as an 'issue', only because of the potential to be 'unexpected'.

It mostly seems to come down to the fact that kernel does not recognize/support the EFI disklabel to the same extent as other disk labelling systems.

I won't actually mark this issue 'On Hold' so if there are any takers, they can make themselves known via this page.
Added by (anonymous) on May 30, 2010 12:10 PM
Does it mean that the basic scenario of reading a zfs file system created in OpenSolaris is impossible now? If so, I'd consider it quite a serious portability flaw. If not, do known workarounds exist?
Added by Seth Heeren on May 30, 2010 03:23 PM
Responsible manager: (UNASSIGNED)sgheeren
Woho calm down, big boy!

First off: Rationally, even if things were as 'bad' as you represent them, this would hardly change proper course of action. We cannot bend iron bare-handed. Even if we could, our time might be well-spent otherwise.

Now the thing is, I've never _once_ seen the issue you describe in my entire stretch of using Solaris. This means I can see root pools and pools created on the Solaris side (afaict using just 'whole disks', as this were in case brand-new disks without any (known) formatting applied), just supplying the 'name-thingies' reported by format (I still donot have the foggiest why solaris's device/parition/slice naming scheme is so... complicated?).

So I feel your pain; In fact I rate this issue 'Confirmed' on premises that I trust you're not simply (accidentally?) lying about the reported facts.

I think this phenomenon is interesting enough to warrant research. I was simply stating, based on the input on the user group, I imagine that the proper solution should come from upstream (most likely, linux).

I'd personally hate it every bit as you when I had meticulously created my pool on Solaris, only to find out that, due to some happenstance I wasn't allowed to use it from Linux. That would suck. (Un?)fortunately, that never happened _to me_ yet (other than Solaris not booting since I had imported the root pool from linux).
Added by (anonymous) on May 30, 2010 03:50 PM
Thanks for the clarification - I just wanted to know the scope of the problem and hoped for you to dispel my "accusation".

I'm trying to paint myself a picture of zfs-fuse, zfs, (later) btrfs, different "solutions" available on top of them, different options for NAS/off-site storage on the market etc. so that I can (1) use all this information to improve my own work process and (2) recommend and deploy these technologies for clients (with warranty). The big fear of a "solution provider" is betting on the wrong horse and then (when things go really bad) compensate the client from own pocket (be it through fines or be it simply through extra unscheduled/unpaid work).

You've done a splendid job addressing all the issues and questions that popped up on my side so far! I'm also aware of how/why open source projects work or fail, being involved in one myself in the role of a lazy (previously much more enthusiastic) maintainer.
Added by Seth Heeren on May 30, 2010 03:58 PM
Needless to say, I'm currently 30 minutes removed from booting OpenSolaris into a virtual box to play out your suggested scenario :)
Added by (anonymous) on May 30, 2010 06:47 PM
Confirm that the pool is not importable in linux after using Solaris (I replaced 'cryptsetup create -y' with 'losetup -f /tmp/nocrypt.img'):

used format(1) and 'iostat -En' to locate device: c4d1; creating a pool using 'zpool create blue c4d1' makes the pool invisible on linux.

After much (_much_) tinkering with

SolBox: prtvtoc, format, 'devfsadm -C -v; devfsadm -i ata -v' and scrubs
Linux: gptsync, gdisk, 'gksu gparted /dev/loop0'

I found out that there was no way I could slap some kind of (fake) MBR table on the disk and be happy with it. Not even when using gparted to create a SUN disk label _before_ creating the pool worked.

I subsequently found out that using

zpool create blue c4d1p0

works like a charm (BINGO). This works on a freshly nulled device (dd if=/dev/zero of=/dev/loop0). Everbody happy? (PS. Don't try '-f c4d1s2' because on Solaris it will deadlock the ZFS subsystem. Nice...)

We should probably put up a FAQ/Recommendations section on the zfs-fuse.net site

External references:
http://groups.google.com/[…]/d6768766f8a6d7bf
http://opensolaris.org/jive/thread.jspa?threadID=113079 (zfs-discuss, espec. Paul Archer's reactions)
http://groups.google.com/[…]/997fa94a05b654bb (Authoritative post by original author Ricardo Correia)
Added by (anonymous) on May 31, 2010 04:52 PM
I have to correct two mistakes I made in my original bug report:
1. (minor) in step 3 of how to reproduce /dev/mapper/blue, not /dev/sdb2 should be used
2. (major) a partition table which contains the EE partition is present in the "invisible" OpenSolaris pool, NOT, as I wrote, in the zfs-fuse pool. Moreover, creating the pool using the p0 device as you suggest will lead to a partition table WITHOUT the EFI disk label.

So the story becomes short:
GPT EFI label present => pool invisible to zfs-fuse
GPT EFI label missing => pool visible to both zfs-fuse and OpenSolaris

I also switched on the kernel option CONFIG_EFI_PARTITION=y, which was previously turned off in the host kernel. Apparently, it didn't change anything.

At http://www.plosquare.com/download/blue-hexdumps.tar.gz I uploaded hexdumps of the first and last 5 MB of /dev/mapper/blue for three different cases:
- good-zfs: a pool created using zfs-fuse
- good-osol: a pool created using OpenSolaris with p0 suffix
- bad-osol: a pool created using OpenSolaris without p0 suffix

One question remains: is the inability to cope with a GPT EFI disk label a problem that can be addressed in zfs-fuse, or is it someone else's play (kernel [module]?)
Added by Seth Heeren on May 31, 2010 05:29 PM
Issue state: unconfirmedresolved
And the solution is:

on the linux host say (assuming /dev/loop0 for the blockdev)

kpartx -a /dev/loop0
zpool import blue

SUCCESS!

So, the answer is: it should be done outside of zfs-fuse. Note that kpartx spews a few messages. The pool is detected anyhow.

Also note, that the import needs to have -d /dev/disk/by-id (which is now the default in latest testing snapshot, and in the upcoming 0.6.9). The device node created looks like

sehe@karmic:~/custom/ZFS$ zpool import -d /dev/disk/by-id/
  pool: blue
    id: 10614952404720438664
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
config:

    blue ONLINE
      disk/by-id/dm-uuid-part1-loop0 ONLINE
Added by Seth Heeren on May 31, 2010 05:42 PM
Mmm that /almost/ works. Something like this needs to be done, without the quirks. I think I might be running into the EFI quirks / incompatibilities that Ricardo was talking about (see links in earlier post).
Added by (anonymous) on May 31, 2010 05:47 PM
Your kpartx -a solution seems to work flawlessly for me. It doesn't output any messages, creates /dev/mapper/blue1 and /dev/mapper/blue9, makes the pool visible, doesn't destroy content. I also booted the guest OpenSolaris again and the pool is still visible ok there. I'm not sure about how permanent the devices created by kpartx are, but it seems to be the right approach.

Now I'm also wondering whether it would also work without my newly enabled kernel option.
Added by Seth Heeren on May 31, 2010 05:53 PM
Well, heed my warning; I'm getting pool corruption on both sides in repeated tests. However, I must say that:

(a) I'm cutting corners by sharing the loop device (only exporting it from the SolBox VM before importing in Linux host and vice versa)

(b) kpartx is giving ominous messages already at kpartx -l /dev/loop0. Might this be lack of GPT/EFI support in my kernel? (unlikely if you ask me)

sehe@lucid:~/custom/ZFS$ sudo kpartx -lv /dev/loop0
GPT:Primary header thinks Alt. header is not at the end of the disk.
GPT:Alternate GPT header not at the end of the disk.
GPT: Use GNU Parted to correct GPT errors.
loop0p1 : 0 2079967 /dev/loop0 256
loop0p9 : 0 16384 /dev/loop0 2080223


I'm running Osol b134 in the VM
Linux lucid 2.6.32-22-generic-pae #33-Ubuntu SMP Wed Apr 28 14:57:29 UTC 2010 i686 GNU/Linux
Added by Seth Heeren on Sep 19, 2010 04:55 PM
Issue state: openin-progress
Target release: None0.7.0
Convert this into a knowledge base item on the site for 0.7.0 release