#54 — internal error: failed to initialize ZFS library
| State | In progress |
|---|---|
| Version: | 0.6.9 |
| Area | User interface |
| Issue type | Bug |
| Severity | Medium |
| Submitted by | (anonymous) |
| Submitted on | Jun 06, 2010 |
| Responsible | Seth Heeren |
| Target release: | 0.7.0 |
Last modified on
Sep 19, 2010
by
Seth Heeren
This happens every time while booting my machine. The zfs-fuse daemon doesn't come up properly:
Starting zfs-fuse: zfs-fuse.
Mounting ZFS filesystems...connect: No such file or directory
Please make sure that the zfs-fuse daemon is running.
internal error: failed to initialize ZFS library
I see the zfs-fuse process is running, however zfs commands don't work. I have to restart it (after the machine has booted), then it works properly.
Perhaps there is some dependency that the init script (Ubuntu from contrib) does not take into account? I see the Debian installer symlinked it as /etc/rcS.d/S38zfs-fuse - too early?
There are no error messages in /var/log/messages at all, and the usual messages about ARC caching and ARC setup are missing too.
Starting zfs-fuse: zfs-fuse.
Mounting ZFS filesystems...connect: No such file or directory
Please make sure that the zfs-fuse daemon is running.
internal error: failed to initialize ZFS library
I see the zfs-fuse process is running, however zfs commands don't work. I have to restart it (after the machine has booted), then it works properly.
Perhaps there is some dependency that the init script (Ubuntu from contrib) does not take into account? I see the Debian installer symlinked it as /etc/rcS.d/S38zfs-fuse - too early?
There are no error messages in /var/log/messages at all, and the usual messages about ARC caching and ARC setup are missing too.
Added by
Seth Heeren
on
Jun 06, 2010 06:14 AM
Issue state:
unconfirmed → open
Responsible manager:
(UNASSIGNED) → sgheeren
The debian installer symlinked it?
This suggests you are not compiling from source. Are you using a package? If so, which one
Can you confirm the version used is 0.6.9?
If so, could you retest with the previous version of the initscript just to make sure we didn't introduce a silly error in the latest changes? I suggest this version:
http://gitweb.zfs-fuse.net/[…]/zfs-fuse.initd.ubuntu;hb=0.6.9_beta3
If all that fails, please try to launch the daemon interactively to spot any errors
sudo zfs-fuse -n; echo exitcode $?
Normally, that should not exit until you interrupt it (Ctrl-C, e.g.) and the exitcode should in that case be 1.
This suggests you are not compiling from source. Are you using a package? If so, which one
Can you confirm the version used is 0.6.9?
If so, could you retest with the previous version of the initscript just to make sure we didn't introduce a silly error in the latest changes? I suggest this version:
http://gitweb.zfs-fuse.net/[…]/zfs-fuse.initd.ubuntu;hb=0.6.9_beta3
If all that fails, please try to launch the daemon interactively to spot any errors
sudo zfs-fuse -n; echo exitcode $?
Normally, that should not exit until you interrupt it (Ctrl-C, e.g.) and the exitcode should in that case be 1.
Added by
Seth Heeren
on
Jun 06, 2010 06:16 AM
PS are you sure you have sufficient permission to access the zfs daemon (as in, try sudo zfs?). I assume you do because you mention it can work a second time around...
Added by
(anonymous)
on
Jun 06, 2010 06:19 AM
Sorry, the version in use is the one pulled from the 'testing' branch.
I first used the Debian one (0.6.0+critical20100301-1), hence the symlinking, but then replaced the script with the Ubuntu one from contrib directory and updated paths to point to the zfs-fuse 'testing' branch.
I am going to update the script to launch interactively for the next boot and see what happens then.
I first used the Debian one (0.6.0+critical20100301-1), hence the symlinking, but then replaced the script with the Ubuntu one from contrib directory and updated paths to point to the zfs-fuse 'testing' branch.
I am going to update the script to launch interactively for the next boot and see what happens then.
Added by
(anonymous)
on
Jun 06, 2010 06:21 AM
To clarify: it always works the second time around (when I kill zfs-fuse after the boot process completes and use the very same init script to start it up). It just gets "stuck" during the normal boot sequence for some reason.
Added by
(anonymous)
on
Jun 06, 2010 06:28 AM
BTW (unrelated to this issue), I see the following lines in the script:
sleep 2 # allow zfs-fuse time to start up
zpool import -a -f
This is ugly, slowing down the boot process by a fixed 2 seconds. Is there no better way to wait?
sleep 2 # allow zfs-fuse time to start up
zpool import -a -f
This is ugly, slowing down the boot process by a fixed 2 seconds. Is there no better way to wait?
Added by
Seth Heeren
on
Jun 06, 2010 07:46 AM
Ok, no offense but I get the distinct impression you are mixing things up more than is good for you (or me, for that matter) :)
> the version in use is the one pulled from the 'testing' branch
That only begs the question: which version/label? You might want to 'git log -1' or 'git describe --tags HEAD' (the latter should result in something similar to 0.6.9_beta3-2-g0469903 for a 'recent' revision on testing)
> it always works the second time around
Seeing that you mix initscripts and testing versions, it is likely that you use the enhanced init script with an 'unenhanced' version of the code :_
> BTW (unrelated to this issue)
Erm, wrong?
> This is ugly
agreed
> ...., slowing down the boot process by a fixed 2 seconds
Not on debian/ubuntu (upstart parallelizes most jobs anyway)
> ...., Is there no better way to wait?
Well, yes!? Note that the sleep does _NOT_ exist in 0.6.9_beta4 or 0.6.9? This is precisely because we fixed that ugliness (see issue #36).
You need to have 0.6.9 (released, see http://zfs-fuse.net/releases/0.6.9) or latest testing (specificly anything beyond abbdc47
> the version in use is the one pulled from the 'testing' branch
That only begs the question: which version/label? You might want to 'git log -1' or 'git describe --tags HEAD' (the latter should result in something similar to 0.6.9_beta3-2-g0469903 for a 'recent' revision on testing)
> it always works the second time around
Seeing that you mix initscripts and testing versions, it is likely that you use the enhanced init script with an 'unenhanced' version of the code :_
> BTW (unrelated to this issue)
Erm, wrong?
> This is ugly
agreed
> ...., slowing down the boot process by a fixed 2 seconds
Not on debian/ubuntu (upstart parallelizes most jobs anyway)
> ...., Is there no better way to wait?
Well, yes!? Note that the sleep does _NOT_ exist in 0.6.9_beta4 or 0.6.9? This is precisely because we fixed that ugliness (see issue #36).
You need to have 0.6.9 (released, see http://zfs-fuse.net/releases/0.6.9) or latest testing (specificly anything beyond abbdc47
Added by
Seth Heeren
on
Jun 06, 2010 07:54 AM
BTW,
is your /var, /usr or /run mounted on something special?
Please post /etc/mtab *after* complete system initialization
is your /var, /usr or /run mounted on something special?
Please post /etc/mtab *after* complete system initialization
Added by
(anonymous)
on
Jun 06, 2010 07:55 AM
The sleep line came from the link you posted above:
http://gitweb.zfs-fuse.net/[…]/zfs-fuse.initd.ubuntu;hb=0.6.9_beta3
My zfs-fuse version is:
commit 3c64b738517f9a7c68e77fa7b2714c6278d4a9d2
Author: Seth Heeren <sgheeren@hotmail.com>
Date: Thu Jun 3 22:33:51 2010 +0200
Now, after you last remark I switched (back) to the Ubuntu init script from contrib, from the release indicated above (not that it is so much different).
http://gitweb.zfs-fuse.net/[…]/zfs-fuse.initd.ubuntu;hb=0.6.9_beta3
My zfs-fuse version is:
commit 3c64b738517f9a7c68e77fa7b2714c6278d4a9d2
Author: Seth Heeren <sgheeren@hotmail.com>
Date: Thu Jun 3 22:33:51 2010 +0200
Now, after you last remark I switched (back) to the Ubuntu init script from contrib, from the release indicated above (not that it is so much different).
Added by
(anonymous)
on
Jun 06, 2010 08:01 AM
/var, /usr, /run are not mount points for anything.
I have a symlink /usr -> /mnt/data/usr, and /mnt/data is a mount point for a dm-crypt device /dev/mapper/data (specified in /etc/crypttab and /etc/fstab). This is already mounted at the time the script is being run during boot, otherwise it would not be able to find and start the /usr/local/sbin/zfs-fuse process.
I have a symlink /usr -> /mnt/data/usr, and /mnt/data is a mount point for a dm-crypt device /dev/mapper/data (specified in /etc/crypttab and /etc/fstab). This is already mounted at the time the script is being run during boot, otherwise it would not be able to find and start the /usr/local/sbin/zfs-fuse process.
Added by
Seth Heeren
on
Jun 06, 2010 08:02 AM
Issue state:
open → in-progress
Severity:
Medium → Important
Ok, that confirms this is a bug/problem on your particular setup
Thanks for confirming the same problem happens when using the older init script. I confirm that the sleep is unneeded with you reported version.
To narrow down to the cause, please supply as much of the following info as you can:
Your linux distro (preferrably steps to install/install media used)
$ lsmod | grep fuse
$ fusermount -V
$ lsb_release -a
$ uname -a
$ cat /etc/mtab # (after full boot + logon)
Anything security specific (AppArmor custom settings, SELinux?)
Thanks in advance
Thanks for confirming the same problem happens when using the older init script. I confirm that the sleep is unneeded with you reported version.
To narrow down to the cause, please supply as much of the following info as you can:
Your linux distro (preferrably steps to install/install media used)
$ lsmod | grep fuse
$ fusermount -V
$ lsb_release -a
$ uname -a
$ cat /etc/mtab # (after full boot + logon)
Anything security specific (AppArmor custom settings, SELinux?)
Thanks in advance
Added by
Seth Heeren
on
Jun 06, 2010 08:04 AM
Oh and perhaps attach output of
egrep 'zfs|fuse' /var/log/syslog
(mountpoint info received: not a problem there)
egrep 'zfs|fuse' /var/log/syslog
(mountpoint info received: not a problem there)
Added by
Seth Heeren
on
Jun 06, 2010 08:26 AM
Oops, /run needed to be /var/run, of course. It is not so uncommon that /var/run gets remounted on tmpfs (when using SSD and on netbooks, e.g.)
So if /var/run gets remounted _after_ starting zfs-fuse, the socket interface will be inaccessible (/var/run/zfs/zfs_socket)
So if /var/run gets remounted _after_ starting zfs-fuse, the socket interface will be inaccessible (/var/run/zfs/zfs_socket)
Added by
Seth Heeren
on
Jun 06, 2010 09:23 AM
Severity:
Important → Medium
Ok,
TLDR: I notice that the runlevels get configured differently from your report. You may want to 'sudo update-rc.d zfs-fuse defaults'
Longer version:
I tried to mimick your (probable) setup by installing on a bare minimum Debian Squeeze system, building 'testing' from source:
sudo apt-get install libaio-dev libattr1-dev libacl1-dev libz-dev libz-dev libfuse-dev libfuse2 scons libssl-dev build-essential git-core
git clone http://git.zfs-fuse.net/official -b testing
cd official
git checkout 3c64b738517f
(cd src && scons debug=2 install)
cp contrib/zfs-fuse.initd.ubuntu /etc/init.d/zfs-fuse
update-rc.d zfs-fuse defaults
I then create a pool (issue54.sh attached)
And reboot
Pool comes up ok
I notice that the runlevels get configured differently from your report. You may want to 'sudo update-rc.d zfs-fuse defaults' to see i it fixes the boot sequence for your setup (output on my test system:)
update-rc.d: using dependency based boot sequencing
update-rc.d: warning: zfs-fuse start runlevel arguments (2 3 4 5) do not match LSB Default-Start values (S)
update-rc.d: warning: zfs-fuse stop runlevel arguments (0 1 6) do not match LSB Default-Stop values (0 6)
TLDR: I notice that the runlevels get configured differently from your report. You may want to 'sudo update-rc.d zfs-fuse defaults'
Longer version:
I tried to mimick your (probable) setup by installing on a bare minimum Debian Squeeze system, building 'testing' from source:
sudo apt-get install libaio-dev libattr1-dev libacl1-dev libz-dev libz-dev libfuse-dev libfuse2 scons libssl-dev build-essential git-core
git clone http://git.zfs-fuse.net/official -b testing
cd official
git checkout 3c64b738517f
(cd src && scons debug=2 install)
cp contrib/zfs-fuse.initd.ubuntu /etc/init.d/zfs-fuse
update-rc.d zfs-fuse defaults
I then create a pool (issue54.sh attached)
And reboot
Pool comes up ok
I notice that the runlevels get configured differently from your report. You may want to 'sudo update-rc.d zfs-fuse defaults' to see i it fixes the boot sequence for your setup (output on my test system:)
update-rc.d: using dependency based boot sequencing
update-rc.d: warning: zfs-fuse start runlevel arguments (2 3 4 5) do not match LSB Default-Start values (S)
update-rc.d: warning: zfs-fuse stop runlevel arguments (0 1 6) do not match LSB Default-Stop values (0 6)
Added by
(anonymous)
on
Jun 06, 2010 09:30 AM
Thanks for the hints. I noticed that my sysv-rc is not up-to-date, not using dependency-based booting (this is a Debian installation that has been in use for >10 years, not a fresh setup). I will upgrade sysv-rc and then report back whether it helped.
Added by
Seth Heeren
on
Jun 06, 2010 09:33 AM
I'll keep the issue opened until confirmation/feedback. This will be a good item for FAQ entry, I think!
Added by
Seth Heeren
on
Jun 06, 2010 04:35 PM
Issue state:
in-progress → postponed
For Your Information:
In the interest of suppling a workaround on the FAQ, I rebuilt a Lenny box (bare minimum) and repeated the steps (slightly modified due to different versions). I could not reproduce the behaviour. So, at the moment no stuff to make a workaround or FAQ entry.
sudo apt-get install libaio-dev libattr1-dev libacl1-dev libz-dev libz-dev libfuse-dev libfuse2 scons libssl-dev build-essential git-core
git clone http://git.zfs-fuse.net/official
cd official
git checkout 3c64b738517f
(cd src && scons debug=2 install)
cp contrib/zfs-fuse.initd.ubuntu /etc/init.d/zfs-fuse
Instead of the 'update-rc.d zfs-fuse defaults' i manually replicated your link:
ln -sfv /etc/init.d/zfs-fuse /etc/rcS.d/S38zfs-fuse
I then create a pool (issue54.sh attached)
And reboot
sehe@lenny:~$ sudo zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
issue54demo 59.5M 85.5K 59.4M 0% 1.00x ONLINE -
In the interest of suppling a workaround on the FAQ, I rebuilt a Lenny box (bare minimum) and repeated the steps (slightly modified due to different versions). I could not reproduce the behaviour. So, at the moment no stuff to make a workaround or FAQ entry.
sudo apt-get install libaio-dev libattr1-dev libacl1-dev libz-dev libz-dev libfuse-dev libfuse2 scons libssl-dev build-essential git-core
git clone http://git.zfs-fuse.net/official
cd official
git checkout 3c64b738517f
(cd src && scons debug=2 install)
cp contrib/zfs-fuse.initd.ubuntu /etc/init.d/zfs-fuse
Instead of the 'update-rc.d zfs-fuse defaults' i manually replicated your link:
ln -sfv /etc/init.d/zfs-fuse /etc/rcS.d/S38zfs-fuse
I then create a pool (issue54.sh attached)
And reboot
sehe@lenny:~$ sudo zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
issue54demo 59.5M 85.5K 59.4M 0% 1.00x ONLINE -
Added by
(anonymous)
on
Jun 06, 2010 04:42 PM
I followed your suggestion, upgraded sysv-rc and then replaced all the non-LSB init scripts with LSB equivalents (this took quite a lot of reinstalling packages and some manual changes). Within two test reboots after that I could NOT reproduce the problem any longer. zfs-fuse came up nicely. However, the final test will be tomorrow, which is when I am going to boot up again after a nightly zfs backup job followed by shutdown. This is the regular way which drew my attention to the problem (though I believe it was reproducible by a simple reboot before). If zfs-fuse doesn't "lock up" then, I will declare that switching to the LSB layout helped and that it was some exotic issue with my "organically grown over years" Linux installation.
Added by
(anonymous)
on
Jun 07, 2010 04:27 PM
zfs-fuse started fine after reboot, so I consider this ticket solved (by upgrading to LSB/dependency-based boot).
Added by
Seth Heeren
on
Jun 08, 2010 01:27 AM
Ok, thanks for reporting back
Added by
Seth Heeren
on
Sep 19, 2010 05:02 PM
Issue state:
postponed → in-progress
Target release:
None → 0.7.0
convert to knowledge base entry (mention LSB and daemon user on gentoo)

issue54.sh
