#36 — zfs-fuse fedora init script sometimes attempts zfs mount before daemon fully started
| State | Resolved |
|---|---|
| Version: | 0.6.0 |
| Area | Process |
| Issue type | Bug |
| Severity | Medium |
| Submitted by | (anonymous) |
| Submitted on | Apr 15, 2010 |
| Responsible | Seth Heeren |
| Target release: | 0.6.9 |
Last modified on
May 27, 2010
by
Seth Heeren
Hi,
I'm using zfs-fuse 0.6.0(plus git patches) on RHEL5.[45] x86_64.
Sometimes, on slow machines, zfs filesystems do not get mounted at bootup.
That's because 'zfs mount -a' is attempted at booted before the zfs-fuse daemon is fully
started and functionnal.
It would also be nice if the 1sec timeout would be configurable for slow systems.
The following patch fixes the issue for me (timeout increased from 1sec to 2secs and made user-configurable):
diff --git a/contrib/zfs-fuse.initd.fedora b/contrib/zfs-fuse.initd.fedora
index 28a7263..ef7e9ea 100755
--- a/contrib/zfs-fuse.initd.fedora
+++ b/contrib/zfs-fuse.initd.fedora
@@ -63,7 +63,7 @@ do_start() {
exit $ES_TO_REPORT
fi
- sleep 1
+ sleep $ZFS_MOUNT_DELAY
action "Mounting ZFS filesystems" zfs mount -a
ES_TO_REPORT=$?
if [ 0 != $ES_TO_REPORT ] ; then
diff --git a/contrib/zfs-fuse.sysconfig b/contrib/zfs-fuse.sysconfig
index 9186ef7..bf3e9db 100644
--- a/contrib/zfs-fuse.sysconfig
+++ b/contrib/zfs-fuse.sysconfig
@@ -1 +1,3 @@
OPTIONS=""
+# Delay before initial mount attempt after zfs-fuse started at boot time.
+ZFS_MOUNT_DELAY=2
[raistlin@thorbardin official]$
I'm using zfs-fuse 0.6.0(plus git patches) on RHEL5.[45] x86_64.
Sometimes, on slow machines, zfs filesystems do not get mounted at bootup.
That's because 'zfs mount -a' is attempted at booted before the zfs-fuse daemon is fully
started and functionnal.
It would also be nice if the 1sec timeout would be configurable for slow systems.
The following patch fixes the issue for me (timeout increased from 1sec to 2secs and made user-configurable):
diff --git a/contrib/zfs-fuse.initd.fedora b/contrib/zfs-fuse.initd.fedora
index 28a7263..ef7e9ea 100755
--- a/contrib/zfs-fuse.initd.fedora
+++ b/contrib/zfs-fuse.initd.fedora
@@ -63,7 +63,7 @@ do_start() {
exit $ES_TO_REPORT
fi
- sleep 1
+ sleep $ZFS_MOUNT_DELAY
action "Mounting ZFS filesystems" zfs mount -a
ES_TO_REPORT=$?
if [ 0 != $ES_TO_REPORT ] ; then
diff --git a/contrib/zfs-fuse.sysconfig b/contrib/zfs-fuse.sysconfig
index 9186ef7..bf3e9db 100644
--- a/contrib/zfs-fuse.sysconfig
+++ b/contrib/zfs-fuse.sysconfig
@@ -1 +1,3 @@
OPTIONS=""
+# Delay before initial mount attempt after zfs-fuse started at boot time.
+ZFS_MOUNT_DELAY=2
[raistlin@thorbardin official]$
- Steps to reproduce:
- get a slow machine, edit /etc/init.d/zfs-fuse and remove 'sleep 1' on line 63.
You'll notice that the ZFS filesystems do not get mounted at boot time.
Added by
(anonymous)
on
Apr 23, 2010 03:25 PM
This is the same on all distros. The init script I use on Gentoo, I had to add 2 second delay, although my machine is not slow by any standards.
Added by
Seth Heeren
on
May 22, 2010 06:57 PM
Responsible manager:
(UNASSIGNED) → sgheeren
I've kept an eye on this one.
See this patch
http://zfs-fuse.sehe.nl/?p=[…]99da4e34ece50b8c852e2da5e98
This should in theory allow any zfs/zpool commands following the daemon start to connect (to a certain maximum number of pending requests, which has been conservatively set at 5 at the moment). This would remove any raciness in zfs-fuse initialization. Also, failure to open the listening socket now correctly returns exitcode 1 to the initscript (as it is done before the fork now).
Let me know what you think. Any sleeps should be superfluous now.
See this patch
http://zfs-fuse.sehe.nl/?p=[…]99da4e34ece50b8c852e2da5e98
This should in theory allow any zfs/zpool commands following the daemon start to connect (to a certain maximum number of pending requests, which has been conservatively set at 5 at the moment). This would remove any raciness in zfs-fuse initialization. Also, failure to open the listening socket now correctly returns exitcode 1 to the initscript (as it is done before the fork now).
Let me know what you think. Any sleeps should be superfluous now.
Added by
Seth Heeren
on
May 24, 2010 06:33 PM
I've been stress testing (performance, stability) a bit. I turns out that this fix is really quite powerful. I just tested with
$ zfs-fuse; time (for a in $(seq 1 20); do (for a in $(seq 1 200); do zpool list& zpool get all BONNIE& done)|wc -l& done; wait)
This will spawn some 4000 jobs in parallel against a freshly launching/-ed zfs-fuse. It doesn't even break a sweat (this line completes in 7.5 seconds on my system
$ zfs-fuse; time (for a in $(seq 1 20); do (for a in $(seq 1 200); do zfs list& zpool get all BONNIE& done)|wc -l& done; wait)
returns in 12.5 seconds
$ zfs-fuse; time (for a in $(seq 1 20); do (for a in $(seq 1 200); do zpool list& zpool get all BONNIE& done)|wc -l& done; wait)
This will spawn some 4000 jobs in parallel against a freshly launching/-ed zfs-fuse. It doesn't even break a sweat (this line completes in 7.5 seconds on my system
$ zfs-fuse; time (for a in $(seq 1 20); do (for a in $(seq 1 200); do zfs list& zpool get all BONNIE& done)|wc -l& done; wait)
returns in 12.5 seconds
Added by
Seth Heeren
on
May 26, 2010 07:04 PM
Target release:
None → 0.6.9
Ironically, my patch broke lockfile subtly (fork does not inherit file locks). Therefore,
for a in 1 2 3 4 5 6; do zfs-fuse; done
would result in 6 copies of the daemon process (of which only 1 was actively responding, so there wasn't much of a risk except for resource abuse).
This has been comprehensively fixed in 6755f6609295fe
This (with fix) will be merged into testing by tomorrow
It is here for now:
http://zfs-fuse.sehe.nl/[…]/sehe
(namely: 42fe23421de4299d and 6755f6609)
for a in 1 2 3 4 5 6; do zfs-fuse; done
would result in 6 copies of the daemon process (of which only 1 was actively responding, so there wasn't much of a risk except for resource abuse).
This has been comprehensively fixed in 6755f6609295fe
This (with fix) will be merged into testing by tomorrow
It is here for now:
http://zfs-fuse.sehe.nl/[…]/sehe
(namely: 42fe23421de4299d and 6755f6609)
Added by
Seth Heeren
on
May 27, 2010 06:58 PM
Issue state:
unconfirmed → resolved
Merged (closing)

