#40 — zfs_socket hangs with http://rainemu.swishparty.co.uk/git/zfs/
| State | Resolved |
|---|---|
| Version: | |
| Area | User interface |
| Issue type | Bug |
| Severity | Critical |
| Submitted by | Seth Heeren |
| Submitted on | May 15, 2010 |
| Responsible | Seth Heeren |
| Target release: | 0.6.9 |
Last modified on
May 22, 2010
by
Seth Heeren
Even in a clean room environment I can observe the hang.
The symptom prints a line in /var/log/daemon.log saying
"zfs-fuse: WARNING: Error creating ioctl thread."
(typed in, I have no copy&paste in the VM currently.)
To check this again I created a VM (vmware-server) from scratch with
following parameters:
- Linux 32 bit
- 1024 MB RAM
- 2 SMP processors (2.67 GHz btw.)
- Debian Lenny version 5.0.4 minimal (nothing, not even SSHd, only
console)
Then I downloaded following script and ran it as root (sorry, I use
Google Groups, no way to attach something here properly):
http://hydra.geht.net/zfs-build.sh
This does everything, it updates Linux, installs the Dev environment,
checks out zfs-fuse, compiles it and runs the test.
This test fails. It still hangs after a while. It took half a screen
full of dots to hang at my side, though.
So I am really out of clues.
Key properties which might be needed to see this problem, too:
- 32 bit
- Debian Lenny
- Enough RAM
- Fast enough processor
- SMP
But I might be wrong as I did not do many tests with different VM
parameters yet.
Note that it looks like the bigger the pool size the faster you find
the problem. However it even shows up with no pool at all.
-Tino
The symptom prints a line in /var/log/daemon.log saying
"zfs-fuse: WARNING: Error creating ioctl thread."
(typed in, I have no copy&paste in the VM currently.)
To check this again I created a VM (vmware-server) from scratch with
following parameters:
- Linux 32 bit
- 1024 MB RAM
- 2 SMP processors (2.67 GHz btw.)
- Debian Lenny version 5.0.4 minimal (nothing, not even SSHd, only
console)
Then I downloaded following script and ran it as root (sorry, I use
Google Groups, no way to attach something here properly):
http://hydra.geht.net/zfs-build.sh
This does everything, it updates Linux, installs the Dev environment,
checks out zfs-fuse, compiles it and runs the test.
This test fails. It still hangs after a while. It took half a screen
full of dots to hang at my side, though.
So I am really out of clues.
Key properties which might be needed to see this problem, too:
- 32 bit
- Debian Lenny
- Enough RAM
- Fast enough processor
- SMP
But I might be wrong as I did not do many tests with different VM
parameters yet.
Note that it looks like the bigger the pool size the faster you find
the problem. However it even shows up with no pool at all.
-Tino
- Steps to reproduce:
wget -c http://hydra.geht.net/zfs-build.sh
more zfs-build.sh
apt-get update
apt-get install scons libssl-dev fuse-utils libfuse-dev libfuse2 build-essential zlib1g-dev libaio-dev libattr1-dev git-core git -y
git clone http://rainemu.swishparty.co.uk/git/zfs
cd zfs/src/
scons -c
scons
zfs-fuse/zfs-fuse -n &
cd cmd/zpool
while echo -n .; do ./zpool status >/dev/null; done
history
Added by
Seth Heeren
on
May 15, 2010 04:06 AM
Issue state:
unconfirmed → open
Severity:
Medium → Important
reproduced the behaviour on a cloud machine (Amazon AMI ami-ed16f984) running Debian SMP i386 5.0.4 lenny; info http://ec2debian-group.notlong.com)
No pools defined, hang with folowing output:
domU-12-31-38-00-B1-B3:~/zfs/src# cd cmd/zpool
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# while echo -n .; do ./zpool status >/dev/null; done
..................................................................................................................................................................................................................................................................
It is interesting to see the number of dots is more ore less constant each time (+/-1) This is easily seen by modifying the sequence like so:
# killall zfs-fuse; sleep 4; ../../zfs-fuse/zfs-fuse -n & sleep 2; while echo .; do ./zpool status >/dev/null; done | nl
The output will be numbered list of dots :) In 5 runs I haven't seen another outome than 257 dots. This clearly points to a 256 limit somewhere :)
No pools defined, hang with folowing output:
domU-12-31-38-00-B1-B3:~/zfs/src# cd cmd/zpool
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# while echo -n .; do ./zpool status >/dev/null; done
..................................................................................................................................................................................................................................................................
It is interesting to see the number of dots is more ore less constant each time (+/-1) This is easily seen by modifying the sequence like so:
# killall zfs-fuse; sleep 4; ../../zfs-fuse/zfs-fuse -n & sleep 2; while echo .; do ./zpool status >/dev/null; done | nl
The output will be numbered list of dots :) In 5 runs I haven't seen another outome than 257 dots. This clearly points to a 256 limit somewhere :)
Added by
Seth Heeren
on
May 15, 2010 04:33 AM
bisection point: The problem does _NOT_ occur with 1b186de310461eb1a4637ed2fca0ea0ccb8a66d3 (immediately after introducing threads for ioctls)?
Will try to find the eventual cause
Will try to find the eventual cause
Added by
Seth Heeren
on
May 15, 2010 04:49 AM
The culprit is a change in the default stacksize
commit d0c97125de8f999c830619d285e3f6e04e827693
Author: Emmanuel Anne <manu@manu-home.dyndns.org>
Date: Mon Apr 26 13:41:18 2010 +0200
stack-size command line argument
Limiting the stack size of threads becomes optional.
Without this option : virtual stack size of zfs-fuse at startup 900 Mb
With stack-size=32, virtual size = 38 Mb !
Paranoids can forget this options, for the others, 32 should be safe.
Apparently on debian with no ulimit, running as root on an average server (see below for mem stats) will not allow the default settings:
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# free -m
total used free shared buffers cached
Mem: 1706 537 1168 0 16 444
-/+ buffers/cache: 76 1629
Swap: 895 0 895
We need to make the default something more sane so it will just work
commit d0c97125de8f999c830619d285e3f6e04e827693
Author: Emmanuel Anne <manu@manu-home.dyndns.org>
Date: Mon Apr 26 13:41:18 2010 +0200
stack-size command line argument
Limiting the stack size of threads becomes optional.
Without this option : virtual stack size of zfs-fuse at startup 900 Mb
With stack-size=32, virtual size = 38 Mb !
Paranoids can forget this options, for the others, 32 should be safe.
Apparently on debian with no ulimit, running as root on an average server (see below for mem stats) will not allow the default settings:
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# free -m
total used free shared buffers cached
Mem: 1706 537 1168 0 16 444
-/+ buffers/cache: 76 1629
Swap: 895 0 895
We need to make the default something more sane so it will just work
Added by
Seth Heeren
on
May 15, 2010 05:05 AM
Issue state:
open → in-progress
Target release:
None → 0.6.1
I think the documentation is broken (default is unlimited) or something else I can't fathom: using the default (no zfsrc or --stack-size) is not working. Using --stack-size=8 gives _exactly_ the same hang at the same point, but raising the stack limit from the default 8m to 32m using zfsrc or --stack-size=32
# official config (./contrib/zfsrc) breaks this test scenario...
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# mkdir -pv /etc/zfs
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# cp ../../../contrib/zfsrc /etc/zfs/
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# echo -e 'stack-size = 32\n' >> /etc/zfs/zfsrc
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# killall zfs-fuse; sleep 4; ../../zfs-fuse/zfs-fuse -n & sleep 2; while echo -n .; do ./zpool status >/dev/null; done
[1]+ Exit 1 ../../zfs-fuse/zfs-fuse -n
[1] 16547
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
(Note that requiring an extra trailing line-feed in zfsrc is a separate bug :))
Could it be that the default on no --stack-size=.... is accidentially 8m instead of unlimited...?
# official config (./contrib/zfsrc) breaks this test scenario...
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# mkdir -pv /etc/zfs
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# cp ../../../contrib/zfsrc /etc/zfs/
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# echo -e 'stack-size = 32\n' >> /etc/zfs/zfsrc
domU-12-31-38-00-B1-B3:~/zfs/src/cmd/zpool# killall zfs-fuse; sleep 4; ../../zfs-fuse/zfs-fuse -n & sleep 2; while echo -n .; do ./zpool status >/dev/null; done
[1]+ Exit 1 ../../zfs-fuse/zfs-fuse -n
[1] 16547
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
(Note that requiring an extra trailing line-feed in zfsrc is a separate bug :))
Could it be that the default on no --stack-size=.... is accidentially 8m instead of unlimited...?
Added by
Seth Heeren
on
May 15, 2010 05:08 AM
Though it doesn't show in this web page, that was an waful long list of dots in the previous comment :)
I head to manually break the test because it would not end ...
I head to manually break the test because it would not end ...
Added by
Seth Heeren
on
May 15, 2010 05:51 AM
The problem is quite a bit bigger
To my shock and horror, on my meaty server with _no pools_ the same test stopped at 336 dots... (my system has 8Gb RAM, Ubuntu Lucid PAE kernel)
# killall zfs-fuse; sleep 4; zfs-fuse ; sleep 2; while echo .; do zpool status >/dev/null; done | nl
--stack-size=0, --stack-size=1, --stack-size=2.... --stack-size=14 all cut off at precisely 336 or 337 dots
--stack-size=16... --stack-size=64 all cut off at..... precisely 32695 or 32696 dots!
The memory usage slowly but consistenly grows during the tests. The thread count is fixed at 47 (NLWP)
I'm pretty positive this type of limit has hit me while doing large zfs receives. It would just hang there.
After a 'git revert d0c97125de8f999c830619d285e3f6e04e827693 --no-commit' the same type of limit is reached at 42696 dots.
Well that should be enough observations for now...
SYSTEM DETAILS
root@karmic:~# ulimit
unlimited
root@karmic:~# free
total used free shared buffers cached
Mem: 8259024 3120176 5138848 0 195892 1728620
-/+ buffers/cache: 1195664 7063360
Swap: 0 0 0
root@karmic:~# ps -f $(pgrep zfs)
UID PID PPID C STIME TTY STAT TIME CMD
root 25103 1 34 12:46 ? Ssl 1:29 zfs-fuse
To my shock and horror, on my meaty server with _no pools_ the same test stopped at 336 dots... (my system has 8Gb RAM, Ubuntu Lucid PAE kernel)
# killall zfs-fuse; sleep 4; zfs-fuse ; sleep 2; while echo .; do zpool status >/dev/null; done | nl
--stack-size=0, --stack-size=1, --stack-size=2.... --stack-size=14 all cut off at precisely 336 or 337 dots
--stack-size=16... --stack-size=64 all cut off at..... precisely 32695 or 32696 dots!
The memory usage slowly but consistenly grows during the tests. The thread count is fixed at 47 (NLWP)
I'm pretty positive this type of limit has hit me while doing large zfs receives. It would just hang there.
After a 'git revert d0c97125de8f999c830619d285e3f6e04e827693 --no-commit' the same type of limit is reached at 42696 dots.
Well that should be enough observations for now...
SYSTEM DETAILS
root@karmic:~# ulimit
unlimited
root@karmic:~# free
total used free shared buffers cached
Mem: 8259024 3120176 5138848 0 195892 1728620
-/+ buffers/cache: 1195664 7063360
Swap: 0 0 0
root@karmic:~# ps -f $(pgrep zfs)
UID PID PPID C STIME TTY STAT TIME CMD
root 25103 1 34 12:46 ? Ssl 1:29 zfs-fuse
Added by
Seth Heeren
on
May 15, 2010 11:46 AM
Severity:
Important → Critical
The latest patch (88105bb84206e257f5507ce96f4ce7c9aee30e71 explicitely create ioctl threads in detached state) improved things.
The prior one (5c04fe4b3539ee17e0fb959bb2ab31fb7b9e6276 better handling of error if ioctl thread creation fails) appears to make _no_ difference _at all_ on my server.
With both patches, there is no measurable climb of memory usage over time and the test run continues to run indefinitely (300277 and counting).
Invocation was with --stack-size=0 for this particular run
I'll retest receiving the large sendfiles that failed on testing the other time. If the hang is gone from that scenario too, I consider this bug closed!
The prior one (5c04fe4b3539ee17e0fb959bb2ab31fb7b9e6276 better handling of error if ioctl thread creation fails) appears to make _no_ difference _at all_ on my server.
With both patches, there is no measurable climb of memory usage over time and the test run continues to run indefinitely (300277 and counting).
Invocation was with --stack-size=0 for this particular run
I'll retest receiving the large sendfiles that failed on testing the other time. If the hang is gone from that scenario too, I consider this bug closed!
Added by
Seth Heeren
on
May 22, 2010 05:23 AM
Issue state:
in-progress → resolved
Target release:
0.6.1 → 0.6.9
Closing after retesting with said receives (they are flawless albeit slow :))
Thanks Tino for the [pc]atch
Thanks Tino for the [pc]atch

