I was told, on some implementations of mutex initially it will spin lock internally first.
This could be why it seems faster.
If a system call would be made I doubt you would have the same results.
(I can't verify this but I thought this could be a reason)