Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall, os/exec: Support for User Namespaces #8447

Closed
gopherbot opened this issue Jul 30, 2014 · 20 comments
Closed

syscall, os/exec: Support for User Namespaces #8447

gopherbot opened this issue Jul 30, 2014 · 20 comments
Milestone

Comments

@gopherbot
Copy link

by mrunalp:

We are working on adding User Namespace support to docker/libcontainer, but running into
an issue where there is no opportunity to write UID/GID Mappings to a child process
after fork/exec leading to the child process losing capabilities.

Here is a proof-of-concept patch that will allow Go to support User Namespaces easily.

https://gist.github.com/mrunalp/7334e74a01b9a10e8546

Here is some sample code that will exercise the code above.

https://gist.github.com/mrunalp/4365565f94e9c9fd737c

Here is sample output from invoking the test program above.

https://gist.github.com/mrunalp/b684fdce11f2175b6d4d

I will be glad to clean up this patch to get this merged into Go.

Thanks.
@bradfitz
Copy link
Contributor

Comment 1:

Please use the normal contribution process if you'd like feedback.

Status changed to Accepted.

@gopherbot
Copy link
Author

Comment 2 by mrunalp:

Sure, I will follow the contribution process.

@ianlancetaylor
Copy link
Contributor

Comment 3:

Labels changed: added repo-main, release-none.

@bradfitz
Copy link
Contributor

Comment 4:

This bug should have more context and references to the interfaces involved.
It seems this is a new Linux-only interface as of Linux 3.5:
I dug up this description of /proc/nnn/uid_map and gid_map, etc:
https://lists.linux-foundation.org/pipermail/containers/2013-January/031504.html
It's probably committed somewhere.
Copying Ian, who loves fork issues.

@gopherbot
Copy link
Author

Comment 5 by mrunalp:

Yes, this is a linux only feature. It is described here --
http://lwn.net/Articles/532593/

@ianlancetaylor
Copy link
Contributor

Comment 6:

OK, that is some crazy stuff.  But, it's not clear to me why this needs to go into the
syscall package at all.  Why can't the child process do it when it starts up?
That is, I can see that it would sometimes be convenient to have this in SysProcAttr,
but so far everything in SysProcAttr is there because it can't reasonably be done
elsewhere.  It seems like this mapping could be written by a helper process.

@gopherbot
Copy link
Author

Comment 7 by mrunalp:

The child process is limited in what mappings it could write. One approach we tried was
writing the mappings after the child is exec'ed. In that case, child does become root
(if the parent writes the correct mappings), but it has lost capabilities on exec
(unless the binary was a setuid), so it is very limited in what it could do. It can't
even switch to the docker user since CAP_SETUID/CAP_SETGID are lost. This led us to the
path of pursuing this patch.

@gopherbot
Copy link
Author

Comment 8 by crosby.michael:

The mappings have to be written after the fork and before the exec of the new process or
they will not be successfully applied.  That is why it has to go into the syscall
package because we have no reliable way to insert code in-between the fork and exec.
Like the comment above using a helper process to write the mappings does work but
capabilities are lost when the fork'd process tries to update it's own UID/GID mappings.

@gopherbot
Copy link
Author

Comment 9 by mrunalp:

Here are more details about what happens in exec_linux.go when CLONE_NEWUSER is passed
as a flag --
1. child uid is overflowuid i.e 65534.
2. If UID/GID are passed setuid/setgid in child will fail since
   linux returns invalid arguments for setuid/setgid for any unmapped values.
   (With patch and mappings written, this succeeds as the mappings have been written.)
3. child execs any binary (nsinit in case of docker which isn't setuid) and since the
child isn't root it loses capabilities. With the patch we ensure that it is root, so it
has all capabilities and is able to do further setup before exec'ing into user supplied
command.

@ianlancetaylor
Copy link
Contributor

Comment 10:

OK.  We'll probably need to figure out how to implement this in the new go.sys package.

@gopherbot
Copy link
Author

Comment 11 by brandon@ifup.co:

Ian- Does this mean that the current code review should be closed and something new
should be proposed against go.sys? https://golang.org/cl/126190043
This issue blocks a pretty major feature of Docker and libcontainer so it would be great
to get it on the right path for go 1.4.

@ianlancetaylor
Copy link
Contributor

Comment 12:

I think that I was wrong about go.sys.  Code that has to run between fork and exec more
or less has to run in the syscall support, otherwise the os and os/exec packages can't
use it.
I'm not sure I understand that CL, though.  It looks like that CL was never even mailed
out.  Is it ready for review?  Why does it use a pipe rather than simply opening the
file in the child?

Labels changed: added release-go1.4, removed release-none.

@gopherbot
Copy link
Author

Comment 13 by mrunalp:

Ian, it wasn't mailed out after this discussion as I assumed that it won't be used in
favor of a go.sys based solution. The pipe is used as a synchronization mechanism to
make the child wait for the parent to write the mappings. BTW, I can mail this out to
the appropriate mailing list if we are ready to make the change in fork/exec code.

@gopherbot
Copy link
Author

Comment 14 by crosby.michael:

Also, the parent process must write the mappings because the child is not allowed to
remap UID/GIDs that are not it's own, which is basically only 0.  That is why the pipe
is used and the parent writes the mappings for the child process.

@ianlancetaylor
Copy link
Contributor

Comment 15:

If you want something in Go 1.4 we need to start on that immediately if not sooner, as
we are already in the feature freeze.

@gopherbot
Copy link
Author

Comment 16 by mrunalp:

I used the publish + mail comments option on the codereview, since I don't have access
to the system where I initiated the codereview from right now. Hopefully that sent out
an email to initiate a formal review. If that didn't work, I will resend it from my
other system in the evening.

@bradfitz
Copy link
Contributor

Comment 17:

https://golang.org/cl/126190043/ mentions this issue.

@ianlancetaylor
Copy link
Contributor

Comment 18:

This issue was closed by revision f9d7e13.

Status changed to Fixed.

@gopherbot
Copy link
Author

Comment 19 by teabee89:

Would this make it into Go 1.4 ?

@ianlancetaylor
Copy link
Contributor

Comment 20:

Yes, this will be in Go 1.4.

@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
Fixes golang#8447.

LGTM=iant
R=golang-codereviews, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/126190043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 26, 2018
Fixes golang#8447.

LGTM=iant
R=golang-codereviews, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/126190043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
Fixes golang#8447.

LGTM=iant
R=golang-codereviews, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/126190043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 30, 2018
Fixes golang#8447.

LGTM=iant
R=golang-codereviews, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/126190043
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants