-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Whilst investigating how I could make EKS clusters work with the system GPU I ran into an issue whereby it is only possible to currently mount a single host path as a volume to the EKS cluster when using the localstack start
with the default, in-built k3d
cluster.
This presents a scenario whereby multiple paths are required on the host but cannot be mounted to the container despite that being a capability of the underlying docker engine.
Whilst the documentation for mounting host directories eludes to the fact multiple directories may be mounted, in reality this requires all directories to be located under the same host path (https://docs.localstack.cloud/user-guide/aws/eks/#mounting-directories-from-host-to-pod) and it is not possible to mount paths which are at differing locations across the host filesystem.
This makes the following bind configuration impossible for localstack eks
to produce (example from an equivelent kind cluster)
docker inspect gputest-worker | jq '.[].HostConfig.Binds'
[
"/dev/null:/var/run/nvidia-container-devices/all",
"/lib/modules:/lib/modules:ro"
]
Any attempt to bypass this results in a failed EKS cluster. For example attempting to pass a comma separated list as in the example below, causes the cluster to fail as can be seen in the associated error log
--tags '{"_volume_mount_":"/dev/null:/var/run/nvidia-container-devices/all,/lib/modules:/lib/modules:ro"}'
Error log for multiple mount attempt
2025-02-07T05:39:06.547 DEBUG --- [functhread61] localstack.utils.run : Executing command: ['/var/lib/localstack/lib/k3d/v5.7.3/k3d-linux-amd64', 'cluster', 'create', 'test5', '--api-port', '6443', '--image', 'rancher/k3s:v1.30.4-k3s1', '--volume', '[/dev/null:/var/run/nvidia-container-devices/all,/lib/modules:/lib/modules:ro]', '-p', '8081:80@loadbalancer']
2025-02-07T05:42:06.547 DEBUG --- [functhread57] l.p.c.s.eks.k8s_utils : Unable to start k3d cluster in 180 (EKS_STARTUP_TIMEOUT) seconds.
2025-02-07T05:42:06.547 DEBUG --- [functhread57] l.p.c.s.eks.k8s_utils : K3D CLI logs:
2025-02-07T05:39:06.569061 WARN[0000] No node filter specified
2025-02-07T05:39:06.569080 INFO[0000] portmapping '8081:80' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy]
2025-02-07T05:39:06.569086 FATA[0000] error processing cluster configuration: invalid volume mount '[/dev/null:/var/run/nvidia-container-devices/all,/lib/modules:/lib/modules:ro]': maximal 2 ':' allowed
2025-02-07T05:42:06.548 DEBUG --- [functhread57] l.p.c.s.eks.k8s_utils : Could not get container logs for test5
2025-02-07T05:42:06.549 INFO --- [functhread57] l.p.c.s.eks.k8s_utils : Error creating cluster test5 - deleting and retrying: Timeout while waiting for EKS startup...(<class 'Exception'>)
2025-02-07T05:42:06.549 DEBUG --- [functhread57] l.u.c.docker_sdk_client : Disconnecting container 'localstack-main' from network 'k3d-test5'
2025-02-07T05:42:06.550 DEBUG --- [functhread57] l.p.c.s.eks.k8s_utils : Could not disconnect LocalStack container from k3d network
2025-02-07T05:42:06.550 DEBUG --- [functhread57] localstack.utils.run : Executing command: ['/var/lib/localstack/lib/k3d/v5.7.3/k3d-linux-amd64', 'cluster', 'delete', 'test5']
After discussing this with @HarshCasper in this slack thread https://localstack-community.slack.com/archives/CMAFN2KSP/p1738909519889869 another option proposed was for a method to be offered to pass arbitrary command line arguments through to the cluster.
Whilst the source of these command line arguments (e.g. environment variables, docker-compose, aws tags...) was not discussed, having the capability to feed arbitrary configuration to the cluster host would be hugely beneficial in granting capabilities of fine tuning how the cluster starts in relation to the end users environment and potentially help resolve the issue I detail here by allowing additional volumes to be created alongside the _volume_mount_
tag.