Containers & Docker Basics
1. What is a container?
A container is a lightweight, standalone, executable package that bundles an application together with its code, runtime, libraries, and dependencies. Containers share the host operating system's kernel rather than running a full guest OS, which makes them far smaller and faster to start than virtual machines. This isolation ensures an application behaves the same way regardless of where it runs, solving the classic "it works on my machine" problem.
2. What is Docker?
Docker is an open-source platform for building, shipping, and running applications inside containers. It provides a simple command-line interface and a daemon that manages the container lifecycle, along with a registry ecosystem for sharing images. Docker standardised containerisation and made it accessible to mainstream developers, becoming the de facto tool for packaging applications consistently across environments.
3. How do containers differ from virtual machines?
Virtual machines virtualise hardware and run a complete guest operating system on top of a hypervisor, so each VM includes its own kernel and is typically several gigabytes in size. Containers virtualise the operating system and share the host kernel, packaging only the application and its dependencies, which makes them megabytes in size and able to start in seconds. As a result, containers offer higher density and faster startup, while VMs provide stronger isolation because they do not share a kernel.
4. What are the main components of Docker's architecture?
Docker uses a client-server architecture consisting of the Docker client, the Docker daemon (dockerd), and registries. The client sends commands such as docker run to the daemon, which builds, runs, and manages images and containers. Registries like Docker Hub store and distribute images, and the daemon pulls images from them when needed.
5. What is the Docker daemon?
The Docker daemon, known as dockerd, is a background process that listens for Docker API requests and manages Docker objects including images, containers, networks, and volumes. It performs the heavy lifting of building images, creating containers, and handling their lifecycle. The daemon can also communicate with other daemons to manage Docker services across a cluster.
6. What is the difference between an image and a container?
An image is a read-only template containing the application code, libraries, and configuration needed to run a program, while a container is a running (or stopped) instance created from that image. You can think of the image as a class and the container as an object instantiated from it. Many containers can be launched from the same image, each with its own writable layer and isolated runtime state.
7. What does the docker run command do?
The docker run command creates a new container from a specified image and starts it in a single step. If the image is not available locally, Docker first pulls it from a registry before running it. Common flags include -d to run in detached mode, -p to publish ports, -e to set environment variables, and --name to assign a readable container name.
8. How do you list running and stopped containers?
You list currently running containers with docker ps, which shows their IDs, images, status, and published ports. Adding the -a flag (docker ps -a) displays all containers including those that have stopped or exited. These commands are essential for monitoring container state and retrieving IDs needed for further operations like docker logs or docker stop.
9. What is the container lifecycle?
A container moves through several states: created, running, paused, stopped (exited), and removed. The docker create command prepares a container without starting it, docker start runs it, docker stop gracefully terminates it, and docker rm deletes it. Understanding these states helps you manage resources and troubleshoot containers that exit unexpectedly.
10. What is Docker Hub?
Docker Hub is the default public registry where Docker images are stored, shared, and distributed. It hosts official images for popular software such as nginx, postgres, and python, as well as community and private repositories. Developers use docker pull to download images from Docker Hub and docker push to upload their own images after authenticating.
Images & Dockerfiles
11. What is a Dockerfile?
A Dockerfile is a plain-text file containing a sequence of instructions that Docker reads to build an image automatically. Each instruction, such as FROM, RUN, COPY, or CMD, defines a step in assembling the image. Using a Dockerfile makes image builds reproducible, version-controllable, and easy to share across a team.
12. What are the most common Dockerfile instructions?
FROM specifies the base image, RUN executes commands during the build, and COPY or ADD move files into the image. WORKDIR sets the working directory, ENV defines environment variables, EXPOSE documents the ports the container listens on, and CMD or ENTRYPOINT define the default command that runs when the container starts. Together these instructions describe exactly how to assemble and launch an application.
13. What is the difference between CMD and ENTRYPOINT?
CMD provides default arguments or a default command that can be easily overridden by passing arguments to docker run. ENTRYPOINT defines the executable that always runs and is not overridden by command-line arguments unless --entrypoint is used. They are often combined, with ENTRYPOINT setting the fixed executable and CMD supplying default parameters that users can replace.
14. What is the difference between COPY and ADD?
COPY simply copies files and directories from the build context into the image and is preferred for its predictability. ADD does the same but adds extra features: it can automatically extract local tar archives and fetch files from remote URLs. Because of these implicit behaviours, best practice is to use COPY unless you specifically need the archive-extraction capability of ADD.
15. What are image layers?
A Docker image is built as a stack of read-only layers, where each Dockerfile instruction that changes the filesystem creates a new layer. Layers are cached and shared between images, so unchanged layers do not need to be rebuilt or re-downloaded, which speeds up builds and reduces storage. When a container runs, Docker adds a thin writable layer on top of the read-only image layers.
16. How does Docker's build cache work?
During a build, Docker caches the result of each instruction and reuses it if the instruction and its inputs have not changed. If a layer changes, that layer and all subsequent layers are rebuilt, because each layer depends on the ones before it. To maximise cache efficiency, you should order Dockerfile instructions from least to most frequently changing, for example copying dependency manifests and installing dependencies before copying application source code.
17. What is a multi-stage build?
A multi-stage build uses multiple FROM statements in a single Dockerfile, where one stage compiles or builds the application and a later, leaner stage copies only the final artifacts. This pattern keeps build tools and intermediate files out of the final image, dramatically reducing its size and attack surface. It is especially valuable for compiled languages, where the build environment is large but the runtime needs only the resulting binary.
18. How do you tag a Docker image?
You tag an image using the format repository:tag, either with the -t flag during docker build -t myapp:1.0 . or afterwards with docker tag. Tags identify specific versions of an image, such as 1.0, latest, or a Git commit hash. Using explicit, immutable version tags rather than relying on latest is recommended for reproducible deployments.
19. What is the .dockerignore file?
A .dockerignore file lists files and directories that should be excluded from the build context sent to the Docker daemon. Excluding items such as node_modules, .git, logs, and local secrets reduces the build context size, speeds up builds, and prevents sensitive or unnecessary files from being baked into the image. It works similarly to a .gitignore file but applies to Docker builds.
20. What are some best practices for writing efficient Dockerfiles?
Use small, official base images such as Alpine variants to reduce size, and combine related RUN commands to minimise the number of layers. Leverage build caching by ordering instructions from least to most frequently changing, use multi-stage builds to discard build dependencies, and add a .dockerignore file. Finally, run the application as a non-root user and pin specific versions to keep images secure and reproducible.
Docker Networking & Volumes
21. What network drivers does Docker provide?
Docker includes several built-in network drivers: bridge, the default for standalone containers on a single host; host, which removes network isolation and uses the host's network stack directly; and none, which disables networking. For multi-host communication, the overlay driver connects containers across a Docker Swarm, and macvlan assigns containers their own MAC addresses to appear as physical devices on the network.
22. What is the default bridge network?
The default bridge network is a private internal network created by Docker on each host, to which standalone containers attach unless told otherwise. Containers on the default bridge can communicate by IP address but not by container name, and they are isolated from the host network except through published ports. Creating a custom (user-defined) bridge network is preferred because it provides automatic DNS-based service discovery by container name.
23. How do you expose a container's port to the host?
You publish a port using the -p flag with docker run, mapping a host port to a container port in the form -p hostPort:containerPort, for example -p 8080:80. This allows external traffic reaching the host on port 8080 to be forwarded to port 80 inside the container. The EXPOSE instruction in a Dockerfile only documents the intended port and does not actually publish it.
24. How do containers communicate with each other?
Containers attached to the same user-defined bridge network can reach one another by container name thanks to Docker's built-in DNS. This lets, for example, an application container connect to a database container using a hostname like db instead of a hardcoded IP. For containers on different hosts, an overlay network or an orchestrator such as Kubernetes provides cross-host service discovery and routing.
25. What is a Docker volume?
A Docker volume is the preferred mechanism for persisting data generated and used by containers, stored in a location managed by Docker on the host. Because volumes exist independently of the container lifecycle, the data survives when a container is removed or recreated. Volumes are also easier to back up, migrate, and share between containers than data stored in the container's writable layer.
26. What is the difference between volumes and bind mounts?
Volumes are fully managed by Docker and stored in Docker's storage area, making them portable and the recommended choice for persistent application data. Bind mounts map a specific file or directory from the host filesystem directly into the container, giving you precise control but tying the container to the host's directory structure. Bind mounts are useful during development for live code reloading, while volumes are better for production data.
27. Why is storing data in a container's writable layer discouraged?
Data written to a container's writable layer is ephemeral and is lost when the container is removed, making it unsuitable for important state. The writable layer also relies on a storage driver, which adds overhead and reduces I/O performance compared with volumes. For these reasons, persistent data such as databases and uploads should always be placed in volumes or bind mounts.
28. What is Docker Compose?
Docker Compose is a tool for defining and running multi-container applications using a single declarative YAML file, typically named compose.yaml. In that file you describe services, networks, and volumes, then bring the whole application up or down with docker compose up and docker compose down. Compose is ideal for local development and testing because it captures the entire application topology in one version-controlled file.
29. How do you view the logs of a container?
You inspect a container's output using docker logs <container>, which shows everything the application wrote to standard output and standard error. Adding -f follows the log stream in real time, and --tail limits output to the most recent lines. Centralising logs to stdout/stderr rather than writing to files inside the container is a recommended practice so logs can be collected by external systems.
30. How do you remove unused Docker objects to reclaim space?
The docker system prune command removes stopped containers, dangling images, unused networks, and the build cache in one operation. Adding -a also removes images not referenced by any container, and --volumes includes unused volumes. Regularly pruning unused objects prevents disk space from being consumed by leftover artifacts on development and build machines.
Kubernetes Architecture
31. What is Kubernetes?
Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerised applications across clusters of machines. It groups containers into logical units, handles scheduling, self-healing, load balancing, and rolling updates, abstracting away much of the manual operational work. Originally developed by Google, it is now maintained by the Cloud Native Computing Foundation and has become the industry standard for orchestration.
32. Why do we need container orchestration?
Running a handful of containers by hand is manageable, but production systems may run hundreds or thousands across many machines, requiring automated scheduling, scaling, and recovery. Orchestration tools like Kubernetes ensure the desired number of containers are always running, replace failed ones, distribute traffic, and roll out updates without downtime. This automation makes large-scale containerised systems reliable, resilient, and far less labour-intensive to operate.
33. What are the main components of a Kubernetes cluster?
A Kubernetes cluster consists of a control plane and a set of worker nodes. The control plane makes global decisions and includes the API server, scheduler, controller manager, and the etcd datastore. Each worker node runs the kubelet, a container runtime, and the kube-proxy, and is where the actual application containers run inside pods.
34. What is the role of the kube-apiserver?
The kube-apiserver is the front end of the control plane and the central component through which all cluster communication flows. It exposes the Kubernetes REST API, validates and processes requests, and is the only component that reads from and writes to etcd. Every other component, as well as kubectl, interacts with the cluster by sending requests to the API server.
35. What is etcd in Kubernetes?
etcd is a consistent, distributed key-value store that serves as the single source of truth for all cluster data and state. It holds configuration, the desired and current state of objects, secrets, and metadata, allowing the cluster to be reconstructed if components restart. Because it is so critical, etcd is typically run in a highly available, replicated configuration and backed up regularly.
36. What does the kube-scheduler do?
The kube-scheduler watches for newly created pods that have no assigned node and selects the best node for each one to run on. It makes this decision based on resource requirements, hardware and software constraints, affinity and anti-affinity rules, and current node utilisation. Once it chooses a node, it records the assignment via the API server, and the node's kubelet then starts the pod.
37. What is the kubelet?
The kubelet is an agent that runs on every worker node and ensures that the containers described in the pod specifications are running and healthy. It receives pod definitions from the API server, instructs the container runtime to start or stop containers accordingly, and reports node and pod status back to the control plane. The kubelet only manages containers created by Kubernetes, not arbitrary containers on the host.
38. What is the role of kube-proxy?
kube-proxy runs on each node and maintains the network rules that allow communication to and from pods, implementing part of the Kubernetes Service abstraction. It routes traffic destined for a Service's virtual IP to one of the healthy backing pods, typically using iptables or IPVS rules. This enables stable, load-balanced access to a set of pods even as individual pods are created and destroyed.
39. What is the difference between the control plane and worker nodes?
The control plane is the brain of the cluster, making decisions about scheduling, scaling, and maintaining desired state through components like the API server and scheduler. Worker nodes are the machines that actually run application workloads inside pods, managed by the kubelet and kube-proxy. In production, the control plane is usually replicated across multiple machines for high availability while worker nodes are added or removed to adjust capacity.
40. What is kubectl?
kubectl is the command-line tool used to interact with a Kubernetes cluster by communicating with the API server. It lets you create, inspect, update, and delete cluster resources using commands such as kubectl apply, kubectl get, kubectl describe, and kubectl delete. It reads cluster connection details from a kubeconfig file, allowing you to target different clusters and contexts.
Pods, Deployments & Services
41. What is a pod in Kubernetes?
A pod is the smallest deployable unit in Kubernetes and represents one or more tightly coupled containers that share the same network namespace and storage. Containers in a pod share an IP address and can communicate over localhost, and they are always scheduled together on the same node. Most pods run a single container, but multi-container pods are used for helper patterns such as sidecars.
42. Why are pods considered ephemeral?
Pods are designed to be disposable: they can be created, destroyed, and rescheduled at any time in response to failures, scaling, or node maintenance. When a pod dies it is not resurrected; instead a controller creates a brand-new pod with a different IP address. This is why applications rely on higher-level objects like Deployments and stable Services rather than addressing individual pods directly.
43. What is a Deployment?
A Deployment is a controller that manages a set of identical pods and ensures the desired number of replicas are always running. It provides declarative updates, allowing you to change the image or configuration and have Kubernetes roll out the change gradually while keeping the application available. Deployments also support rollbacks, so you can quickly revert to a previous known-good version if an update fails.
44. What is a ReplicaSet?
A ReplicaSet ensures that a specified number of identical pod replicas are running at any given time, recreating pods that fail or are deleted. While you can create ReplicaSets directly, they are normally managed automatically by Deployments, which create and update ReplicaSets behind the scenes. This layered design lets Deployments handle versioned rollouts while ReplicaSets handle maintaining the replica count.
45. What is a Kubernetes Service and why is it needed?
A Service is an abstraction that defines a stable network endpoint and load-balances traffic across a dynamic set of pods selected by labels. Because pod IP addresses change as pods are recreated, a Service provides a constant virtual IP and DNS name so clients do not need to track individual pods. This decouples consumers from the lifecycle of the pods, enabling reliable internal and external communication.
46. What are the main types of Services?
ClusterIP is the default and exposes the Service on an internal cluster IP reachable only within the cluster. NodePort exposes the Service on a static port on every node, making it accessible from outside, and LoadBalancer provisions an external load balancer through the cloud provider. There is also a special ExternalName type that maps a Service to an external DNS name rather than to pods.
47. What is an Ingress?
An Ingress is an API object that manages external HTTP and HTTPS access to Services within a cluster, providing routing rules based on hostnames and URL paths. It allows multiple Services to be exposed through a single external IP and supports features like TLS termination and virtual hosting. An Ingress controller, such as NGINX or Traefik, must be running in the cluster to actually fulfil the rules defined in Ingress objects.
Scaling, Config & Best Practices
48. How does Kubernetes perform autoscaling?
Kubernetes supports several forms of autoscaling, the most common being the Horizontal Pod Autoscaler, which adds or removes pod replicas based on observed metrics such as CPU or memory usage. The Vertical Pod Autoscaler adjusts the resource requests and limits of pods, while the Cluster Autoscaler adds or removes nodes when there is insufficient or excess capacity. Together these mechanisms keep applications responsive under varying load while controlling cost.
49. What are ConfigMaps and Secrets?
ConfigMaps store non-confidential configuration data as key-value pairs, allowing you to decouple configuration from container images and inject it as environment variables or mounted files. Secrets serve the same purpose for sensitive data such as passwords, tokens, and keys, storing the values in a base64-encoded form and offering tighter access controls. Using these objects keeps configuration externalised and avoids hardcoding settings into images.
50. What are liveness and readiness probes?
Liveness and readiness probes are health checks the kubelet performs on containers to manage their lifecycle. A liveness probe determines whether a container is still functioning; if it fails, Kubernetes restarts the container to recover from deadlocks. A readiness probe determines whether a container is ready to receive traffic, and until it passes, the pod is removed from Service endpoints so requests are not routed to an application that is still starting up.