Top 20 Docker Interview Questions & Answers

Ques: 1. What do you understand by Docker?

Ans: Docker is an open-source lightweight containerization technology. It has gained widespread popularity in the cloud and application packaging world. It allows you to automate the deployment of applications in lightweight and portable containers.

Ques: 2. What is Docker image?

Ans: The Docker image help to create Docker containers. You can create the Docker image with the build command. Due to this, it creates a container that starts when it begins to run. Every docker images are stored in the Docker registry.

Ques: 3. What is Virtualization?

Ans: Virtualization is a method of logically dividing mainframes to allow multiple applications to run simultaneously.

However, this scenario changed when companies and open source communities were able to offer a method of handling privileged instructions. It allows multiple OS to run simultaneously on a single x86 based system.

Ques: 4. Can you explain the process of scaling your Docker containers?

Ans: The Docker containers can be scaled to any level starting from a few hundred to even thousands or millions of containers. The only condition for this is that the containers need the memory and the OS at all times, and there should not be a constraint when the Docker is getting scaled.

Ques: 5. What is CNM?

Ans: CNM stands for Container Networking Model. It is a standard or specification from Docker, Inc. that forms the basis of container networking in a Docker environment. This docker's approach provides container networking with support for multiple network drivers.

Ques: 6. Can you explain the Docker Trusted Registry?

Ans: Docker Trusted Registry is the enterprise-grade image storage toll for Docker. You should install it after your firewall so that you can securely manage the Docker images you use in your applications.

Ques: 7. How do I run multiple copies of Compose file on the same host?

Ans: Compose uses the project name which allows you to create unique identifiers for all of a project's containers and other resources. To run multiple copies of a project, set a custom project name using the -a command-line option or using COMPOSE_PROJECT_NAME environment variable.

Ques: 8. What is Hypervisor?

Ans: A hypervisor is a software that makes virtualization possible. It is also called Virtual Machine Monitor. It divides the host system and allocates the resources to each divided virtual environment. You can basically have multiple OS on a single host system. There are two types of Hypervisors:

Type 1: It’s also called Native Hypervisor or Bare metal Hypervisor. It runs directly on the underlying host system. It has direct access to your host’s system hardware and hence does not require a base server operating system.

Type 2: This kind of hypervisor makes use of the underlying host operating system. It’s also called Hosted Hypervisor.

Ques: 9. What is containerization?

Ans:In the software development process, code developed on one machine might not work perfectly fine on any other machine because of the dependencies. This problem was solved by the containerization concept.

So basically, an application that is being developed and deployed is bundled and wrapped together with all its configuration files and dependencies. This bundle is called a container. Now when you wish to run the application on another system, the container is deployed which will give a bug-free environment as all the dependencies and libraries are wrapped together. Most famous containerization environments are Docker and Kubernetes.

Ques: 10. What is the difference between virtualization and containerization?

Ans: Once you’ve explained containerization and virtualization, the next expected question would be differences. The question could either be differences between virtualization and containerization or differences between virtual machines and containers. Either way, this is how you respond.

Containers provide an isolated environment for running the application. The entire user space is explicitly dedicated to the application. Any changes made inside the container is never reflected on the host or even other containers running on the same host. Containers are an abstraction of the application layer. Each container is a different application.

Whereas in Virtualization, hypervisors provide an entire virtual machine to the guest(including Kernal). Virtual machines are an abstraction of the hardware layer. Each VM is a physical machine.

Ques: 11. Explain Docker Architecture?

Ans: Docker Architecture consists of a Docker Engine which is a client-server application with three major components:

A server which is a type of long-running program called a daemon process (the docker command).
A REST API which specifies interfaces that programs can use to talk to the daemon and instruct it what to do.
A command line interface (CLI) client (the docker command).
The CLI uses the Docker REST API to control or interact with the Docker daemon through scripting or direct CLI commands. Many other Docker applications use the underlying API and CLI.

Ques: 12. What is a Dockerfile?

Ans: Let’s start by giving a small explanation of Dockerfile and proceed by giving examples and commands to support your arguments.

Docker can build images automatically by reading the instructions from a file called Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build, users can create an automated build that executes several command-line instructions in succession.

Ques: 13. What is the life cycle of a Docker Container?

Ans: This is one of the most popular questions asked in Docker interviews. Docker containers have the following life cycle:

Create a container
Run the container
Pause the container(optional)
Un-pause the container(optional)
Start the container
Stop the container
Restart the container
Kill the container
Destroy the container

Ques: 14. How do you scale your Docker containers?

Ans: Docker containers can be scaled to any level, starting from a few hundreds to even thousands or millions of containers. The only condition is that the containers need the memory and the OS all the time, and there should not be a constraint on these when the Docker is getting scaled.

Ques: 15. What is the difference between the COPY and ADD commands in a Dockerfile?

Ans: Although ADD and COPY are functionally similar, generally speaking, COPY is preferred.

That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious. Consequently, the best use for ADD is local tar file auto-extraction into the image, as in ADD rootfs.tar.xz /.

Ques: 16. What are the most common instructions in Dockerfile?

Ans: Some of the common instructions in Dockerfile are as follows:

FROM: We use FROM to set the base image for subsequent instructions. In every valid Dockerfile, FROM is the first instruction.
LABEL: We use LABEL to organize our images as per project, module, licensing etc. We can also use LABEL to help in automation. In LABEL we specify a key value pair that can be later used for programmatically handling the Dockerfile.
RUN: We use RUN command to execute any instructions in a new layer on top of the current image. With each RUN command we add something on top of the image and use it in subsequent steps in Dockerfile.
CMD: We use CMD command to provide default values of an executing container. In a Dockerfile, if we include multiple CMD commands, then only the last instruction is used.

Ques: 17. Can you explain the basic Docker usage workflow?

Ans:

Everything starts with the Dockerfile. The Dockerfile is the source code of the Image.
Once the Dockerfile is created, you build it to create the image of the container. The image is just the "compiled version" of the "source code" which is the Dockerfile.
Once you have the image of the container, you should redistribute it using the registry. The registry is like a git repository -- you can push and pull images.
Next, you can use the image to run containers. A running container is very similar, in many aspects, to a virtual machine (but without the hypervisor).

Ques: 18. What are some similarities between Virtual Machine & Docker ?

Ans: Docker is not just like a Virtual Machine. It uses the host kernel & can’t boot a different operating system. So 5 similarities between Docker & VIrtual Machine can be:

Virtual Machine	Docker
Process in on VM cant see processes in other VMs	Process in one container cant see processes in other container
Each VM has its own root filesystem	Each container has its own root file system (Not Kernal)
Each VM gets its own virtual network adapter	Docker can get virtual network adapter. It can have separate IP and ports
VM is a running instance of physical files(.VMS and .VMDK)	Docker containers are running instances of Docker Image
Host OS can be difference from guest OS	Host OS can be different from Container OS

Ques: 19. What are some of the differences between Docker container & Virtual Machine?

Ans: The differences between docker container and virtual machines are following:

Virtual Machine	Docker
Each Virtual Machine runs its own OS.	All containers share the same kernel of the host.
Boot up time is in minutes.	Containers instantiate in seconds.
Virtual Machines snapshots are used sparingly.	Images are build incrementally on top of another like layers. Lots operating system images/snapshots.
Not effective diffs.Not version controlled.	Images can be diffed and can be version controlled.Dockerhub is like GITHUB.
Cannot run more than couple of VMs on an average laptop.	Can run many Docker containers in a laptop.
Only one VM can be started from one set of VMx and VMDK files.	Multiple Docker containers can be started from one Docker image.

Ques: 20. Can I use JSON instead of YAML for my Docker Compose file?

Ans: Yes. YAML is a superset of json so any JSON file should be valid YAML. To use a JSON file with Compose, specify the filename to use, for example:

docker-compose -f docker-compose.json up

You can use json instead of YAML for your compose file, to use json file with compose, specify the filename to use.

December 21, 2019

Top 20 Machine Learning Interview Questions and Answers

Ques: 1. What is the difference between supervised and unsupervised machine learning?

Answer:

Supervised learning requires training labelled data. For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labelled groups. Unsupervised learning, in contrast, does not require labelling data explicitly.

Ques: 2. What is Overfitting? And how do you ensure you’re not overfitting with a model?

Answer:

Over-fitting occurs when a model studies the training data to such an extent that it negatively influences the performance of the model on new data. This means that the disturbance in the training data is recorded and learned as concepts by the model. But the problem here is that these concepts do not apply to the testing data and negatively impact the model’s ability to classify the new data, hence reducing the accuracy on the testing data.

Collect more data so that the model can be trained with varied samples. Use assembling methods, such as Random Forest. It is based on the idea of bagging, which is used to reduce the variation in the predictions by combining the result of multiple Decision trees on different samples of the data set.

Ques: 3. What do you understand by precision and recall?

Answer:

Recall is also known as the true positive rate: the number of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the number of accurate positives your model claims compared to the number of positives it actually claims. It can be easier to think of recall and precision in the context of a case where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples. You’d have perfect recall (there are actually 10 apples, and you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.

Ques: 4. What are collinearity and multi collinearity?

Answer:

Collinearity occurs when two predictor variables (e.g., x1 and x2) in a multiple regression have some correlation.

Multi collinearity occurs when more than two predictor variables (e.g., x1, x2, and x3) are inter-correlated.

Ques: 5. What’s the difference between Type I and Type II error?

Answer:

Don’t think that this is a trick question! Many machine learning interview questions will be an attempt to lob basic questions at you just to make sure you’re on top of your game and you’ve prepared all of your bases.

Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.

A clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.

Ques: 6. What is A/B Testing?

Answer:

A/B is Statistical hypothesis testing for randomized experiment with two variables A and B. It is used to compare two models that use different predictor variables in order to check which variable fits best for a given sample of data.

Consider a scenario where you’ve created two models (using different predictor variables) that can be used to recommend products for an e-commerce platform.

A/B Testing can be used to compare these two models to check which one best recommends products to a customer.

Ques: 7. What is deep learning, and how does it contrast with other machine learning algorithms?

Answer:

Deep learning is a subset of machine learning that is concerned with neural networks: how to use back propagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

Ques: 8. Name a few libraries in Python used for Data Analysis and Scientific Computations.

Answer:

Here is a list of Python libraries mainly used for Data Analysis:

NumPy
SciPy
Pandas
SciKit
Matplotlib
Seaborn
Bokeh

Ques: 9. Which is more important to you– model accuracy, or model performance?

Answer:

This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense?

Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive data set with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance.

Ques: 10. How are NumPy and SciPy related?

Answer:

NumPy is part of SciPy. NumPy defines arrays along with some basic numerical functions like indexing, sorting, reshaping, etc.

SciPy implements computations such as numerical integration, optimization and machine learning using NumPy’s functionality.

Ques: 11. How would you handle an imbalanced dataset?

Answer:

An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:

Collect more data to even the imbalances in the dataset.
Re-sample the dataset to correct for imbalances.
Try a different algorithm altogether on your dataset.

What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

Ques: 12: Is rotation necessary in PCA? If yes, Why? What will happen if you don’t rotate the components?

Answer:

Yes, rotation (orthogonal) is necessary because it maximizes the difference between variance captured by the component. This makes the components easier to interpret. Not to forget, that’s the motive of doing PCA where, we aim to select fewer components (than features) which can explain the maximum variance in the data set. By doing rotation, the relative location of the components doesn’t change, it only changes the actual coordinates of the points.

If we don’t rotate the components, the effect of PCA will diminish and we’ll have to select more number of components to explain variance in the data set.

Ques: 13. What’s the “kernel trick” and how is it useful?

Answer:

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

Ques: 14. Explain prior probability, likelihood and marginal likelihood in context of naiveBayes algorithm?

Answer:

Prior probability is nothing but, the proportion of dependent (binary) variable in the data set. It is the closest guess you can make about a class, without any further information. For example: In a data set, the dependent variable is binary (1 and 0). The proportion of 1 (spam) is 70% and 0 (not spam) is 30%. Hence, we can estimate that there are 70% chances that any new email would be classified as spam.

Likelihood is the probability of classifying a given observation as 1 in presence of some other variable. For example: The probability that the word ‘FREE’ is used in previous spam message is likelihood. Marginal likelihood is, the probability that the word ‘FREE’ is used in any message.

Ques: 15. Do you have experience with Spark or big data tools for machine learning?

Answer:

You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.

Ques: 16: You came to know that your model is suffering from low bias and high variance. Which algorithm should you use to tackle it? Why?

Answer:

Low bias occurs when the model’s predicted values are near to actual values. In other words, the model becomes flexible enough to mimic the training data distribution. While it sounds like great achievement, but not to forget, a flexible model has no generalization capabilities. It means, when this model is tested on an unseen data, it gives disappointing results.

In such situations, we can use bagging algorithm (like random forest) to tackle high variance problem. Bagging algorithms divides a data set into subsets made with repeated randomized sampling. Then, these samples are used to generate a set of models using a single learning algorithm. Later, the model predictions are combined using voting (classification) or averaging (regression).

Also, to combat high variance, we can:

Use regularization technique, where higher model coefficients get penalized, hence lowering model complexity.

Use top n features from variable importance chart. May be, with all the variable in the data set, the algorithm is having difficulty in finding the meaningful signal.

Ques 17. Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

Answer:

What’s important here is to define your views on how to properly visualize data and your personal preferences when it comes to tools. Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau.

Ques: 18. How is kNN different from kmeans clustering?

Answer:

Don’t get mislead by ‘k’ in their names. You should know that the fundamental difference between both these algorithms is,

kmeans is unsupervised in nature and kNN is supervised in nature.
kmeans is a clustering algorithm. kNN is a classification (or regression) algorithm.
kmeans algorithm partitions a data set into clusters such that a cluster formed is homogeneous and the points in each cluster are close to each other.

The algorithm tries to maintain enough separability between these clusters. Due to unsupervised nature, the clusters have no labels.

KNN algorithm tries to classify an unlabelled observation based on its k (can be any number ) surrounding neighbors. It is also known as lazy learner because it involves minimal training of model. Hence, it doesn’t use training data to make generalization on unseen data set.

Ques: 19. Is it better to have too many false positives or too many false negatives? Explain.

Answer:

It depends on the question as well as on the domain for which we are trying to solve the problem. If you’re using Machine Learning in the domain of medical testing, then a false negative is very risky, since the report will not show any health problem when a person is actually unwell. Similarly, if Machine Learning is used in spam detection, then a false positive is very risky because the algorithm may classify an important email as spam.

Ques: 20. What is the difference between Gini Impurity and Entropy in a Decision Tree?

Answer:

Gini Impurity and Entropy are the metrics used for deciding how to split a Decision Tree.

Gini measurement is the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in the branch.

Entropy is a measurement to calculate the lack of information. You calculate the Information Gain (difference in entropies) by making a split. This measure helps to reduce the uncertainty about the output label.

Top Technical Interviews Questions and Answers for AWS Cloud, Java, Oracle

December 26, 2019

Top 20 Docker Interview Questions & Answers

December 21, 2019

Top 20 Machine Learning Interview Questions and Answers