Top 20 Data Science Interview Questions and Answers

Ques: 1. What is the difference between Data Science and Data Analytics?

Answer:

Data Scientists need to slice data to extract valuable insights that a data analyst can apply to real-world business scenarios. The main difference between the two is that the data scientists have more technical knowledge then business analyst. Moreover, they don’t need an understanding of the business required for data visualization.

Ques: 2. What is the method to collect and analyse data to use social media to predict the weather condition?

Answer:

You can collect social media data using Facebook, twitter, Instagram's API's. For example, for the tweeter, we can construct a feature from each tweet like tweeted date, retweets, list of followers, etc. Then you can use a multivariate time series model to predict the weather condition.

Ques: 3. What is the Cross-Validation?

Answer:

It is a model validation technique for evaluating how the outcomes of a statistical analysis will generalize to an independent data set. It is mainly used in backgrounds where the objective is forecast, and one wants to estimate how accurately a model will accomplish in practice. The goal of cross-validation is to term a data set to test the model in the training phase (i.e., validation data set) to limit problems like overfitting and gain insight on how the model will generalize to an independent data set.

Ques: 4. What are the Steps in Making a “Decision Tree”?

Answer:

The steps to make a “Decision Tree” are as follows:

Take the entire data set as input.
Look for a split that maximizes the separation of the classes. A split is any test that divides the data into two sets.
Apply the split to the input data (divide step).
Re-apply steps 1 to 2 to the divided data.
Stop when you meet some stopping criteria.This step is called pruning.
Clean up the tree if you went too far doing splits.

Ques: 5. Can you explain Star Schema?

Answer:

It is a traditional database schema with a central table. Satellite tables map IDs to physical names or descriptions and can be connected to the central fact table using the ID fields; these tables are known as lookup tables and are principally useful in real-time applications, as they save a lot of memory. Sometimes star schemas involve several layers of summarization to recover information faster.

Ques: 6. What are the various steps for a Data analytics project?

Answer:

The following are important steps involved in an analytics project:

Understand the Business problem.
Explore the data and study it carefully.
Prepare the data for modelling by finding missing values and transforming variables.
Start running the model and analyse the Big data result.
Validate the model with new data set.
Implement the model and track the result to analyze the performance of the model for a specific period.

Ques: 7. Why Data Cleansing is essential and which method you use to maintain clean data? Explain.

Answer:

Dirty data often leads to the incorrect inside, which can damage the prospect of any organization. For example, if you want to run a targeted marketing campaign. However, our data incorrectly tell you that a specific product will be in-demand with your target audience; the campaign will fail.

Ques: 8. What is reinforcement learning?

Answer:

Reinforcement Learning is a learning mechanism about how to map situations to actions. The end result should help you to increase the binary reward signal. In this method, a learner is not told which action to take but instead must discover which action offers a maximum reward. As this method based on the reward/penalty mechanism.

Ques: 9. While working on a data set, how can you select important variables? Explain.

Answer:

Following methods of variable selection you can use:

Remove the correlated variables before selecting important variables
Use linear regression and select variables which depend on that p values.
Use Backward, Forward Selection, and Stepwise Selection
Use Xgboost, Random Forest, and plot variable importance chart.
Measure information gain for the given set of features and select top n features accordingly.

Ques: 10. What cross-validation technique would you use on a time series dataset?

Answer:

Instead of using k-fold cross-validation, you should be aware to the fact that a time series is not randomly distributed data - It is inherently ordered by chronological order.

In case of time series data, you should use techniques like forward chaining – Where you will be model on past data then look at forward-facing data.

fold 1: training[1], test[2]

fold 1: training[1 2], test[3]

fold 1: training[1 2 3], test[4]

fold 1: training[1 2 3 4], test[5]

Ques: 11. What is deep learning?

Answer:

Deep learning is subfield of machine learning inspired by structure and function of brain called artificial neural network. We have a lot of numbers of algorithms under machine learning like Linear regression, SVM, Neural network etc and deep learning is just an extension of Neural networks. In neural nets we consider small number of hidden layers but when it comes to deep learning algorithms we consider a huge number of hidden layers to better understand the input output relationship.

Ques: 12. What is the difference between machine learning and deep learning?

Answer:

Machine learning:

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning can be categorized in following three categories.

Supervised machine learning,
Unsupervised machine learning,
Reinforcement learning

Deep learning:

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

Ques: 13. What is selection bias?

Answer:

Selection bias is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analysed. It is sometimes referred to as the selection effect. The phrase “selection bias” most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not considered, then some conclusions of the study may not be accurate.

Ques: 14. What is TF/IDF vectorization?

Answer:

TF–IDF is short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The TF-IDF value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

Ques: 15. What is the difference between Regression and classification ML techniques?

Answer:

Both Regression and classification machine learning techniques come under Supervised machine learning algorithms. In Supervised machine learning algorithm, we must train the model using labelled dataset, while training we must explicitly provide the correct labels and algorithm tries to learn the pattern from input to output. If our labels are discreate values then it will a classification problem, e.g A,B etc. but if our labels are continuous values then it will be a regression problem, e.g 1.23, 1.333 etc.

Ques: 16. What is p-value?

Answer:

When you perform a hypothesis test in statistics, a p-value can help you determine the strength of your results. p-value is a number between 0 and 1. Based on the value it will denote the strength of the results. The claim which is on trial is called Null Hypothesis.

Low p-value (≤ 0.05) indicates strength against the null hypothesis which means we can reject the null Hypothesis. High p-value (≥ 0.05) indicates strength for the null hypothesis which means we can accept the null Hypothesis p-value of 0.05 indicates the Hypothesis could go either way. To put it in another way,

High P values: your data are likely with a true null.
Low P values: your data are unlikely with a true null.

Ques: 17. What are the differences between overfitting and underfitting?

Answer:

In order to make reliable predictions on general untrained data in machine learning and statistics, it is required to fit a (machine learning) model to a set of training data. Overfitting and underfitting are two of the most common modelling errors that occur while doing so.

Following are the differences between overfitting and underfitting:

Definition - A statistical model suffering from overfitting describes some random error or noise in place of the underlying relationship. When underfitting occurs, a statistical model or machine learning algorithm fails in capturing the underlying trend of the data.

Occurrence – When a statistical model or machine learning algorithm is excessively complex, it can result in overfitting. Example of a complex model is one having too many parameters when compared to the total number of observations. Underfitting occurs when trying to fit a linear model to non-linear data.

Poor Predictive Performance – Although both overfitting and underfitting yield poor predictive performance, the way in which each one of them does so is different. While the overfitted model overreacts to minor fluctuations in the training data, the underfit model under-reacts to even bigger fluctuations.

Ques: 18. Could you explain the role of data cleaning in data analysis?

Answer:

Data cleaning can be a daunting task since with the increase in the number of data sources, the time required for cleaning the data increases at an exponential rate.

This is due to the vast volume of data generated by additional sources. Also, data cleaning can solely take up to 80% of the total time required for carrying out a data analysis task.

Nevertheless, there are several reasons for using data cleaning in data analysis. Two of the most important ones are:

Cleaning data from different sources helps in transforming the data into a format that is easy to work with.
Data cleaning increases the accuracy of a machine learning model.

Ques: 19. Can you explain Recommender Systems along with an application?

Answer:

Recommender Systems is a subclass of information filtering systems, meant for predicting the preferences or ratings awarded by a user to some product.

An application of a recommender system is the product recommendations section in Amazon. This section contains items based on the user’s search history and past orders.

Ques: 20. What is exploding gradients?

Answer:

“Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training.” At an extreme, the values of weights can become so large as to overflow and result in NaN values.

This has the effect of your model being unstable and unable to learn from your training data.

Gradient: Gradient is the direction and magnitude calculated during training of a neural network that is used to update the network weights in the right direction and by the right amount.

December 02, 2019

Top 20 PHP Interview Questions and Answers

Ques: 1) What is CAPTCHA?

Answer:

CAPTCHA stands for Completely Automated Public Turing Test to tell Computers and Humans Apart. To prevent spammers from using bots to automatically fill out forms, CAPTCHA programmers will generate an image containing distorted images of a string of numbers and letters. Computers cannot determine what the numbers and letters are from the image but humans have great pattern recognition abilities and will be able to fairly accurately determine the string of numbers and letters. By entering the numbers and letters from the image in the validation field, the application can be fairly assured that there is a human client using it.

BlockChain interview Questions and Answers

Ques: 2) What is meant by urlencode() and urldecode()?

Answer:

string urlencode(str)

When str contains a string like this “hello world” and the return value will be URL encoded and can be use to append with URLs, normally used to append data for GET like someurl.com?var=hello%world

string urldocode(str)

This will simple decode the GET variable’s value. Example: echo (urldecode($_GET_VARS[var])) will output “hello world”

C language Interview Questions and Answers

Ques: 3) What is difference between mysql_fetch_array(), mysql_fetch_row() and mysql_fetch_object()?

Answer:

mysql_fetch_array - Fetch the all matching records of results.

mysql_fetch_object - Fetch the first single matching record of results.

mysql_fetch_row - fetches a result row as array.

C++ language Interview Questions and Answers

Ques: 4) What is difference between srand & shuffle?

Answer:

The srand function seeds the random number generator with seed and shuffle is used for shuffling the array values.

shuffle - This function shuffles (randomizes the order of the elements in) an array. This function assigns new keys for the elements in array. It will remove any existing keys you may have assigned, rather than just reordering the keys.

srand - Seed the random number generator

Machine Learning Interview Questions and Answers

Ques: 5) How do you capture audio/video in PHP?

Answer:

You need a module installed - FFMPEG. FFmpeg is a complete solution to record, convert and stream audio and video. It includes libavcodec, the leading audio/video codec library. FFmpeg is developed under Linux, but it can be compiled under most operating systems, including Windows.

MySQL Interview Questions and Answers

Ques: 6) What's the difference between COPY OF A FILE & MOVE_UPLOAD_FILE in file uploading?

Answer:

Move: This function checks to ensure that the file designated by filename is a valid upload file (meaning that it was uploaded via PHP's HTTP POST upload mechanism). If the file is valid, it will be moved to the filename given by destination.

If filename is not a valid upload file, then no action will occur, and move_uploaded_file() will return FALSE.

Copy: Makes a copy of a file. Returns TRUE if the copy succeeded, FALSE otherwise.

PowerShell Interview Questions and Answers

Ques: 7) What is the difference between echo and print?

Answer:

Main difference between echo() and print() is that echo is just an statement not a function and doesn't return's value or it just prints a value whereas print() is an function which prints a value and also it returns value.

We cannot pass arguments to echo since it is just a statement whereas print is a function and we can pass arguments to it and it returns true or false. print can be used as part of a more complex expression whereas echo cannot. echo is marginally faster since it doesn't set a return value.

Python Interview Questions and Answers

Ques: 8) What is the difference between require() and include()?

Answer:

Both of these constructs includes and evaluates the specific file. The two functions are identical in every way except how they handle failure. If filepath not found, require() terminates the program and gives fatal error, but include() does not terminate the program; It gives warning message and continues to program.

include() produces a Warning while require() results in a Fatal Error if the filepath is not correct.

Python Pandas Interview Questions and Answers

Ques: 9) How do we know properties of the browser?

Answer:

You can gather a lot of information about a person's computer by using $_SERVER['HTTP_USER_AGENT']. This can tell us more about the user's operating system, as well as their browser. For example I am revealed to be Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.3 when visiting a PHP page.

This can be useful to programmers if they are using special features that may not work for everyone, or if they want to get an idea of their target audience. This also is important when using the get_browser() function for finding out more information about the browser's capabilities. By having this information the user can be directed to a version of your site best suited to their browser.

get_browser() attempts to determine the capabilities of the user's browser. This is done by looking up the browser's information in the browscap.ini file.

echo $_SERVER['HTTP_USER_AGENT'] . "<hr />\n";

$browser = get_browser();

foreach ($browser as $name => $value) { echo "<b>$name</b> $value <br />\n";

}

SQL Server Interview Questions and Answers

Ques: 10) What is difference between require_once(), require(), include(). Because all these function are used to call a file in another file.

Answer:

Difference between require() and require_once(): require() includes and evaluates a specific file, while require_once() does that only if it has not been included before (on the same page).

So, require_once() is recommended to use when you want to include a file where you have a lot of functions for example. This way you make sure you don't include the file more times and you will not get the "function re-declared" error.

Difference between require() and include() is that require() produces a FATAL ERROR if the file you want to include is not found, while include() only produces a WARNING.

There is also include_once() which is the same as include(), but the difference between them is the same as the difference between require() and require_once().

Unix interview Questions and Answers

Ques: 11) What are the different types of errors in PHP?

Answer:

1. Notices: These are trivial, non-critical errors that PHP encounters while executing a script - for example, accessing a variable that has not yet been defined. By default, such errors are not displayed to the user at all - although you can change this default behavior.

2. Warnings: These are more serious errors - for example, attempting to include() a file which does not exist. By default, these errors are displayed to the user, but they do not result in script termination.

3. Fatal errors: These are critical errors - for example, instantiating an object of a non-existent class, or calling a non-existent function. These errors cause the immediate termination of the script, and PHP’s default behavior is to display them to the user when they take place.

C# Language Interview Questions and Answers

Ques: 12) How to Create a Cookie & destroy it in PHP?

Answer:

setcookie(”variable”,”value”,”time”);

variable - name of the cookie variable variable - value of the cookie variable time - expiry time

Example: setcookie(”test”,$i,time()+3600);

Test - cookie variable name

$i - value of the variable ‘Test’ time()+3600 - denotes that the cookie will expire after an one hour.

Destroy a cookie by specifying expiry time

Example: setcookie(”test”,$i,time()-3600); // already expired time

Reset a cookie by specifying its name only

setcookie(”test”);

CSS (Cascading Style Sheets ) Interview Questions and Answers

Ques: 13) What is the difference between the functions unlink and unset?

Answer:

unlink is a function for file system handling. It will simply delete the file in context. unset will set UNSET the specified variable.

unlink is used to delete a file. unset is used to destroy an earlier declared variable.

Robotic Process Automation(RPA) Interview Questions and Answers

Ques: 14) How do you know (status) whether the recipient of your mail had opened the mail i.e. read the mail?

Answer:

Embed an URL in a say 0-byte image tag may be the better way to go. In other word, you embed an invisible image on you html email and when the src URL is being rendered by the server, you can track whether your recipients have view the emails or not.

UX Design Interview Questions and Answers

Ques: 15) What is difference between mysql_connect and mysql_pconnect?

Answer:

mysql_connect opens up a database connection every time a page is loaded. mysql_pconnect opens up a connection, and keeps it open across multiple requests.

mysql_pconnect uses less resources, because it does not need to establish a database connection every time a page is loaded.

Docker Interview Questions and Answers

Ques: 16) What do you need to do to improve the performance (speedy execution) for the script you have written?

Answer:

If your script is to retrieve data from Database, you should use "Limit" syntax. Break down the non dynamic sections of website which need not be repeated over a period of time as include files.

Google Cloud Computing Interview Questions and Answers

Ques: 17) How do you insert single & double quotes in MySQL db without using PHP?

Answer:

By using & / &quote;

Alternately, escape single quote using forward slash \' . In double quote you don't need to escape quotes. Insert double quotes as "".

Azure Interview Questions and Answers

Ques: 18) What is the difference between strstr & stristr?

Answer:

For strstr, the syntax is: string strstr(string $string,string $str ); The function strstr will search $str in $string. If it finds the string means it will return string from where it finds the $str upto end of $string.

For Example:

$string = "http://yahoomail.com"; $str="yahoomail";

The output is "yahoomail.com". The main difference between strstr and stristr is of case sensitivity. The former consider the case difference and later ignore the case difference.

Linux Interview Questions and Answers

Ques: 19) What is the difference between explode and split?

Answer:

Split function splits string into array by regular expression. Explode splits a string into array by string.

For Example:

explode(" and", "India and Pakistan and Srilanka"); split(" :", "India : Pakistan : Srilanka");

Both of these functions will return an array that contains India, Pakistan, and Srilanka.

Data Science Interview Questions and Answers

Ques: 20) How can you avoid execution time out error while fetching record from MySQL?

Answer:

set_time_limit -- Limits the maximum execution time

For Example:

set_time_limit(0);

If you set to 0 you say that there is not limit.

Edge Computing Interview Questions and Answers

Top Technical Interviews Questions and Answers for AWS Cloud, Java, Oracle

December 23, 2019

Top 20 Data Science Interview Questions and Answers

December 02, 2019

Top 20 PHP Interview Questions and Answers