Top 20 Machine Learning Interview Questions and Answers

Ques: 1. What is the difference between supervised and unsupervised machine learning?

Answer:

Supervised learning requires training labelled data. For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labelled groups. Unsupervised learning, in contrast, does not require labelling data explicitly.

Ques: 2. What is Overfitting? And how do you ensure you’re not overfitting with a model?

Answer:

Over-fitting occurs when a model studies the training data to such an extent that it negatively influences the performance of the model on new data. This means that the disturbance in the training data is recorded and learned as concepts by the model. But the problem here is that these concepts do not apply to the testing data and negatively impact the model’s ability to classify the new data, hence reducing the accuracy on the testing data.

Collect more data so that the model can be trained with varied samples. Use assembling methods, such as Random Forest. It is based on the idea of bagging, which is used to reduce the variation in the predictions by combining the result of multiple Decision trees on different samples of the data set.

Ques: 3. What do you understand by precision and recall?

Answer:

Recall is also known as the true positive rate: the number of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the number of accurate positives your model claims compared to the number of positives it actually claims. It can be easier to think of recall and precision in the context of a case where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples. You’d have perfect recall (there are actually 10 apples, and you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.

Ques: 4. What are collinearity and multi collinearity?

Answer:

Collinearity occurs when two predictor variables (e.g., x1 and x2) in a multiple regression have some correlation.

Multi collinearity occurs when more than two predictor variables (e.g., x1, x2, and x3) are inter-correlated.

Ques: 5. What’s the difference between Type I and Type II error?

Answer:

Don’t think that this is a trick question! Many machine learning interview questions will be an attempt to lob basic questions at you just to make sure you’re on top of your game and you’ve prepared all of your bases.

Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.

A clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.

Ques: 6. What is A/B Testing?

Answer:

A/B is Statistical hypothesis testing for randomized experiment with two variables A and B. It is used to compare two models that use different predictor variables in order to check which variable fits best for a given sample of data.

Consider a scenario where you’ve created two models (using different predictor variables) that can be used to recommend products for an e-commerce platform.

A/B Testing can be used to compare these two models to check which one best recommends products to a customer.

Ques: 7. What is deep learning, and how does it contrast with other machine learning algorithms?

Answer:

Deep learning is a subset of machine learning that is concerned with neural networks: how to use back propagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

Ques: 8. Name a few libraries in Python used for Data Analysis and Scientific Computations.

Answer:

Here is a list of Python libraries mainly used for Data Analysis:

NumPy
SciPy
Pandas
SciKit
Matplotlib
Seaborn
Bokeh

Ques: 9. Which is more important to you– model accuracy, or model performance?

Answer:

This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense?

Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive data set with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance.

Ques: 10. How are NumPy and SciPy related?

Answer:

NumPy is part of SciPy. NumPy defines arrays along with some basic numerical functions like indexing, sorting, reshaping, etc.

SciPy implements computations such as numerical integration, optimization and machine learning using NumPy’s functionality.

Ques: 11. How would you handle an imbalanced dataset?

Answer:

An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:

Collect more data to even the imbalances in the dataset.
Re-sample the dataset to correct for imbalances.
Try a different algorithm altogether on your dataset.

What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

Ques: 12: Is rotation necessary in PCA? If yes, Why? What will happen if you don’t rotate the components?

Answer:

Yes, rotation (orthogonal) is necessary because it maximizes the difference between variance captured by the component. This makes the components easier to interpret. Not to forget, that’s the motive of doing PCA where, we aim to select fewer components (than features) which can explain the maximum variance in the data set. By doing rotation, the relative location of the components doesn’t change, it only changes the actual coordinates of the points.

If we don’t rotate the components, the effect of PCA will diminish and we’ll have to select more number of components to explain variance in the data set.

Ques: 13. What’s the “kernel trick” and how is it useful?

Answer:

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

Ques: 14. Explain prior probability, likelihood and marginal likelihood in context of naiveBayes algorithm?

Answer:

Prior probability is nothing but, the proportion of dependent (binary) variable in the data set. It is the closest guess you can make about a class, without any further information. For example: In a data set, the dependent variable is binary (1 and 0). The proportion of 1 (spam) is 70% and 0 (not spam) is 30%. Hence, we can estimate that there are 70% chances that any new email would be classified as spam.

Likelihood is the probability of classifying a given observation as 1 in presence of some other variable. For example: The probability that the word ‘FREE’ is used in previous spam message is likelihood. Marginal likelihood is, the probability that the word ‘FREE’ is used in any message.

Ques: 15. Do you have experience with Spark or big data tools for machine learning?

Answer:

You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.

Ques: 16: You came to know that your model is suffering from low bias and high variance. Which algorithm should you use to tackle it? Why?

Answer:

Low bias occurs when the model’s predicted values are near to actual values. In other words, the model becomes flexible enough to mimic the training data distribution. While it sounds like great achievement, but not to forget, a flexible model has no generalization capabilities. It means, when this model is tested on an unseen data, it gives disappointing results.

In such situations, we can use bagging algorithm (like random forest) to tackle high variance problem. Bagging algorithms divides a data set into subsets made with repeated randomized sampling. Then, these samples are used to generate a set of models using a single learning algorithm. Later, the model predictions are combined using voting (classification) or averaging (regression).

Also, to combat high variance, we can:

Use regularization technique, where higher model coefficients get penalized, hence lowering model complexity.

Use top n features from variable importance chart. May be, with all the variable in the data set, the algorithm is having difficulty in finding the meaningful signal.

Ques 17. Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

Answer:

What’s important here is to define your views on how to properly visualize data and your personal preferences when it comes to tools. Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau.

Ques: 18. How is kNN different from kmeans clustering?

Answer:

Don’t get mislead by ‘k’ in their names. You should know that the fundamental difference between both these algorithms is,

kmeans is unsupervised in nature and kNN is supervised in nature.
kmeans is a clustering algorithm. kNN is a classification (or regression) algorithm.
kmeans algorithm partitions a data set into clusters such that a cluster formed is homogeneous and the points in each cluster are close to each other.

The algorithm tries to maintain enough separability between these clusters. Due to unsupervised nature, the clusters have no labels.

KNN algorithm tries to classify an unlabelled observation based on its k (can be any number ) surrounding neighbors. It is also known as lazy learner because it involves minimal training of model. Hence, it doesn’t use training data to make generalization on unseen data set.

Ques: 19. Is it better to have too many false positives or too many false negatives? Explain.

Answer:

It depends on the question as well as on the domain for which we are trying to solve the problem. If you’re using Machine Learning in the domain of medical testing, then a false negative is very risky, since the report will not show any health problem when a person is actually unwell. Similarly, if Machine Learning is used in spam detection, then a false positive is very risky because the algorithm may classify an important email as spam.

Ques: 20. What is the difference between Gini Impurity and Entropy in a Decision Tree?

Answer:

Gini Impurity and Entropy are the metrics used for deciding how to split a Decision Tree.

Gini measurement is the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in the branch.

Entropy is a measurement to calculate the lack of information. You calculate the Information Gain (difference in entropies) by making a split. This measure helps to reduce the uncertainty about the output label.

December 13, 2019

Top 20 Oracle Financials Interview Questions and Answers

Ques: 1. What is the implication of dynamic insert?

Answer:

In Oracle EBS Applications, Dynamic Insertion is a feature which controls whether the user can enter new account code combinations from any form/window. If this feature is disabled, then the user cannot input new account code combinations from any window/form.

Oracle applications use a specific form known as Combination form, for directly entering the new code combinations. Users can enter new account code combinations only through this form if Dynamic Insertion is disabled.

Oracle Fusion Applications interview Questions and Answers

Ques: 2. What are the different statuses of an accounting period?

Answer:

The different status of an accounting period in oracle GL are:

Never Opened - Cannot enter or post journals.
Future Enterable - Enter journal but cannot post. The number of future enterable periods is a fixed number defined in the set of books window. The number of future enterable period can be changed at any time.
Open - Enter and port journals to any open period. An unlimited number of periods can be open but doing so may slow the posting process and can confuse users entering journals.
Closed - Cannot post journals in a closed period. Must reopen closed periods before posting journals. Should manually close periods after finishing month/quarter/year-end processing.
Permanently Closed - Permanently closed periods cannot be reopened. This status is required to Archive and Purge data.

Oracle Accounts Payables Interview Questions and Answers

Ques: 3. How many types of conversion rates are there in oracle GL?

Answer:

There are five basic types of conversion rate types predefined in Oracle GL:

Spot: An exchange rate based on the rate for a specific date. It applies to the immediate delivery of a currency.
Corporate: An exchange rate that standardize rates for your company. This rate is generally a standard market rate determined by senior financial management for use throughout the organization.
User: An exchange rate that you enter during foreign currency journal entry.
Emu Fixed: An exchange rate that is used by countries joining the EU during the transition period to the Euro currency.
User Defined: A rate type defined by your company to meet specific needs.

Oracle ADF Interview Questions and Answers

Ques: 4. What is the implication of the ‘future period” field in the set of book definition form?

Answer:

The value mentioned in the Future Period field represents the number of future enterable periods that users can use to input journal entries (provided those future periods are opened). However, consideration must be given to minimize the number of future enterable periods to prevent users from accidentally entering journal entries in an incorrect period.

Oracle Access Manager Interview Questions and Answers

Ques: 5. What action is required at set of book definition level / what is a suspense account and its purpose?

Answer:

If you choose to allow posting of out-of-balance/unbalanced journal entries, GL automatically posts the difference to Suspense Account. However, the Suspense Account check box should be checked and an Account # to be provided for this feature to work during the creation of set of books.

If you have multiple companies or balancing entities within a set of books, GL automatically creates a suspense account for each balancing entity.

Oracle Fusion HCM Interview Questions and Answers

Ques: 6. What is the purpose of stat journal?

Answer:

You can associate statistical amounts with monetary amounts by using statistical units of measure. This enables you to enter both monetary and statistical amounts in a single journal entry line.

Oracle SCM Interview Questions and Answers

Ques: 7. What are the target and offset accounts in allocation formula?

Answer:

These are the lines that are the actual journal entry:

Target (T): Enter an account in the Target line to specify the destination for your allocation. The parent value used in the target must be the same parent value used in the B and C lines of the formula.

Offset (O): Enter an account in the Offset line to specify the account to use for offsetting debit or credit from your allocation. The Offset account is usually the same account as formula line A to reduce the cost pool by the allocated amount.

Oracle Cloud Interview Questions and Answers

Ques: 8. How the Primary Ledger is different from Secondary Ledger?

Answer:

Use secondary ledgers for supplementary purposes, such as consolidation, statutory reporting, or adjustments for one or more legal entities within the same accounting setup.

For example, use a primary ledger for corporate accounting purposes that use the corporate chart of accounts and subledger accounting method, and use a secondary ledger for statutory reporting purposes that use the statutory chart of accounts and subledger accounting method. This allows you to maintain both a corporate and statutory representation of the same legal entity’s transactions in parallel.

Oracle PL/SQL Interview Questions and Answers

Ques: 9. What is an adjusting period and its implications?

Answer:

Typically, the last day of the fiscal year is used to perform adjusting and closing journals entries. This period is referred to as Adjusting Period. Choosing whether to include an adjusting period or not in a calendar is a very important decision. There can be unlimited number of adjusting periods. Once the accounting calendar is used, changes to its structure to remove or add an adjusting period cannot be done.

Oracle SQL Interview Questions and Answers

Ques: 10. How is the effective date related to the period?

Answer:

Effective Date and Period are related to each other in Journals scenarios when we are trying to import journal import by effective dates. A new profile option, GL Journal Import, Separate Journals by Accounting Date, allows us to choose how journal import will group journal lines.

Yes: Journal import will place journal lines with different accounting dates into separate journals.
No: Journal import will group all journal lines with different accounting dates that fall into the same accounting period into the same journal, unless average balance processing is enabled.

Oracle RDMS Interview Questions and Answers

Ques: 11. What do you understand by consolidation workbench?

Answer:

The consolidation workbench provides a central point of control for consolidating an unlimited number of subsidiaries to your parent. This window provides feedback on the state of the consolidation process, keeping you informed about each subsidiary’s consolidation status. The workbench also monitors subsidiary account balances for any changes that occur after the subsidiary data has been transferred to your parent SOBs.

Consolidation Sets: You can even create consolidation sets which launch multiple consolidations in a single step for overall streamlining of the consolidation process.
Consolidation Hierarchies: You can create consolidation hierarchies, or multi-level hierarchies, and view your consolidations hierarchies using a graphical Consolidation Hierarchy Viewer.
State Controller: From the consolidation workbench, you can access the State Controller, which is a color coded navigation tool to guide through the consolidation process.

BI Publisher Interview Questions and Answers

Ques: 12. What is Translations & Revaluation and which level its working?

Answer:

Translation: It is used to translate functional currency balances into foreign currency balances at the account level.

Revaluation: It is used identify the unrealized gain or loss .which is occurring on the currency fluctuation.

Oracle 10g Interview Questions and Answers

Ques: 13. What is adjusting period?

Answer:

Typically, the last day of the fiscal year is used as an adjusting period to perform adjusting and closing journal entries. Once you begin using your accounting calendar, you cannot change its structure to remove or add an adjusting period. Choosing whether to include an adjusting period or not in your calendar is a very important decision. You can have an unlimited number of adjusting periods.

Ques: 14. What is 2-way, 3-way and 4-way matching?

Answer:

Making payments to the suppliers in 3 ways. whatever you have ordered for the PO we will make the payment for the suppliers

1). in 2-way matching, we will compare two documents PO and Invoice.

For example: Suppose we had given PO for 10 items, for that we will receive invoice for 10 items. so that we will make payment for that 10 items.

2) In 3-Way matching, we will compare 3 documents PO + receipt + Invoice.

For example: Suppose we have ordered 10 items in PO. But we had received only 8 items ,But we had received invoice for 10 items. so, we will make payment for only 8 items.

3) IN 4-Way matching, we will compare 4 documents PO + Receipt + Invoice + Inspection.

For example: Suppose we have 10 items in PO. Supplier sends us 8 items. We will do inspection on those items whatever we have received, If 2 items got damaged. finally, we are going to make payment to the 6 items only.

Ques: 15. What is SWEEP Program? Explain Process Of Sweep Program?

Answer:

This particular program is run in order to transfer un-accounted invoice to next opened period during period end closing of Accounts Payable. In fact you can’t close Payable Period if you have Un-Accounted Invoice in Payables. In order to negotiate (Transfer) these invoice to next open period this program is run. So that the Payable period can be closed.

Ques: 16. What is Debit Memo and Credit Memo in AP?

Answer:

Debit Memo: Its negative amount identified by Customer and sent to Supplier. For Example: Purchase Returns.

Credit memo: Its negative amount identified by Supplier and sent to the Customer. Ex: TDS Payables

In Payable we are receiving the material from supplier. so we have to pay the amount to the supplier. in case supplier has send the goods more than what we order at the point of we must return the goods reduce the accounting balance. We send a memo to the supplier is called as debit memo or supplier send a memo is called as credit memo. Both reducing our liability. Ex: In Payables Debit Memo and Credit Memo functionality is same It decreases the supplier balance (i.e. decreases the liability) Eg Supplier has send you invoice X with an amount of $100 but Later we found there is mismatch in quantity (more quantity billed) so we will inform to customer. Then customer has sent you the credit memo but if customer says send me the debit memo then you will generate debit memo from your end. Both are same as functionality.

Ques: 17. Difference between Standard and mixed Invoices?

Answer:

Standard Invoice: Standard Invoice are invoices from a supplier representing an amount due for goods or services purchased. Standard invoices can be either matched to a purchase order or not matched. Standard invoices must be positive amounts.

Mixed Invoices: Mixed Invoices can be matched to both purchase orders and invoices. Mixed invoices can have either positive or negative amounts.

Ques: 18. What is Security Rules and Cross validation Rules?

Answer:

Security Rules: It is used to restrict the users from entering the segments. It will work at the responsibility level.

Cross validation Rules: It is used to restrict the end users from entering the code combinations. It will work at structure level.

Ques: 19. Define FSG (Financial Statement Generator) ?

Answer:

This is a kind of tool that is highly powerful as well as flexible and helps in building reports that are customized without depending on programming. This tool is only available with GL.

Ques: 20. How Many types of AR Invoices?

Answer:

There are 7 types of invoices in AR Transactions:

Invoice
Credit memo
Debit memo
Deposit
Guaranty
Chargeback
Bills Receivables.

Top Technical Interviews Questions and Answers for AWS Cloud, Java, Oracle

December 21, 2019

Top 20 Machine Learning Interview Questions and Answers

December 13, 2019

Top 20 Oracle Financials Interview Questions and Answers