Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

December 30, 2021

Top 20 Python Pandas Interview Questions and Answers


            Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is open-source and BSD-licensed. Python with Pandas is utilised in a variety of academic and commercial disciplines, including finance, economics, statistics, analytics, and more. 

Data analysis necessitates a great deal of processing, such as restructuring, cleansing, or combining, among other things. Numpy, Scipy, Cython, and Panda are just a few of the quick data processing tools available. However, we favour Pandas since they are faster, easier, and more expressive than other tools.


Python Interview Questions & Answers


Ques. 1): What is Pandas? What is the purpose of Python pandas?

Answer:

Pandas is a Python module that provides quick, versatile, and expressive data structures that make working with "relational" or "labelled" data simple and intuitive. Its goal is to serve as the foundation for undertaking realistic, real-world data analysis in Python.

Pandas is a data manipulation and analysis software library for the Python programming language. It includes data structures and methods for manipulating numerical tables and time series, in particular. Pandas is open-source software distributed under the BSD three-clause licence.

 

Ques. 2): Mention the many types of data structures available in Pandas?

Answer:

The pandas library supports two data structures: Series and DataFrames. Numpy is used to construct both data structures. In pandas, a Series is a one-dimensional data structure, while a DataFrame is a two-dimensional data structure. Panel is another axis label that is a three-dimensional data structure that comprises items, major axis, and minor axis.

 

Ques. 3): What are the key features of pandas library ? What is pandas Used For ?

Answer:

There are various features in pandas library and some of them are mentioned below

Data Alignment

Memory Efficient

Reshaping

Merge and join

Time Series

This library is developed in Python and can be used to do data processing, data analysis, and other tasks. To manipulate time series and numerical tables, the library contains numerous operations as well as data structures.

 

Ques. 4): What is Pandas NumPy?

Answer:

Pandas Numpy is an open-source Python module that allows you to work with a huge number of datasets. For scientific computing with Python, it has a powerful N-dimensional array object and complex mathematical methods.

Fourier transformations, linear algebra, and random number capabilities are some of Numpy's most popular features. It also includes integration tools for C/C++ and Fortran programming.

 

Ques. 5): In Pandas, what is a Time Series?

Answer:

An ordered sequence of data that depicts how a quantity evolves over time is known as a time series. For all fields, pandas has a wide range of capabilities and tools for working with time series data.

pandas supports:

Taking time series data from a variety of sources and formats and parsing it

Create a series of dates and time ranges with a set frequency.

Manipulation and conversion of date and time with timezone data

A time series is resampled or converted to a specific frequency.

Using absolute or relative time increments to do date and time arithmetic.

 

Ques. 6): In pandas, what is a DataFrame?

Answer:

Pandas DataFrame is a possibly heterogeneous two-dimensional size-mutable tabular data format with labelled axes (rows and columns). A data frame is a two-dimensional data structure in which data is organised in rows and columns in a tabular format. The data, rows, and columns are the three main components of a Pandas DataFrame.

Creating a Pandas DataFrame-

A Pandas DataFrame is built in the real world by loading datasets from existing storage, which can be a SQL database, a CSV file, or an Excel file. Pandas DataFrames can be made from lists, dictionaries, and lists of dictionaries, among other things. A dataframe can be constructed in a variety of ways.  

Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

 

Ques. 7): Explain Series In pandas. How To Create Copy Of Series In pandas?

Answer:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

>>> s = pd.Series(data, index=index), where the data can be a Python dict, an ndarray or a scalar value.

To create a copy in pandas, we can call copy() function on a series such that

s2=s1.copy() will create copy of series s1 in a new series s2.

 

Ques. 8): How will you create an empty DataFrame in pandas?

Answer:

To create a completely empty Pandas dataframe, we use do the following:

import pandas as pd

MyEmptydf = pd.DataFrame()

This will create an empty dataframe with no columns or rows.

To create an empty dataframe with three empty column (columns X, Y and Z), we do:

df = pd.DataFrame(columns=[‘X’, ‘Y’, ‘Z’])

 

Ques. 9): What is Python pandas vectorization?

Answer:

The process of executing operations on the full array is known as vectorization. This is done to reduce the number of times the functions iterate. Pandas has a number of vectorized functions, such as aggregations and string functions, that are designed to work with series and DataFrames especially. To perform the operations quickly, it is preferable to use the vectorized pandas functions.

 

Ques. 10):  range ()  vs and xrange () functions in Python?

Answer:

In Python 2 we have the following two functions to produce a list of numbers within a given range.

range()

xrange()

in Python 3, xrange() is deprecated, i.e. xrange() is removed from python 3.x.

Now In Python 3, we have only one function to produce the numbers within a given range i.e. range() function.

But, range() function of python 3 works same as xrange() of python 2 (i.e. internal implementation of range() function of python 3 is same as xrange() of Python 2).

So The difference between range() and xrange() functions becomes relevant only when you are using python 2.

range() and xrange() function values

a). range() creates a list i.e., range returns a Python list object, for example, range (1,500,1) will create a python list of 499 integers in memory. Remember, range() generates all numbers at once.

b).xrange() functions returns an xrange object that evaluates lazily. That means xrange only stores the range arguments and generates the numbers on demand. It doesn’t generate all numbers at once like range(). Furthermore, this object only supports indexing, iteration, and the len() function.

On the other hand xrange() generates the numbers on demand. That means it produces number one by one as for loop moves to the next number. In every iteration of for loop, it generates the next number and assigns it to the iterator variable of for loop.

 

Ques. 11):  What does categorical data mean in Pandas?

Answer:

Categorical data is a Pandas data type that correlates to a statistical categorical variable. A categorical variable is one that has a restricted number of possible values, which is usually fixed. Gender, country of origin, blood type, social status, observation time, and Likert scale ratings are just a few examples. Categorical data values are either in categories or np.nan.This data type is useful in the following cases:

It is useful for a string variable that consists of only a few different values. If we want to save some memory, we can convert a string variable to a categorical variable.

It is useful for the lexical order of a variable that is not the same as the logical order (“one”, “two”, “three”) By converting into a categorical and specify an order on the categories, sorting and min/max is responsible for using the logical order instead of the lexical order.

It is useful as a signal to other Python libraries because this column should be treated as a categorical variable.

 

Ques. 12): To a Pandas DataFrame, how do you add an index, a row, or a column?

Answer:

Adding an Index into a DataFrame: If you create a DataFrame with Pandas, you can add the inputs to the index argument. It will ensure that you get the index you want. If no inputs are specified, the DataFrame has a numerically valued index that starts at 0 and terminates on the DataFrame's last row.

Increasing the number of rows in a DataFrame: To insert rows in the DataFrame, we can use the.loc, iloc, and ix commands.

The loc is primarily used for our index's labels. It can be seen as if we insert in loc[4], which means we're seeking for DataFrame items with an index of 4.

The ix is a complex case because if the index is integer-based, we pass a label to ix. The ix[4] means that we are looking in the DataFrame for those values that have an index labeled 4. However, if the index is not only integer-based, ix will deal with the positions as iloc.

 

Ques. 13): How to Delete Indices, Rows or Columns From a Pandas Data Frame?

Answer:

Deleting an Index from Your DataFrame

If you want to remove the index from the DataFrame, you should have to do the following:

Reset the index of DataFrame.

Executing del df.index.name to remove the index name.

Remove duplicate index values by resetting the index and drop the duplicate values from the index column.

Remove an index with a row.

Deleting a Column from Your DataFrame

You can use the drop() method for deleting a column from the DataFrame.

The axis argument that is passed to the drop() method is either 0 if it indicates the rows and 1 if it drops the columns.

You can pass the argument inplace and set it to True to delete the column without reassign the DataFrame.

You can also delete the duplicate values from the column by using the drop_duplicates() method.

Removing a Row from Your DataFrame

By using df.drop_duplicates(), we can remove duplicate rows from the DataFrame.

You can use the drop() method to specify the index of the rows that we want to remove from the DataFrame.

 

Ques. 14): How to convert String to date?

Answer:

The below code demonstrates how to convert the string to date:

From datetime import datetime

# Define dates as the strings

dmy_str1 = ‘Wednesday, July 14, 2018’

dmy_str2 = ’14/7/17′

dmy_str3 = ’14-07-2017′

# Define dates as the datetime objects

dmy_dt1 = datetime.strptime(date_str1, ‘%A, %B %d, %Y’)

dmy_dt2 = datetime.strptime(date_str2, ‘%m/%d/%y’)

dmy_dt3 = datetime.strptime(date_str3, ‘%m-%d-%Y’)

#Print the converted dates

print(dmy_dt1)

print(dmy_dt2)

print(dmy_dt3)

 

Ques. 15): What exactly is the Pandas Index?

Answer:

Pandas indexing is as follows:

In pandas, indexing simply involves picking specific rows and columns of data from a DataFrame. Selecting all of the rows and some of the columns, part of the rows and all of the columns, or some of each of the rows and columns is what indexing entails. Subset selection is another name for indexing.

Using [],.loc[],.iloc[],.ix[] for Pandas indexing

A DataFrame's items, rows, and columns can be extracted in a variety of methods. In Pandas, there are some indexing methods that can be used to retrieve an element from a DataFrame. These indexing systems look to be fairly similar on the surface, however they perform extremely differently. Pandas supports four different methods of multi-axes indexing:

Dataframe.[ ] ; This function also known as indexing operator

Dataframe.loc[ ] : This function is used for labels.

Dataframe.iloc[ ] : This function is used for positions or integer based

Dataframe.ix[] : This function is used for both label and integer based

Collectively, they are called the indexers. These are by far the most common ways to index data. These are four function which help in getting the elements, rows, and columns from a DataFrame.

 

Ques. 16): Define ReIndexing?

Answer:

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

Reorder the existing data to match a new set of labels.

Insert missing value (NA) markers in label locations where no data for the label existed.

 

Ques. 17): How to Set the index?

Answer:

Python is an excellent language for data analysis, thanks to its vast ecosystem of data-centric Python packages. One of these packages is Pandas, which makes importing and analysing data a lot easier.

Pandas set index() is a function for setting the index of a Data Frame from a List, Series, or Data Frame. A data frame's index column can also be set while it's being created. However, because a data frame might be made up of two or more data frames, the index can be altered later using this method.

Syntax:

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False

 

Ques. 18): Define GroupBy in Pandas?

Answer:

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters :

by : mapping, function, str, or iterable

axis : int, default 0

level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels

as_index : For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

sort : Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

group_keys : When calling apply, add group keys to index to identify pieces

squeeze : Reduce the dimensionality of the return type if possible, otherwise return a consistent type

Returns : GroupBy object

 

Ques. 19): How will you add a scalar column with same value for all rows to a pandas DataFrame?

Answer:

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Dataframe.add() method is used for addition of dataframe and other, element-wise (binary operator add). Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs.

Syntax: DataFrame.add(other, axis=’columns’, level=None, fill_value=None)

Parameters:

other :Series, DataFrame, or constant

axis :{0, 1, ‘index’, ‘columns’} For Series input, axis to match Series index on

fill_value : [None or float value, default None] Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing.

level : [int or name] Broadcast across a level, matching Index values on the passed MultiIndex level

Returns: result DataFrame

 

Ques. 20): In pandas, how can you see if a DataFrame is empty?

Answer:

Pandas DataFrame is a possibly heterogeneous two-dimensional size-mutable tabular data format with labelled axes (rows and columns). Both the row and column labels align for arithmetic operations. It can be viewed of as a container for Series items, similar to a dict. The Pandas' fundamental data structure is this.

Pandas DataFrame is a dataframe for Pandas.

The empty attribute determines whether or not the dataframe is empty. If the dataframe is empty, it returns True; otherwise, it returns False.

Syntax: DataFrame.empty

Parameter : None

Returns : bool

 


May 25, 2019

Top 20 Python Interview Questions and Answers

 
Ques: 1. So, what is Python?
 
Answer:
 
Python is an open source programming language that is widely used as one of the most popular interpreted languages. Objects, modules, threads, exceptions, dynamic typing, very high-level dynamic data types, classes, and memory management are all included. It's easy to use, portable, expandable, interpreted, interactive, object-oriented, and has a built-in data structure. It's also open source. It's paired with a very simple syntax. Python is portable and scriptable, but it is mostly thought of as a general-purpose programming language.
 


 
Ques: 2. What are the benefits of using Python?
 
Answer:
 
  • Python is one of the most success interpreted languages. Means that, Python scripts does not need to be compiled before it is run.
  • Python is an object oriented as it allows the definition of classes along with composition and inheritance.
  • In Python, functions and classes are first-class objects. It means that they can be assigned to variables, returned from other functions and passed into functions.
  • Python is dynamically typed, means that no need to state the types of variables when you declare them. You can declare variables without error like a=10 and then a="a good programmer".
  • Python code is very quick to write. The numpy package is a good example of this, it is very quick because a lot of the number crunching in it.
  • Python has a vast area to use in web applications, automation, scientific modelling, big data applications and many more.
  • It’s also often used as inclusive code to get other languages and components to use.
 
 
Python Pandas Interview and Questions
 
 
Ques: 3. What do you know about PEP 8?
 
Answer:
 
PEP is short for Python Enhancement Proposal. It defines a set of rules that specify how to format Python code for maximum readability. It is the latest Python coding standard, a set of coding recommendations. It guides to deliver more readable Python code.
 


 
Ques: 4. How does the Conversion from number to string happens in Python?
 
Answer:
 
To convert, the number to string, in Python, use the built-in function str(). For hexadecimal or octal representation conversion, use the built-in functions hex() or oct(). And for fancy formatting, use the % operator on strings, e.g. "%04d" % 144 yields '0144' and "%.3f" % (1/3.0) yields '0.333'.
 

C++ language Interview Questions and Answers


Ques: 5. What Is Class and method in Python? How will you use them?
 
Answer:
 
A class can be based on one or more other classes, called its base class(es). It then inherits the attributes and methods of its base classes. This allows an object model to be successively refined by inheritance.  A class is the object type created by executing a class statement. Class objects are used as templates to create instance objects, which embody both the data (attributes) and code (methods) specific to a data type.
 
A method is a function on some object x that you normally call as x.role(arguments...). Methods are defined as functions inside the class definition:
class C:
def names (self, arg):
return arg*2 + self.attribute.
 


 
Ques: 6. What are the some of the core default modules available in Python?
 
Answer:
 
There are a few of the core default modules available in Python.
 
    XML – Enables XML support.
    string – Contains functions to process standard Python strings.
    traceback – Allows to extract and print stack trace details.
    email – Help to parse, handle, and generate email messages.
    sqlite3 – Provides methods to work with the SQLite database.
    logging – Adds support for log classes and methods.
 


 
Ques: 7. What Is Self in Python?
 
Answer:
 
Self is just a name for the first argument of a method. A method defined as name(self, a, b, c) should be called as x.name(a, b, c) for some instance x of the class in which the definition occurs; the called method will think it is called as name(x, a, b, c).
 


 
Ques: 8. How Do I Generate Random Numbers In Python?
 
Answer:
 
The standard module random implements a random number generator. Usage is simple:
import random
random.random()
This returns a random floating point number in the range [0, 1).
But you can also generate random numbers in Python in different ways:
· uniform(X, Y) - It returns a floating point number between the values given as X and Y.
· randint(X, Y) - This command returns a random integer between the values given as X and Y.
 


 
Ques: 9. In Python, how memory is managed?
 
Answer:
 
The memory is managed by private heap space in Python. All objects and data structures located in a private heap. The developer/coder/programmer does not have any access to the private heap and interpreter will takes care of Python private heap in memory. The Python memory manager will allocate Python heap space for Python objects. The Python core APIs give access to some tools for the programmer to code. There is an inbuilt garbage collector in Python, which recycle all the unused memory and frees the memory and makes it available to the heap space.
 
 


Ques: 10. What are the available tools to search the bugs or perform static analysis?
 
Answer:
 
PyChecker is a static analysis tool that searches the bugs in source code and gives warnings about the complexity and style of the bug. And Pylint is another tool that checks whether the module satisfies the coding standard and makes sure to write plug-ins to add a custom feature.
 


 
Ques: 11. Can you explain the rules For Local and Global Variables in Python?
 
Answer:
 
Variables are only referenced inside a function and are implicitly global. A local variable is assigned a new value anywhere within the function's body. The variable is implicitly local, if a variable is assigned a new value inside the function. Otherwise, you need to explicitly declare it as 'global'. If global was required for all global references, you'd be using global all the time. You'd have to declare as global every reference to a built-in function or to a component of an imported module.
 
 
Ques: 12. What do you know about The Dictionary Function in Python?
 
Answer:
 
In python, you have to associate keys with values. Key should be unique because it is useful for retrieving information in Python. A dictionary is a place where you will find and store information on address, contact details, etc. In Python, the strings will be passed as keys.
 
Keys must be separated by a colon and the pairs are separated themselves by commas. And the whole statement will be enclosed in curly brackets.
 
 
Ques: 13. Can you explain the Sharing of Global Variables Across Modules in Python?
 
Answer:
 
The canonical way to share information across modules within a single program is to create a special module (often called config or cfg). Just import the config module in all modules of your application; the module then becomes available as a global name. Because there is only one instance of each module, any changes made to the module object get reflected everywhere. For example:
 
config.py:
x = 0 # Default value of the 'x' configuration setting
mod.py:
import config
config.x = 1
main.py:
import config
import mod
print config.x
 
 
Ques: 14. Do you know about Indexing and Slicing Operation in Sequences?
 
Answer:
 
Python supports two main operations which are indexing and slicing. Tuples, lists and strings are some examples about sequence. Indexing operation allows you to fetch an item in the sequence and slicing operation allows you to retrieve an item from the list of sequence. Python starts from the beginning and if successive numbers are not specified it starts at the last. In python the start position is included but it stops before the end statement.
 
 
Question: 15. What do you mean by Raising Error Exceptions in Python?
 
Answer:
 
Programmer can raise exceptions using the raise statements in python. While using exception statement, error and exception object must be specified first. This error should be related to the derived class of the Error. We can use this to specify about the length of the user name, password field.
 
 
Ques: 16. What do you mean by a Lambda Form?
 
Answer:
 
Make_repeater is used to create a function during the run time and it is later called at run time. The lambda statement is used to create a new function which can be later used during the run time. Lambda function takes expressions only in order to return them during the run time.
 
 
Ques: 17. What do you understand by Assert Statement?
 
Answer:
 
This statement is very useful when you want to check the items in the list for true or false function. This statement should be predefined because it interacts with the user and raises an error if something goes wrong. Assert statement is used to assert whether something is true or false.
 
 
Ques: 18. What are Pickling and Unpickling in Python?
 
Answer:
 
By specifying the dump function, you can store the data into a specific file and this is known as pickling. Python has a standard module known as Pickle which enables you to store a specific object at some destination and then you can call the object back at later stage. While you are retrieving the object, this process is known as unpickling.
 
 
Ques: 19. How a Tuple is different from a List?
 
Answer:
 
A tuple is a list that is immutable. A list is mutable, means that the members of a list can be changed and altered. But a tuple is immutable, it means that the members of list cannot be changed.
Other significant difference is of the syntax. A list is defined as
 
listA = [1,2,5,8,5,3,]
listB= ["John", "Alice", "Tom"]
A tuple is defined in the following way
tupX = (1,4,2,4,6,7,8)
tupY = ("John","Alice", "Tom")
 
 
Ques: 20. Does Python allow arguments Pass by Value or Pass by Reference?
 
Answer:
 
No, there is nothing like passing of the arguments by Value or pass by reference. Instead, they are Pass by assignment. Python does not support passing arguments by value or reference.
 
The parameter which you pass is originally a reference to the object not the reference to a fixed memory location. But the reference is passed by value. Additionally, some data types like strings and tuples are immutable whereas others are mutable.