Showing posts with label answer. Show all posts
Showing posts with label answer. Show all posts

April 15, 2022

Top 20 Apache Storm Interview Questions and Answers

     

                                    Apache Storm is a Clojure-based distributed stream processing platform that is free and open-source. The project was founded by Nathan Marz and the BackType team, and it was open-sourced after Twitter acquired it. Storm makes it simple to reliably process unbounded streams of data, resulting in real-time processing instead of batch processing as Hadoop does. Storm is simple to use and may be used with a variety of computer programming languages.


Apache Kafka Interview Questions and Answers 


Ques. 1): Where would you put Apache Storm to good use?

Answer:

Storm is used for: Stream processing—Apache Storm is used to process real-time streams of data and update many databases. The input data must be processed at a rate that is equal to that of the input data. Apache Storm's Distributed RPC may parallelize a complex query, allowing it to be computed in real time. Continuous computation—Data streams are processed in real time, and Storm displays the results to clients. This may necessitate processing each message as it arrives or developing it in small batches over a short period of time. Continuous computation is demonstrated by streaming popular themes from Twitter into web browsers. Real-time analytics: Apache Storm will evaluate and respond to data as it arrives in real-time from multiple data sources.

 

Apache Struts 2 Interview Questions and Answers


Ques. 2): What exactly is Apache Storm? What are the Storm components?

Answer:

Apache Storm is an open source distributed real-time compute system for real-time large data analytics processing. Apache Storm, unlike Hadoop, supports real-time processing and may be utilised with any programming language.

Apache Storm contains the following components:

Nimbus: It functions as a Job Tracker for Hadoop. It distributes code across the cluster, uploads computation for execution, assigns workers to the cluster, and monitors and reallocates workers as needed.

Zookeeper: It serves as a communication intermediary between the Storm Cluster Supervisor and the Zookeeper. Through Zookeeper, it interacts with Nimbus and runs the procedure based on the signals received from Nimbus.

 

Apache Spark Interview Questions and Answers


Ques. 3): Storm's Codebase has how many unique layers?

Answer:

Storm's codebase is divided into three tiers.

First: Storm was built from the ground up to work with a variety of languages. Topologies are defined as Thrift structures in Nimbus, which is a Thrift service. Storm may be utilised from any language thanks to the use of Thrift.

Second: Storm specifies all of its interfaces as Java interfaces. So, despite the fact that Storm's implementation contains a lot of Clojure, all usage must go through the Java API. This means that Storm's whole feature set is always accessible via Java.

Third: Storm’s implementation is largely in Clojure. Line-wise, Storm is about half Java code, half Clojure code. But Clojure is much more expressive, so in reality, the great majority of the implementation logic is in Clojure.


Apache Hive Interview Questions and Answers


Ques. 4): What are Storm Topologies and How Do They Work?

Answer:

A Storm topology contains the philosophy for a real-time application. Storm is similar to MapReduce in terms of topology. A key distinction is that a MapReduce job has a finish, whereas a topology can go on indefinitely (or until you kill it, of course). A topology is a graph made up of spouts, bolts, and stream groupings.


Apache Tomcat Interview Questions and Answers


5) What do you mean when you say "nodes"?

Answer:

The Master Node and the Worker Node are the two types of nodes. The Master Node manages the Nimbus daemon, which assigns work to devices and monitors their performance. The Supervisor daemon, which runs on the Worker Node, assigns responsibilities to other worker nodes and supervises them as needed.


Apache Ambari interview Questions and Answers


Ques. 6): Explain how Apache Storm processes a message completely.

Answer:

Storm requests a tuple from the Spout by invoking the nextTuple function or method on the Spout. To discharge a tuple to one of its output streams, the Spout uses the SpoutoutputCollector provided in the open method. The Spout assigns a "message id" to each tuple it discharges, which will be used to recognise the tuple afterwards.

The tuple is then transmitted to consuming bolts, and storm is in charge of tracking the message tree that is generated. If the storm is certain that a tuple has been properly digested, it can invoke the ack operation on the original Spout job, passing the message id that the Spout has provided to the Storm.


Apache Tapestry Interview Questions and Answers


Ques. 7): When my topology is being set up, why am I getting a NotSerializableException/IllegalStateException?

Answer:

Prior to the topology being performed, the topology is instantiated and then serialised to byte format and stored in ZooKeeper as part of the Storm lifecycle.

Serialization will fail in this stage if a spout or bolt in the topology has an initialised unserializable property. If an unserializable field is required, initialise it in the prepare method of the bolt or spout, which is called after the topology is supplied to the worker.


Apache Ant Interview Questions and Answers


Ques. 8): In Storm, how do you kill a topology?

Answer:

Storm kill topology-name [-w wait-time-secs]

The topology with the name topology-name is destroyed. Storm will turn off the topology's spouts for the duration of the topology's message timeout to allow all currently processing messages to complete. Storm will then turn off the generators and clean up the area. With the -w parameter, you can stop Storm from pausing between deactivation and shutdown.


Apache Camel Interview Questions and Answers


Ques. 9): What is the best way to write integration tests in Java for an Apache Storm topology?

Answer:

For integration testing, you can use LocalCluster. For ideas, have a look at some of Storm's own integration tests. FeederSpout and FixedTupleSpout are the tools you should employ. Using the tools in the Testing class, a topology where all spouts implement the CompletableSpout interface can be run until fulfilment. Storm tests can additionally choose to "simulate time," which means the Storm topology will be idle until LocalCluster.advanceClusterTime is called. This allows you to do assertions, for example, in between bolt emits.


Apache Cassandra Interview Questions and Answers


Ques. 10): What's the difference between Apache Kafka and Apache Storm, and why should you care?

Answer:

Apache Kafka: Apache Kafka is a distributed and scalable messaging system that can manage massive amounts of data while allowing messages to flow from one end-point to the next. Kafka is meant to allow a single cluster to function as a huge organization's central data backbone. It may be expanded elastically and transparently without any downtime. To allow data streams greater than the capability of any single machine and clusters of coordinated users, data streams are partitioned and dispersed across a cluster of machines.

Whereas.

Apache Storm is a real-time message processing system that allows users to update and manipulate data in real time. Storm takes the data from Kafka and does the necessary processing. It makes it simple to process limitless streams of data safely, doing real-time processing in the same way as Hadoop does batch processing. A storm is easy to use, works with any computer language, and is a lot of fun.


Apache NiFi Interview Questions and Answers


Ques. 11): When do you invoke the cleanup procedure?

Answer:

When a Bolt is shut down, the cleanup method is called, and it should clean up any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine where the task is running crashes, the function will be unavailable.

When you run topologies in local mode (where a Storm cluster is mimicked), and you want to be able to run and kill several topologies without experiencing resource leaks, you should use the cleanup approach.

 

Ques. 12): What are the common configurations in Apache Storm?

Answer:

Config.TOPOLOGY_WORKERS : This sets the number of worker processes to use to execute the topology.

Config.TOPOLOGY_ACKER_EXECUTORS : This sets the number of executors that will track tuple trees and detect when a spout tuple has been fully processed By not setting this variable or setting it as null, Storm will set the number of acker executors to be equal to the number of workers configured for this topology.

Config.TOPOLOGY_MAX_SPOUT_PENDING : This sets the maximum number of spout tuples that can be pending on a single spout task at once

Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed.

Config.TOPOLOGY_SERIALIZATIONS : You can register more serializers to Storm using this config so that you can use custom types within tuples.

 

Ques. 13):  What are the advantages of using Apache Storm?

Answer:

Using Apache Storm for real-time processing has a number of advantages.

First:  Storm is a breeze to use. Its pre-configured configurations make it simple to deploy and use.

Second: it works extremely quickly, with each node capable of processing 100 messages in a single second.

Third: Apache Storm is a scalable option since it can work with any programming language, making it simple to deploy across a large number of machines.

Fourth: it has the capability of automatically identifying errors and restarting the workers.

Fifth: it is one of the most significant benefits of Apache Storm that it is reliable. It ensures the data to be executed once and in some cases more than once.

 

Ques. 14): In Storm, tell us about the stream groups that are built-in?

Answer:

Storms consist of 8 stream grouping that is built-in. These are :

1.      Shuffle Grouping: In Shuffle Grouping, the same quantity of tuples is distributed to every bolt in a random manner.

2.      Field Grouping: The stream is divided with the help of fields that are specified in the grouping, in Field Grouping.

3.      Partial Key Grouping: Partial Grouping is similar to field grouping as the stream is divided with the help of the field that is specified in the grouping but is different in the sense that it gives resources that are better utilized.

4.      All Grouping: All Grouping should be used carefully since the stream is duplicated across the task of all bolts.

5.      Global Grouping: In Global Grouping, the complete stream goes to each of the tasks of bolts.

6.      None Grouping: In None Grouping, the grouping of stream need not have cared.

7.      Direct Grouping: Direct Grouping is a different type of grouping than others. In this, it is decided by the producer that the task will be received by which task.

8.      Local Grouping: Local Grouping is also known as Shuffle Grouping.

 

Ques. 15): What role does real-time analytics play?

Answer:

Real-time analytics is critical, and the need is rapidly increasing. The application appears to deliver quick solutions with real-time insights. It covers a wide range of industries, including retail, telecommunications, and finance. In the banking industry, many frauds are reported. Fraud transactions are one of the most common types of fraud. Such frauds occur often, and real-time analytics can assist in detecting and identifying them. It also has a place in the world of social media sites like Twitter. The most popular subjects on Twitter are displayed to users. Real-time analytics plays a part in attracting visitors and generating cash.

 

Ques. 16): Why SSL is not included in Apache?

Answer:

SSL is not included in Apache due to some significant reasons. Some governments don't allow import, export and do not give permission for using the encryption technology which is required by SSL data transport. If SSL would be included in Apache then it won't be available freely due to various legal matters. Some technology of SSL which is used for talking to current clients is under the patent of RSA Data Security and it does not allow its usage without a license.

 

Ques. 17): What is the purpose of the Server Type directive in the Apache server?

Answer: The server type directive in Apache's server determines whether Apache should keep everything in one process or spawn as a child process. The server type directive is not accessible in Apache 2.0, hence it is not discovered. It is, nevertheless, accessible in Apache 1.3 for background compatibility with Apache UNIX versions.

 

Ques. 18): Worker processes, executors, and tasks: what forms a running topology?

Answer:

Storm differentiates between the three major entities that are utilised to run a topology in a Storm cluster:

        A worker process is responsible for executing a subset of a topology. A worker process is associated with a specific topology and may run one or more executors for one or more of the topology's components (spouts or bolts). Within a Storm cluster, a running topology comprises of many such processes running on multiple machines.

        A thread produced by a worker process is known as an executor. It may do one or more tasks for the same component at the same time (spout or bolt).

·         A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time.

 

Ques. 19): When a Nimbus or Supervisor daemon dies, what happens?

Answer:

The Nimbus and Supervisor daemons are designed to be stateless and fail-fast (the process self-destructs whenever an unexpected circumstance occurs) (all state is kept in Zookeeper or on disk).

Using a tool like daemon tools or monit, the Nimbus and Supervisor daemons must be operated under supervision. As a result, if the Nimbus or Supervisor daemons die, they will restart as if nothing had happened.

The death of Nimbus or the Supervisors, in particular, has little effect on worker procedures. In Hadoop, on the other hand, if the JobTracker dies, all the running jobs are destroyed.

 

Ques. 20): What are some general guidelines you can provide me for customising Storm+Trident?

Answer:

The number of workers is a multiple of the number of machines; parallelism is also a multiple of the number of workers; and the number of Kafka partitions is a multiple of the number of spout parallelism.

Use one worker per machine per topology.

Begin with fewer, larger aggregators, one for each machine that has employees.

Make use of the isolation timer.

Use one acker per worker by default in version 0.9, although earlier versions do not.

Enable GC logging; if everything is in order, you should only notice a few significant GCs.

Set the trident batch millis to around 50% of your average end-to-end latency.

Start with a modest max spout pending — one for trident, or the number of executors for storm — and gradually increase it until the flow stops changing. You'll probably get close to 2*(throughput in recs/sec)*(end-to-end delay) (2x the capacity of Little's law).




January 03, 2022

Top 20 Apache Tomcat Interview Questions and Answers

  

       Tomcat is a Java Servlet container and web server developed by the Apache Software Foundation's Jakarta project. Client browsers send queries to a web server, which the server answers to with web pages. Web servers can generate dynamic content based on the user's requests. Because it supports both Java servlet and JavaServerPages (JSP) technologies, Tomcat excels at this. Even if a free servlet and JSP engine is required, Tomcat can be utilised as a web server for a variety of applications. It can run on its own or alongside standard web servers like Apache httpd, delivering static pages while Tomcat handles dynamic servlet and JSP queries.

    Apache Tomcat is an open source Java Servlet, JavaServer Pages, Java Expression Language, and Java WebSocket implementation platform. Many firms are hiring Devops engineers, Apache Tomcat administrators, Linux Apache Tomcat jobs, and Hadoop developers at varying levels of experience. The most popular Web server is Apache, and you must be familiar with it if you plan to work as a Middleware/System/Web administrator. Apache HTTP is a free and open-source web server that runs on Windows and Linux.

 Apache Kafka Interview Questions and Answers

Ques. 1): Who is in charge of Tomcat?

Answer:

The Apache Software Foundation is the correct answer. The Apache Software Foundation is a non-profit organisation that oversees several Open Source projects.

The Apache Software Foundation's Java-based projects are referred to as Jakarta.

Tomcat is an Apache Jakarta project that manages server-side Java (in the form of Servlets and JSPs). Tomcat is the "reference" implementation of the Servlet and JSP specifications, which means that anything that runs in Tomcat should run in any compliant Servlet / JSP container.

 Apache Tapestry Interview Questions and Answers

Ques. 2): Difference between apache and apache-tomcat server?

Answer: 

Apache: Apache is mostly used to serve static content, but there are numerous add-on modules (some of which are included with Apache) that allow it to modify the content and serve dynamic content written in Perl, PHP, Python, Ruby, and other languages.

Apache is an HTTP server that serves HTTP requests.

Tomcat is a servlet/JSP container developed by Apache. It's written in the Java programming language. Although it can provide static information, its primary function is to host servlets and JSPs.

JSP files (which are comparable to PHP and older ASP files) are converted into Java code (HttpServlet), which is then compiled into.class files and run by the Java virtual machine by the server.

Apache Tomcat is used to deploy your Java Servlets and JSPs. So in your Java project, you can build your WAR (short for Web ARchive) file, and just drop it in the deploy directory in Tomcat.

Although it is possible to get Tomcat to run Perl scripts and the like, you wouldn’t use Tomcat unless most of your content was Java.

Tomcat is a Servlet and JSP Server serving Java technologies

 Apache Ambari interview Questions & Answers

Ques. 3):  What exactly is Coyote?

Answer:

Coyote is a Tomcat Connector component that acts as a web server and supports the HTTP 1.1 protocol. This enables Catalina, which is ostensibly a Java Servlet or JSP container, to additionally serve local files as HTTP documents.

Coyote monitors a specific TCP port for incoming connections to the server and transmits the request to the Tomcat Engine, which processes the request and returns a response to the requesting client.

Coyote is Tomcat's HTTP connector, which offers an interface for browsers to connect to.

 Apache Hive Interview Questions & Answers

Ques. 4): What is a servlet container?

Answer:

A servlet container is a web server component that communicates with Java servlets. The servlet container is in charge of managing servlet lifecycles, mapping URLs to specific servlets, and ensuring that the URL requester has the appropriate access privileges.

Requests to servlets, JavaServer Pages (JSP) files, and other types of files containing server-side code are handled by the servlet container. The Web container generates servlet instances, loads and unloads servlets, creates and manages request and response objects, and handles other servlet-related operations.

The web component contract of the Java EE architecture is implemented by the servlet container, which defines a runtime environment for web components that includes security, concurrency, lifecycle management, transaction, deployment, and other services.

Apache Spark Interview Questions & Answers 

Ques. 5): How Do I Can Change The Default Home Page Loaded By Tomcat?

Answer :

We can easily override home page via adding welcome-file-list in application $TOMCAT_HOME/webapps//WEB-INF /web.xml file or by editing in container $TOMCAT_HOME/conf/web.xml

In $TOMCAT_HOME/conf/web.xml, it may look like this:

    index.html

    index.htm

    index.jsp

Request URI refers to a directory, the default servlet looks for a "welcome file" within that directory in following order: index.html, index.htm and index.jsp

Apache NiFi Interview Questions & Answers 

Ques. 6): what is a difference between Apache and Nginx web server?

Answer:

Both are classified as Web Servers, but there are a few key differences. Nginx is an event-driven web server, whereas Apache is a process-driven web server.

Nginx has a reputation for being faster than Apache.

Whereas Nginx does not support OpenVMS or IBMi, Apache supports a wide range of operating systems.

Nginx is still catching up to Apache in terms of module interoperability with backend application servers.

Nginx is a lightweight web server that is rapidly gaining market share. If you're new to Nginx, you might be interested in reading some of my Nginx articles.

 Apache Ant Interview Questions and Answers

Ques. 7): How Do You Create Multiple Virtual Hosts?

Answer :

If you want tomcat to accept requests for different hosts e.g. www.myhostname.com then you must

Create ${catalina.home}/www/appBase , ${catalina.home}/www/deploy, and ${catalina.home}/conf/Catalina/www.myhostname.com

Add a host entry in the server.xml file

Create the the following file under conf/Catalina/www.myhostname.com/ROOT.xml

Add any parameters specific to this hosts webapp to this context file

Put your war file in ${catalina.home}/www/deploy

When tomcat starts, it finds the host entry, then looks for any context files and will start any apps with a context.

 

Ques. 8): In Apache Tomcat, what is Catalina?

Answer:

Once Jasper has completed the compilation, it turns JSP into a servlet, which Catalina can then manage. Catalina is a servlet container for Tomcat. It also implements all of the Java server page and servlet specs. Catalina is a Java engine embedded into Tomcat that provides an efficient environment for servlets to execute in.

 

Ques. 9): What exactly do you mean by Tomcat's default port, and can it be used with SSL?

Answer:

Tomcat uses port 8080 as its default port. Well, you can change it by editing the server.xml file in the Tomcat install directory's conf folder. By adjusting the property to the desired port connection port="8080" and then restarting Tomcat, the modifications will take effect.

Tomcat can use SSL, but it will require some configuration. You must complete the following tasks:

Generate a keystore

Then add a connector in server.xml

Restart Tomcat

 

Ques. 10): What is a mod_evasive module, and what does it do?

Answer:

Mod_evasive is a third-party module that accomplishes one simple task really well. It identifies when your site is under attack by a Denial of Service (DoS) attack and mitigates the harm that the attack causes. When a single client makes repeated requests in a short period of time, mod evasive recognises this and refuses additional requests from that client. The ban can last for a very short time because it is simply reissued the following time a request is discovered from that same host.

 

Ques. 11): Explain Directory Structure Of Tomcat?

Answer :

Directory structure of Tomcat are:

bin - contain startup, shutdown, and other scripts (*.sh for UNIX and *.bat for Windows systems) and some jar files also there.

conf - Server configuration files (including server.xml) and related DTDs. The most important file in here is server.xml. It is the main configuration file for the container.

lib - contains JARs those are used by container and Servlet and JSP application programming interfaces (APIs).

logs - Log and output files.

webapps – deployed web applications reside in it .

work - Temporary working directories for web applications and mostly used during in JSP compilation where JSP is converted to a Java servlet.

temp - Directory used by the JVM for temporary files .

 

Ques. 12): Explain How Running Tomcat As A Windows Service Provides Benefits?

Answer :

Running Tomcat as a windows service provides benefits like:

Automatic startup: It is crucial for environment where you may want to remotely re-start a system after maintenance

Server startup without active user login: Tomcat is run oftenly on blade servers that may not even have an active monitor attached to them. Windows services can be started without an active user

Security: Tomcat under window service enables you to run it under a special system account, which is protected from the rest of the user accounts

 

Ques. 13): How Do Servlet Life Cycles Work?

Answer:

The life-cycle of a typical Tomcat servlet is as follows:

Through one of its connectors, Tom-cat receives a request from a client.

This request will be processed. This request is routed through Tomcat to the proper server.

Tomcat checks that the servlet class has been loaded after the request has been forwarded to the proper servlet. If it isn't, Tomcat wraps the servlet in Java Bytecode, which is executed by the JVM and creates a servlet instance.

The servlet is started by Tomcat by invoking its init method. The servlet includes code that can inspect Tomcat configuration files and take appropriate action, as well as declare any resources it might need.

Once the servlet has been started, Tomcat can call the servlet’s service method to proceed the request

Tomcat and the servlet can co-ordinate or communicate through the use of listener classes during the servlet’s lifecycle, which tracks the servlet for a variety of state changes.

To remove the servlet, Tomcat calls the servlets destroy method.

 

Ques. 14): In Tomcat, what is the difference between a host and a context?

Answer:

In Tomcat, the host is a component. It's a network name association for the server. On the other hand, context is an element that indicates a web application that is running on a certain virtual host. Web applications are built on top of a Web Application Archive (WAR) file or a corresponding directory that contains all of the unpacked content indicated in the servlet description.

 

Ques. 15): What Is The Distinction Between A Webserver And An Application Server?

Answer:

The main distinction between a web server and an application server is that a web server can only execute web applications, such as servlets and JSPs, and has just one container, the Web container, that is used to understand and execute web applications. The application server has the ability to run Enterprise applications, i.e. (servlets, jsps, and EJBs)

it is having two containers:

Web Container(for interpreting/executing servlets and jsps)

EJB container(for executing EJBs).

it can perform operations like load balancing , transaction demarcation etc.

 

Ques. 16): Apart from Apache Tomcat, what are the different kinds of Web Servers?

Answer:

There are many web servers as mentioned below:

LiteSpeed Web Server

GWS Web Server

Microsoft IIS Web Server

Nginx Web Server

Jigsaw Web Server

Sun Java System Web Server

Lighttpd Web Server

 

Ques. 17): How to limit upload size?

Answer:

I have a web application that allows users to upload files such as word documents, pdf and so on.  How do I limit file upload by users?

You can make use of the LimitRequestBody directive to limit upload file size.

<Directory "usr/local/apache2/uploads">

LimitRequestBody 9000

</Directory>

The value assigned to the LimitRequestBody allows Apache to accept and store file uploads of 9000 bytes by users. You can adjust the value based on the requirement.

 

Ques. 18): Explain how to use WAR files to deploy a web application.

Answer:

JSPs, servlets, and their associated files are placed under Tomcat's web applications directory in the appropriate subdirectories. You can combine all of the files in the web apps directory into a single compressed file with the extension.war. A web application can be run by placing a WAR file in the webapps directory. When a web server starts up, it extracts the contents of the WAR file and places them in the proper webapps sub-directories.

 

Ques. 19): How can an Apache Service be stopped by its control script?

Answer:

The Apache Service is controlled using a script called the apachectl.

So, to stop the service, we need to run the below-mentioned commands.

#apachectl stop [for Ubuntu based system]

# /etc/inid.t/httpd.stop [for red hat based system]

 

Ques. 20): What is the purpose of the Listen property in Apache Tomcat?

Listening is very important for Apache Tomcat and the developers.

If a developer has numerous IPs on the server, we must explicitly indicate IP and PORT in the Listen Drive if we want Apache to evaluate only one of them.

For example: 10.10.10.20

 

 

Top 20 Apache Kafka Interview Questions and Answers

 

Apache Kafka is a free and open-source streaming platform. Kafka began as a messaging queue at LinkedIn, but it has since grown into much more. It's a flexible tool for working with data streams that may be used in a wide range of situations. Because Kafka is a distributed system, it can scale up and down as needed. All that's left to do now is expand the cluster with new Kafka nodes (servers).

In a short length of time, Kafka can process a big volume of data. It also has a low latency, allowing for real-time data processing. Despite the fact that Apache Kafka is written in Scala and Java, it may be utilised with a wide range of computer languages.


Apache Hive Interview Questions & Answers


Ques. 1): What exactly do you mean when you say "confluent kafka"? What are the benefits?

Answer:

Confluent is an Apache Kafka-based data streaming platform that can do more than just publish and subscribe. It can also store and process data within the stream. Confluent Kafka is a more extensive version of Apache Kafka. It improves Kafka's integration capabilities by adding tools for optimising and maintaining Kafka clusters, as well as methods for ensuring the security of the streams. Because of the Confluent Platform, Kafka is simple to set up and use. Confluent's software is available in three flavours:

A free, open-source streaming platform that makes working with real-time data streams a breeze;

A premium cloud-based version with more administration, operations, and monitoring features; an enterprise-grade version with more administration, operations, and monitoring tools.

Following are the advantages of Confluent Kafka :

  • It features practically all of Kafka's characteristics, as well as a few extras.
  • It greatly simplifies the administrative operations procedures.
  • It relieves data managers of the burden of thinking about data relaying.


Apache Ambari interview Questions & Answers


Ques. 2): What are some of Kafka's characteristics?

Answer:

The following are some of Kafka's most notable characteristics:-

  • Kafka is a fault-tolerant messaging system with a high throughput.
  • A Topic is a built-in patriation system in Kafka.
  • Kafka also comes with a replication mechanism.
  • Kafka is a distributed messaging system that can manage massive volumes of data and transfer messages from one sender to another.
  • The messages can also be saved to storage and replicated across the cluster using Kafka.
  • Kafka works with Zookeeper for synchronisation and collaboration with other services.
  • Kafka provides excellent support for Apache Spark.


Apache Tapestry Interview Questions and Answers


Ques. 3): What are some of the real-world usages of Apache Kafka?

Answer:

The following are some examples of Apache Kafka's real-world applications:

Message Broker: Because Apache Kafka has a high throughput value, it can handle a large number of similar sorts of messages or data. Apache Kafka can be used as a publish-subscribe messaging system that makes it simple to read and publish data.

To keep track of website activity, Apache Kafka can check if data is successfully delivered and received by websites. Apache Kafka is capable of handling the huge volumes of data generated by websites for each page as well as user actions.

To keep track of metrics connected to certain technologies, such as security logs, we can utilise Apache Kafka to monitor operational data.

Data logging: Apache Kafka provides data replication between nodes functionality that can be used to restore data on failed nodes. It can also be used to collect data from various logs and make it available to consumers.

Stream Processing with Kafka: Apache Kafka can also handle streaming data, the data that is read from one topic, processed, and then written to another. Users and applications will have access to a new topic containing the processed data.


Apache NiFi Interview Questions & Answers


Ques. 4): What are some of Kafka's disadvantages?

Answer:

The following are some of Kafka's drawbacks:

  • When messages are tweaked, Kafka performance suffers. Kafka works well when the message does not need to be updated.
  • Kafka does not support wildcard topic selection. It's crucial to use the appropriate issue name.
  • When dealing with large messages, brokers and consumers degrade Kafka's performance by compressing and decompressing the messages. This has an effect on Kafka's performance and throughput.
  • Kafka does not support several message paradigms, such as point-to-point queues and request/reply.
  • Kafka lacks a comprehensive set of monitoring tools.


Apache Spark Interview Questions & Answers


Ques. 5): What are the use cases of Kafka monitoring?

Answer:

The following are some examples of Kafka monitoring use cases:

  • Monitor the use of system resources: It can be used to track the usage of system resources like memory, CPU, and disc over time.
  • Threads and JVM consumption should be monitored: To free up memory, Kafka relies on the Java garbage collector, which ensures that it runs frequently, ensuring that the Kafka cluster is more active.
  • Maintain an eye on the broker, controller, and replication statistics so that partition and replica statuses can be changed as needed.
  • Identifying which applications are producing excessive demand and performance bottlenecks may aid in quickly resolving performance issues.

 

Ques. 6): What is the difference between Kafka and Flume?

Answer:

Flume's main application is ingesting data into Hadoop. Hadoop's monitoring system, file types, file system, and tools like Morphlines are all incorporated into the Flume. When working with non-relational data sources or streaming a huge file into Hadoop, the Flume is the best option.

Kafka's main use case is as a distributed publish-subscribe messaging system. Kafka was not created with Hadoop in mind, therefore using it to gather and analyse data for Hadoop is significantly more difficult than using Flume.

When a highly reliable and scalable corporate communications system, such as Hadoop, is required, Kafka can be used.

 

Ques. 7): Explain the terms "leader" and "follower."

Answer:

In Kafka, each partition has one server that acts as a Leader and one or more servers that operate as Followers. The Leader is in charge of all read and write requests for the partition, while the Followers are responsible for passively replicating the leader. In the case that the Leader fails, one of the Followers will assume leadership. The server's load is balanced as a result of this.

 

Ques. 8): What are the traditional methods of message transfer? How is Kafka better from them?

Answer:

The classic techniques of message transmission are as follows: -

Message Queuing: -

The message queuing pattern employs a point-to-point approach. A message in the queue will be discarded once it has been eaten, similar to how a message in the Post Office Protocol is removed from the server once it has been delivered. These queues allow for asynchronous messaging.

If a network difficulty prevents a message from being delivered, such as when a consumer is unavailable, the message will be queued until it is transmitted. As a result, messages aren't always sent in the same order. Instead, they are distributed on a first-come, first-served basis, which in some cases can improve efficiency.

Publisher - Subscriber Model:-

The publish-subscribe pattern entails publishers producing ("publishing") messages in multiple categories and subscribers consuming published messages from the various categories to which they are subscribed. Unlike point-to-point texting, a message is only removed once it has been consumed by all category subscribers.

Kafka caters to a single consumer abstraction, the consumer group, which contains both of the aforementioned. The advantages of adopting Kafka over standard communications transfer mechanisms are as follows:

Scalable: Data is partitioned and streamlined using a cluster of devices, which increases storage capacity.

Faster: A single Kafka broker can handle megabytes of reads and writes per second, allowing it to serve thousands of customers.

Durability and Fault-Tolerant: The data is kept persistent and tolerant to any hardware failures by copying the data in the clusters.

  

Ques. 9): What is a Replication Tool in Kafka? Explain how to use some of Kafka's replication tools.

Answer:

The Kafka Replication Tool is used to define the replica management process at a high level. Some of the replication tools available are as follows:

Replica Leader Election Tool of Choice: The Preferred Replica Leader Election Tool distributes partitions to many brokers in a cluster, each of which is known as a replica. The favourite replica is a term used to describe the leader. For various partitions, the brokers generally distribute the leader position fairly across the cluster, but due to failures, planned shutdowns, and other circumstances, an imbalance might develop over time. By reassigning the preferred copies, and hence the leaders, this tool can be utilised to maintain the balance in these instances.

Topics tool: The Kafka topics tool is in charge of all administration operations relating to topics, including:

  • Listing and describing the topics.
  • Topic generation.
  • Modifying Topics.
  • Adding a topic's dividers.
  • Disposing of topics.

Tool to reassign partitions: The replicas assigned to a partition can be changed with this tool. This refers to adding or removing followers from a partition.

StateChangeLogMerger tool: The StateChangeLogMerger tool collects data from brokers in a cluster, formats it into a central log, and aids in the troubleshooting of state change issues. Sometimes there are issues with the election of a leader for a particular partition. This tool can be used to figure out what's causing the issue.

Change topic configuration tool: used to create new configuration choices, modify current configuration options, and delete configuration options.

 

Ques. 10):  Explain the four core API architecture that Kafka uses.

Answer:

Following are the four core APIs that Kafka uses:

Producer API:

The Producer API in Kafka allows an application to publish a stream of records to one or more Kafka topics.

Consumer API:

The Kafka Consumer API allows an application to subscribe to one or more Kafka topics. It also allows the programme to handle streams of records generated in connection with such topics.

Streams API: The Kafka Streams API allows an application to process data in Kafka using a stream processing architecture. This API allows an application to take input streams from one or more topics, process them with streams operations, and then generate output streams to send to one or more topics. In this way, the Streams API allows you to turn input streams into output streams.

Connect API:

The Kafka Connector API connects Kafka topics to applications. This opens up possibilities for constructing and managing the operations of producers and consumers, as well as establishing reusable links between these solutions. A connector, for example, may capture all database updates and ensure that they are made available in a Kafka topic.

  

Ques. 11): Is it possible to utilise Kafka without Zookeeper?

Answer:

As of version 2.8, Kafka can now be utilised without ZooKeeper. When Kafka 2.8.0 was released in April 2021, we all had the opportunity to check it out without ZooKeeper. This version, however, is not yet ready for production and is missing a few crucial features.

It was not feasible to connect directly to the Kafka broker without using Zookeeper in prior versions. This is because the Zookeeper is unable to fulfil client requests when it is down.

 

Ques. 12): Explain Kafka's concept of leader and follower.

Answer:

Each partition in Kafka has one server acting as a Leader and one or more servers acting as Followers. The Leader is in control of the partition's read and write requests, while the Followers are in charge of passively replicating the leader. If the Leader is unable to lead, one of the Followers will take over. As a result, the server's load is balanced.

 

Ques. 13): In Kafka, what is the function of partitions?

Answer:

From the standpoint of the Kafka broker, partitions allow a single topic to be partitioned across many servers. This gives you the ability to store more data in a single topic than a single server. If you have three brokers and need to store 10TB of data in a topic, you can create a subject with only one partition and store the entire 10TB on one broker. Another option is to create a three-partitioned topic with 10 TB of data distributed across all brokers. From the consumer's perspective, a partition is a unit of parallelism.

 

Ques. 14): In Kafka, what do you mean by geo-replication?

Answer:

Geo-replication is a feature in Kafka that allows you to copy messages from one cluster to a number of other data centres or cloud locations. You can use geo-replication to replicate all of the files and store them all over the world if necessary. Using Kafka's MirrorMaker Tool, we can achieve geo-replication. We can ensure data backup without fail by employing the geo-replication strategy.

 

Ques. 15): Is Apache Kafka a platform for distributed streaming? What are you going to do with it?

Answer:

Yes. Apache Kafka is a platform for distributed streaming data. Three critical capabilities are included in a streaming platform:

  • We can easily push records using a distributed streaming infrastructure.
  • It has a large storage capacity and allows us to store a large number of records without difficulty.
  • It assists us in processing records as they arrive.
  • The Kafka technology allows us to do the following:
  • We may create a real-time stream of data pipelines using Apache Kafka to send data between two systems.
  • We could also create a real-time streaming platform that reacts to data.

 

Ques. 16): What is Apache Kafka Cluster used for?

Answer:

Apache Kafka Cluster is a messaging system that is used to overcome the challenges of gathering and processing enormous amounts of data. The following are the most important advantages of Apache Kafka Cluster:

We can track web activities using Apache Kafka Cluster by storing/sending events for real-time processes.

We may use this to both alert and report on operational metrics.

We can also use Apache Kafka Cluster to transform data into a common format.

It enables the processing of streaming data to the subjects in real time.

It is currently ruling over some of the most popular programmes such as ActiveMQ, RabbitMQ, AWS, and others due to its outstanding characteristics.

 

Ques. 17): What is the purpose of the Streams API?

Answer:

Streams API is an API that allows an application to function as a stream processor, ingesting an input stream from one or more topics and providing an output stream to one or more output topics, as well as effectively changing the input streams to output streams.

 

Ques. 18): In Kafka, what do you mean by graceful shutdown?

Answer:

Any broker shutdown or failure will be detected automatically by the Apache cluster. In this case, new leaders will be picked for partitions previously handled by that device. This can occur as a result of a server failure or even when the server is shut down for maintenance or configuration changes. Kafka provides a graceful approach for ending a server rather than killing it when it is shut down on purpose.

When a server is turned off, the following happens:

Kafka guarantees that all of its logs are synced onto a disc to avoid having to perform any log recovery when it is restarted. Purposeful restarts can be sped up since log recovery requires time.

Prior to shutting down, all partitions for which the server is the leader will be moved to the replicas. The leadership transfer will be faster as a result, and the period each partition is inaccessible will be decreased to a few milliseconds.

  

Ques. 19): In Kafka, what do the terms BufferExhaustedException and OutOfMemoryException mean?

Answer:

A BufferExhaustedException is thrown when the producer can't assign memory to a record because the buffer is full. If the producer is in non-blocking mode and the pace of production over an extended period of time exceeds the rate at which data is transferred from the buffer, the allocated buffer will be emptied and an exception will be thrown.

An OutOfMemoryException may occur if the consumers send large messages or if the quantity of messages sent increases faster than the rate of downstream processing. As a result, the message queue becomes overburdened, using RAM.

 

Ques. 20): How will you change the retention time in Kafka at runtime?

Answer:

A topic's retention time can be configured in Kafka. A topic's default retention time is seven days. While creating a new subject, we can set the retention time. When a topic is generated, the broker's property log.retention.hours are used to set the retention time. When configurations for a currently operating topic need to be modified, kafka-topic.sh must be used.

The right command is determined on the Kafka version in use.

The command to use up to 0.8.2 is kafka-topics.sh --alter.

Use kafka-configs.sh --alter starting with version 0.9.0.

 


 

December 30, 2021

Top 20 Python Pandas Interview Questions and Answers


            Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is open-source and BSD-licensed. Python with Pandas is utilised in a variety of academic and commercial disciplines, including finance, economics, statistics, analytics, and more. 

Data analysis necessitates a great deal of processing, such as restructuring, cleansing, or combining, among other things. Numpy, Scipy, Cython, and Panda are just a few of the quick data processing tools available. However, we favour Pandas since they are faster, easier, and more expressive than other tools.


Python Interview Questions & Answers


Ques. 1): What is Pandas? What is the purpose of Python pandas?

Answer:

Pandas is a Python module that provides quick, versatile, and expressive data structures that make working with "relational" or "labelled" data simple and intuitive. Its goal is to serve as the foundation for undertaking realistic, real-world data analysis in Python.

Pandas is a data manipulation and analysis software library for the Python programming language. It includes data structures and methods for manipulating numerical tables and time series, in particular. Pandas is open-source software distributed under the BSD three-clause licence.

 

Ques. 2): Mention the many types of data structures available in Pandas?

Answer:

The pandas library supports two data structures: Series and DataFrames. Numpy is used to construct both data structures. In pandas, a Series is a one-dimensional data structure, while a DataFrame is a two-dimensional data structure. Panel is another axis label that is a three-dimensional data structure that comprises items, major axis, and minor axis.

 

Ques. 3): What are the key features of pandas library ? What is pandas Used For ?

Answer:

There are various features in pandas library and some of them are mentioned below

Data Alignment

Memory Efficient

Reshaping

Merge and join

Time Series

This library is developed in Python and can be used to do data processing, data analysis, and other tasks. To manipulate time series and numerical tables, the library contains numerous operations as well as data structures.

 

Ques. 4): What is Pandas NumPy?

Answer:

Pandas Numpy is an open-source Python module that allows you to work with a huge number of datasets. For scientific computing with Python, it has a powerful N-dimensional array object and complex mathematical methods.

Fourier transformations, linear algebra, and random number capabilities are some of Numpy's most popular features. It also includes integration tools for C/C++ and Fortran programming.

 

Ques. 5): In Pandas, what is a Time Series?

Answer:

An ordered sequence of data that depicts how a quantity evolves over time is known as a time series. For all fields, pandas has a wide range of capabilities and tools for working with time series data.

pandas supports:

Taking time series data from a variety of sources and formats and parsing it

Create a series of dates and time ranges with a set frequency.

Manipulation and conversion of date and time with timezone data

A time series is resampled or converted to a specific frequency.

Using absolute or relative time increments to do date and time arithmetic.

 

Ques. 6): In pandas, what is a DataFrame?

Answer:

Pandas DataFrame is a possibly heterogeneous two-dimensional size-mutable tabular data format with labelled axes (rows and columns). A data frame is a two-dimensional data structure in which data is organised in rows and columns in a tabular format. The data, rows, and columns are the three main components of a Pandas DataFrame.

Creating a Pandas DataFrame-

A Pandas DataFrame is built in the real world by loading datasets from existing storage, which can be a SQL database, a CSV file, or an Excel file. Pandas DataFrames can be made from lists, dictionaries, and lists of dictionaries, among other things. A dataframe can be constructed in a variety of ways.  

Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

 

Ques. 7): Explain Series In pandas. How To Create Copy Of Series In pandas?

Answer:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

>>> s = pd.Series(data, index=index), where the data can be a Python dict, an ndarray or a scalar value.

To create a copy in pandas, we can call copy() function on a series such that

s2=s1.copy() will create copy of series s1 in a new series s2.

 

Ques. 8): How will you create an empty DataFrame in pandas?

Answer:

To create a completely empty Pandas dataframe, we use do the following:

import pandas as pd

MyEmptydf = pd.DataFrame()

This will create an empty dataframe with no columns or rows.

To create an empty dataframe with three empty column (columns X, Y and Z), we do:

df = pd.DataFrame(columns=[‘X’, ‘Y’, ‘Z’])

 

Ques. 9): What is Python pandas vectorization?

Answer:

The process of executing operations on the full array is known as vectorization. This is done to reduce the number of times the functions iterate. Pandas has a number of vectorized functions, such as aggregations and string functions, that are designed to work with series and DataFrames especially. To perform the operations quickly, it is preferable to use the vectorized pandas functions.

 

Ques. 10):  range ()  vs and xrange () functions in Python?

Answer:

In Python 2 we have the following two functions to produce a list of numbers within a given range.

range()

xrange()

in Python 3, xrange() is deprecated, i.e. xrange() is removed from python 3.x.

Now In Python 3, we have only one function to produce the numbers within a given range i.e. range() function.

But, range() function of python 3 works same as xrange() of python 2 (i.e. internal implementation of range() function of python 3 is same as xrange() of Python 2).

So The difference between range() and xrange() functions becomes relevant only when you are using python 2.

range() and xrange() function values

a). range() creates a list i.e., range returns a Python list object, for example, range (1,500,1) will create a python list of 499 integers in memory. Remember, range() generates all numbers at once.

b).xrange() functions returns an xrange object that evaluates lazily. That means xrange only stores the range arguments and generates the numbers on demand. It doesn’t generate all numbers at once like range(). Furthermore, this object only supports indexing, iteration, and the len() function.

On the other hand xrange() generates the numbers on demand. That means it produces number one by one as for loop moves to the next number. In every iteration of for loop, it generates the next number and assigns it to the iterator variable of for loop.

 

Ques. 11):  What does categorical data mean in Pandas?

Answer:

Categorical data is a Pandas data type that correlates to a statistical categorical variable. A categorical variable is one that has a restricted number of possible values, which is usually fixed. Gender, country of origin, blood type, social status, observation time, and Likert scale ratings are just a few examples. Categorical data values are either in categories or np.nan.This data type is useful in the following cases:

It is useful for a string variable that consists of only a few different values. If we want to save some memory, we can convert a string variable to a categorical variable.

It is useful for the lexical order of a variable that is not the same as the logical order (“one”, “two”, “three”) By converting into a categorical and specify an order on the categories, sorting and min/max is responsible for using the logical order instead of the lexical order.

It is useful as a signal to other Python libraries because this column should be treated as a categorical variable.

 

Ques. 12): To a Pandas DataFrame, how do you add an index, a row, or a column?

Answer:

Adding an Index into a DataFrame: If you create a DataFrame with Pandas, you can add the inputs to the index argument. It will ensure that you get the index you want. If no inputs are specified, the DataFrame has a numerically valued index that starts at 0 and terminates on the DataFrame's last row.

Increasing the number of rows in a DataFrame: To insert rows in the DataFrame, we can use the.loc, iloc, and ix commands.

The loc is primarily used for our index's labels. It can be seen as if we insert in loc[4], which means we're seeking for DataFrame items with an index of 4.

The ix is a complex case because if the index is integer-based, we pass a label to ix. The ix[4] means that we are looking in the DataFrame for those values that have an index labeled 4. However, if the index is not only integer-based, ix will deal with the positions as iloc.

 

Ques. 13): How to Delete Indices, Rows or Columns From a Pandas Data Frame?

Answer:

Deleting an Index from Your DataFrame

If you want to remove the index from the DataFrame, you should have to do the following:

Reset the index of DataFrame.

Executing del df.index.name to remove the index name.

Remove duplicate index values by resetting the index and drop the duplicate values from the index column.

Remove an index with a row.

Deleting a Column from Your DataFrame

You can use the drop() method for deleting a column from the DataFrame.

The axis argument that is passed to the drop() method is either 0 if it indicates the rows and 1 if it drops the columns.

You can pass the argument inplace and set it to True to delete the column without reassign the DataFrame.

You can also delete the duplicate values from the column by using the drop_duplicates() method.

Removing a Row from Your DataFrame

By using df.drop_duplicates(), we can remove duplicate rows from the DataFrame.

You can use the drop() method to specify the index of the rows that we want to remove from the DataFrame.

 

Ques. 14): How to convert String to date?

Answer:

The below code demonstrates how to convert the string to date:

From datetime import datetime

# Define dates as the strings

dmy_str1 = ‘Wednesday, July 14, 2018’

dmy_str2 = ’14/7/17′

dmy_str3 = ’14-07-2017′

# Define dates as the datetime objects

dmy_dt1 = datetime.strptime(date_str1, ‘%A, %B %d, %Y’)

dmy_dt2 = datetime.strptime(date_str2, ‘%m/%d/%y’)

dmy_dt3 = datetime.strptime(date_str3, ‘%m-%d-%Y’)

#Print the converted dates

print(dmy_dt1)

print(dmy_dt2)

print(dmy_dt3)

 

Ques. 15): What exactly is the Pandas Index?

Answer:

Pandas indexing is as follows:

In pandas, indexing simply involves picking specific rows and columns of data from a DataFrame. Selecting all of the rows and some of the columns, part of the rows and all of the columns, or some of each of the rows and columns is what indexing entails. Subset selection is another name for indexing.

Using [],.loc[],.iloc[],.ix[] for Pandas indexing

A DataFrame's items, rows, and columns can be extracted in a variety of methods. In Pandas, there are some indexing methods that can be used to retrieve an element from a DataFrame. These indexing systems look to be fairly similar on the surface, however they perform extremely differently. Pandas supports four different methods of multi-axes indexing:

Dataframe.[ ] ; This function also known as indexing operator

Dataframe.loc[ ] : This function is used for labels.

Dataframe.iloc[ ] : This function is used for positions or integer based

Dataframe.ix[] : This function is used for both label and integer based

Collectively, they are called the indexers. These are by far the most common ways to index data. These are four function which help in getting the elements, rows, and columns from a DataFrame.

 

Ques. 16): Define ReIndexing?

Answer:

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis.

Multiple operations can be accomplished through indexing like −

Reorder the existing data to match a new set of labels.

Insert missing value (NA) markers in label locations where no data for the label existed.

 

Ques. 17): How to Set the index?

Answer:

Python is an excellent language for data analysis, thanks to its vast ecosystem of data-centric Python packages. One of these packages is Pandas, which makes importing and analysing data a lot easier.

Pandas set index() is a function for setting the index of a Data Frame from a List, Series, or Data Frame. A data frame's index column can also be set while it's being created. However, because a data frame might be made up of two or more data frames, the index can be altered later using this method.

Syntax:

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False

 

Ques. 18): Define GroupBy in Pandas?

Answer:

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters :

by : mapping, function, str, or iterable

axis : int, default 0

level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels

as_index : For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

sort : Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

group_keys : When calling apply, add group keys to index to identify pieces

squeeze : Reduce the dimensionality of the return type if possible, otherwise return a consistent type

Returns : GroupBy object

 

Ques. 19): How will you add a scalar column with same value for all rows to a pandas DataFrame?

Answer:

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Dataframe.add() method is used for addition of dataframe and other, element-wise (binary operator add). Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs.

Syntax: DataFrame.add(other, axis=’columns’, level=None, fill_value=None)

Parameters:

other :Series, DataFrame, or constant

axis :{0, 1, ‘index’, ‘columns’} For Series input, axis to match Series index on

fill_value : [None or float value, default None] Fill missing (NaN) values with this value. If both DataFrame locations are missing, the result will be missing.

level : [int or name] Broadcast across a level, matching Index values on the passed MultiIndex level

Returns: result DataFrame

 

Ques. 20): In pandas, how can you see if a DataFrame is empty?

Answer:

Pandas DataFrame is a possibly heterogeneous two-dimensional size-mutable tabular data format with labelled axes (rows and columns). Both the row and column labels align for arithmetic operations. It can be viewed of as a container for Series items, similar to a dict. The Pandas' fundamental data structure is this.

Pandas DataFrame is a dataframe for Pandas.

The empty attribute determines whether or not the dataframe is empty. If the dataframe is empty, it returns True; otherwise, it returns False.

Syntax: DataFrame.empty

Parameter : None

Returns : bool