In this post we talk about how you can read data from files using Spark Structured Streaming and store the output in a Hive table […]
Capture bad records while loading csv in spark Dataframe
Loading a csv file and capturing all the bad records is a very common requirement in ETL projects. Most of the relational database loaders like […]
Deployment modes and Job submission in Apache Spark
Spark is a Scheduling Monitoring and Distribution engine, it can also acts as a resource manager for its jobs. When Spark runs job by itself […]
What is an RDD and Why Spark needs it?
Resilient Distributed Data set(RDD) is the core of Apache Spark. It is the fundamental data structure on top of which all the spark components reside. […]
What is Apache NiFi?
Apache NiFi is an open source software to automate and manage the flow of data between different systems. It provides a web-based UI for creating monitoring […]
Welcome to WordPress. This is your first post. Edit or delete it, then start writing!