Data Structures
Algorithm
Projects
Course Syllabus
Introduction to Big Data
Limitations of Hadoop MapReduce
What is Apache Spark?
Spark Ecosystem Components
Spark Architecture & RDDs
SparkContext & SparkConf
Resilient Distributed Datasets (RDD)
RDD Operations: Transformations & Actions
Lazy Evaluation in Spark
RDD Persistence & Caching
Introduction to Spark SQL
DataFrames & Datasets
Spark SQL Queries
Integrating with Hive
Handling JSON, CSV, Parquet
Introduction to Streaming
DStreams (Discretized Streams)
Window Operations
Integrating with Kafka, Flume
Structured Streaming
Overview of MLlib
Feature Extraction & Transformation
Classification & Regression
Clustering (K-Means, etc.)
Collaborative Filtering & Recommendation
Introduction to GraphX
Graph Representation in Spark
Graph Operations & Algorithms
PageRank, Connected Components
Real-World Graph Use Cases
Running Spark on Hadoop YARN
Spark on AWS EMR
Spark on Databricks
Cluster Management (YARN, Mesos, Kubernetes)
Optimizing Spark Jobs
Real-Time Streaming Analytics Project
Recommendation Engine with Spark MLlib
Log Analysis using Spark SQL
Social Network Graph Analysis
ETL Pipeline using Spark + Hive + HDFS