BIG DATA ANALYTICS USING SPARK
UNIT I:
Introduction to Big Data: Whatis Big Data-Characteristics, Data in the Warehouse and Data in Hadoop, Why is Big Data Important- When to consider Big Data Solution, Applications.
Introduction to Hadoop: Hadoop-
definition, Application development in Hadoop. The building
blocks of Hadoop, Name Node, Data Node, Secondary Name Node, Job Tracker and Task Tracker.
UNIT II:
Introduction to Spark: What is Apache Spark, Why Spark when Hadoop is there, Spark Features, , Spark components, Spark program flow, Spark Eco System. Differences between implementation of programs in Hadoop and Spark Programming environments.
UNIT III:
Spark Fundamentals- Using spark in action VM, Using Spark Shell and writing first spark program, Basic RDD actions and transformations.
Spark SQL-Working with Data Frames, Using SQL Commands, Saving and loading Data Frame.
UNIT IV:
Streaming
in Spark- Writing spark streaming applications, Using external data sources, structured streaming.
Spark MLlib-Introduction to Machine Learning. Definition of Machine Learning, Machine Learning with Spark.
UNIT V:
Graph Representation in MapReduce: Graph Processing with Spark, Spark GraphX,
GraphX features, Graph Examples, Graph algorithms-Shortest Path Algorithm.
TEXT BOOKS:
2. Spark in Action PetarZecevic, markoBonaci Manning Publications-2016.
3. Learning Spark“Holden KarauA. Konwinskietc.,”O’reilly Publications.
No comments:
Post a Comment