About the course
Effectively store, manage, and analyze large Datasets with HDFS, SQOOP, YARN, and MapReduce
Do you struggle to store and handle big data sets? This course will teach to smoothly handle big data sets using Hadoop 3.
The course starts by covering basic commands used by big data developers on a daily basis. Then, you'll focus on HDFS architecture and command lines that a developer uses frequently. Next, you'll use Flume to import data from other ecosystems into the Hadoop ecosystem, which plays a crucial role in the data available for storage and analysis using MapReduce. Also, you'll learn to import and export data from RDBMS to HDFS and vice-versa using SQOOP. Then, you'll learn about Apache Pig, which is used to deal with data using Flume and SQOOP. Here you'll also learn to load, transform, and store data in Pig relation. Finally, you'll dive into Hive functionality and learn to load, update, delete content in Hive.
By the end of the course, you'll have gained enough knowledge to work with big data using Hadoop. So, grab the course and handle big data sets with ease.
The code bundle for this course is available at https://github.com/PacktPublishing/Hands-On-Beginner-s-Guide-on-Big-Data-and-Hadoop-3-.
Style and Approach
The course will practically get you started with HDFS to store data efficiently, SQOOP to transfer bulk data, and YARN to ensure efficient data management. You will gain the hands-on knowledge to analyze and process big data sets with MapReduce functions.
What You Will Learn
- Focus on the Hadoop ecosystem to understand big data and how to manage it
- Learn the basic commands used by big data developers and the structure of the Unix OS.
- Understand the HDFS architecture and command line to deal with HDFS files and directories
- Import data using Flume and analyze it using MapReduce
- Export and import data from RDBMS to HDFS and vice-versa with SQOOP
- Use command-line language Pig Latin for data transformation operations
- Deal with stored data and learn to load, update, and delete data using Hive
Milind Jagre works as a Data Scientist Analyst at the Ford Motor Company in Dearborn. In his current work, he works on the latest technologies in the field of big data and Machine Learning. He is responsible for bringing third-party client datasets to the Ford ecosystem and making use out of that data intelligently by deriving useful insights from it. He graduated from the University of Connecticut with a Master's degree in Science in Business Analytics and Project Management. He has worked and learned a lot of new things in the field of Analytics and Data Science.
Applied Deep Learning with TensorFlow and Google Cloud AI [Video]