2016-09-10

Hadoop is an open source software framework used for storing data of any type. It has huge processing power, can handle a high number of tasks, and is capable of running applications on a group of hardware. Hadoop is an integral component of Big Data, the concept used by major corporations and organizations to organize their mass quantities of data. This course will give you an introduction to Hadoop and Big Data.

Access 14 lectures & 3 hours of content 24/7

Learn the concepts of Hadoop & Big Data

Perform Data Analytics using Hadoop

Master the concepts of Hadoop framework

Get experience w/ different configurations of the Hadoop cluster

Work w/ real time projects using Hadoop

To get a full understanding of the power of Hadoop, you need to actually work with Hadoop. Here you’ll get a crash course in how Hadoop output is sorted, and how to control input keys in such a way that output is sorted according to values. You’ll learn to understand the complex logic behind Hadoop and how to use it effectively.

Access 14 lectures & 3 hours of content 24/7

Discover input keys & output in Hadoop

Understand joins in Hadoop & how to implement them

Cover MapReduce side joins

Discover combiners in Hadoop to reduce network congestion

Companies use Hadoop for a variety of reasons, including its low storage cost, its data warehouse and analytics store, its unconstrained data lake, and many more. Ultimately, it comes down to its quick, powerful processing power for mass amounts of data. In this course, you’ll gain experience with that processing power firsthand, as you apply Hadoop in real world Data Analytics problems.

Access 9 lectures & 2 hours of content 24/7

Learn the 4 main module of Hadoop: Hadoop Common, Hadoop Distributed File System, MapReduce, & Yet Another Resource Negotiator (YARN)

Understand the real world applications of Hadoop & MapReduce

Design, code & run a real example of MapReduce using real data

Perform data analytics using Pig, Hive & YARN

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. It’s used by companies like Facebook and Netflix, and is a valuable tool for any developer to know. This course will introduce you to Hive and its data types and commands, so that you can start gaining a full understanding of this infrastructure.

Access 22 lectures & 4 hours of content 24/7

Understand Hive partitioning & bucketing

Use SQL queries to access data stored on the Hadoop cluster

Discover the applications of Hive for data analysis & warehousing

Learn data types, commands, & the Hive metastore

Apache Pig is a data flow language used to analyze large data sets. It is executable in Hadoop with MapReduce, Apache Tez, or Apache Spark. Managing and analyzing large data sets is significantly easier with Pig. This course will introduce you to Pig concepts so you can work with Big Data more efficiently.

Access 18 lectures & 3 hours of content 24/7

Learn Pig commands like load & store data, group data, & join data

Combine & split data in Pig using union operator & split operator

Understand filtering data in Pig

Concentrate on Pig internal & custom functions

MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. Sound confusing? Clear it up with this course! MapReduce is an extremely common application of Hadoop, and one that companies value and depend on to organize their data. If you want a high-paying job working in Big Data, you’ll need this introduction to MapReduce.

Access 30 lectures & 6 hours of content 24/7

Implement a custom Hadoop writable data type & custom Hadoop key type

Emit data of different value types from a mapper

Choose the best Hadoop input format for different scenarios

Format the results of MapReduce computations using Hadoop output formats

Understand data positioning, broadcasting, & distributing shared resources to tasks in a MapReduce job

For writing data analysis programs, Apache Pig has a high level language called Pig Latin which provides numerous operators through which programmers can develop their own functions for reading, writing, and processing data. Pig Latin allows programmers to perform MapReduce tasks without complex Java codes, thereby greatly streamlining the data analysis process. In this course, you’ll understand how to use the advanced operators of Apache Pig, and elevate your Big Data skills to an elite level.

Access 12 lectures & 2 hours of content 24/7

Explore Apache Pig’s set of operators for performing operations like join, sort, filter, etc

Write Pig Latin scripts

Move beyond MapReduce using Apache Pig

Store data w/o designing the schema

Throughout this course, you’ll gain a further exploration into Big Data processes, while extending your knowledge into the Cloudera environment. Cloudera offers a centralized system for running Hadoop protocols and is a valuable resource for anybody working with Hadoop.

Access 12 lectures & 2 hours of content 24/7

Explore Big Data, distributed storage, & MapReduce in further detail

Understand the Hadoop environment when installed on Cloudera

Discover the metadata configuration on Hadoop

Learn about HDFS, web UI, & HUE

Access HDFS through Java programs

YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science, and batch processing to handle data stored in a single platform. YARN unlocks an entirely new approach to analytics and is at the forefront of a Hadoop revolution. This course will get you up to speed with Apache Hadoop YARN, so you’ll be at the forefront of Big Data engineers.

Access 18 lectures & 3 hours of content 24/7

Understand the core concepts of Apache Hadoop YARN

Learn in detail how to install & administer an Apache Hadoop YARN system

Master the architecture guidelines for adapting the YARN architecture

Apache Mahout comprises Scale, Spark, H20, and Hadoop’s MapReduce algorithm to create scalable, intelligent algorithms that are optimized for machine learning. Machine learning is the field in AI concerning techniques through which computers enhance outputs based on prior recognition. Big companies like Yahoo and Amazon use machine learning algorithms to gauge their customers’ experiences and provide a better one. The applications of machine learning are growing every day, and seemingly endless, which makes this course in Apache Mahout a particularly valuable one for anyone interested in this growing, high-paying field.

Access 12 lectures & 2 hours of content 24/7

Understand the difference between supervised & unsupervised learning, & the advantages of each

Explore Apache Mahout’s 3 approaches to machine learning: Collaborative filters, clusters, & categories

Create adaptable machine learning that can turn large amounts of information into meaningful knowledge

Use the Taste library to create a recommendation engine

Show more