2016-08-25

About Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python:

What’s the big deal with big data?

It was recently reported in the Wall Street Journal that the government is collecting so much data on its citizens that they can’t even use it effectively.

A few “unicorns” have popped up in the past decade or so, promising to help solve the big data problems that billion dollar corporations and the people running your country can’t.

It goes without saying that programming with frameworks that can do big data processing is a highly-coveted skill.

Machine learning and artificial intelligence algorithms, which have garnered increased attention (and fear-mongering) in recent years, mainly due to the rise of deep learning, are completely dependent on data to learn.

The more data the algorithm learns from, the smarter it can become. The problem is, the amount of data we collect has outpaced gains in CPU performance. Therefore, scalable methods for processing data are needed.

In the early 2000s, Google invented MapReduce, a framework to systematically and methodically process big data in a scalable way by distributing the work across multiple machines.

Later, the technology was adopted into an open-source framework called Hadoop, and then Spark emerged as a new big data framework which addressed some problems with MapReduce.

In this book we will cover all 3 – the fundamental MapReduce paradigm, how to program with Hadoop, and how to program with Spark.

Advance your Career

If Spark is a better version of MapReduce, why are we even talking about it?

Good question!

Corporations, being slow-moving entities, are often still using Hadoop due to historical reasons. Just search for “big data” and “Hadoop” on LinkedIn and you will see that there are a large number of high-salary openings for developers who know how to use Hadoop.

In addition to giving you deeper insight into how big data processing works, learning about the fundamentals of MapReduce and Hadoop first will help you really appreciate how much easier Spark is to work with.

Any startup or technical engineering team will appreciate a solid background with all of these technologies. Many will require you to know all of them, so that you can help maintain and patch their existing systems, and build newer and more efficient systems that improve the performance and robustness of the old systems.

Amazingly, all the technologies we discuss in this book can be downloaded and installed for FREE. That means all you need to invest after purchasing this book is your effort and your time. The only prerequisites are that you are comfortable with Python coding and the command line shell. For the machine learning chapter you’ll want to be familiar with using machine learning libraries.

BONUS: At the end of this book, I’ll show you a super simple way to train a deep neural network on Spark with the classic MNIST dataset. It will demonstrate how easy it is to apply deep learning to big data.

Buy the book, and follow the author on social media:
Learn more about the writer. Visit the Author’s Website.
Buy the Book On Amazon.
Visit the Facebook Fan Page.
Visit the Twitter page.

Author Bio:

The LazyProgrammer is a data scientist, big data engineer, and full stack software engineer. He is especially interested in deep learning and neural networks. Some also refer to this as AI, or artificial intelligence.

He graduated with a masters degree with a thesis on classification of brain signals using deep learning. This research would help those who are non-mobile or non-vocal communicate with their caregivers.

The LazyProgrammer got his start in machine learning and data science by learning about computational neuroscience and neural engineering. The physics aspect has always interested him but the practical nature of machine learning and data science has made up a majority of his work.

After spending years in online advertising and the media, working to build and improve big data pipelines and using machine learning to increase revenue via CTR (click-through rate) optimization and conversion tracking, he began to work for himself.

This allowed the LazyProgrammer to focus 100% of his effort on deepening his knowledge of machine learning and data science. He works with startups and larger companies to set up data pipelines and engineer predictive models that result in meaningful insights and data-driven decision making.

The LazyProgrammer also loves to teach. He has helped many adults looking to change their career path and dive into the startup and tech world. Students at General Assembly, the Flatiron School, and App Academy have all benefitted from his help. He has also helped many graduate students at various ivy leagues and other colleges through their machine learning and data science programs.

The LazyProgrammer loves to give away free tutorials and other material. You can get a FREE 6-week introduction to machine learning course by signing up for his newsletter at:

https://lazyprogrammer.me

Show more