2016-06-09

One of the greatest benefits of working among a diverse group of data scientists and data engineers at Stitch Fix is how much we can learn from our peers. Usually that means getting ad hoc help with specific questions from the resident expert(s). But it also means getting advice on how best to fill any gaps in our own skill sets or knowledge bases, or just what interesting data science materials to explore in our spare time. Our blog posts usually highlight the former; this post touches on the latter.


xkcd

We’ve queried our data science team for some of their favorite data science books. This list is by no means exhaustive, but should keep any data scientist/engineer new or old learning and entertained for many an evening. Some of the suggestions included context from the data scientist, so for those we include that person’s name!

An Introduction to Statistical Learning

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Machine Learning

A great intro on the fundamentals of statistics and machine learning with R-based tutorials. It’s a great stand-alone text for someone interested in learning statistical modeling, but not necessarily hypothesis tests (usually the first big concept covered in introductory statistics textbooks). It is also a good to prep for the more advanced Elements of Statistical Learning.

Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani and Jerome Friedman
Machine Learning

A classic amongst the machine learning texts; concise and deep review on basics of machine learning from giants in the field.

Machine Learning, a probabilistic perspective

Kevin P. Murphy
Machine Learning

Perfect reference book to find details about that probability distribution or that algorithm including examples. – Eli

Probability Theory: The Logic of Science

E. T. Jaynes
Probability

Great book on probabiity theory - a must have on any bookshelf. – Eli

Introduction to Probability

Dimitri P. Bertsekas, John N. Tsitsiklis
Probability

Great into to probability. Beautifully written, easy to read with great examples. Classic! (I may also be biased because my former advisor wrote it, but he’s an incredible teacher). – Hoda

Doing Bayesian Data Analysis

John K. Kruschke
Bayesian Statistics

Basic book and light on the mathematics, but great for gaining a strong understanding of Bayesian statistics for data analysis.

Causality: Models, Reasoning and Inference

Judea Pearl
Statistics

All intricacies and terms for causal models. Used in Lise Getoor’s class on advanced ML. – Natalia

Econometrics

Fumio Hayashi
Inferential Statistics

Great depth and eloquence on econometric problems and time series. – Alex

Mostly Harmless Econometrics

Joshua Angrist, Jörn-Steffen Pischke
Inferential Statistics

This is an approachable introduction to causal modeling through the econometrics lens. It sits somewhere between a textbook and a less techincal read in that it has formulas, but it also contextualizes with lots of intuitive examples. It’s a great introduction to the field! – John & Hilary

Data Analysis Using Regression and Multilevel/Hierarchical Models

Andrew Gelman, Jennifer Hill
Statistics

A must-have for multilevel modeling. Good examples and easy to follow. – Songya

Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets

Nassim Taleb
Statistics

It’s more conceptual (i.e. light on advanced topics in statistics). But it articulates extremely well all of our all-too-human tendencies to misuse statistics. The Black Swan and Anti-fragile are also very good.

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeff Ullman
Data mining

A reference to a wide range of data mining algorithms, focusing on large scale problems. – Sky

Mining the social web

Matthew A. Russell
Data Mining

Includes code and how to create applications using social network APIs. – Natalia

How To Speak

Patrick Winston
Speaking

I present to you: the infamous Patrick Winston. Enough said. – Hoda

How to write a lot

Paul J. Silvia
Writing

A really excellent (and funny!) book on writing and general academic success. Much of the advice ports over to any research role, where success is a Poisson process so you just have to keep trying! – Hilary

The Art of Data Science

Roger D. Peng, Elizabeth Matsui
Coding / Analysis Style

This book is one of many produced by the professors who teach the Coursera Data Science track. Great resources for introductory-level material, with many examples in R.

Tidy Data (paper)

Hadley Wickham
Coding / Analysis Style

In the absense of a good “tidy data analysis” book, this is a great primer for tidy data analysis that is the foundation of the “Hadleyverse” in R (tidyr, dplyr, ggplot2, etc.).

Kaggle problems

Many!
General

Another good resource is just to try your hand at some Kaggle problems / read other people’s solutions/thought process. It’s good practice for real-life applications of sorts. Not a book, but still. – Hoda

Unlocking the Clubhouse

Allan Fisher, Jane Margolis
General

An empirical look at the causes of the gender gap in Computer Science.

Nice Girls Just Don't Get It

Lois P. Frankel, Carol Frohlinger
General

This book was a very helpful read for me early on in my career in entering a male-dominated field. Not everyone will be facing the same issues or need the same advice, and furthermore any one book will only present one lens to look at the myriad issues through. However, I was grateful I read it and regularly recommend it to friends. – Hilary

Whistling Vivaldi: How Stereotypes Affect Us and What We Can Do (Issues of Our Time)

Claude M. Steele
General

Ideas in this book have stuck with me for years. Describes anyone’s performance can be affected just by being reminded of a stereotype that exists against them in the task at hand. Also offers individual and social strategies to reverse the impact.

Python Cookbook

David Beazley, Brian K. Jones
Python

Amazing cookbook resource for all things Python related.

Python for Data Analysis

Wes McKinney
Python

Introduction to pandas library, which enables programmatic data analysis in Python. Saves you a lot of time to read the text vs. fumbling around when first learning pandas. – Ceslee

Art of R Programming

Norman Matloff
R

Great introduction for new R users. – Kyle

Advanced R

Hadley Wickham
R

A deep and informative look at the R language.

Interactive Data Visualization for the Web

Scott Murray
Visualization

One of the most fun books I’ve read since you can try the code examples real time (online version). Great primer to d3. – Ceslee

Show more