One of the greatest benefits of working among a diverse group of data scientists and data engineers at Stitch Fix is how much we can learn from our peers. Usually that means getting ad hoc help with specific questions from the resident expert(s). But it also means getting advice on how best to fill any gaps in our own skill sets or knowledge bases, or just what interesting data science materials to explore in our spare time. Our blog posts usually highlight the former; this post touches on the latter.
xkcd
We’ve queried our data science team for some of their favorite data science books. This list is by no means exhaustive, but should keep any data scientist/engineer new or old learning and entertained for many an evening. Some of the suggestions included context from the data scientist, so for those we include that person’s name!
An Introduction to Statistical Learning
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Machine Learning
A great intro on the fundamentals of statistics and machine learning with R-based tutorials. It’s a great stand-alone text for someone interested in learning statistical modeling, but not necessarily hypothesis tests (usually the first big concept covered in introductory statistics textbooks). It is also a good to prep for the more advanced Elements of Statistical Learning.
Elements of Statistical Learning
Trevor Hastie, Robert Tibshirani and Jerome Friedman
Machine Learning
A classic amongst the machine learning texts; concise and deep review on basics of machine learning from giants in the field.
Machine Learning, a probabilistic perspective
Kevin P. Murphy
Machine Learning
Perfect reference book to find details about that probability distribution or that algorithm including examples. – Eli
Probability Theory: The Logic of Science
E. T. Jaynes
Probability
Great book on probabiity theory - a must have on any bookshelf. – Eli
Introduction to Probability
Dimitri P. Bertsekas, John N. Tsitsiklis
Probability
Great into to probability. Beautifully written, easy to read with great examples. Classic! (I may also be biased because my former advisor wrote it, but he’s an incredible teacher). – Hoda
Doing Bayesian Data Analysis
John K. Kruschke
Bayesian Statistics
Basic book and light on the mathematics, but great for gaining a strong understanding of Bayesian statistics for data analysis.
Causality: Models, Reasoning and Inference
Judea Pearl
Statistics
All intricacies and terms for causal models. Used in Lise Getoor’s class on advanced ML. – Natalia
Econometrics
Fumio Hayashi
Inferential Statistics
Great depth and eloquence on econometric problems and time series. – Alex
Mostly Harmless Econometrics
Joshua Angrist, Jörn-Steffen Pischke
Inferential Statistics
This is an approachable introduction to causal modeling through the econometrics lens. It sits somewhere between a textbook and a less techincal read in that it has formulas, but it also contextualizes with lots of intuitive examples. It’s a great introduction to the field! – John & Hilary
Data Analysis Using Regression and Multilevel/Hierarchical Models
Andrew Gelman, Jennifer Hill
Statistics
A must-have for multilevel modeling. Good examples and easy to follow. – Songya
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets
Nassim Taleb
Statistics
It’s more conceptual (i.e. light on advanced topics in statistics). But it articulates extremely well all of our all-too-human tendencies to misuse statistics. The Black Swan and Anti-fragile are also very good.
Mining of Massive Datasets
Jure Leskovec, Anand Rajaraman, Jeff Ullman
Data mining
A reference to a wide range of data mining algorithms, focusing on large scale problems. – Sky
Mining the social web
Matthew A. Russell
Data Mining
Includes code and how to create applications using social network APIs. – Natalia
How To Speak
Patrick Winston
Speaking
I present to you: the infamous Patrick Winston. Enough said. – Hoda
How to write a lot
Paul J. Silvia
Writing
A really excellent (and funny!) book on writing and general academic success. Much of the advice ports over to any research role, where success is a Poisson process so you just have to keep trying! – Hilary
The Art of Data Science
Roger D. Peng, Elizabeth Matsui
Coding / Analysis Style
This book is one of many produced by the professors who teach the Coursera Data Science track. Great resources for introductory-level material, with many examples in R.
Tidy Data (paper)
Hadley Wickham
Coding / Analysis Style
In the absense of a good “tidy data analysis” book, this is a great primer for tidy data analysis that is the foundation of the “Hadleyverse” in R (tidyr, dplyr, ggplot2, etc.).
Kaggle problems
Many!
General
Another good resource is just to try your hand at some Kaggle problems / read other people’s solutions/thought process. It’s good practice for real-life applications of sorts. Not a book, but still. – Hoda
Unlocking the Clubhouse
Allan Fisher, Jane Margolis
General
An empirical look at the causes of the gender gap in Computer Science.
Nice Girls Just Don't Get It
Lois P. Frankel, Carol Frohlinger
General
This book was a very helpful read for me early on in my career in entering a male-dominated field. Not everyone will be facing the same issues or need the same advice, and furthermore any one book will only present one lens to look at the myriad issues through. However, I was grateful I read it and regularly recommend it to friends. – Hilary
Whistling Vivaldi: How Stereotypes Affect Us and What We Can Do (Issues of Our Time)
Claude M. Steele
General
Ideas in this book have stuck with me for years. Describes anyone’s performance can be affected just by being reminded of a stereotype that exists against them in the task at hand. Also offers individual and social strategies to reverse the impact.
Python Cookbook
David Beazley, Brian K. Jones
Python
Amazing cookbook resource for all things Python related.
Python for Data Analysis
Wes McKinney
Python
Introduction to pandas library, which enables programmatic data analysis in Python. Saves you a lot of time to read the text vs. fumbling around when first learning pandas. – Ceslee
Art of R Programming
Norman Matloff
R
Great introduction for new R users. – Kyle
Advanced R
Hadley Wickham
R
A deep and informative look at the R language.
Interactive Data Visualization for the Web
Scott Murray
Visualization
One of the most fun books I’ve read since you can try the code examples real time (online version). Great primer to d3. – Ceslee