Archive

Monthly Archives: January 2014

Data Science for Business – by Foster Provost and Tom Fawcett

5_star

In a nutshell: If you are looking for a simple (but not simplistic) introduction to nearly all of the underlying data science fundamentals then look no further, because this is the book for you! You’ll learn about data-analytic thinking, correlation, segmentation, model fitting/overfitting, similarity (k-NN etc.), clustering (k-means etc.), probability (Bayes et al), mining text, result evaluation etc., and more!

I often work as a software developer, and so like to think that I have a good working knowledge of data science (‘data analytics’, ‘big data’ etc.), which has been achieved through College/Uni education and also through the modern media and blog posts etc. However, I often struggled at times to fully understand, and perhaps more importantly knit together and apply, the core fundamentals of the topic. This book has provided exactly the explanations and ‘glue’ that I required, in that it delivers a very well structured (and paced) introduction and overview of data science, and also how to think in a ‘data-analytics’ manner.

If you preview the book on Amazon with the ‘look inside’ feature then what you see in the table of contents is exactly what you get. Every chapter delivers upon its title (and promised ‘fundamental concepts’), and frequently builds superbly upon topics introduced in early chapters. You’ll move seamlessly from understanding how to frame data science questions, to learning about correlation and segmentation, to model fitting and overfitting, and on to similarity and clustering. With a brief pause to discuss exactly ‘what is a good model’ you’ll then be thrust back into learning about visualising model performance, evidence and probabilities and then how to explore mining text.

The concluding chapters draw upon and summarise how to practically choose and apply the techniques you’ve learnt, and provide great discussion on how to solve business problems through ‘analytical engineering’. There is also some bonus discussion on other tools and techniques that build upon earlier concepts which you might find useful, data science and business strategy, and some general thinking points around topics such as the need to human intervention in data analysis and privacy and ethics.

The book is superbly written and reads very easily, which for the potentially dry topic of data science is worthy of praise alone. The majority of chapters took me each approximately an hour to read, and then another couple of hours to re-read and ponder upon (and sometimes looking at other provided references) to fully understand some of the more complex topics and how everything related together. Each chapter also provided plenty of pointers and experimentation ideas if I wanted to go away and practically explore the topic further (say, with the Mahout framework, or R, or scikit-learn/Pandas etc.). The book could probably be read by dipping in and out of chapters, but I think you’ll get a whole lot more from a cover-to-cover reading.

In summary, this is a superb book for those looking for a solid and comprehensive introduction to data science and data analytics for business, and I’m sure will that even the more experienced practitioners of the art will find something useful here. The book introduces topics in a perfect order, superbly builds your knowledge chapter after chapter, and constantly relates and reinforces the various techniques and tools your learning as it progresses. I wish more text/learning books were written this well!

Click here to buy  Data Science for Business: What you need to know about data mining and data-analytic thinking on Amazon UK (This is a sponsored link. Please click through and help a fellow developer buy some more books!)