Top menu

Mastering Spark for Data Science

We are proud to announce the publication of “Mastering Spark for Data Science”, of which our CEO Andrew Morgan was one of the authors.

The book is a whirlwind of expertise, and aimed at highlighting methods that represent the art of the possible with Spark. Antoine, Dave and Matt, the other authors, are all incredible data science engineers – and the book is essential reading for anyone looking to do really big data science studies.

Screen Shot 2017-04-13 at 15.42.12

What is really exciting is that the book also includes a whole chapter explaining an implementation in Apache Spark of the TrendCalculus algorithm for detecting trend change-points.

To recap – the TrendCalculus algorithm itself delivers a very (very) fast and efficient trend change-point detection function that works on a particular time scale over a stream of time series data. The effect is very similar to delivering a piecewise linear regression. We’re using it to help study a big feed of data having tens of thousands of time series.

With the publication of “Mastering Spark for Data Science” complete, it makes sense now for us to start releasing our own internal documentation to compliment that book, and to go into more detail about how to apply the code to hard problems.

Now with more time on his hands, Andrew is working on that cleaning up those documents for release – watch this space for announcements!