I recently read the book Building Machine Learning Systems with Python by Willi Richert and Luis Pedro Coelho (disclaimer). Overall I think it is worth reading for someone who is already familiar with coding in python (and the numpy library) and is interested in using python machine learning libraries. I can’t recommended it as strongly for someone who is unfamiliar with python because the code in the book is often unpythonic (in my opinion), and the code available on the book’s website doesn’t match well with the code in the book and requires a fair bit of tweaking before you can actually run the examples.
I especially enjoyed exploring the gensim library, which is touched on in the book. I also liked the approach taken in the book of building and analyzing machine learning systems as an iterative process, exploring models and features to converge on a good solution.
One thing I think could improve the code in the book is better variable names. For example the following code is needlessly cryptic:
dense = np.zeros( (len(topics), 100), float) for ti,t in enumerate(topics): for tj,v in t: dense[ti,tj] = v
There is also a blundering use of a list comprehension to reshape a numpy array:
x = np.array([[v] for v in x])
Using the built in reshape
method is more memory efficient, easier to read, and 1,000 times faster:
In [25]: x.shape Out[25]: (506L,) In [26]: %timeit y = np.array([[v] for v in x]) 100 loops, best of 3: 4.07 ms per loop In [27]: %timeit y = x.reshape((x.size, 1)).copy() 100000 loops, best of 3: 4.09 µs per loop