Python for Data Analysis (disclaimer) is written by Wes McKinney, the original author of the excellent Pandas library. I highly recommend this book for anyone who interacts with data. The scope of the book goes well beyond Pandas and covers other essential python data tools such as IPython, Numpy and Matplotlib. Also included are recommendations and best practices for data workflows and interactive analysis. The examples in the book are well thought out and illustrate the point in question without unnecessary complication. As a bonus a diverse group of data sets are used in the examples, which makes for a more interesting read.
One useful function that I had previously overlooked is the apply
method of pandas groupby
objects. The apply
method applies a function to each group in the groupby
object, then glues the results together row wise. I like apply
because it’s an elegant way to do arbitrary operations to each group of data, replacing cases where I might otherwise have used a loop like the one below:
result_dict = {} for group_name, group in groupby_object: result = some_function(group) result_dict[group_name] = result
There are some useful examples of using apply
in the pandas documentation. There are even more examples in the Python for Data Analysis book, including applying a regression model to the data in each group.