On Some Aspects of Big Data Processing in Apache Spark, Part 1: Serialization

Many beginner Spark programmers encounter a "Task not serializable" exception when they try to break their Spark applications into Java classes. There are a number of posts to instruct developers on how to solve this problem. Also, there are excellent overviews of Spark. Nonetheless, I think it is worthwhile to look at the Spark source code to see where and how tasks get serialized and such exceptions are thrown to better understand those instructions.  

This post is organized as follows: