StreamSets Transformer Extensibility: Spark and Machine Learning Part One

Apache Spark has been on the rise for the past few years, and it continues to dominate the landscape when it comes to in-memory and distributed computing, real-time analysis, and machine learning use cases. And with the recent release of StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, you can quickly start leveraging all the benefits and power of Apache Spark with minimal operational and configuration overhead.

In this blog, you will learn how to extend StreamSets Transformer in order to train a Spark ML RandomForestRegressor model.