Hopefully you have enjoyed the previous four articles in this series. In the last article of this series, we will introduce the last high applicability scenario: “Windows AD + Open-Source Ranger.”
Apache Ranger and AWS EMR Automated Installation and Integration Series (4): OpenLDAP + Open-Source Ranger
In the previous two articles, we introduced the EMR-native Ranger integration solution with OpenLDAP and Windows AD. In this article, we will introduce open-source Ranger integration. This article will discuss “OpenLDAP + Open-Source Ranger.”
1. OpenLDAP + Open-Source Ranger Solution Overview
1.1 Solution Architecture
Apache Ranger and AWS EMR Automated Installation and Integration Series (3): Windows AD + EMR-Native Ranger
In this article, we will introduce the solution against “Scenario 2: Windows AD + EMR-Native Ranger.” Just like in the previous article, we will introduce the solution architecture, give detailed installation step descriptions, and verify the installed environment.
1. Solution Overview
1.1 Solution Architecture
Apache Ranger and AWS EMR Automated Installation and Integration Series (2): OpenLDAP + EMR-Native Ranger
In the first article of this series, we got a full picture of EMR and Ranger integration solutions. From now on, we will start to introduce concrete solutions one by one. This article is against “Scenario 1: OpenLDAP + EMR-Native Ranger.” We will introduce the architecture of solution, give detailed installation step descriptions, and verify installed environment.
1. Solution Overview
1.1 Architecture
In this solution, OpenLDAP plays the authentication provider, all user accounts data store on it, and Ranger plays the authorization controller. Because we select the EMR-native Ranger solution, which strongly depends on Kerberos, a Kerberos KDC is required. In this solution, we recommend choosing a cluster-dedicated KDC created by EMR instead of an external KDC. This can help us save the job of installing Kerberos. If you have an existing KDC, this solution also supports it.
Apache Ranger and AWS EMR Automated Installation and Integration Series (1): Solutions Overview
System security usually includes two core topics: authentication and authorization. One solves the problem of “Who is s/he?” and the other solves the problem of “Does s/he have permission to perform an operation?” In the big data area, Apache Ranger is one of the most popular choices for authorization, it supports all mainstream big data components, including HDFS, Hive, HBase, and so on. As Amazon EMR rolls out native ranger (plugins) features, users can manage the authorization of EMRFS(S3), Spark, Hive, and Trino all together. For authentication, an organization usually has its own centralized authentication infrastructure, i.e., Windows AD or OpenLDAP; however, for most big data components, Kerberos is only supported authentication mechanism, so users usually need to integrate Windows AD/OpenLDAP and Kerberos together to unify authentication.
We will focus on how to implement automated installation and integration for Amazon EMR and Apache Ranger. This series is composed of four articles. Each article will introduce a completed solution against different technology stacks.