Azure Data Lake – managing data access

When setting up Azure Data Lake services, it is possible to combine access to the actual data with Azure Active Directory B2B. The combination of these services allow external vendors and or partners to connect to the data in Azure Data Lake, but under the governance of your and their company. The logins for accessing the data is based on the source company credentials, while the permissions are set by the Data Lake administrator.

When looking at the Data Lake data explorer, there are three basic permissions that can be set in two kinds of ACL’s:

Access ACL’s: These control access to an object. Files and Folders both have Access ACL’s

Default ACL’s: This is a “template” of ACL’s that is associated with a folder. These determine the ACL’s for any child item under that folder.

Within each ACL you have Read, Write and Execute, they correspond to the Linux file system, and in short they do the following:

File Folder
Read (R) Can read the contents of a file Requires Read and Execute to list the contents of the folder
Write (W) Can write or append to a file Requires Write and Execute to create child items in a folder
Execute (X) Does not mean anything in the context of Data Lake Store Required to traverse the child items of a folder

More information on setting permissions can be found here: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control

When exposing data to partners, you might hit the limit of 32 ACL’s per file or folder. It is however possible to assign permissions based on groups. These groups are Azure Active Directory groups that can contain AAD accounts, as well as AAD guest accounts. This allows for a very flexible architecture to assign ACL’s on:

 

In the image above we have 9 groups defined in the AAD that determine the actual access to the data structure in ADL. Each group has a specific ACL set in the ADL based on a default ACL or access ACL (depending on the requirements). As groups can be a member of other groups in AAD it is now easy to map the access structure in AAD. Vendor 1 is a member of 4 (access) groups, while vendor 2 is a member of 6 (access) groups. Note that in order to traverse the folder paths, the user/group needs to have execute permissions on the top level folder (and subsequent folders they have access to). That is why in the image above, both vendors have Execute permission on the root folder.

The beauty of this system is, is that users are allowed (in default config) to create and manage their own groups and the group membership. As such, the ADL administrators can now determine themselves who has access to their data. AAD administrators can set the owner of (existing) groups through the Azure AAD portal. Multiple owners can be specified for a group.

However, users (if allowed) can also create their own groups through the myapps.microsoft.com website. When allowed, they can create and fully manage their own groups. Given the ADL administrator (usually) also has full access to the ADL security principals, they can now fully manage their own service.

Note that due to a bug, it is currently not possible to see/add/remove group members (specified as a group) to a group. Only user accounts can be added/removed. Please see below, how to manually add the groups through PowerShell.

The group owner can also invite external people into their groups (if allowed by setting). When adding members, the only thing the group owner has to do is type the email address of the user that needs to be added and send the invite

The invitee will receive an email message to accept the invite, and after it has been accepted, the user access is automatically granted.

Given there is now an account, the invited user can now use his / her account to actually connect to the Azure Data Lake, either via Portal.azure.com.

Important: be aware that allowing a user access to the Azure Data Lake file structure, also allows them access to the Azure portal and related services. While the user is unable to deploy / read any other item in the Azure portal, a wrong permission setting on a Resource Group or resource might reveal information from the Azure portal. 

Be aware that the default setting in AAD is that gusts can also invite other guests, please see below

When providing write access to the Azure Data Lake for partners, be aware that large uploads of data will incur additional charges as the Azure Data Lake is being filled with additional data. Also, data that is being extracted from the ADL is being charged as part of the egress traffic counter.

As ADL provides HDFS capable storage, some vendors might want to have HDInsight, Hadoop, or other compute services running against the stored data. This usually involves providing access to a service account that has been created in AAD. While service accounts cannot be added to the group via the portal or any other (available) GUI, it must be added via PowerShell to the group.

First we need to create a service principal, this can be done the easiest in PowerShell, using the new-msolserviceprincipal commandlet:

As the AAD Admin, I already created a group called vendor 3, set the owner of the group, and using get-msolgroup I got the ObjectID for this group.

The group owner can now connect with PowerShell to the tenant (connect-MSOLService), and actually add the ServicePrincipalName directly into the group.

The command above was executed by the group owner not the AAD administrator.

So, it is possible for the AAD admin to pre-create a number of service principals, and provide the details of these to the ADL admin. The ADL admin then can distribute these to the different vendors to be used to connect to the ADL.

Given PowerShell provides the group admin a direct interface for managing the groups, it is also possible to add other groups into other group that he/she owns. By retrieving the group ID’s it is fairly easy to manage the group memberships. Note however that inviting external users to a group is best done via the MyApps portal or the Azure Portal (if you are an AAD admin) given the complexity in the request.

You can retrieve the group objectID by issuing: get-msolgroup and search for your group you want to use and want to add to the group. And use these in combination with the add-msolgroupmember command let.

And if you want to actually use it in combination with Cloudera, please see: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_adls_config.html

 

AAD Settings

The access to AAD is controlled by settings that can be set by the AAD administrator. These settings determine if a user can invite guests into the organization and if these guests in their turn can invite others. Furthermore, these settings also dictate if a user can actually create a group in the AAD. The first two settings can be found in the User Settings tab on the AAD main page in the Azure Portal.

The group settings are located in the Users and Groups / Groups console of the AAD in Azure:

Tagged , , ,