前言

Sentry是Hadoop生態(tài)中的一員蚯嫌，扮演著“守門人”的角色拢操，看守著大數(shù)據(jù)平臺的數(shù)據(jù)安全的訪問渠概。它以Plugin的形式運行于組件中茶凳，通過關系型數(shù)據(jù)庫（PostgreSQL嫂拴、MySQL）或本地文件來存取訪問策略，對數(shù)據(jù)使用者提供細粒度的訪問控制贮喧。本文試圖在源碼層剖析Sentry的鑒權過程筒狠，以幫助更好的理解權限的鑒定過程。博客地址Sentry源碼之HiveServer2鑒權過程

Sentry架構簡述

Sentry的設計目標是作為一層獨立的訪問控制層來對Hadoop組件（目前支持HDFS箱沦，Hive辩恼，Impala，solr谓形，kafka灶伊，sqoop）進行授權/鑒權操作，因此它的耦合度很低寒跳，以插件的形式工作于組件之上聘萨。可以把它看作Java web中的filter童太，當用戶請求過來的時候米辐，sentry截獲了用戶的信息，對用戶的權限進行驗證书释，如果成功儡循，則讓該請求通過；否則征冷，拋出異常择膝，阻斷該請求。

Sentry是一個分層的結構检激，如下圖所示

image

Binding層 負責將用戶對Hadoop組件的訪問請求截獲肴捉，并解析出其中的用戶信息，以便進行鑒權
Provider層 是一個較通用的權限策略驗證層叔收，在這里抽象了權限對象齿穗，并對用戶所具備的權限對象進行驗證
Policy Metadata Store 負責與策略的存儲和讀取，目前支持文件存儲和關系型數(shù)據(jù)庫存儲方式饺律。

由上圖結合源碼分析窃页，Sentry的大致工作流程為：

Binding層攔截用戶的訪問，并將用戶信息解析出來复濒，暫存到一個subject對象中
Policy Metadata Store層根據(jù)用戶訪問的資源對象（表名）和用戶信息（subject）從底層存儲（文件或關系型數(shù)據(jù)庫）中讀取兩個權限對象列表：requireList（需要有的權限）和obtainList（用戶當前的權限）
Policy Engine根據(jù)讀取到的兩個權限對象列表脖卖，逐一進行權限的比對，缺少任何一個權限都要拋出異常巧颈，只有當完全滿足時畦木，將此訪問請求通過

源碼分析

下面以HiveServer2為例，分析Sentry是如何進行鑒權工作的砸泛，以此為切入點十籍，剖析Sentry的通用鑒權模型蛆封。上面提到，Sentry的鑒權過程中主要分為了Binding勾栗、Policy Engine和Policy MetadataStore三層的協(xié)作惨篱，下面逐一進行分析。

Binding

上面談到Binding的主要工作是解析用戶信息围俘，那么Sentry是如何截獲用戶對Hadoop組件的請求的呢妒蛇？拿HiveServer2為例，用戶在連接的時候楷拳，會由HiveServer2創(chuàng)建一個session绣夺，該session中保存了用戶的用戶名等信息，該session在該用戶的整個TCP連接中都會保留欢揖，因此如果可以獲得該session陶耍，便可以獲得用戶名。

HiveServer2中提供了一個方便的接口叫作HiveSessionHook她混，其中只有一個run方法烈钞，在session manager創(chuàng)建一個session的時候，會進行調用坤按。這是一個Hive提供的hook機制毯欣，方便進行自定義的hook動作，Sentry使用了這個Hook臭脓，定義了一個HiveAuthzBindingSessionHookV2類實現(xiàn)了HiveSessionHook接口酗钞，重寫了其中的run方法。代碼如下：

  @Override
  public void run(HiveSessionHookContext sessionHookContext) throws HiveSQLException {
    // Add sentry hooks to the session configuration
    HiveConf sessionConf = sessionHookContext.getSessionConf();

    appendConfVar(sessionConf, ConfVars.SEMANTIC_ANALYZER_HOOK.varname, SEMANTIC_HOOK);
    // enable sentry authorization V2
    sessionConf.setBoolean(HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED.varname, true);
    sessionConf.setBoolean(HiveConf.ConfVars.HIVE_SERVER2_ENABLE_DOAS.varname, false);
    sessionConf.set(HiveConf.ConfVars.HIVE_AUTHENTICATOR_MANAGER.varname,
        "org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator");

    // grant all privileges for table to its owner
    sessionConf.setVar(ConfVars.HIVE_AUTHORIZATION_TABLE_OWNER_GRANTS, "");

    // Enable compiler to capture transform URI referred in the query
    sessionConf.setBoolVar(ConfVars.HIVE_CAPTURE_TRANSFORM_ENTITY, true);

    // set security command list
    HiveAuthzConf authzConf = HiveAuthzBindingHookBaseV2.loadAuthzConf(sessionConf);
    String commandWhitelist =
        authzConf.get(HiveAuthzConf.HIVE_SENTRY_SECURITY_COMMAND_WHITELIST,
            HiveAuthzConf.HIVE_SENTRY_SECURITY_COMMAND_WHITELIST_DEFAULT);
    sessionConf.setVar(ConfVars.HIVE_SECURITY_COMMAND_WHITELIST, commandWhitelist);

    // set additional configuration properties required for auth
    sessionConf.setVar(ConfVars.SCRATCHDIRPERMISSION, SCRATCH_DIR_PERMISSIONS);

    // setup restrict list
    sessionConf.addToRestrictList(ACCESS_RESTRICT_LIST);

    // set user name
    sessionConf.set(HiveAuthzConf.HIVE_ACCESS_SUBJECT_NAME, sessionHookContext.getSessionUser());
    sessionConf.set(HiveAuthzConf.HIVE_SENTRY_SUBJECT_NAME, sessionHookContext.getSessionUser());

    // Set MR ACLs to session user
    updateJobACL(sessionConf, JobContext.JOB_ACL_VIEW_JOB, sessionHookContext.getSessionUser());
    updateJobACL(sessionConf, JobContext.JOB_ACL_MODIFY_JOB, sessionHookContext.getSessionUser());
  }

英文注釋已經(jīng)比較詳細来累，在此有幾點需要注意的是：

HiveConf是Configuration的一個子類砚作，可以把它看成一個Map集合，存放了Hive當前session的一些配置信息嘹锁，默認會將hive-site.xml中的配置載入葫录，因此通過HiveConf就可以獲得hive-site.xml中的配置項。
semantic analyzer hook也被注入了進來领猾，它也是一個hook米同，在SQL語句的語法分析階段觸發(fā)，可以在此完成一些鑒權的操作摔竿，但sentry的主要鑒權邏輯并不在此實現(xiàn)
SCRATCH_DIR_PERMISSIONS的值為700面粮，是對目錄的權限賦值，對應為111000000拯坟，也就是對該用戶有r但金、w、x權限
ACCESS_RESTRICT_LIST是一個key的集合郁季，該集合中的key值對應的value值不允許用戶修改
HiveAuthzConf也是Configuration的一個子類冷溃，可以把它看做sentry-site.xml中的配置信息
設置subject name，這里為用戶名梦裂，用于之后的用戶鑒權似枕，每個用戶對應一定的權限。

Binding層至此就分析完畢了年柠，主要使用了HiveServer2中的session hook凿歼，將session的用戶名讀取并設置到一個key值中，以備之后的使用冗恨。

權限驗證

HiveServer2原生提供了訪問控制邏輯答憔，Sentry在此基礎上進行了RBAC概念的強化，使得權限只能賦予給角色掀抹，角色賦予給用戶/用戶組虐拓，由此就有了權限——角色——用戶組——用戶的鏈式關系。當拿到用戶名之后傲武，通過數(shù)據(jù)庫中讀取其角色和相應的權限集合蓉驹，便可以進行權限的驗證了。Sentry中跟權限驗證相關的類關系如下圖所示：

image

類/接口的右上角表示其屬于Hive還是Sentry揪利，空心菱形代表的是實現(xiàn)的接口态兴，實心箭頭指向的為內部的一個引用對象。

HiveAuthorizerFactory和HiveAuthorizer都來自于Hive且都為接口疟位，HiveAuthorizerFactory實現(xiàn)了一個抽象工廠模式瞻润，返回一個HiveAuthorizer
SentryAuthorizerFactory和SentryHiveAuthorizer分別是Sentry的兩個對應實現(xiàn)，到此HiveServer2的訪問控制就交給了Sentry處理
SentryHiveAuthorizer內有兩個引用接口甜刻，分別為SentryHiveAccessController和SentryHiveAuthorizationValidator,分別負責授權（grant/revoke)和鑒權（checkPrivileges）操作
SentryHiveAccessController的默認實現(xiàn)為DefaultSentryAccessController
SentryHiveAuthorizationValidator的默認實現(xiàn)為DefaultSentryValidator,其中的checkPrivileges方法負責鑒權敢订，在該方法中調用了HiveAuthzBinding的authorize方法完成最終的權限驗證

authorize

上面說到DefaultSentryValidator中的checkPrivileges方法調用了authorize方法進行實際的權限驗證，代碼如下：

hiveAuthzBinding.authorize(hiveOp, stmtAuthPrivileges,
          new Subject(authenticator.getUserName()), inputHierarchyList, outputHierarchyList);

hiveOp是本次sql語句轉化為的HiveOperation枚舉對象罢吃，它表示了當前SQL對應的操作
stmtAuthPrivileges表示本次操作所需的權限集合楚午，它從一個預先定義好的系統(tǒng)常量表中根據(jù)hiveOp的類型取出
new Subject表示的是當前的用戶
inputHierarchyList和outputHierarchyList分別表示輸入對象和輸出對象

由上面?zhèn)魅氲膮?shù)可以看出，除了subject是用戶相關的信息外尿招，其他全部都是本次SQL操作所需要的權限信息矾柜，其中stmtAuthPrivileges直接表示本次operation需要的權限，inputHierarchyList和outputHierarchyList表示了本次SQL需要訪問的輸入就谜、輸出資源怪蔑，因此，鑒權驗證需要分為兩步：

用戶是否擁有對輸入對象列表的該operation對應的訪問權限
用戶是否擁有對輸出對象列表的該operation對應的訪問權限

下面我們進入authorize方法一探究竟

  public void authorize(HiveOperation hiveOp, HiveAuthzPrivileges stmtAuthPrivileges,
      Subject subject, List<List<DBModelAuthorizable>> inputHierarchyList,
      List<List<DBModelAuthorizable>> outputHierarchyList)
          throws AuthorizationException {
    if (!open) {
      throw new IllegalStateException("Binding has been closed");
    }
    boolean isDebug = LOG.isDebugEnabled();
    if(isDebug) {
      LOG.debug("Going to authorize statement " + hiveOp.name() +
          " for subject " + subject.getName());
    }

    /* for each read and write entity captured by the compiler -
     *    check if that object type is part of the input/output privilege list
     *    If it is, then validate the access.
     * Note the hive compiler gathers information on additional entities like partitions,
     * etc which are not of our interest at this point. Hence its very
     * much possible that the we won't be validating all the entities in the given list
     */

    // Check read entities
    Map<AuthorizableType, EnumSet<DBModelAction>> requiredInputPrivileges =
        stmtAuthPrivileges.getInputPrivileges();
    if(isDebug) {
      LOG.debug("requiredInputPrivileges = " + requiredInputPrivileges);
      LOG.debug("inputHierarchyList = " + inputHierarchyList);
    }
    Map<AuthorizableType, EnumSet<DBModelAction>> requiredOutputPrivileges =
        stmtAuthPrivileges.getOutputPrivileges();
    if(isDebug) {
      LOG.debug("requiredOuputPrivileges = " + requiredOutputPrivileges);
      LOG.debug("outputHierarchyList = " + outputHierarchyList);
    }

    boolean found = false;
    for (Map.Entry<AuthorizableType, EnumSet<DBModelAction>> entry : requiredInputPrivileges.entrySet()) {
      AuthorizableType key = entry.getKey();
      for (List<DBModelAuthorizable> inputHierarchy : inputHierarchyList) {
        if (getAuthzType(inputHierarchy).equals(key)) {
          found = true;
          if (!authProvider.hasAccess(subject, inputHierarchy, entry.getValue(), activeRoleSet)) {
            throw new AuthorizationException("User " + subject.getName() +
                " does not have privileges for " + hiveOp.name());
          }
        }
      }
      if (!found && !key.equals(AuthorizableType.URI) && !(hiveOp.equals(HiveOperation.QUERY))
          && !(hiveOp.equals(HiveOperation.CREATETABLE_AS_SELECT))) {
        //URI privileges are optional for some privileges: anyPrivilege, tableDDLAndOptionalUriPrivilege
        //Query can mean select/insert/analyze where all of them have different required privileges.
        //CreateAsSelect can has table/columns privileges with select.
        //For these alone we skip if there is no equivalent input privilege
        //TODO: Even this case should be handled to make sure we do not skip the privilege check if we did not build
        //the input privileges correctly
        throw new AuthorizationException("Required privilege( " + key.name() + ") not available in input privileges");
      }
      found = false;
    }

    for (Map.Entry<AuthorizableType, EnumSet<DBModelAction>> entry : requiredOutputPrivileges.entrySet()) {
      AuthorizableType key = entry.getKey();
      for (List<DBModelAuthorizable> outputHierarchy : outputHierarchyList) {
        if (getAuthzType(outputHierarchy).equals(key)) {
          found = true;
          if (!authProvider.hasAccess(subject, outputHierarchy, entry.getValue(), activeRoleSet)) {
            throw new AuthorizationException("User " + subject.getName() +
                " does not have privileges for " + hiveOp.name());
          }
        }
      }
      if(!found && !(key.equals(AuthorizableType.URI)) &&  !(hiveOp.equals(HiveOperation.QUERY))) {
        //URI privileges are optional for some privileges: tableInsertPrivilege
        //Query can mean select/insert/analyze where all of them have different required privileges.
        //For these alone we skip if there is no equivalent output privilege
        //TODO: Even this case should be handled to make sure we do not skip the privilege check if we did not build
        //the output privileges correctly
        throw new AuthorizationException("Required privilege( " + key.name() + ") not available in output privileges");
      }
      found = false;
    }

  }

由代碼可知丧荐，傳入的stmtAuthPrivileges包含了輸入對象權限map和輸出對象權限map缆瓣，需要分別對它們進行權限的驗證，map的key值為一個AuthorizableType枚舉對象虹统，取值為Server,Db,Table,Column,View,URI中的一種弓坞，對于每一個AuthorizableType隧甚，至少有一個inputList或outputList與其authzType相同,此時通過Provider的hasAccess方法判斷該用戶是否對該對象列表擁有相應的權限(entry.getValue代表了需要的權限)。

如果沒有一個inputList或者outputList與之類型相同渡冻，且該AuthorizableType不是uri,hiveOp不是QUERY操作戚扳，則直接拋出異常，這里的意思說族吻，如果對一個表A需要進行除去select之外的操作帽借，則必須擁有相應的權限。

分析到這里發(fā)現(xiàn)超歌，authorize并不是最終判斷權限的方法砍艾，還需要調用Provider的hasAccess方法，這里也很好理解巍举，因為我們這里只有本次操作的訪問控制對象所需要的權限集合脆荷，并沒有該用戶當前獲得的權限集合，因此禀综，我們需要通過Provider來將用戶的權限集合從存儲介質中讀出來简烘，前面提到過，目前支持文件（本地/hdfs）和關系型數(shù)據(jù)庫兩種存儲方式定枷。

Provider中有三個相關的對象孤澎，分別為Policy Engine， Provider欠窒， Provider Backend覆旭。

Policy engine 默認為org.apache.sentry.policy.engine.common.CommonPolicyEngine類
Provider默認為org.apache.sentry.provider.common.HadoopGroupResourceAuthorizationProvider
Backend默認為org.apache.sentry.provider.file.SimpleFileProviderBackend，可以在sentry-site.xml中配置sentry.hive.provider.backend為SimpleDBProviderBackend來使用數(shù)據(jù)庫存儲策略

它們三者的關系是：Provider 包含 Policy Engine 包含 Provider Backend

hasAccess方法內部調用了私有方法doHasAccess岖妄，其定義如下：

  private boolean doHasAccess(Subject subject,
      List<? extends Authorizable> authorizables, Set<? extends Action> actions,
      ActiveRoleSet roleSet) {
    //獲得用戶的組信息
    Set<String> groups =  getGroups(subject);
    //用戶名集合
    Set<String> users = Sets.newHashSet(subject.getName());
    //授權對象集合型将， 形如 table=student
    Set<String> hierarchy = new HashSet<String>();
    for (Authorizable authorizable : authorizables) {
      hierarchy.add(KV_JOINER.join(authorizable.getTypeName(), authorizable.getName()));
    }
    //形如 table=student->select的數(shù)組
    List<String> requestPrivileges = buildPermissions(authorizables, actions);
    //使用policy engine獲取用戶棘钞，角色對應的權限集合,此時讀取數(shù)據(jù)庫或策略文件
    Iterable<Privilege> privileges = getPrivileges(groups, users, roleSet,
        authorizables.toArray(new Authorizable[0]));
    lastFailedPrivileges.get().clear();

    for (String requestPrivilege : requestPrivileges) {
      //將形如table=student->select的字符串創(chuàng)建成Privilege對象乔夯，用于權限驗證
      Privilege priv = privilegeFactory.createPrivilege(requestPrivilege);
      for (Privilege permission : privileges) {
        /*
         * Does the permission granted in the policy file imply the requested action?
         */
        boolean result = permission.implies(priv, model);
        if (LOGGER.isDebugEnabled()) {
          LOGGER.debug("ProviderPrivilege {}, RequestPrivilege {}, RoleSet {}, Result {}",
              new Object[]{ permission, requestPrivilege, roleSet, result});
        }
        if (result) {
          return true;
        }
      }
    }

    lastFailedPrivileges.get().addAll(requestPrivileges);
    return false;
  }

permission.implies(priv, model);是最終的權限驗證步驟端蛆，調用的是Privilege的該方法砚哗，在此處，是Privilege的一個實現(xiàn)類CommonPrivilege脯倒，它通過傳入一個字符串進行構造办悟，將其解析為一個KeyValue的List躬厌，然后在implies方法中使用它來進行權限的驗證铛碑，implies方法如下：

  @Override
  public boolean implies(Privilege privilege, Model model) {
    // By default only supports comparisons with other IndexerWildcardPermissions
    if (!(privilege instanceof CommonPrivilege)) {
      return false;
    }

    List<KeyValue> otherParts = ((CommonPrivilege) privilege).getParts();
    if(parts.equals(otherParts)) {
      return true;
    }

    int index = 0;
    for (KeyValue otherPart : otherParts) {
      // If this privilege has less parts than the other privilege, everything
      // after the number of parts contained
      // in this privilege is automatically implied, so return true
      //這里的含義是狠裹，如果用戶對table擁有權限，當前訪問的對象(other)為column汽烦，則用戶默認擁有對column的權限涛菠，粗粒度的權限包含了細粒度的權限
      if (parts.size() - 1 < index) {
        return true;
      } else {
        KeyValue part = parts.get(index);
        String policyKey = part.getKey();
        // are the keys even equal
        if(!policyKey.equalsIgnoreCase(otherPart.getKey())) {
          // Support for action inheritance from parent to child
          if (SentryConstants.PRIVILEGE_NAME.equalsIgnoreCase(policyKey)) {
            continue;
          }
          return false;
        }

        // do the imply for action
        if (SentryConstants.PRIVILEGE_NAME.equalsIgnoreCase(policyKey)) {
          if (!impliesAction(part.getValue(), otherPart.getValue(), model.getBitFieldActionFactory())) {
            return false;
          }
        } else {
          if (!impliesResource(model.getImplyMethodMap().get(policyKey.toLowerCase()),
                  part.getValue(), otherPart.getValue())) {
            return false;
          }
        }

        index++;
      }
    }

    // If this privilege has more parts than the other parts, only imply it if
    // all of the other parts are wildcards
    //如果該用戶有更細粒度的權限，只有其權限為*時，才讓其通過驗證
    for (; index < parts.size(); index++) {
      KeyValue part = parts.get(index);
      if (!SentryConstants.PRIVILEGE_WILDCARD_VALUE.equals(part.getValue())) {
        return false;
      }
    }

    return true;
  }

至此俗冻，權限的驗證已經(jīng)分析完成了礁叔，sentry在最終驗證權限之前才根據(jù)用戶的組、角色從數(shù)據(jù)庫中讀取其擁有的權限言疗，并與需要的權限進行比對晴圾，用戶信息的讀取是在Policy backend中進行的颂砸，Policy provider層屏蔽了不同組件的權限分類噪奄，使用通用的形式進行驗證，可以進行重復使用人乓。

小結

本文分析了Sentry是如何對HiveServer2進行用戶的細粒度訪問控制的勤篮，并詳細介紹了從session hook設置用戶信息，到Policy backend讀取用戶已有權限的代碼邏輯色罚，對sentry的工作原理和流程有了初步的認識碰缔。其鑒權的本質是將用戶已有的權限與訪問對象所需權限進行比對，如果全部滿足戳护，或者用戶已有權限更加粗粒度金抡，此時認為該用戶擁有其資源的訪問權限，可以理解為權限字符串的比對腌且。sentry通過一個通用的Policy Provider來對屏蔽不同組件的權限對象的差異性梗肝，達到了一個通用模塊來進行權限驗證的目的。