Caffe2核心代碼解析系列之七：Operator其二

Operator

基本特點(diǎn)

Caffe2中大多數(shù)我們所接觸的operator都是class Operator的子類鸟辅。

而Operator則是上系列中我們提及的class OperatorBase的子類泉粉。下面我們將一一過一下它新加的一些主要接口及其涵義。

它與OperatorBase不同，是Device context相關(guān)的一個(gè)模板類手形。因此與OperatorBase不同牺蹄，它包含了一個(gè)名為context的屬性酒觅。

// Operator is the class that you usually want to derive, if your operator will
// run on different devices. You should then implement the RunOnDevice()
// function.
template <class Context>
class Operator : public OperatorBase {
 public:
  explicit Operator(const OperatorDef& operator_def, Workspace* ws)
      : OperatorBase(operator_def, ws), context_(operator_def.device_option()) {
    // In the constructor, we switch to the device so that the child class
    // constructors will run on that device.
    context_.SwitchToDevice(0);
  }
  ~Operator() noexcept override {}
.......
........
  const Context* getContext() const {
    return &context_;
  }

 protected:
  void RecordEvent(const char* err_msg = nullptr) final {
    if (event_) {
      context_.Record(event_.get(), err_msg);
    }
  }

  Context context_;
};

而像一般的輸入撮执、輸出基本operator功能，則都通過直接委托父類OperatorBase來完成舷丹，如下所示：

  inline const Tensor& Input(
      int idx,
      DeviceType type = Context::GetDeviceType()) {
    return OperatorBase::template Input<Tensor>(idx, type);
  }

  inline Tensor* Output(int idx, at::IntList dims, at::TensorOptions options) {
    if (options.device_opt() == c10::nullopt) {
      return OperatorBase::OutputTensor(
          idx, dims, options.device(context_.device()));
    }
    return OperatorBase::OutputTensor(idx, dims, options);
  }

  inline Tensor* Output(int idx, DeviceType type = Context::GetDeviceType()) {
    return OperatorBase::template Output<Tensor>(idx, type);
  }

Event相關(guān)async處理

對(duì)于Event等operator異步執(zhí)行相關(guān)的處理抒钱，它跟OperatorBase并無太多不同，但同時(shí)也將event處理跟具體的device context更加緊密地綁定起來了。因此如果你的async operator
是Operator的子類谋币，那么將可直接使用它提供的一些Event處理函數(shù)來進(jìn)行異步操作即可仗扬，但若你使用OperatorBase作為父類，那么還得考慮所使用的operator的具體device種類瑞信，并在
進(jìn)行event處理時(shí)考慮傳device context參數(shù)厉颤。

  void WaitEvent(const Event& ev, int stream_id = -1) final {
    if (stream_id >= 0) {
      context_.SwitchToDevice(stream_id);
    }
    context_.WaitEvent(ev);
  }

  void WaitEvents(const std::vector<const Event*>& events, int stream_id = -1)
      final {
    if (stream_id >= 0) {
      context_.SwitchToDevice(stream_id);
    }
    for (const auto& ev : events) {
      context_.WaitEvent(*ev);
    }
  }

Run及RunAsync

若說operator里面最為核心及用戶接觸最頻繁的兩個(gè)函數(shù)穴豫，那么肯定非Run及RunAsync莫屬凡简。
當(dāng)然它們都是包了真正子類Operator里面定義的RunOnDevice函數(shù)。只是Run用來以sync的方式來執(zhí)行一個(gè)op精肃，而RunAsync則是以async的方式來執(zhí)行它秤涩。

下面是op sync執(zhí)行的方式。

  // The run function of Operator switches to the device, and then carries out
  // the actual computation with RunOnDevice(). You should implement RunOnDevice
  // instead of Run().
  // Note: Run does not update operator's event and can be used only with
  // non-async executors that do not rely on events
  bool Run(int stream_id = 0) final {
    try {
      StartAllObservers();

      context_.SwitchToDevice(stream_id);
      bool result = RunOnDevice();
      if (!result) {
        this->RecordLastFailedOpNetPosition();
      }
      context_.FinishDeviceComputation(); // throws on error

      StopAllObservers();

      return result;
    } catch (EnforceNotMet& err) {
      if (has_debug_def()) {
        err.AppendMessage(
            "Error from operator: \n" + ProtoDebugString(debug_def()));
        AddRelatedBlobInfo(&err);
      }
      this->RecordLastFailedOpNetPosition();
      StopAllObservers();
      throw;
    } catch (...) {
      this->RecordLastFailedOpNetPosition();
      StopAllObservers();
      throw;
    }
  }

下面則是op async執(zhí)行的具體函數(shù)RunAsync司抱】鹁欤可見我們?cè)趏perator里面定義的一些event相關(guān)函數(shù)，大多都是在這里被使用的习柠。

 bool RunAsync(int stream_id = 0) final {
    try {
      StartAllObservers();

      context_.SwitchToDevice(stream_id);
      auto result = RunOnDevice();
      if (result) {
        if (HasAsyncPart()) {
          RecordEvent();
        } else {
          // Manually set CPU operator's event status to finished,
          // unless this is an async CPU operator
          SetEventFinished();
        }
      } else {
        SetEventFinished(getErrorMsg().c_str());
        this->RecordLastFailedOpNetPosition();
      }

      StopAllObservers();

      return result;
    } catch (EnforceNotMet& err) {
      if (has_debug_def()) {
        err.AppendMessage(
            "Error from operator: \n" + ProtoDebugString(debug_def()));
        AddRelatedBlobInfo(&err);
      }
      SetEventFinishedWithException(err.what());
      this->RecordLastFailedOpNetPosition();
      StopAllObservers();
      throw;
    } catch (const std::exception& err) {
      SetEventFinishedWithException(err.what());
      this->RecordLastFailedOpNetPosition();
      StopAllObservers();
      throw;
    } catch (...) {
      SetEventFinishedWithException(getErrorMsg().c_str());
      this->RecordLastFailedOpNetPosition();
      StopAllObservers();
      throw;
    }
  }

真正干活的RunOnDevice函數(shù)在這里是個(gè)純虛函數(shù)匀谣，并不做啥事，接口而已资溃。

  virtual bool RunOnDevice() = 0;

Operator async屬性及async執(zhí)行

Operator async執(zhí)行有兩個(gè)概念需要領(lǐng)悟清楚武翎。其一HasAsyncPart，它指的是我們常規(guī)意義上理解的async執(zhí)行溶锭，即不等待操作真正執(zhí)行完畢而是立即就返回一個(gè)handle宝恶，將來
需要時(shí)再去check看操作是否真正完成。另二則是AsyncScheduling趴捅，它指的是是否我們?cè)趫?zhí)行的op支持不待其input ready（即父op執(zhí)行完成）就被schedule到pool或stream中以
avalable的狀態(tài)去執(zhí)行（這一設(shè)計(jì)顯然是受CUDA programming model影響而來的）垫毙，當(dāng)然也不可能真正的不顧input是否就緒，只是將同步的責(zé)任由framework移交給了CUDA而已拱绑。

 // Events of operators that don't have async parts are automatically set
  // to finished state by RunAsync.
  // Defaulting to the value from context (true for CUDA, false for CPU).
  // Override in case of async CPU operators
  // Async CPU operators are expected to catch all exceptions in async parts
  // and set Event to finished/failed state with Event::SetFinished or
  // SetFinishedWithException call.
  bool HasAsyncPart() const override {
    return context_.HasAsyncPartDefault();
  }

  // Returns whether operator's RunOnDevice schedules async on device part and
  // can be run without waiting for parent operator's async part to be finished
  // on the same device.
  // Note: when true, RunOnDevice must not access the content of the input blobs
  // as they might not be computed yet
  // Note: when true, operator's device needs to support async scheduling:
  //  - supports concept of streams: async ops scheduled on the same stream are
  //    guaranteed to be executed in the same order they were scheduled
  //  - provides non-blocking cross device/cross stream synchronization
  //    primitives
  //
  // By default, assuming an op with an async part can be scheduled
  // asynchronously if device supports async scheduling
  bool SupportsAsyncScheduling() const override {
    return HasAsyncPart() && context_.SupportsAsyncScheduling();
  }

一般综芥，我們以sync的方式執(zhí)行完成一個(gè)op后，就需要以使用一個(gè)barrier來保證其真正執(zhí)行完成猎拨，這里亦是參考了CUDA 異步編程模型里的思想膀藐。這framework簡(jiǎn)直就是CUDA
編程模型的一個(gè)wrapper啊迟几！呵呵消请，這也是讓其它CPU/ASIC等各種廠商很討厭的地方，所謂的nVidia CUDA生態(tài)的護(hù)城河类腮。臊泰。

  void SyncDeviceBarrierForObservers() override {
    context_.FinishDeviceComputation();
  }

Operator相關(guān)的utilities

下面是不同Device context operator注冊(cè)相關(guān)的一些utilities。這一套作法跟之前Caffe里面Layer/Net/Solver等工廠模式很是類似蚜枢，畢竟是出自一個(gè)人的手筆嘛缸逃。针饥。

以下為CPU operator注冊(cè)的一些utilities，其它像CUDA/HIP/IDEEP等都比較類似需频。

// The operator registry. Since we are not expecting a great number of devices,
// we will simply have an if-then type command and allocate the actual
// generation to device-specific registerers.
// Note that although we have CUDA and CUDNN here, the registerers themselves do
// not depend on specific cuda or cudnn libraries. This means that we will be
// able to compile it even when there is no cuda available - we simply do not
// link any cuda or cudnn operators.
C10_DECLARE_REGISTRY(
    CPUOperatorRegistry,
    OperatorBase,
    const OperatorDef&,
    Workspace*);
#define REGISTER_CPU_OPERATOR_CREATOR(key, ...) \
  C10_REGISTER_CREATOR(CPUOperatorRegistry, key, __VA_ARGS__)
#define REGISTER_CPU_OPERATOR(name, ...)                           \
  C10_IMPORT void CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_##name();  \
  static void CAFFE2_UNUSED CAFFE_ANONYMOUS_VARIABLE_CPU##name() { \
    CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_##name();                \
  }                                                                \
  C10_REGISTER_CLASS(CPUOperatorRegistry, name, __VA_ARGS__)
#define REGISTER_CPU_OPERATOR_STR(str_name, ...) \
  C10_REGISTER_TYPED_CLASS(CPUOperatorRegistry, str_name, __VA_ARGS__)

#define REGISTER_CPU_OPERATOR_WITH_ENGINE(name, engine, ...) \
  C10_REGISTER_CLASS(CPUOperatorRegistry, name##_ENGINE_##engine, __VA_ARGS__)

參考文獻(xiàn)

https://github.com/pytorch/pytorch

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末丁眼，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子昭殉，更是在濱河造成了極大的恐慌苞七，老刑警劉巖，帶你破解...
沈念sama閱讀 206,968評(píng)論 6贊 482
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件挪丢，死亡現(xiàn)場(chǎng)離奇詭異蹂风，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)乾蓬，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,601評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門惠啄，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人任内，你說我怎么就攤上這事撵渡。” “怎么了死嗦？”我有些...
開封第一講書人閱讀 153,220評(píng)論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵趋距，是天一觀的道長。經(jīng)常有香客問我越走，道長棚品，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,416評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任廊敌，我火速辦了婚禮铜跑，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘骡澈。我一直安慰自己锅纺，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,425評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布肋殴。她就那樣靜靜地躺著囤锉，像睡著了一般。火紅的嫁衣襯著肌膚如雪护锤。梳的紋絲不亂的頭發(fā)上官地，一...
開封第一講書人閱讀 49,144評(píng)論 1贊 285
城市分裂傳說
那天，我揣著相機(jī)與錄音烙懦，去河邊找鬼驱入。笑死，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的亏较。我是一名探鬼主播莺褒，決...
沈念sama閱讀 38,432評(píng)論 3贊 401
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼雪情！你這毒婦竟也來了遵岩？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,088評(píng)論 0贊 261
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤巡通，失蹤者是張志新（化名）和其女友劉穎尘执，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體扁达，經(jīng)...
沈念sama閱讀 43,586評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡正卧，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,028評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了跪解。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,137評(píng)論 1贊 334
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡签孔，死狀恐怖叉讥，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情饥追，我是刑警寧澤图仓，帶...
沈念sama閱讀 33,783評(píng)論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站但绕，受9級(jí)特大地震影響救崔，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜捏顺，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,343評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一六孵、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧幅骄，春花似錦劫窒、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,333評(píng)論 0贊 19
一樁弒父案主巍，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至挪凑，卻和暖如春孕索，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背躏碳。一陣腳步聲響...
開封第一講書人閱讀 31,559評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工搞旭，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個(gè)月前我還...
沈念sama閱讀 45,595評(píng)論 2贊 355
代替公主和親
正文我出身青樓选脊，卻偏偏與公主長得像杭抠，于是被迫代替她去往敵國和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子恳啥，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,901評(píng)論 2贊 345