A templated class to allow one to wrap a CPU operator as a CUDA operator. More...
#include <operator_fallback_gpu.h>
Public Member Functions | |
USE_OPERATOR_FUNCTIONS (CUDAContext) | |
GPUFallbackOp (const OperatorDef &def, Workspace *ws) | |
bool | RunOnDevice () override |
Public Member Functions inherited from caffe2::Operator< CUDAContext > | |
Operator (const OperatorDef &operator_def, Workspace *ws) | |
const Tensor< CUDAContext > & | Input (int idx) |
Tensor< CUDAContext > * | Output (int idx) |
void | WaitEvent (const Event &ev, int stream_id=-1) final |
void | WaitEvents (const std::vector< const Event * > &events, int stream_id=-1) final |
bool | Run (int stream_id=0) final |
bool | RunAsync (int stream_id=0) final |
bool | IsStreamFree (int stream_id) const override |
bool | HasAsyncPart () const override |
bool | SupportsAsyncScheduling () const override |
const CUDAContext * | getContext () const |
Public Member Functions inherited from caffe2::OperatorBase | |
OperatorBase (const OperatorDef &operator_def, Workspace *ws) | |
bool | HasArgument (const string &name) const |
Checks if the operator has an argument of the given name. | |
template<typename T > | |
T | GetSingleArgument (const string &name, const T &default_value) const |
template<typename T > | |
bool | HasSingleArgumentOfType (const string &name) const |
template<typename T > | |
vector< T > | GetRepeatedArgument (const string &name, const vector< T > &default_value={}) const |
template<typename T > | |
const T & | Input (int idx) |
template<typename T > | |
T * | Output (int idx) |
template<typename T > | |
T * | Output (int idx, T *allocated) |
const Blob & | InputBlob (int idx) |
Blob * | OutputBlob (int idx) |
template<typename T > | |
bool | InputIsType (int idx) |
template<typename T > | |
bool | OutputIsType (int idx) |
int | InputSize () const |
int | OutputSize () const |
const vector< const Blob * > & | Inputs () const |
const vector< Blob * > & | Outputs () |
vector< TensorShape > | InputTensorShapes () |
void | Wait (const OperatorBase &other, int stream_id=-1) |
virtual void | Finish () |
virtual void | AddRelatedBlobInfo (EnforceNotMet *err) |
const OperatorDef & | debug_def () const |
void | set_debug_def (const std::shared_ptr< const OperatorDef > &operator_def) |
bool | has_debug_def () const |
void | RecordLastFailedOpNetPosition () |
int | net_position () const |
void | set_net_position (int idx) |
const DeviceOption & | device_option () const |
const Event & | event () const |
Event & | event () |
void | ResetEvent () |
void | DisableEvent () |
bool | IsEventDisabled () const |
const std::string & | type () const |
void | annotate_engine (const std::string &engine) |
const std::string & | engine () const |
Public Member Functions inherited from caffe2::Observable< OperatorBase > | |
const Observer * | AttachObserver (std::unique_ptr< Observer > observer) |
std::unique_ptr< Observer > | DetachObserver (const Observer *observer_ptr) |
Returns a unique_ptr to the removed observer. More... | |
virtual size_t | NumObservers () |
void | StartAllObservers () |
void | StopAllObservers () |
Protected Attributes | |
Workspace | local_ws_ |
vector< Blob * > | local_input_blobs_ |
vector< Blob * > | local_output_blobs_ |
std::unique_ptr< CPUOp > | base_op_ |
Protected Attributes inherited from caffe2::Operator< CUDAContext > | |
CUDAContext | context_ |
Protected Attributes inherited from caffe2::OperatorBase | |
std::unique_ptr< Event > | event_ |
Protected Attributes inherited from caffe2::Observable< OperatorBase > | |
std::vector< std::unique_ptr< Observer > > | observers_list_ |
Additional Inherited Members | |
Public Types inherited from caffe2::Observable< OperatorBase > | |
using | Observer = ObserverBase< OperatorBase > |
Static Public Attributes inherited from caffe2::OperatorBase | |
static constexpr int | kNoNetPositionSet = -1 |
Protected Member Functions inherited from caffe2::Operator< CUDAContext > | |
void | RecordEvent (const char *err_msg=nullptr) final |
std::string | getErrorMsg () |
Protected Member Functions inherited from caffe2::OperatorBase | |
DISABLE_COPY_AND_ASSIGN (OperatorBase) | |
A templated class to allow one to wrap a CPU operator as a CUDA operator.
This class can be used when one does not have the CUDA implementation ready yet for an operator. Essentially, what this op does is to automatically deal with data copy for you. Plausibly, this causes a lot of overhead and is not optimal, so you should use this operator mostly for quick prototyping purpose.
All the input and output of the original operator should be TensorCPU.
Example usage: if you have a class MyMagicOp that is CPU based, and you use the registration code REGISTER_CPU_OPERATOR(MyMagic, MyMagicOp); to register the CPU side, you can create its corresponding GPU operator (with performance hits of course) via REGISTER_CUDA_OPERATOR(MyMagic, GPUFallbackOp<MyMagicOp>);
Advanced usage: if you want to have some specific outputs never copied, you can use the SkipOutputCopy template argument to do that. For example, if MyMagic produces two outputs and the first output is always going to live on the CPU, you can do REGISTER_CUDA_OPERATOR(MyMagic, GPUFallbackOp<MyMagicOp, SkipIndices<0>>);
Definition at line 40 of file operator_fallback_gpu.h.