Adding Custom Data Types in C++

Vineyard has already support a set of efficient builtin data types in the C++ SDK, e.g., Vector, HashMap, Tensor, DataFrame, Table and Graph, (see cpp-api). However there are still scenarios where users need to develop their own data structures and efficiently share the data with Vineyard. Custom C++ data types could be easily added by following this step-by-step tutorial.

Note that this tutorial includes code that could be auto-generated for keeping clear about the design internals and helping developers get a whole picture about how vineyard client works.

Vineyard Objects

Vineyard has a base class vineyard::Objects, and a corresponding base class Vineyard::ObjectBuilder for builders as follows,

class Object {
  public:
    static std::unique_ptr<Object> Create() {
        ...
    }

    virtual void Construct(const ObjectMeta& meta);
}

and the builder

class ObjectBuilder {
    virtual Status Build(Client& client) override = 0;

    virtual std::shared_ptr<Object> _Seal(Client& client) = 0;
}

Where the object is the base class for user-defined data types, and the builders is responsible for placing the data into vineyard.

reg.docker.alibaba-inc.com/v6d/graphscope:2f40c443

Adding Your Own Types

We taking defining a custom Vector type as example. Basically, a Vector contains a vineyard::Blob as payload, and metadata like dtype and size as well.

The class for Vector usually looks like

template <typename T>
class Vector {
private:
    size_t size;
    const T *data = nullptr;
public:
    Vector(): size(0), data(nullptr) {
    }

    Vector(const int size, const T *data): size(size), data(data) {
    }

    size_t length() const {
        return size;
    }

    const T& operator[](size_t index) {
        assert(index < size);
        return data[index];
    }
};

Defining the data structure

We first migrate the existing Vector<T> to vineyard’s Object,

 template <typename T>
-class Vector {
+class Vector: public vineyard::Registered<Vector<T>> {
   private:
     size_t size;
     T *data = nullptr;
   public:
+    static std::unique_ptr<Object> Create() __attribute__((used)) {
+        return std::static_pointer_cast<Object>(
+            std::unique_ptr<Vector<T>>{
+                new Vector<T>()});
+    }
+
     Vector(): size(0), data(nullptr) {
     }

     Vector(const int size, const T *data): size(size), data(data) {
     }

     ...
 }

Note the two changes above,

  • inherits from vineyard::Registered<Vector<T>>

    vineyard::Registered<T> is a helper to generate some static initialization stubs to register the data type T to the type resolving factory, and associate the type T with its typename. The typename is the auto-generated readable name for C++ types, e.g., "Vector<int32>" for Vector<int32_t>.

  • The zero-parameter static constructor Create()

    Create() is a static function that will be registered to the resolving factory by helper vineyard::Registered<T> and used to construct a instance of type T first when getting objects from vineyard.

    Vineyard client looks up the static constructor by typename in the metadata of vineyard objects store in the daemon server.

To obtain the object Vector<T> from vineyard’s metadata, we need to implements a Construct method as well. The Construct method takes a vineyard::ObjectMeta as input, and retrieve metadata as well as members from the metadata to fill its own data members. The memory in member buffer (a vineyard::Blob) is shared using memory mapping, without the cost of copying.

 template <typename T>
 class Vector: public vineyard::Registered<Vector<T>> {
   public:
     ...

+    void Construct(const ObjectMeta& meta) override {
+      this->size = meta.GetKeyValue<size_t>("size");
+
+      auto buffer = std::dynamic_pointer_cast<Blob>(meta.GetMember("buffer"));
+      this->data = reinterpret_cast<const T *>(buffer->data());
+    }
+
     ...
 }

Implements the builder

Next, we go the builder part. The vineyard::ObjectBuilder contains two part,

  • Build(): this method is responsible for storing blobs of custom data structures into vineyard

  • _Seal(): this method is responsible for generate the corresponding metadata and putting the metadata into vineyard

For our Vector<T> type, we first define a general vector builder,

template <typename T>
class VectorBuilder {
  private:
    std::unique_ptr<BlobWriter> buffer_builder;
    std::size_t size;
    T *data;

  public:
    VectorBuilder(size_t size): size(size) {
      data = static_cast<T *>(malloc(sizeof(T) * size));
    }

    T& operator[](size_t index) {
      assert(index < size);
      return data[index];
    }
};

The builder allocate the required memory based on required size to contain the elements, and a [] operator to fill the data in.

Now we adapts the builder above as a ObjectBuilder in vineyard,

 template <typename T>
-class VectorBuilder {
+class VectorBuilder: public vineyard::ObjectBuilder {
   private:
     std::unique_ptr<BlobWriter> buffer_builder;
     std::size_t size;
     T *data;

   public:
     VectorBuilder(size_t size): size(size) {
       data = static_cast<T *>(malloc(sizeof(T) * size));
     }

+    Status Build(Client& client) override {
+      VINEYARD_CHECK_OK(client.CreateBlob(size * sizeof(T), buffer_builder));
+      memcpy(buffer_builder->data(), data, size * sizeof(T));
+      return Status::OK();
+    }
+
+    std::shared_ptr<Object> _Seal(Client& client) override {
+      VINEYARD_CHECK_OK(this->Build(client));
+
+      auto vec = std::make_shared<Vector<int>>();
+      auto buffer = std::dynamic_pointer_cast<vineyard::Blob>(
+        this->buffer_builder->Seal(client));
+      vec->size = size;
+      vec->data = reinterpret_cast<const T *>(buffer->data());
+
+      vec->meta_.SetTypeName(vineyard::type_name<Vector<T>>());
+      vec->meta_.SetNBytes(size * sizeof(T));
+      vec->meta_.AddKeyValue("size", size);
+      vec->meta_.AddMember("buffer", buffer);
+      VINEYARD_CHECK_OK(client.CreateMetaData(vec->meta_, vec->id_));
+
+      return vec;
+    }
+
     T& operator[](size_t index) {
       assert(index < size);
       return data[index];
     }
 };

Note that the builder needs to directly access the private data member of Vector<T>, thus we need to makes the builder as a friend class of our vector type,

 template <typename T>
 class VectorBuilder: public vineyard::ObjectBuilder {

     const T& operator[](size_t index) {
       assert(index < size);
       return data[index];
     }
+
+  friend class VectorBuilder<T>;
 };

As you can see in the above example, there are many boilerplate snippets in the builder and constructor. They are be auto-generated from the layout of class Vector<T> based on the static analysis of user’s source code.

Now it should work!

Finally we are able to build our custom data types into vineyard and retrieve it back, using vineyard client,

int main(int argc, char** argv) {
    std::string ipc_socket = std::string(argv[1]);

    Client client;
    VINEYARD_CHECK_OK(client.Connect(ipc_socket));
    LOG(INFO) << "Connected to IPCServer: " << ipc_socket;

    auto builder = VectorBuilder<int>(3);
    builder[0] = 1;
    builder[1] = 2;
    builder[2] = 3;
    auto result = builder.Seal(client);

    auto vec = std::dynamic_pointer_cast<Vector<int>>(client.GetObject(result->id()));
    for (size_t index = 0; index < vec->length(); ++index) {
        std::cout << "element at " << index << " is: " << (*vec)[index] << std::endl;
    }
}

Builders and Resolvers in other Languages

Vineyard keeps the same design principle for SDKs in other languages, e.g., Java and Python. For an example in Python about the vineyard objects and its builders, see also divein-builder-resolver.

As described in the example above, there are a lots of boilerplate code when defining the constructor and builder. To make the integration with vineyard easier, a code generator is already on the way to generate SDKs in different languages based on a C++-like DSL, just stay tuned!

For a preview about how the code generator works, please refer to array.vineyard-mod and arrow.vineyard-mod.