Getting Started¶
Installing vineyard¶
Vineyard is distributed as a Python package and can be effortlessly installed using pip
:
$ pip3 install vineyard
Launching vineyard server¶
$ python3 -m vineyard
A vineyard daemon server will be launched with default settings. By default, /var/run/vineyard.sock
will be used by vineyardd to listen for incoming IPC connections.
To stop the running vineyardd instance, simply press Ctrl-C
in the terminal.
Tip
If you encounter errors like cannot launch vineyardd on '/var/run/vineyard.sock':
Permission denied,
, it means you don’t have the permission to create a UNIX-domain
socket at /var/run/vineyard.sock
. You can either:
Run vineyard as root, using
sudo
:$ sudo -E python3 -m vineyard
Or, change the socket path to a writable location with the
--socket
command line option:$ python3 -m vineyard --socket /tmp/vineyard.sock
Connecting to vineyard¶
Once launched, you can call vineyard.connect
with the socket name to initiate a vineyard client
from Python:
>>> import vineyard
>>> client = vineyard.connect('/var/run/vineyard.sock')
Storing and Retrieving Python Objects¶
Vineyard is designed as an in-memory object store and offers two high-level APIs put
and
get
for creating and accessing shared objects, enabling seamless interoperability with the Python
ecosystem. The former returns a vineyard.ObjectID
upon success, which can be used
to retrieve shared objects from vineyard using the latter.
In the following example, we use client.put()
to build a vineyard object from the numpy
ndarray arr
, which returns the object_id
- a unique identifier in vineyard representing
the object. Given the object_id
, we can obtain a shared-memory object from vineyard with the
client.get()
method.
>>> import numpy as np
>>>
>>> object_id = client.put(np.random.rand(2, 4))
>>> object_id
o0015c78883eddf1c
>>>
>>> shared_array = client.get(object_id)
>>> shared_array
ndarray([[0.39736989, 0.38047846, 0.01948815, 0.38332264],
[0.61671189, 0.48903213, 0.03875045, 0.5873005 ]])
Note
shared_array
does not allocate extra memory in the Python process; instead, it shares memory
with the vineyard server via mmap in a zero-copy process.
The sharable objects can be complex and nested. Like numpy ndarray, the pandas dataframe df
can
be seamlessly stored in vineyard and retrieved with the .put()
and .get()
methods as follows:
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'u': [0, 0, 1, 2, 2, 3],
>>> 'v': [1, 2, 3, 3, 4, 4],
>>> 'weight': [1.5, 3.2, 4.7, 0.3, 0.8, 2.5]})
>>> object_id = client.put(df)
>>>
>>> shared_dataframe = client.get(object_id)
>>> shared_dataframe
u v weight
0 0 1 1.5
1 0 2 3.2
2 1 3 4.7
3 2 3 0.3
4 2 4 0.8
5 3 4 2.5
Under the hood, vineyard implements a builder/resolver mechanism to represent arbitrary data structures as vineyard objects and resolve them back to native values in the corresponding programming languages and computing systems. See also I/O Drivers for more information.
Next steps¶
Beyond the core functionality of sharing objects between tasks, vineyard also provides:
Distributed objects and stream abstraction over immutable chunks;
An IDL (Code Generation for Boilerplate) that helps integrate vineyard with other systems at minimal cost;
A mechanism of pluggable drivers for various tasks that serve as the glue between the core compute engine and the external world, e.g., data sources, data sinks;
Integration with Kubernetes for sharing between tasks in workflows deployed on cloud-native infrastructures.
Overview of vineyard.
Learn more about vineyard’s key concepts from the following user guides:
Explore the design of the object model in vineyard.
Discover how vineyard integrates with other computing systems.
Understand the design and implementation of pluggable routines for I/O, repartition, migration, and more.
Vineyard is a natural fit for cloud-native computing, where it can be deployed and managed by the vineyard operator, providing data-aware scheduling for data analytical workflows to achieve efficient data sharing on Kubernetes. More details about vineyard on Kubernetes can be found here:
Deploy vineyard on Kubernetes and accelerate your big-data workflows.