I/O DriversΒΆ

As we have shown in the getting-started, the open function in vineyard can open a local file as a stream for consuming, and we notice that the path of the local file is headed with the scheme file://.

Actually, vineyard supports several different types of data source, e.g., kafka:// for kafka topics. The functional methods to open different data sources as vineyard streams are called drivers in vineyard. They are registered to open for specific schemes, so that when open is invoked, it will dispatch the corresponding driver to handle the specific data source according to the scheme of the path.

The following sample code demonstrates the dispatching logic in open, and the registration examples.

>>> @registerize
>>> def open(path, *args, **kwargs):
>>>     scheme = urlparse(path).scheme

>>>     for reader in open._factory[scheme][::-1]:
>>>         r = reader(path, *args, **kwargs)
>>>         if r is not None:
>>>             return r
>>>     raise RuntimeError('Unable to find a proper IO driver for %s' % path)
>>>
>>> # different driver functions are registered as follows
>>> open.register('file', local_driver)

Most importantly, the registration design allows users to register their own drivers to registerized vineyard methods using .register, which prevents major revisions on the processing code to fulfill customized computation requirements.