[Subversion] / Trellis / Collections.txt  

View of /Trellis/Collections.txt

Parent Directory | Revision Log
Revision: 2545 - (download)
Fri May 23 16:48:40 2008 UTC (15 years, 10 months ago) by pje
File size: 12422 byte(s)
Update docs for release
=========================================
Event-Driven Collections with the Trellis
=========================================

The ``peak.events.collections`` module includes some additional data structures
for use with the Trellis::

    >>> from peak.events import trellis, collections


.. contents:: **Table of Contents**



SortedSet
---------

``trellis.List`` objects have some inherent inefficiencies due to the wide
variety of operations supported by Python lists.  While ``trellis.Set``
and ``trellis.Dict`` objects update themselves in place by applying change
logs, ``trellis.List`` has to use a copy-on-write strategy to manage updates,
because there isn't any simple way to reduce operations like ``sort()``,
``reverse()``, ``remove()``, etc. to a meaningful change log.  (That's why
it only provides a simple ``changed`` flag.)

So if you need to use large lists in an application, you may be better off
using a different data structure design.  That way, if you only need a subset
of the list interface, you can implement a changelog-based structure.  For
example, the Trellis package includes a ``SortedSet`` type that maintains an
index of items sorted by keys, with a cell that lists changed regions.

A ``collections.SortedSet`` is a specialized component that lets you wrap a
``trellis.Set`` (or anything similar to one, that offers iteration plus
``added`` and ``removed`` cells) with a sort key, to get a list::

    >>> myItems = trellis.Set([3,2,1])
    >>> myIndex = collections.SortedSet(data=myItems)

    >>> class Viewer(trellis.Component):
    ...     model = trellis.attr(None)
    ...
    ...     trellis.perform()
    ...     def view_it(self):
    ...         if self.model is not None:
    ...             print self.model

    >>> view = Viewer(model=myIndex)
    [1, 2, 3]

You can change the ``sort_key`` attribute to re-sort the items::

    >>> myIndex.sort_key = lambda x: -x
    [3, 2, 1]


``SortedSet`` objects also have a discrete rule attribute called ``changes``.
It's normally empty, but during the recalculation triggered by a change to the
underlying set, it will temporarily contain one or more ``(start, end, size)``
tuples, representing the effective changes to the old list.  But as with all
discrete or receiver attributes, you'll never see a value in it from non-rule
code::

    >>> myIndex.changes
    []

Only in rule code will you ever see it containing values, a moment before it
becomes empty again::

    >>> view.model = None   # quiet, please

    >>> class Watcher(trellis.Component):
    ...     trellis.perform()
    ...     def dump(self):
    ...         print myIndex.changes

    >>> watcher=Watcher()
    []

    >>> myItems.remove(2)
    [(1, 2, 0)]
    []
    >>> myIndex
    [3, 1]

The ``changed`` values describe the changes made to the index contents.  In
this case, ``(1, 2, 0)`` means that the list slice ``1:2`` was reduced to
length zero.  Compare with the effect of re-inserting 2::

    >>> myItems.add(2)
    [(1, 1, 1)]
    []
    >>> myIndex
    [3, 2, 1]

This value means the slice ``1:1`` was increased to length 1.  In a sense,
you can view the provided changes as being "set slice" operators.  Note, too,
that it's possible to have multiple slices if multiple items are added or
deleted::

    >>> def add_some():
    ...     myItems.add(0)
    ...     myItems.add(4)

    >>> trellis.atomically(add_some)
    [(3, 3, 1), (0, 0, 1)]
    []
    >>> myIndex
    [4, 3, 2, 1, 0]

    >>> add_some()

As you can see, 1 item was inserted at position 3 in the old list, followed
by 1 item being inserted at position 0.  In other words, if you loop over the
``changes`` attribute and apply the slice operations in that order, you can
successfully update another sequence to match the changed sequence.

Finally, note that adjacent operations may be merged into single slices::

    >>> def del_some():
    ...     myItems.remove(2)
    ...     myItems.remove(3)

    >>> trellis.atomically(del_some)
    [(1, 3, 0)]
    []
    >>> myIndex
    [4, 1, 0]

And that insertion+deletion at the same position can lead to change slices that
don't result in a net change to the number of rows::

    >>> def add_and_del():
    ...     myItems.remove(1)
    ...     myItems.add(2)

    >>> trellis.atomically(add_and_del)
    [(1, 2, 1)]
    []
    >>> myIndex
    [4, 2, 0]

    >>> def add_and_del():
    ...     myItems.remove(2)
    ...     myItems.add(1)

    >>> trellis.atomically(add_and_del)
    [(1, 2, 1)]
    []
    >>> myIndex
    [4, 1, 0]

And changing the sort key always results in a change slice for the entire
index::

    >>> myIndex.sort_key = lambda x:x
    [(0, 3, 3)]
    []
    >>> myIndex
    [0, 1, 4]

As does setting or clearing the ``reverse`` flag (which reverses the effect of
the sort key)::

    >>> myIndex.reverse = True
    [(0, 3, 3)]
    []
    >>> myIndex
    [4, 1, 0]

    >>> myItems.add(7)
    [(0, 0, 1)]
    []
    >>> myIndex
    [7, 4, 1, 0]

    >>> myItems.add(2)
    [(2, 2, 1)]
    []
    >>> myIndex
    [7, 4, 2, 1, 0]

    >>> myItems.remove(7)
    [(0, 1, 0)]
    []

    >>> myItems.remove(2)
    [(1, 2, 0)]
    []

    >>> myIndex
    [4, 1, 0]

    >>> myIndex.reverse = False
    [(0, 3, 3)]
    []

    >>> myIndex
    [0, 1, 4]


TODO: test changes to ``.data``


SubSet
------

The ``SubSet`` type lets you create a ``Set`` that is constrained to be a
subset of another set.  If items are removed from the ``base`` set, the same
items are removed from the ``SubSet``::

    >>> s1 = trellis.Set([1,2,3])
    >>> ss = collections.SubSet([2], base = s1)
    >>> ss
    SubSet([2])

    >>> s1.remove(2)
    >>> ss
    SubSet([])

Adding an item to the ``SubSet`` when the item is not in the ``base`` set has
no effect::

    >>> ss.add(4)
    >>> ss
    SubSet([])

But you *can* add items that are currently in the base set::

    >>> ss.add(3)
    >>> ss
    SubSet([3])

    >>> s1.remove(3)
    >>> ss
    SubSet([])


TODO: changes to ``base`` do not affect the current subset membership


Observing
---------

The ``Observing`` type is designed to assist in sending change notifications
to GUI components that need to be told when a portion of the display is out
of date.  It works by taking a set of keys and a lookup function, and providing
a ``changes`` attribute.  Usually the lookup function will be an accessor on
some trellis-ized data structure::

    >>> data = trellis.List([1,2,3,4,5])
    >>> o = collections.Observing(lookup_func = data.__getitem__)

    >>> def show():
    ...     print "Changes:", o.changes
    >>> c = trellis.Performer(show)
    Changes: {}

The ``changes`` attribute is a dictionary (a regular dictionary, not a
``Dict``) that maps the observing instance's ``keys`` to ``(new, old)`` value
tuples.  Whenever a change occurs to either the ``keys`` or their corresponding
values, it becomes non-empty, and then reverts to being empty::

    >>> o.keys.update([1,2,4])
    Changes: {1: (2, 2), 2: (3, 3), 4: (5, 5)}
    Changes: {}

In the above example, the "new" and "old" values are the same, because the keys
shown were all just added to the ``keys`` set.  However, if a value changes for
a key that's already present in ``keys``, the "new" and "old" values will be
different::

    >>> data[1] = -2
    Changes: {1: (-2, 2)}
    Changes: {}

    >>> o.keys.remove(2)    # removing key doesn't induce "changes"

    >>> o.keys.add(3)       # but adding does
    Changes: {3: (4, 4)}
    Changes: {}

    >>> data.reverse()      # and so does any change on the underlying data
    Changes: {1: (4, -2), 3: (-2, 4), 4: (1, 5)}
    Changes: {}

The basic idea for using an ``Observing`` object is to update the ``keys`` to
reflect the currently-visible row or column numbers (or other keys) of a data
structure being displayed in a GUI.  Then, a ``@perform`` rule that reads the
``changes`` can notify the GUI of any changes that happen to the underlying
data structure.

You can, of course, do this without an ``Observing`` instance, but
recalculations would be prohibitively expensive if the underlying data set is
large, compared to the portion being displayed.  (E.g., when scrolling through
an entire database.)

So GUI applications that display grids or tables will typically use a
``SortedSet`` to find out about rows or columns being inserted, deleted, or
replaced, and an ``Observing`` instance to find out about changes *within* the
rows or columns.

TODO: ``changes`` doesn't update if you initialize ``keys`` with a populated
set.


Hub (NEW in 0.7a2)
------------------

A ``collections.Hub`` is used for loosely-coupled many-to-many communications
with flexible pattern matching -- aka "publish/subscribe" or "pub/sub"
messaging::

    >>> hub = collections.Hub()

You can send messages into a hub by calling its ``put()`` method::

    >>> hub.put(1, 2, 3)

    
However, this does nothing unless there are rules using the hub's ``get()``
method to receive these messages::

    >>> def watch_3_3():
    ...     for message in hub.get(None, None, 3):
    ...         print message
    
    >>> watch_3_3 = trellis.Performer(watch_3_3)

    >>> hub.put(1, 2, 3)
    (1, 2, 3)

The ``put()`` and ``get()`` methods both accept an arbitrary number of
positional arguments, but ``get()`` will only match ``put()`` calls with
the same number of arguments::

    >>> hub.put('x', 'y')

    >>> hub.put(1, 2, 3, 4)

And then, only if the non-``None`` arguments to ``get()`` match the
corresponding arguments given to ``put``::

    >>> hub.put(1, 2, 4)

    >>> hub.put(5, 4, 3)
    (5, 4, 3)

You can of course have multiple rules monitoring the same hub::

    >>> def watch_2_4():
    ...     for message in hub.get(2, 4, None):
    ...         print "24:", message
    >>> watch_2_4 = trellis.Performer(watch_2_4)

    >>> hub.put(2,4,3)
    24: (2, 4, 3)
    (2, 4, 3)

    >>> hub.put(2, 4, 4)
    24: (2, 4, 4)

And you can send more than one value in a single recalculation or atomic
action, with the relative order of messages being preserved for each observer::

    >>> def send_many():
    ...     hub.put(1, 2, 3)
    ...     hub.put(2, 4, 4)
    ...     hub.put(2, 4, 3)

    >>> trellis.atomically(send_many)
    24: (2, 4, 4)
    24: (2, 4, 3)
    (1, 2, 3)
    (2, 4, 3)

Note, however, that all arguments to ``put()`` and ``get()`` must be hashable::

    >>> hub.put(1, [])
    Traceback (most recent call last):
      ...
    TypeError: list objects are unhashable

    >>> hub.get(1, [])
    Traceback (most recent call last):
      ...
    TypeError: list objects are unhashable

This is because hubs use a dictionary-based indexing system, that avoids the
need to test every message against every observer's match pattern.  Each
active ``get()`` pattern is saved under an index, keyed by its rightmost
non-``None`` value.

Each value in a message is then looked up in this index, and then tested
against that (hopefully small) subset of active patterns.  For example, if we
look at the contents of our sample hub's index, we can see that the
``(None, None, 3)`` match pattern is indexed under "position 2, value 3", and
the ``(2, 4, None)`` pattern is indexed under "position 1, value 4"::

    >>> hub._index
    {(2, 3): {(None, None, 3): 1}, (1, 4): {(2, 4, None): 1}}

This means that ``(2, 4, None)`` will only be checked for messages with a 4
in the second position, and ``(None, None, 3)`` will only be checked for
messages with a 3 in the third position (which of course it will always match).

So, for best performance in high-volume applications, make sure you design your
messages to place "more distinct" fields further to the right.  For example, if
you have a small number of distinct message types, you should probably make the
message type the first field, so that if a ``get()`` matches on both the
message type and some more-distinctive field, it will be indexed only on the
more-distinctive field, avoiding it being matched against every message of the
desired type.  (Unless of course, the ``get()`` is *supposed* to return all
messages of the desired type!)

In contrast, if you placed the message type as the last field, then any
``get()`` targeting a particular message type would incur a match-time penalty
for *every* message of that type.  Thus, you should place fields with fewer
possible values more to the left, and fields with a larger number of possible
values more to the right.


cvs-admin@eby-sarna.com

Powered by ViewCVS 1.0-dev

ViewCVS and CVS Help