SaltyCrane: datastructureshttps://www.saltycrane.com/blog/2014-10-09T22:45:10-07:00An example using Python's groupby and defaultdict to do the same task
2014-10-09T22:45:10-07:00https://www.saltycrane.com/blog/2014/10/example-using-groupby-and-defaultdict-do-same-task/<p>Here is some data that I want to group by model:</p>
<pre class="python">SOME_DATA = [
{'model': u'Yaris', 'some_value': 11202, 'trim_name': u'3-Door L Manual'},
{'model': u'Yaris', 'some_value': 19269, 'trim_name': u'3-Door LE Automatic'},
{'model': u'Corolla', 'some_value': 27119, 'trim_name': u'L Automatic'},
{'model': u'Corolla', 'some_value': 32262, 'trim_name': u'LE'},
{'model': u'Corolla', 'some_value': 37976, 'trim_name': u'S Premium'},
{'model': u'Camry', 'some_value': 39730, 'trim_name': u'LE 4-Cyl'},
{'model': u'Camry', 'some_value': 45761, 'trim_name': u'XSE 4-Cyl'},
{'model': u'Yaris', 'some_value': 48412, 'trim_name': u'3-Door L Automatic'},
{'model': u'Camry', 'some_value': 55423, 'trim_name': u'XLE 4-Cyl'},
{'model': u'Corolla', 'some_value': 57055, 'trim_name': u'ECO Premium'},
{'model': u'Corolla', 'some_value': 61296, 'trim_name': u'ECO Plus'},
{'model': u'Camry', 'some_value': 63660, 'trim_name': u'XSE V6'},
{'model': u'Yaris', 'some_value': 65570, 'trim_name': u'5-Door LE Automatic'},
{'model': u'Camry', 'some_value': 67461, 'trim_name': u'XLE V6'},
{'model': u'Corolla', 'some_value': 73602, 'trim_name': u'S'},
{'model': u'Yaris', 'some_value': 74158, 'trim_name': u'5-Door SE Manual'},
{'model': u'Corolla', 'some_value': 74249, 'trim_name': u'LE Plus'},
{'model': u'Corolla', 'some_value': 78386, 'trim_name': u'ECO'},
{'model': u'Camry', 'some_value': 82747, 'trim_name': u'SE 4-Cyl'},
{'model': u'Corolla', 'some_value': 83162, 'trim_name': u'LE Premium'},
{'model': u'Corolla', 'some_value': 84863, 'trim_name': u'S Plus Manual'},
{'model': u'Yaris', 'some_value': 90313, 'trim_name': u'5-Door L Automatic'},
{'model': u'Corolla', 'some_value': 90452, 'trim_name': u'L Manual'},
{'model': u'Yaris', 'some_value': 93152, 'trim_name': u'5-Door SE Automatic'},
{'model': u'Corolla', 'some_value': 94973, 'trim_name': u'S Plus CVT'},
]</pre>
<p>This can be done using <a href="https://docs.python.org/2/library/collections.html#collections.defaultdict"><code>defaultdict</code></a> from the collections module.</p>
<pre class="python">import collections
grouped = collections.defaultdict(list)
for item in SOME_DATA:
grouped[item['model']].append(item)
for model, group in grouped.items():
print
print model
pprint(group, width=150)</pre>
<p>Here are the results:</pre>
<pre>Yaris
[{'model': u'Yaris', 'some_value': 27065, 'trim_name': u'5-Door L Automatic'},
{'model': u'Yaris', 'some_value': 32757, 'trim_name': u'5-Door SE Automatic'},
{'model': u'Yaris', 'some_value': 57344, 'trim_name': u'3-Door L Manual'},
{'model': u'Yaris', 'some_value': 64002, 'trim_name': u'5-Door SE Manual'},
{'model': u'Yaris', 'some_value': 77974, 'trim_name': u'3-Door L Automatic'},
{'model': u'Yaris', 'some_value': 92658, 'trim_name': u'3-Door LE Automatic'},
{'model': u'Yaris', 'some_value': 98769, 'trim_name': u'5-Door LE Automatic'}]
Camry
[{'model': u'Camry', 'some_value': 30247, 'trim_name': u'XSE 4-Cyl'},
{'model': u'Camry', 'some_value': 33809, 'trim_name': u'XSE V6'},
{'model': u'Camry', 'some_value': 65637, 'trim_name': u'LE 4-Cyl'},
{'model': u'Camry', 'some_value': 67329, 'trim_name': u'SE 4-Cyl'},
{'model': u'Camry', 'some_value': 76269, 'trim_name': u'XLE 4-Cyl'},
{'model': u'Camry', 'some_value': 87438, 'trim_name': u'XLE V6'}]
Corolla
[{'model': u'Corolla', 'some_value': 11239, 'trim_name': u'S'},
{'model': u'Corolla', 'some_value': 27356, 'trim_name': u'S Plus Manual'},
{'model': u'Corolla', 'some_value': 44792, 'trim_name': u'L Manual'},
{'model': u'Corolla', 'some_value': 56252, 'trim_name': u'ECO Premium'},
{'model': u'Corolla', 'some_value': 78570, 'trim_name': u'S Plus CVT'},
{'model': u'Corolla', 'some_value': 78964, 'trim_name': u'LE Premium'},
{'model': u'Corolla', 'some_value': 82116, 'trim_name': u'ECO'},
{'model': u'Corolla', 'some_value': 85467, 'trim_name': u'S Premium'},
{'model': u'Corolla', 'some_value': 87099, 'trim_name': u'L Automatic'},
{'model': u'Corolla', 'some_value': 91974, 'trim_name': u'LE Plus'},
{'model': u'Corolla', 'some_value': 94862, 'trim_name': u'LE'},
{'model': u'Corolla', 'some_value': 97625, 'trim_name': u'ECO Plus'}]</pre>
<p>This can also be done using <a href="https://docs.python.org/2/library/itertools.html#itertools.groupby"><code>itertools.groupby</code></a>. This method is probably better when working with large datasets because <code>groupby</code> returns the group as an iterator. (This is the reason I convert it to a list before printing.)</p>
<pre class="python">import itertools
def keyfunc(x):
return x['model']
SOME_DATA = sorted(SOME_DATA, key=keyfunc)
for model, group in itertools.groupby(SOME_DATA, keyfunc):
print
print model
pprint(list(group), width=150)</pre>
<p>Here are the results:</p>
<pre>Camry
[{'model': u'Camry', 'some_value': 36776, 'trim_name': u'SE 4-Cyl'},
{'model': u'Camry', 'some_value': 56569, 'trim_name': u'LE 4-Cyl'},
{'model': u'Camry', 'some_value': 57052, 'trim_name': u'XSE 4-Cyl'},
{'model': u'Camry', 'some_value': 92360, 'trim_name': u'XLE V6'},
{'model': u'Camry', 'some_value': 92756, 'trim_name': u'XSE V6'},
{'model': u'Camry', 'some_value': 94413, 'trim_name': u'XLE 4-Cyl'}]
Corolla
[{'model': u'Corolla', 'some_value': 13307, 'trim_name': u'L Automatic'},
{'model': u'Corolla', 'some_value': 15726, 'trim_name': u'ECO Plus'},
{'model': u'Corolla', 'some_value': 25579, 'trim_name': u'S'},
{'model': u'Corolla', 'some_value': 31920, 'trim_name': u'ECO Premium'},
{'model': u'Corolla', 'some_value': 34480, 'trim_name': u'LE'},
{'model': u'Corolla', 'some_value': 44958, 'trim_name': u'S Plus Manual'},
{'model': u'Corolla', 'some_value': 49606, 'trim_name': u'LE Premium'},
{'model': u'Corolla', 'some_value': 59629, 'trim_name': u'LE Plus'},
{'model': u'Corolla', 'some_value': 74226, 'trim_name': u'S Plus CVT'},
{'model': u'Corolla', 'some_value': 75725, 'trim_name': u'L Manual'},
{'model': u'Corolla', 'some_value': 82382, 'trim_name': u'ECO'},
{'model': u'Corolla', 'some_value': 95633, 'trim_name': u'S Premium'}]
Yaris
[{'model': u'Yaris', 'some_value': 16789, 'trim_name': u'3-Door L Manual'},
{'model': u'Yaris', 'some_value': 20349, 'trim_name': u'5-Door LE Automatic'},
{'model': u'Yaris', 'some_value': 42897, 'trim_name': u'5-Door L Automatic'},
{'model': u'Yaris', 'some_value': 62045, 'trim_name': u'5-Door SE Automatic'},
{'model': u'Yaris', 'some_value': 91913, 'trim_name': u'3-Door L Automatic'},
{'model': u'Yaris', 'some_value': 94218, 'trim_name': u'5-Door SE Manual'},
{'model': u'Yaris', 'some_value': 97979, 'trim_name': u'3-Door LE Automatic'}]</pre>
python enum types
2012-10-10T18:52:36-07:00https://www.saltycrane.com/blog/2012/10/python-enum-types/<pre class="python">import operator
class EnumValue(object):
def __init__(self, parent_name, name, value):
self._parent_name = parent_name
self._name = name
self._value = value
def _parents_equal(self, other):
return (
hasattr(other, '_parent_name')
and self._parent_name == other._parent_name)
def _check_parents_equal(self, other):
if not self._parents_equal(other):
raise TypeError(
'This operation is valid only for enum values of the same type')
def __eq__(self, other):
return self._parents_equal(other) and self._value == other._value
def __ne__(self, other):
return not self.__eq__(other)
def __lt__(self, other):
self._check_parents_equal(other)
return self._value < other._value
def __le__(self, other):
self._check_parents_equal(other)
return self._value <= other._value
def __gt__(self, other):
self._check_parents_equal(other)
return self._value > other._value
def __ge__(self, other):
self._check_parents_equal(other)
return self._value >= other._value
def __hash__(self):
return hash(self._parent_name + str(self._value))
def __repr__(self):
return '{}({!r}, {!r}, {!r})'.format(
self.__class__.__name__, self._parent_name, self._name, self._value)
def __int__(self):
return int(self._value)
def __str__(self):
return str(self._name)
class EnumMetaclass(type):
def __new__(cls, name, bases, dct):
uppercased = dict((k.upper(), v) for k, v in dct.items())
new_dct = dict(
name=name,
_enums_by_str=dict(
(k, EnumValue(name, k, v)) for k, v in uppercased.items()),
_enums_by_int=dict(
(v, EnumValue(name, k, v)) for k, v in uppercased.items()),
)
return super(EnumMetaclass, cls).__new__(cls, name, bases, new_dct)
def __getattr__(cls, name):
try:
return cls.__getitem__(name)
except KeyError:
raise AttributeError
def __getitem__(cls, name):
try:
name = name.upper()
except AttributeError:
pass
try:
return cls._enums_by_str[name]
except KeyError:
return cls._enums_by_int[name]
def __repr__(cls):
return '{}({!r}, {})'.format(
cls.__class__.__name__,
cls.name,
', '.join('{}={}'.format(v._name, v._value)
for v in sorted(cls._enums_by_str.values())))
def values(cls):
return sorted(cls._enums_by_str.values())
def _values_comparison(cls, item, comparison_operator):
"""
Return a list of values such that comparison_operator(value, item) is
True.
"""
return sorted(
[v for v in cls._enums_by_str.values()
if comparison_operator(v, item)])
def values_lt(cls, item):
return cls._values_comparison(item, operator.lt)
def values_le(cls, item):
return cls._values_comparison(item, operator.le)
def values_gt(cls, item):
return cls._values_comparison(item, operator.gt)
def values_ge(cls, item):
return cls._values_comparison(item, operator.ge)
def values_ne(cls, item):
return cls._values_comparison(item, operator.ne)
def enum_factory(name, **kwargs):
return EnumMetaclass(name, (), kwargs)
</pre>
<p>Tests:</p>
<pre class="python">
import unittest
class EnumTestCase(unittest.TestCase):
def test_repr(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
repr(ProfileAction),
"EnumMetaclass('ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)")
def test_value_repr(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
repr(ProfileAction.VIEW), "EnumValue('ProfileAction', 'VIEW', 1)")
def test_attribute_error(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
with self.assertRaises(AttributeError):
ProfileAction.ASDFASDF
def test_cast_to_str(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(str(ProfileAction.VIEW), 'VIEW')
def test_cast_to_int(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(int(ProfileAction.VIEW), 1)
def test_access_by_str(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(ProfileAction['VIEW'], ProfileAction.VIEW)
def test_access_by_int(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(ProfileAction[1], ProfileAction.VIEW)
def test_equality(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(ProfileAction.VIEW, ProfileAction.VIEW)
self.assertEqual(ProfileAction['VIEW'], ProfileAction.VIEW)
self.assertEqual(ProfileAction[1], ProfileAction.VIEW)
def test_inequality(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertNotEqual(ProfileAction.VIEW, ProfileAction.EDIT_OWN)
self.assertNotEqual(ProfileAction['VIEW'], ProfileAction.EDIT_OWN)
self.assertNotEqual(ProfileAction[1], ProfileAction.EDIT_OWN)
DashboardAction = enum_factory(
'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertNotEqual(ProfileAction.VIEW, DashboardAction.VIEW)
def test_invalid_comparison(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
DashboardAction = enum_factory(
'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
with self.assertRaises(TypeError) as cm:
ProfileAction.VIEW < DashboardAction.EDIT_OWN
self.assertEqual(
str(cm.exception),
'This operation is valid only for enum values of the same type')
def test_values(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values(), [
EnumValue('ProfileAction', 'VIEW', 1),
EnumValue('ProfileAction', 'EDIT_OWN', 2),
EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
EnumValue('ProfileAction', 'EDIT_FULL', 4),
])
def test_values_lt(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values_lt(ProfileAction.EDIT_PUBLIC), [
EnumValue('ProfileAction', 'VIEW', 1),
EnumValue('ProfileAction', 'EDIT_OWN', 2),
])
def test_values_le(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values_le(ProfileAction.EDIT_PUBLIC), [
EnumValue('ProfileAction', 'VIEW', 1),
EnumValue('ProfileAction', 'EDIT_OWN', 2),
EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
])
def test_values_gt(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values_gt(ProfileAction.EDIT_PUBLIC), [
EnumValue('ProfileAction', 'EDIT_FULL', 4),
])
def test_values_ge(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values_ge(ProfileAction.EDIT_PUBLIC), [
EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
EnumValue('ProfileAction', 'EDIT_FULL', 4),
])
def test_values_ne(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
self.assertEqual(
ProfileAction.values_ne(ProfileAction.EDIT_PUBLIC), [
EnumValue('ProfileAction', 'VIEW', 1),
EnumValue('ProfileAction', 'EDIT_OWN', 2),
EnumValue('ProfileAction', 'EDIT_FULL', 4),
])
def test_intersection_with_same_type(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
set_a = set([ProfileAction.VIEW, ProfileAction.EDIT_OWN])
set_b = set([ProfileAction.VIEW, ProfileAction.EDIT_PUBLIC])
self.assertEqual(set_a & set_b, set([ProfileAction.VIEW]))
def test_intersection_with_different_types(self):
ProfileAction = enum_factory(
'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
DashboardAction = enum_factory(
'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
set_a = set([ProfileAction.VIEW, ProfileAction.EDIT_OWN])
set_b = set([DashboardAction.VIEW, DashboardAction.EDIT_PUBLIC])
self.assertEqual(set_a & set_b, set([]))
</pre>
Python data object motivated by a desire for a mutable namedtuple with default values
2012-08-03T07:49:48-07:00https://www.saltycrane.com/blog/2012/08/python-data-object-motivated-desire-mutable-namedtuple-default-values/<p>
<em>UPDATE 2016-08-12:</em> Read <a href="https://glyph.twistedmatrix.com/2016/08/attrs.html">Glyph's post</a> and use the
<a href="https://attrs.readthedocs.io/">attrs</a> library instead.
</p>
<p>Reasons to use this instead of a <a href="http://docs.python.org/2/library/collections.html#collections.namedtuple">namedtuple</a>:</p>
<ul>
<li>I want to change fields at a later time (mutability)</li>
<li>I want to specify a subset of the fields at instantiation and have the rest be set to a default value</li>
</ul>
<p>Reasons to use this instead of a <a href="http://docs.python.org/2/library/stdtypes.html#mapping-types-dict">dict</a>:</p>
<ul>
<li>I want to explicitly name the fields in the object</li>
<li>I want to disallow setting fields that are not explicitly named*</li>
<li>I want to specify a subset of the fields at instantiation and have the rest be set to a default value</li>
<li>I want to use attribute style access (dot notation to access fields)</li>
</ul>
<p>Reasons to use this instead of a regular Python class:</p>
<ul>
<li>I don't want to duplicate field names in the <code>__init__()</code> method signature and when setting instance attributes of the same name.</li>
<li>I want to disallow setting fields that are not explicitly named*</li>
<li>I want to be able to easily convert the object to a <code>dict</code> or a <code>tuple</code></li>
<li>I want to save memory</li>
</ul>
<p>*Note: This <a href="http://stackoverflow.com/questions/472000/python-slots/472024#472024">Stack Overflow answer</a>
warns against using <code>__slots__</code> for my goal of <em>disallowing setting fields that are not explicitly named</em>.
It says metaclasses or decorators should be abused by us control freaks and static typing weenies instead. To comply with that advice,
if you don't care about saving memory, <code>__slots__</code> could be replaced with a non-special attribute, such as <code>_fields</code>. If that is done, attribute creation would no longer be limited.
</p>
<p>See also:</p>
<ul>
<li><a href="http://pypi.python.org/pypi/recordtype/">recordtype on PyPI</a></li>
<li><a href="http://stackoverflow.com/questions/5227839/why-python-does-not-support-record-type-i-e-mutable-namedtuple">Why Python does not support record type i.e. mutable namedtuple - Stack Overflow</a></li>
<li><a href="http://www.artima.com/weblogs/viewpost.jsp?thread=236637">Managing Records in Python (Part 1 of 3)</a></li>
<li><a href="http://stackoverflow.com/questions/472000/python-slots/472024#472024">
http://stackoverflow.com/questions/472000/python-slots/472024#472024</a></li>
<li><a href="http://stackoverflow.com/questions/1816483/python-how-does-inheritance-of-slots-in-subclasses-actually-work">
http://stackoverflow.com/questions/1816483/python-how-does-inheritance-of-slots-in-subclasses-actually-work</a></li>
</ul>
<pre class="python">class DataObject(object):
"""
An object to hold data. Motivated by a desire for a mutable namedtuple with
default values. To use, subclass, and define __slots__.
The default default value is None. To set a default value other than None,
set the `default_value` class variable.
Example:
class Jello(DataObject):
default_value = 'no data'
__slots__ = (
'request_date',
'source_id',
'year',
'group_id',
'color',
# ...
)
"""
__slots__ = ()
default_value = None
def __init__(self, *args, **kwargs):
# Set default values
for att in self.__slots__:
setattr(self, att, self.default_value)
# Set attributes passed in as arguments
for k, v in zip(self.__slots__, args):
setattr(self, k, v)
for k, v in kwargs.items():
setattr(self, k, v)
def asdict(self):
return dict(
(att, getattr(self, att)) for att in self.__slots__)
def astuple(self):
return tuple(getattr(self, att) for att in self.__slots__)
def __repr__(self):
return '{}({})'.format(
self.__class__.__name__,
', '.join('{}={}'.format(
att, repr(getattr(self, att))) for att in self.__slots__))</pre>
<p>Tests:</p>
<pre class="python">import unittest
class DataObjectTestCase(unittest.TestCase):
def test_instantiation_using_args(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData('my attr 1', 'my attr 2')
self.assertEqual(md.att1, 'my attr 1')
self.assertEqual(md.att2, 'my attr 2')
def test_instantiation_using_kwargs(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1', att2='my attr 2')
self.assertEqual(md.att1, 'my attr 1')
self.assertEqual(md.att2, 'my attr 2')
def test_default_default_value(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1')
self.assertEqual(md.att1, 'my attr 1')
self.assertEqual(md.att2, None)
def test_custom_default_value(self):
class MyData(DataObject):
default_value = 'custom default value'
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1')
self.assertEqual(md.att1, 'my attr 1')
self.assertEqual(md.att2, 'custom default value')
def test_set_value_after_instantiation(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1')
self.assertEqual(md.att1, 'my attr 1')
self.assertEqual(md.att2, None)
md.att1 = 5
md.att2 = 9
self.assertEqual(md.att1, 5)
self.assertEqual(md.att2, 9)
def test_attribute_not_defined_in__slots__(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
with self.assertRaises(AttributeError):
MyData(att3='my attr 3')
with self.assertRaises(AttributeError):
md = MyData()
md.att3 = 45
def test_asdict(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1', att2='my attr 2')
self.assertEqual(
md.asdict(), {'att1': 'my attr 1', 'att2': 'my attr 2'})
def test_tuple(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1', att2='my attr 2')
self.assertEqual(md.astuple(), ('my attr 1', 'my attr 2'))
def test___repr__(self):
class MyData(DataObject):
__slots__ = ('att1', 'att2')
md = MyData(att1='my attr 1', att2='my attr 2')
self.assertEqual(repr(md), "MyData(att1='my attr 1', att2='my attr 2')")
</pre>
<p>Note: previously, I included the following method in the class.
However, this is not necessary. If __slots__ is defined in DataObject and the subclass,
any attribute not in __slots__ will automatically raise an AttributeError.
</p>
<pre class="python"># def __setattr__(self, name, value):
# if name not in self.__slots__:
# raise AttributeError("%s is not a valid attribute in %s" % (
# name, self.__class__.__name__))
# super(DataObject, self).__setattr__(name, value)</pre>
How to sort a list of dicts in Python
2010-04-02T10:46:58-07:00https://www.saltycrane.com/blog/2010/04/how-sort-list-dicts-python/<p>I'm using the <a href="http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group">MongoDB
group function</a> (it's similar to SQL's GROUP BY) to aggregate some results
for my <a href="http://github.com/saltycrane/live-log-analyzer">live-log-analyzer</a>
project. This function is pretty cool, but it does not sort the grouped data. Here is
how to sort the data. (It is only one line of Python, but I have a hard time
remembering how to do this.)
</p>
<p>DATA is the output of the mongoDB group function. I want to sort this list
of dicts by <code>'ups_ad'</code>.</p>
<pre class="python">from pprint import pprint
DATA = [
{u'avg': 2.9165000000000001,
u'count': 10.0,
u'total': 29.165000000000003,
u'ups_ad': u'10.194.154.49:80'},
{u'avg': 2.6931000000000003,
u'count': 10.0,
u'total': 26.931000000000001,
u'ups_ad': u'10.194.155.176:80'},
{u'avg': 1.9860909090909091,
u'count': 11.0,
u'total': 21.847000000000001,
u'ups_ad': u'10.195.71.146:80'},
{u'avg': 1.742818181818182,
u'count': 11.0,
u'total': 19.171000000000003,
u'ups_ad': u'10.194.155.48:80'}
]
data_sorted = sorted(DATA, key=lambda item: item['ups_ad'])
pprint(data_sorted)</pre>
<p>Results:</p>
<pre>[{u'avg': 2.9165000000000001,
u'count': 10.0,
u'total': 29.165000000000003,
u'ups_ad': u'10.194.154.49:80'},
{u'avg': 2.6931000000000003,
u'count': 10.0,
u'total': 26.931000000000001,
u'ups_ad': u'10.194.155.176:80'},
{u'avg': 1.742818181818182,
u'count': 11.0,
u'total': 19.171000000000003,
u'ups_ad': u'10.194.155.48:80'},
{u'avg': 1.9860909090909091,
u'count': 11.0,
u'total': 21.847000000000001,
u'ups_ad': u'10.195.71.146:80'}]</pre>
References:
<ul>
<li><a href="http://wiki.python.org/moin/HowTo/Sorting#KeyFunctions">HowTo/Sorting
- PythonInfo Wiki</a></li>
<li><a href="http://docs.python.org/library/functions.html#sorted">sorted built-in
function - Python documentation</a></li>
</ul>
<p><em>Update 2010-04-28:</em> Apparently I didn't use Google properly when I first
wrote this post. Searching today produced several sources for doing exactly this.
<ul>
<li><a href="http://blog.davidziegler.net/post/107271990/sorting-a-list-of-dictionaries-in-python">
David Ziegler's Blog - Sorting a List of Dictionaries in Python</a></li>
<li><a href="http://stackoverflow.com/questions/72899/in-python-how-do-i-sort-a-list-of-dictionaries-by-values-of-the-dictionary">
Stack Overflow - In Python how do I sort a list of dictionaries by values of the dictionary?</a></li>
<li><a href="http://code.pui.ch/2007/07/23/python-sort-a-list-of-dicts-by-dict-key/">
code.random() - Python: Sort a list of dicts by dict-key</a></li>
<li><a href="http://wiki.python.org/moin/SortingListsOfDictionaries">
PythonInfo Wiki - Sorting Lists of Dictionaries</a></li>
</ul>
</p>
Python setdefault example
2010-02-09T17:10:22-08:00https://www.saltycrane.com/blog/2010/02/python-setdefault-example/<p>I always forget how to use Python's
<a href="http://docs.python.org/library/stdtypes.html#dict.setdefault">
setdefault</a> dictionary operation so here is a quick example.
</p>
<p>What I want:</p>
<pre class="python">DATA_SOURCE = (('key1', 'value1'),
('key1', 'value2'),
('key2', 'value3'),
('key2', 'value4'),
('key2', 'value5'),)
newdata = {}
for k, v in DATA_SOURCE:
if newdata.has_key(k):
newdata[k].append(v)
else:
newdata[k] = [v]
print newdata</pre>
<p>Results:</p>
<pre>{'key2': ['value3', 'value4', 'value5'], 'key1': ['value1', 'value2']}</pre>
<p>Better way using <code>setdefault</code>:</p>
<pre class="python">newdata = {}
for k, v in DATA_SOURCE:
newdata.setdefault(k, []).append(v)
print newdata</pre>
<p>The results are the same.</p>
How to conditionally replace items in a list
2008-08-22T12:53:48-07:00https://www.saltycrane.com/blog/2008/08/how-conditionally-replace-items-list/<p>I wanted to replace items in a list based on a specific condition.
For example, given a list of numbers, I want to replace all items
that are negative with zero.</p>
<h5>Naive way</h5>
<p>At first, I thought of something like this:</p>
<pre class="python">mylist = [111, -222, 333, -444]
newlist = []
for item in mylist:
if item < 0:
item = 0
newlist.append(item)
mylist = newlist
print mylist</pre>
<p>Which gave me the expected results:</p>
<pre>[111, 0, 333, 0]</pre>
<h5>Better way?</h5>
<p>Then I tried using Python's <code>enumerate</code>
(see <a href="http://www.saltycrane.com/blog/2008/04/how-to-use-pythons-enumerate-and-zip-to/">my
previous example</a>) built-in function
to replace the item in-line. This seems to be a more elegant solution
to me. Is there a better way? How would you do it?</p>
<pre class="python">mylist = [111, -222, 333, -444]
for (i, item) in enumerate(mylist):
if item < 0:
mylist[i] = 0
print mylist</pre>
<p>Results:</p>
<pre>[111, 0, 333, 0]</pre>
How to use Python's enumerate and zip to iterate over two lists and their indices.
2008-04-18T15:22:00-07:00https://www.saltycrane.com/blog/2008/04/how-to-use-pythons-enumerate-and-zip-to/<h4 id="enumerate"><strong>enumerate</strong><span style="font-weight: normal"> - Iterate over indices and items of a list</span></h4>
<p>The Python Cookbook (Recipe 4.4) describes how to iterate over items and indices
in a list using <code>enumerate</code>. For example:</p>
<pre class="python">alist = ['a1', 'a2', 'a3']
for i, a in enumerate(alist):
print i, a</pre>
<p>Results:</p>
<pre>0 a1
1 a2
2 a3</pre>
<h4 id="zip">zip<span style="font-weight: normal"> - Iterate over two lists in parallel</span></h4>
<p>I <a href="/blog/2007/12/iterating-through-two-lists-in-parallel/">
previously</a> wrote about using <code>zip</code> to iterate over two lists
in parallel. Example:</p>
<pre class="python">alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']
for a, b in zip(alist, blist):
print a, b</pre>
<p>Results:</p>
<pre>a1 b1
a2 b2
a3 b3</pre>
<h4 id="enumerate-with-zip">enumerate with zip</h4>
<p>Here is how to iterate over two lists and their indices using enumerate together
with zip:</p>
<pre class="python">alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']
for i, (a, b) in enumerate(zip(alist, blist)):
print i, a, b</pre>
<p>Results:</p>
<pre>0 a1 b1
1 a2 b2
2 a3 b3</pre>
How to invert a dict in Python
2008-01-14T13:34:00-08:00https://www.saltycrane.com/blog/2008/01/how-to-invert-dict-in-python/<p><strong>Example 1:</strong> If the values in the dictionary are unique and
hashable, then I can use Recipe 4.14 in the <em>Python Cookbook, 2nd
Edition</em>.</p>
<pre class="python">def invert_dict(d):
return dict([(v, k) for k, v in d.iteritems()])
d = {'child1': 'parent1',
'child2': 'parent2',
}
print invert_dict(d)
</pre>
<pre>{'parent2': 'child2', 'parent1': 'child1'}</pre>
<br /><p><strong>Example 2:</strong> If the values in the dictionary are hashable,
but not unique, I can create a dict of lists as an inverse.</p>
<pre class="python">def invert_dict_nonunique(d):
newdict = {}
for k, v in d.iteritems():
newdict.setdefault(v, []).append(k)
return newdict
d = {'child1': 'parent1',
'child2': 'parent1',
'child3': 'parent2',
'child4': 'parent2',
}
print invert_dict_nonunique(d)
</pre>
<pre>{'parent2': ['child3', 'child4'], 'parent1': ['child1', 'child2']}</pre>
<br /><p><strong>Example 3:</strong> If I am starting with a dict of lists, where
lists contain unique hashable items, I can create an inverse as shown below.</p>
<pre class="python">def invert_dol(d):
return dict((v, k) for k in d for v in d[k])
d = {'child1': ['parent1'],
'child2': ['parent2', 'parent3'],
}
print invert_dol(d)
</pre>
<pre>{'parent3': 'child2', 'parent2': 'child2', 'parent1': 'child1'}</pre>
<br /><p><strong>Example 4:</strong> If I am starting with a dict of lists, where
lists contain non-unique hashable items, I can create another dict of lists
as an inverse.</p>
<pre class="python">def invert_dol_nonunique(d):
newdict = {}
for k in d:
for v in d[k]:
newdict.setdefault(v, []).append(k)
return newdict
d = {'child1': ['parent1'],
'child2': ['parent1'],
'child3': ['parent2'],
'child4': ['parent2'],
'child5': ['parent1', 'parent2'],
}
print invert_dol_nonunique(d)
</pre>
<pre>{'parent2': ['child3', 'child4', 'child5'], 'parent1': ['child1', 'child2', 'child5']}</pre>
How to find the intersection and union of two lists in Python
2008-01-03T16:33:00-08:00https://www.saltycrane.com/blog/2008/01/how-to-find-intersection-and-union-of/<p>My friend Bill had previously alerted me to the coolness of Python
<code>set</code>s. However I hadn't found opportunity to use them
until now. Here are three functions using <code>set</code>s to
remove duplicate entries from a list, find the intersection of two
lists, and find the union of two lists. Note, <code>set</code>s were
introduced in Python 2.4, so Python 2.4 or later is required. Also,
the items in the list must be hashable and order of the lists is not
preserved.</p>
<p>For more information on Python <code>set</code>s, see the
<a href="http://docs.python.org/lib/types-set.html">Library Reference</a>.</p>
<pre class="python">""" NOTES:
- requires Python 2.4 or greater
- elements of the lists must be hashable
- order of the original lists is not preserved
"""
def unique(a):
""" return the list with duplicate elements removed """
return list(set(a))
def intersect(a, b):
""" return the intersection of two lists """
return list(set(a) & set(b))
def union(a, b):
""" return the union of two lists """
return list(set(a) | set(b))
if __name__ == "__main__":
a = [0,1,2,0,1,2,3,4,5,6,7,8,9]
b = [5,6,7,8,9,10,11,12,13,14]
print unique(a)
print intersect(a, b)
print union(a, b)
</pre>
<br />Results:<br />
<pre>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[8, 9, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]</pre>
Tabular data structure conversion in Python
2007-12-20T13:22:00-08:00https://www.saltycrane.com/blog/2007/12/tabular-data-structure-conversion-in-python/<p>Here is a Python library to convert between various tabular data
structures including list of lists, list of dicts, dict of lists, and
dict of dicts.
My <a href="/blog/2007/12/how-to-convert-list-of-dictionaries-to/">
original</a> <a href="/blog/2007/12/how-to-convert-dictionary-of-lists-to/">
attempts</a> at these conversions required that the data be
rectangular (e.g. each column has the same number of
elements). However, further research led me
to <a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/410687">
this ASPN Recipe</a> which uses <code>map</code> to transpose a list
of lists even if it is not
rectangular. With <a href="http://mail.python.org/pipermail/python-list/2007-December/thread.html#469295">
help from the mailing list</a>, I rewrote the recipe without using
<code>lambda</code>. (I did this
because <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=211200">
Guido suggested</a> not to use <code>map</code>
with <code>lambda</code> for the sake of clarity.)</p>
<p>I used <a href="http://docs.python.org/tut/node7.html#SECTION007140000000000000000">
list comprehensions</a> wherever possible and
a <a href="http://gnosis.cx/publish/programming/charming_python_13.html">
functional</a>/
<a href="http://en.wikipedia.org/wiki/Declarative_programming">declarative</a>
approach in general. It is likely there is a better way to do many of
these conversions. (After all,
I <a href="/blog/2007/12/iterating-through-two-lists-in-parallel/">
just learned how to use <code>zip()</code></a>.) In particular, the
functions with the comment <em>"Better way?"</em> use a number of the
other conversion functions in series to achieve the desired
result. All of these could be optimized. Feedback on better methods
is welcome.</p>
<h4>Example data structures</h4>
<p>Here are examples of the 8 different tabular
data structures. Note that if a transpose is performed (i.e. rows switched with
columns or vice versa), the output is padded with <code>None</code>.
Otherwise, it is left as is.</p>
<pre class="python"># lorl- list of lists where each inner list is a row
lorl = [
['a1', 'b1', 'c1'], # row 1
['a2', 'b2', 'c2'], # row 2
['a3', 'b3', 'c3'], # row 3
['a4', 'b4', ], # row 4
]
# locl- list of lists where each inner list is a column
locl = [
['a1', 'a2', 'a3', 'a4'], # col a
['b1', 'b2', 'b3', 'b4'], # col b
['c1', 'c2', 'c3', ], # col c
]
# lord- list of dicts where each dict is a row
lord = [
{'a':'a1', 'b':'b1', 'c':'c1'}, # row 1
{'a':'a2', 'b':'b2', 'c':'c2'}, # row 2
{'a':'a3', 'b':'b3', 'c':'c3'}, # row 3
{'a':'a4', 'b':'b4', }, # row 4
]
# locd- list of dicts where each dict is a column
locd = [
{1:'a1', 2:'a2', 3:'a3', 4:'a4'}, # col a
{1:'b1', 2:'b2', 3:'b3', 4:'b4'}, # col b
{1:'c1', 2:'c2', 3:'c3', }, # col c
]
# dorl- dict of lists where each list is a row
dorl = {
1: ['a1', 'b1', 'c1'], # row 1
2: ['a2', 'b2', 'c2'], # row 2
3: ['a3', 'b3', 'c3'], # row 3
4: ['a4', 'b4', ], # row 4
}
# docl- dict of lists where each list is a column
docl = {
'a': ['a1', 'a2', 'a3', 'a4'], # column a
'b': ['b1', 'b2', 'b3', 'b4'], # column b
'c': ['c1', 'c2', 'c3', ], # column c
}
# dord- dict of dicts where each inner dict is a row
dord = {
1: {'a':'a1', 'b':'b1', 'c':'c1'}, # row 1
2: {'a':'a2', 'b':'b2', 'c':'c2'}, # row 2
3: {'a':'a3', 'b':'b3', 'c':'c3'}, # row 3
4: {'a':'a4', 'b':'b4', }, # row 4
}
# docd- dict of dicts where each inner dict is a column
docd = {
'a': {1:'a1', 2:'a2', 3:'a3', 4:'a4'}, # column a
'b': {1:'b1', 2:'b2', 3:'b3', 4:'b4'}, # column b
'c': {1:'c1', 2:'c2', 3:'c3', }, # column c
}
# list of row keys and column keys
rowkeys = [1, 2, 3, 4]
colkeys = ['a', 'b', 'c']</pre>
<h4>Code</h4>
<p>Below is the library of functions.</p>
<pre class="python">"""tabular.py
Functions to convert tabular data structures
The following data structures are supported:
lorl- list of lists where each inner list is a row
locl- list of lists where each inner list is a column
lord- list of dicts where each dict is a row
locd- list of dicts where each dict is a column
dorl- dict of lists where each list is a row
docl- dict of lists where each list is a column
dord- dict of dicts where each inner dict is a row
docd- dict of dicts where each inner dict is a column
"""
#-------------------------------------------------------
# from lorl to ...
#-------------------------------------------------------
def lorl2locl(lorl):
return [list(col) for col in map(None, *lorl)]
def lorl2lord(lorl, colkeys):
return [dict(zip(colkeys, row)) for row in lorl]
def lorl2locd(lorl, rowkeys):
# better way?
return locl2locd(lorl2locl(lorl), rowkeys)
def lorl2dorl(lorl, rowkeys):
return dict(zip(rowkeys, [row for row in lorl]))
def lorl2docl(lorl, colkeys):
# better way?
return locl2docl(lorl2locl(lorl), colkeys)
def lorl2dord(lorl, rowkeys, colkeys):
return dict(zip(rowkeys, [dict(zip(colkeys, row))
for row in lorl]))
def lorl2docd(lorl, rowkeys, colkeys):
# better way?
return dict(zip(colkeys, [dict(zip(rowkeys, col))
for col in lorl2locl(lorl)]))
#-------------------------------------------------------
# from locl to ...
#-------------------------------------------------------
def locl2lorl(locl):
return [list(row) for row in map(None, *locl)]
def locl2lord(locl, colkeys):
# better way?
return lorl2lord(locl2lorl(locl), colkeys)
def locl2locd(locl, rowkeys):
return [dict(zip(rowkeys, col)) for col in locl]
def locl2dorl(locl, rowkeys):
# better way?
return dict(zip(rowkeys, [row for row in locl2lorl(locl)]))
def locl2docl(locl, colkeys):
return dict(zip(colkeys, locl))
def locl2dord(locl, rowkeys, colkeys):
# better way?
return dict(zip(rowkeys, [dict(zip(colkeys, row))
for row in locl2lorl(locl)]))
def locl2docd(locl, rowkeys, colkeys):
return dict(zip(colkeys, [dict(zip(rowkeys, col))
for col in locl]))
#-------------------------------------------------------
# from lord to ...
#-------------------------------------------------------
def lord2lorl(lord, colkeys):
return [[row[key] for key in colkeys if key in row]
for row in lord]
def lord2locl(lord, colkeys):
# better way?
return lorl2locl(lord2lorl(lord, colkeys))
def lord2locd(lord, rowkeys, colkeys):
return [dict([(rkey, row[ckey])
for rkey, row in zip(rowkeys, lord) if ckey in row])
for ckey in colkeys]
def lord2dorl(lord, rowkeys, colkeys):
return dict(zip(rowkeys, [[row[ckey]
for ckey in colkeys if ckey in row]
for row in lord]))
def lord2docl(lord, colkeys):
return dict(zip(colkeys, [[row[ckey]
for row in lord if ckey in row]
for ckey in colkeys]))
def lord2dord(lord, rowkeys):
return dict(zip(rowkeys, lord))
def lord2docd(lord, rowkeys, colkeys):
return dict(zip(colkeys,
[dict(zip(rowkeys,
[row[ckey]
for row in lord if ckey in row]))
for ckey in colkeys]))
#-------------------------------------------------------
# from locd to ...
#-------------------------------------------------------
def locd2lorl(locd, rowkeys):
# better way?
return locl2lorl(locd2locl(locd, rowkeys))
def locd2locl(locd, rowkeys):
return [[col[key] for key in rowkeys if key in col]
for col in locd]
def locd2lord(locd, rowkeys, colkeys):
return [dict([(ckey, col[rkey])
for ckey, col in zip(colkeys, locd) if rkey in col])
for rkey in rowkeys]
def locd2dorl(locd, rowkeys):
return dict(zip(rowkeys, [[col[rkey]
for col in locd if rkey in col]
for rkey in rowkeys]))
def locd2docl(locd, rowkeys, colkeys):
return dict(zip(colkeys, [[col[rkey]
for rkey in rowkeys if rkey in col]
for col in locd]))
def locd2dord(locd, rowkeys, colkeys):
return dict(zip(rowkeys,
[dict(zip(colkeys,
[col[rkey]
for col in locd if rkey in col]))
for rkey in rowkeys]))
def locd2docd(locd, colkeys):
return dict(zip(colkeys, locd))
#-------------------------------------------------------
# from dorl to ...
#-------------------------------------------------------
def dorl2lorl(dorl, rowkeys):
return [dorl[key] for key in rowkeys]
def dorl2locl(dorl, rowkeys):
# better way?
return lorl2locl(dorl2lorl(dorl, rowkeys))
def dorl2lord(dorl, rowkeys, colkeys):
return [dict(zip(colkeys, dorl[rkey]))
for rkey in rowkeys]
def dorl2locd(dorl, rowkeys):
# better way?
return locl2locd(lorl2locl(dorl2lorl(dorl, rowkeys)), rowkeys)
def dorl2docl(dorl, rowkeys, colkeys):
# better way?
return locl2docl(lorl2locl(dorl2lorl(dorl, rowkeys)), colkeys)
def dorl2dord(dorl, rowkeys, colkeys):
# better way?
return lorl2dord(dorl2lorl(dorl, rowkeys), rowkeys, colkeys)
def dorl2docd(dorl, rowkeys, colkeys):
# better way?
return locl2docd(lorl2locl(dorl2lorl(dorl, rowkeys)),
rowkeys, colkeys)
#-------------------------------------------------------
# from docl to ...
#-------------------------------------------------------
def docl2lorl(docl, colkeys):
# better way?
return locl2lorl(docl2locl(docl, colkeys))
def docl2locl(docl, colkeys):
return [docl[key] for key in colkeys]
def docl2lord(docl, rowkeys, colkeys):
# better way?
return lorl2lord(locl2lorl(docl2locl(docl, colkeys)), colkeys)
def docl2locd(docl, rowkeys, colkeys):
#
return [dict(zip(rowkeys, docl[ckey]))
for ckey in colkeys]
def docl2dorl(docl, rowkeys, colkeys):
# better way?
return lorl2dorl(locl2lorl(docl2locl(docl, colkeys)), rowkeys)
def docl2dord(docl, rowkeys, colkeys):
# better way?
return lorl2dord(locl2lorl(docl2locl(docl, colkeys)),
rowkeys, colkeys)
def docl2docd(docl, rowkeys, colkeys):
# better way?
return locl2docd(docl2locl(docl, colkeys), rowkeys, colkeys)
#-------------------------------------------------------
# from dord to ...
#-------------------------------------------------------
def dord2lorl(dord, rowkeys, colkeys):
return [[dord[rkey][ckey]
for ckey in colkeys if ckey in dord[rkey]]
for rkey in rowkeys if rkey in dord]
def dord2locl(dord, rowkeys, colkeys):
# better way?
return lorl2locl(dord2lorl(dord, rowkeys, colkeys))
def dord2lord(dord, rowkeys):
return [dord[rkey] for rkey in rowkeys]
def dord2locd(dord, rowkeys, colkeys):
# better way?
return lord2locd(dord2lord(dord, rowkeys), rowkeys, colkeys)
def dord2dorl(dord, rowkeys, colkeys):
# don't need zip
return dict([(rkey, [dord[rkey][ckey]
for ckey in colkeys if ckey in dord[rkey]])
for rkey in rowkeys])
def dord2docl(dord, rowkeys, colkeys):
# better way?
return locl2docl(lorl2locl(dord2lorl(dord, rowkeys, colkeys)),
colkeys)
def dord2docd(dord, rowkeys, colkeys):
# better way?
return locl2docd(lorl2locl(dord2lorl(dord, rowkeys, colkeys)),
rowkeys, colkeys)
#-------------------------------------------------------
# from docd to ...
#-------------------------------------------------------
def docd2lorl(docd, rowkeys, colkeys):
# better way?
return locl2lorl(docd2locl(docd, rowkeys, colkeys))
def docd2locl(docd, rowkeys, colkeys):
return [[docd[ckey][rkey]
for rkey in rowkeys if rkey in docd[ckey]]
for ckey in colkeys if ckey in docd]
def docd2lord(docd, rowkeys, colkeys):
# better way?
return locd2lord(docd2locd(docd, colkeys), rowkeys, colkeys)
def docd2locd(docd, colkeys):
return [docd[ckey] for ckey in colkeys]
def docd2dorl(docd, rowkeys, colkeys):
# better way?
return lorl2dorl(locl2lorl(docd2locl(docd, rowkeys, colkeys)),
rowkeys)
def docd2docl(docd, rowkeys, colkeys):
# don't need zip
return dict([(ckey, [docd[ckey][rkey]
for rkey in rowkeys if rkey in docd[ckey]])
for ckey in colkeys])
def docd2dord(docd, rowkeys, colkeys):
# better way?
return lorl2dord(locl2lorl(docd2locl(docd, rowkeys, colkeys)),
rowkeys, colkeys)</pre>
Iterating through two lists in parallel using zip()
2007-12-19T17:33:00-08:00https://www.saltycrane.com/blog/2007/12/iterating-through-two-lists-in-parallel/<p>From
the <a href="http://docs.python.org/lib/built-in-funcs.html">Python docs</a>, <code>zip</code> <em>returns
a list of tuples, where the i-th tuple contains the i-th element
from each of the argument sequences or iterables.</em> This is useful
for iterating over two lists in parallel. For example, if I have two
lists, I can get the first element of both lists, then the
second element of both lists, then the third, etc.</p>
<pre>Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = [1,2,3]
>>> b = ['a','b','c']
>>> for i,j in zip(a,b):
... print i, j
...
1 a
2 b
3 c
>>> </pre>
<p>If the lists are different lengths, <code>zip</code> truncates to
the length of the shortest list. Using <code>map</code>
with <code>None</code> is similar to <code>zip</code> except the
results are padded with <code>None</code>.</p>
<pre>>>> a = [1,2,3]
>>> b = ['a','b','c','d']
>>> zip(a,b)
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> map(None,a,b)
[(1, 'a'), (2, 'b'), (3, 'c'), (None, 'd')]
>>> </pre>
<p>If I have a list of keys and a list of values, I can create a
dictionary by passing the output of <code>zip</code>
to <code>dict</code>.</p>
<pre>>>> mykeys = ['a', 'b', 'c']
>>> myvalues = [1, 2, 3]
>>> dict(zip(mykeys, myvalues))
{'a': 1, 'c': 3, 'b': 2}
>>> </pre>
<br />See also this thread on the Python mailing list:
<a href="http://mail.python.org/pipermail/python-list/2002-May/thread.html#146810">
Iterating through two lists</a><br />
How to convert a dictionary of lists to a list of lists in Python
2007-12-10T14:09:00-08:00https://www.saltycrane.com/blog/2007/12/how-to-convert-dictionary-of-lists-to/<p>UPDATE: See my new post, <a href="http://www.saltycrane.com/blog/2007/12/tabular-data-structure-conversion-in/">
Tabular data structure conversion in Python</a> for an updated method which handles
non-rectangular data.</p>
<p>The functions below convert a rectangular dictionary of lists to a list of
lists. Each list in the dictionary must be the same length. Additionally, a
list of keys is required as an input argument to specify the desired ordering
of the columns in the returned list of lists. If this were not specified, the order
of the columns would be unknown since items in a dictionary are unordered.</p>
<p>The converted list of lists can contain either a list of rows or a list of
columns. The first two functions create a lists of rows; the last two create
a list of columns. <em>(I consider each list in the dict of lists as a column,
and all items for a given index a row.)</em> </p>
<p>I also compare the <a href="http://en.wikipedia.org/wiki/Imperative_programming">
imperative</a>/<a href="http://en.wikipedia.org/wiki/Procedural_programming">
procedural</a> approach to the <a href="http://en.wikipedia.org/wiki/Declarative_programming">
declarative</a>/<a href="http://en.wikipedia.org/wiki/Functional_programming">
functional</a> approach. I like the declarative/functional approach because
it is so concise, and, I believe, a little faster as well.</p>
<pre class="python">#!/usr/bin/python
# IMPERATIVE/PROCEDURAL APPROACH
def byrow_imper(dol, keylist):
"""Converts a dictionary of lists to a list of lists using the
values of the dictionaries. Each list must be the same length.
dol: dictionary of lists
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are rows.
i.e. returns a list of rows. """
lol = []
for i in xrange(len(dol[keylist[0]])):
row = []
for key in keylist:
row.append(dol[key][i])
lol.append(row)
return lol
# DECLARATIVE/FUNCTIONAL APPROACH
def byrow_decl(dol, keylist):
"""Converts a dictionary of lists to a list of lists using the
values of the dictionaries. Each list must be the same length.
dol: dictionary of lists
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are rows.
i.e. returns a list of rows. """
return [[dol[key][i] for key in keylist]
for i in xrange(len(dol[keylist[0]]))]
# IMPERATIVE/PROCEDURAL APPROACH
def bycol_imper(dol, keylist):
"""Converts a dictionary of lists to a list of lists using the
values of the dictionaries. Each list must be the same length.
dol: dictionary of lists
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are columns.
i.e. returns a list of columns. """
lol = []
for key in keylist:
col = []
for item in dol[key]:
col.append(item)
lol.append(col)
return lol
# DECLARATIVE/FUNCTIONAL APPROACH
def bycol_decl(dol, keylist):
"""Converts a dictionary of lists to a list of lists using the
values of the dictionaries. Each list must be the same length.
dol: dictionary of lists
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are columns.
i.e. returns a list of columns. """
return [[item for item in dol[key]] for key in keylist]
# TEST
if __name__ == "__main__":
dol = {
'a': ['a1', 'a2', 'a3'], # column a
'b': ['b1', 'b2', 'b3'], # column b
'c': ['c1', 'c2', 'c3'], # column c
}
keylist = ['a', 'b', 'c']
print byrow_imper(dol, keylist)
print byrow_decl(dol, keylist)
print bycol_imper(dol, keylist)
print bycol_decl(dol, keylist)
</pre>
<br />Results:<br />
<pre>[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']]
[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']]
[['a1', 'a2', 'a3'], ['b1', 'b2', 'b3'], ['c1', 'c2', 'c3']]
[['a1', 'a2', 'a3'], ['b1', 'b2', 'b3'], ['c1', 'c2', 'c3']]</pre>
How to convert a list of dictionaries to a list of lists in Python
2007-12-10T12:28:00-08:00https://www.saltycrane.com/blog/2007/12/how-to-convert-list-of-dictionaries-to/<p>UPDATE: See my new post, <a href="/blog/2007/12/tabular-data-structure-conversion-in-python/">
Tabular data structure conversion in Python</a> for an updated method which handles
non-rectangular data.</p>
<p>The functions below convert a rectangular list of dictionaries to a list of lists.
Each dictionary in the list must have the same keys. Additionally, a list of keys
is required as an input argument to specify the desired ordering of the
columns in the returned list of lists. If this were not specified, the order
of the columns would be unknown since items in a dictionary are unordered.</p>
<p>The converted list of lists can contain either a list of rows or a list of
columns. The first two functions create a lists of rows; the last two create
a list of columns. <em>(I consider each dict in the list of dicts as a row,
and all values for a given key as a column.)</em> </p>
<p>I also compare the <a href="http://en.wikipedia.org/wiki/Imperative_programming">
imperative</a>/<a href="http://en.wikipedia.org/wiki/Procedural_programming">
procedural</a> approach to the <a href="http://en.wikipedia.org/wiki/Declarative_programming">
declarative</a>/<a href="http://en.wikipedia.org/wiki/Functional_programming">
functional</a> approach. I like the declarative/functional approach because
it is so concise, and, I believe, a little faster as well.</p>
<pre class="python">#!/usr/bin/python
def byrow_imper(lod, keylist):
"""Converts a list of dictionaries to a list of lists using the
values of the dictionaries. Assumes that each dictionary has the
same keys.
lod: list of dictionaries
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are rows.
i.e. returns a list of rows. """
# imperative/procedural approach
lol = []
for row in lod:
row2 = []
for key in keylist:
row2.append(row[key])
lol.append(row2)
return lol
def byrow_decl(lod, keylist):
"""Converts a list of dictionaries to a list of lists using the
values of the dictionaries. Assumes that each dictionary has the
same keys.
lod: list of dictionaries
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are rows.
i.e. returns a list of rows. """
# declarative/functional approach
return [[row[key] for key in keylist] for row in lod]
def bycol_imper(lod, keylist):
"""Converts a list of dictionaries to a list of lists using the
values of the dictionaries. Assumes that each dictionary has the
same keys.
lod: list of dictionaries
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are columns.
i.e. returns a list of columns. """
# imperative/procedural approach
lol = []
for key in keylist:
col = []
for row in lod:
col.append(row[key])
lol.append(col)
return lol
def bycol_decl(lod, keylist):
"""Converts a list of dictionaries to a list of lists using the
values of the dictionaries. Assumes that each dictionary has the
same keys.
lod: list of dictionaries
keylist: list of keys, ordered as desired
Returns: a list of lists where the inner lists are columns.
i.e. returns a list of columns. """
# declarative/functional approach
return [[row[key] for row in lod] for key in keylist]
if __name__ == "__main__":
lod = [
{'a':1, 'b':2, 'c':3},
{'a':4, 'b':5, 'c':6},
{'a':7, 'b':8, 'c':9},
]
keylist = ['a', 'b', 'c']
print byrow_imper(lod, keylist)
print byrow_decl(lod, keylist)
print bycol_imper(lod, keylist)
print bycol_decl(lod, keylist)
</pre>
<br />Results:
<pre>[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]</pre>
How to sort a table by columns in Python
2007-12-05T18:19:00-08:00https://www.saltycrane.com/blog/2007/12/how-to-sort-table-by-columns-in-python/<p>I have a 2-dimensional table of data implemented as a list of lists
in Python. I would like to sort the data by an arbitrary
column. This is a common task with tabular data. For example,
Windows Explorer allows me to sort the list of files
by <em>Name</em>, <em>Size</em>, <em>Type</em>, or <em>Date
Modified</em>. I tried the code
from <a href="http://homework.nwsnet.de/snippets/view/4">this
article</a>, however, if there are duplicate entries in the column
being sorted, the duplicates are removed. This is not what I wanted,
so I did some further searching, and found a nice solution from
the <a href="http://wiki.python.org/moin/HowTo/Sorting">HowTo/Sorting
article</a> on the PythonInfo Wiki. This method also uses the
built-in <code>sorted()</code> function, as well
as the <code>key</code> paramenter,
and <code>operator.itemgetter()</code>. (See section <a href="http://docs.python.org/lib/built-in-funcs.html">2.1</a>
and <a href="http://docs.python.org/lib/module-operator.html">6.7</a>
of the Python Library Reference for more information.) The following
code sorts the table by the second column (index 1). Note, Python
2.4 or later is required.
<pre class="python">import operator
def sort_table(table, col=0):
return sorted(table, key=operator.itemgetter(col))
if __name__ == '__main__':
mytable = (
('Joe', 'Clark', '1989'),
('Charlie', 'Babbitt', '1988'),
('Frank', 'Abagnale', '2002'),
('Bill', 'Clark', '2009'),
('Alan', 'Clark', '1804'),
)
for row in sort_table(mytable, 1):
print row
</pre>
<br />Results:<br />
<pre>('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Joe', 'Clark', '1989')
('Bill', 'Clark', '2009')
('Alan', 'Clark', '1804')</pre>
<br />This works well, but I would also like the table to be sorted by
column 0 in addition to column 1. In this example, column 1 holds the
<em>Last Name</em> and column 0 holds the <em>First
Name</em>. I would like the table to be sorted first by <em>Last
Name</em>, and then by <em>First Name</em>. Here is the code to sort
the table by multiple columns. The <em>cols</em> argument is a tuple
specifying the columns to sort by. The first column to sort by is
listed first, the second second, and so on.
<pre class="python">import operator
def sort_table(table, cols):
""" sort a table by multiple columns
table: a list of lists (or tuple of tuples) where each inner list
represents a row
cols: a list (or tuple) specifying the column numbers to sort by
e.g. (1,0) would sort by column 1, then by column 0
"""
for col in reversed(cols):
table = sorted(table, key=operator.itemgetter(col))
return table
if __name__ == '__main__':
mytable = (
('Joe', 'Clark', '1989'),
('Charlie', 'Babbitt', '1988'),
('Frank', 'Abagnale', '2002'),
('Bill', 'Clark', '2009'),
('Alan', 'Clark', '1804'),
)
for row in sort_table(mytable, (1,0)):
print row
</pre>
<br />Results:<br />
<pre>('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Alan', 'Clark', '1804')
('Bill', 'Clark', '2009')
('Joe', 'Clark', '1989')</pre>
</p>
How to copy Python lists or other objects
2007-11-29T17:23:00-08:00https://www.saltycrane.com/blog/2007/11/how-to-copy-python-lists-or-other/<p>This problem had me stumped for a while today. If I have a
list <code>a</code>, setting <code>b = a</code> doesn't make a copy of
the list <code>a</code>. Instead, it makes a new reference
to <code>a</code>. For example, see the interactive Python session
below:
<pre>Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = [1,2,3]
>>> b = a
>>> b
[1, 2, 3]
>>> a.append(4)
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> </pre>
<br />Here is a quick reference extracted from <a href="http://www.oreilly.com/catalog/lpython/chapter/ch09.html">
Chapter 9 in <em>Learning Python, 1st Edition</em></a>.<br />
<br />To make a copy of a list, use the following:
<pre>newList = myList[:]
newList2 = list(myList2) # alternate method</pre>
<br />To make a copy of a dict, use the following:
<pre>newDict = myDict.copy()</pre>
<br />To make a copy of some other object, use the <code>copy</code>
module:
<pre>import copy
newObj = copy.copy(myObj) # shallow copy
newObj2 = copy.deepcopy(myObj2) # deep copy</pre>
<br />For more information on shallow and deep copies with
the <code>copy</code> module, see
the <a href="http://docs.python.org/lib/module-copy.html">Python docs</a>.
</p>
Python circular buffer
2007-11-29T12:06:00-08:00https://www.saltycrane.com/blog/2007/11/python-circular-buffer/<p>Here is a simple circular buffer, or ring buffer, implementation in
Python. It is a first-in, first-out (FIFO) buffer with a fixed size.
<pre class="python">class RingBuffer:
def __init__(self, size):
self.data = [None for i in xrange(size)]
def append(self, x):
self.data.pop(0)
self.data.append(x)
def get(self):
return self.data</pre>
Here is an example where the buffer size is 4. Ten integers, 0-9, are
inserted, one at a time, at the end of the buffer. Each
iteration, the first element is removed from the front of the buffer.
<pre>buf = RingBuffer(4)
for i in xrange(10):
buf.append(i)
print buf.get()</pre>
<br />Here are the results:
<pre>[None, None, None, 0]
[None, None, 0, 1]
[None, 0, 1, 2]
[0, 1, 2, 3]
[1, 2, 3, 4]
[2, 3, 4, 5]
[3, 4, 5, 6]
[4, 5, 6, 7]
[5, 6, 7, 8]
[6, 7, 8, 9]</pre>
<br />References:
<ul>
<li><a href="http://mail.python.org/pipermail/python-dev/2003-April/thread.html#34790">http://mail.python.org/pipermail/python-dev/2003-April/thread.html#34790</a></li>
<li><a href="http://www.wayforward.net/pycontract/examples/circbuf.py">http://www.wayforward.net/pycontract/examples/circbuf.py</a></li>
<li><a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/68429">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/68429</a></li>
<li><a href="http://docs.python.org/tut/node7.html#SECTION007120000000000000000">http://docs.python.org/tut/node7.html#SECTION007120000000000000000</a></li>
<li><a href="http://docs.python.org/lib/module-Queue.html">http://docs.python.org/lib/module-Queue.html</a></li>
<li><a href="http://en.wikipedia.org/wiki/Circular_queue">http://en.wikipedia.org/wiki/Circular_queue</a></li>
</ul>
</p>
How to sort a Python dict (dictionary) by keys or values
2007-09-13T16:21:00-07:00https://www.saltycrane.com/blog/2007/09/how-to-sort-python-dictionary-by-keys/<p>
<em>Updated to work with both Python 2 and 3</em>
</p>
<h4 id="by-key">How to sort a dict by key</h4>
<pre class="python">
mydict = {
"carl": 40,
"alan": 2,
"bob": 1,
"danny": 3,
}
for key in sorted(mydict.keys()):
print("%s: %s" % (key, mydict[key]))</pre
>
<p>Results:</p>
<pre>
alan: 2
bob: 1
carl: 40
danny: 3</pre
>
<p>
To sort the keys in reverse, add <code>reverse=True</code> as a keyword
argument to the
<a href="https://docs.python.org/3/library/functions.html#sorted"
><code>sorted</code></a
>
function.
</p>
<h4 id="by-value">How to sort a dict by value</h4>
<pre class="python">
for key, value in sorted(mydict.items(), key=lambda item: item[1]):
print("%s: %s" % (key, value))</pre
>
<p>Results:</p>
<pre>
bob: 1
alan: 2
danny: 3
carl: 40</pre
>
<p>
Originally taken from Nick Galbreath's Digital Sanitation Engineering blog
article
</p>
<h4 id="see-also">See also</h4>
<ul>
<li>
<a href="https://docs.python.org/3/library/functions.html#sorted">
<code>sorted</code> built-in function documentation
</a>
</li>
<li>
<a href="https://docs.python.org/3/howto/sorting.html">
Sorting HOW TO documentation
</a>
</li>
<li>
<a
href="https://docs.python.org/3/library/collections.html#collections.OrderedDict"
>
<code>collections.OrderedDict</code> documentation
</a>
</li>
<li>
<a
href="http://writeonly.wordpress.com/2008/08/30/sorting-dictionaries-by-value-in-python-improved/"
>Sorting Dictionaries by Value in Python (improved?)</a
>
by Gregg Lind August 30, 2008
</li>
</ul>