• Stars
    star
    210
  • Rank 187,585 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 12 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A mutable set that remembers the order of its entries. One of Python's missing data types.

Pypi

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Installation

ordered_set is available on PyPI and packaged as a wheel. You can list it as a dependency of your project, in whatever form that takes.

To install it into your current Python environment:

pip install ordered-set

To install the code for development, after checking out the repository:

pip install flit
flit install

Usage examples

An OrderedSet is created and used like a set:

>>> from ordered_set import OrderedSet

>>> letters = OrderedSet('abracadabra')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd'])

>>> 'r' in letters
True

It is efficient to find the index of an entry in an OrderedSet, or find an entry by its index. To help with this use case, the .add() method returns the index of the added item, whether it was already in the set or not.

>>> letters.index('r')
2

>>> letters[2]
'r'

>>> letters.add('r')
2

>>> letters.add('x')
5

OrderedSets implement the union (|), intersection (&), and difference (-) operators like sets do.

>>> letters |= OrderedSet('shazam')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd', 'x', 's', 'h', 'z', 'm'])

>>> letters & set('aeiou')
OrderedSet(['a'])

>>> letters -= 'abcd'

>>> letters
OrderedSet(['r', 'x', 's', 'h', 'z', 'm'])

The __getitem__() and index() methods have been extended to accept any iterable except a string, returning a list, to perform NumPy-like "fancy indexing".

>>> letters = OrderedSet('abracadabra')

>>> letters[[0, 2, 3]]
['a', 'r', 'c']

>>> letters.index(['a', 'r', 'c'])
[0, 2, 3]

OrderedSet implements __getstate__ and __setstate__ so it can be pickled, and implements the abstract base classes collections.MutableSet and collections.Sequence.

OrderedSet can be used as a generic collection type, similar to the collections in the typing module like List, Dict, and Set. For example, you can annotate a variable as having the type OrderedSet[str] or OrderedSet[Tuple[int, str]].

OrderedSet in data science applications

An OrderedSet can be used as a bi-directional mapping between a sparse vocabulary and dense index numbers. As of version 3.1, it accepts NumPy arrays of index numbers as well as lists.

This combination of features makes OrderedSet a simple implementation of many of the things that pandas.Index is used for, and many of its operations are faster than the equivalent pandas operations.

For further compatibility with pandas.Index, get_loc (the pandas method for looking up a single index) and get_indexer (the pandas method for fancy indexing in reverse) are both aliases for index (which handles both cases in OrderedSet).

Authors

OrderedSet was implemented by Elia Robyn Lake (maiden name: Robyn Speer). Jon Crall contributed changes and tests to make it fit the Python set API. Roman Inflianskas added the original type annotations.

Comparisons

The original implementation of OrderedSet was a recipe posted to ActiveState Recipes by Raymond Hettiger, released under the MIT license.

Hettiger's implementation kept its content in a doubly-linked list referenced by a dict. As a result, looking up an item by its index was an O(N) operation, while deletion was O(1).

This version makes different trade-offs for the sake of efficient lookups. Its content is a standard Python list instead of a doubly-linked list. This provides O(1) lookups by index at the expense of O(N) deletion, as well as slightly faster iteration.

In Python 3.6 and later, the built-in dict type is inherently ordered. If you ignore the dictionary values, that also gives you a simple ordered set, with fast O(1) insertion, deletion, iteration and membership testing. However, dict does not provide the list-like random access features of OrderedSet. You would have to convert it to a list in O(N) to look up the index of an entry or look up an entry by its index.

More Repositories

1

python-ftfy

Fixes mojibake and other glitches in Unicode text, after the fact.
Python
3,747
star
2

wordfreq

Access a database of word frequencies, in various natural languages.
Python
698
star
3

langcodes

A Python library for working with and comparing language codes.
Python
340
star
4

wiki2text

Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
Nim
132
star
5

dominiate

A simulator for Dominion card game strategies
JavaScript
120
star
6

text-as-data

A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.
Python
50
star
7

wikiparsec

An LL parser for extracting information from Wiki text, particularly Wiktionary.
Haskell
48
star
8

solvertools

Mystery Hunt solving tools for Metropolitan Rage Warehouse. Or anyone really.
JavaScript
32
star
9

scholar.hasfailed.us

Google Scholar is a trans-exclusionary site. Don't use it. Help us demand change.
HTML
22
star
10

dominiate-python

A Python implementation of the card game Dominion
Python
15
star
11

openmind-commons

The dynamic Web site that lets people browse and contribute to Open Mind Common Sense and ConceptNet.
JavaScript
11
star
12

dominionstats

The code behind councilroom.com.
JavaScript
11
star
13

csc-pysparse

A fast sparse matrix library for Python (Commonsense Computing version)
C
10
star
14

music-decomp

Associating music/sound and semantics
Python
10
star
15

mixmaster

Smarter than the average anagrammer.
Python
9
star
16

language_data

An optional supplement to `langcodes` that stores names and statistics of languages.
Python
7
star
17

scorepile

A repository of Innovation games played on Isotropic
JavaScript
6
star
18

solvertools-2014

Julia
4
star
19

adventure

Common sense experiments for working with text adventures.
Python
4
star
20

charcol

An experiment to collect unusual characters from Twitter.
Python
4
star
21

verb-aspect-learning

A hierarchical Bayesian model of biases in how people learn novel verbs
3
star
22

dominion-rank

Calculate ranks based on people's play on dominion.isomorphic.org.
Python
3
star
23

countmerge

A command-line tool that adds counts for sorted keys.
Rust
3
star
24

svdview

A Processing viewer for the results of dimensionality reduction.
Java
3
star
25

spacious_corpus

A corpus build process for use with SpaCy projects
Python
3
star
26

colorizer

JavaScript
2
star
27

analogy_farm

A Web-based puzzle from MIT Mystery Hunt 2013.
Python
2
star
28

irepad

An IRE PROOF collaborative editor, built on FirePad.
JavaScript
2
star
29

rust-nlp-tools

Rust
2
star
30

rspeer-web

My personal Web site.
JavaScript
2
star
31

rspeer.github.io

rspeer's Octopress site
TeX
1
star