dict-like interfaces for various databases
If you search on pypi, you will find a few competing projects: pdicts, durabledicts, etc. The key differences between persistentdicts and those are:
- Implementations of persistentdicts do not keep a local cache dictionary: changes are immediately written to database. Iterators are also proceeding directly on the database. It allows to interact with datasets that would not fit in RAM otherwise.
- The test suite is significantly bigger than in all the other implementation I have seen. When relevant, tests have been backported from the CPython test suite of dict.
- Serialization is done in json rather than using pickle.
You can't modify a value of the dictionary in place. For example:
>>> import persistentdicts >>> d = persistentdicts.sqlitedict.SqliteDict() >>> d["a"] = [] >>> d["a"].append(1) >>> d["a"] # with a normal dict, you would get [1] []
That is because d["a"]
returns a copy of the database entry for the
key "a"
, and not a reference to a python object. Modifying this copy
(with append
) does not affect the database itself.
To circumvent this, you should do:
>>> import persistentdicts >>> d = persistentdicts.sqlitedict.SqliteDict() >>> d["a"] = [] >>> d["a"] = d["a"] + [1] >>> d["a"] [1]
Similarly, setdefault
will not work as expected since it does not
return a reference to the stored value, but a copy of this value.
>>> import persistentdicts >>> d = persistentdicts.sqlitedict.SqliteDict() >>> d.setdefault("a", []).append(1) >>> d["a"] []
Done:
- kyotocabinet
- sqlite
- cassandra
Planned:
- leveldb
- redis
- memcachedb
- lightcloud
You can request new formats on the bug tracker.
persistentdicts.sqlitedict.SqliteDict(path=":memory:", table="dict", isolation_level="DEFERRED", *args, **kwargs)
path
is the path to the file where you wish to store the datadict
is the table to use in this fileisolation_level
is the isolation level used for all transactions. See the sqlite documentation for more details.- the remaining arguments
*args, **kwargs
are used to fill the dictionary (like a normaldict
)
persistentdicts.kyotocabinetdict.KyotoCabinetDict(path, *args, **kwargs)
path
is the path to the file where you wish to store the data. The file extension matters and will determine which format is going to be used internally (must be one of .kch, .kct, .kcd, .kcf or .kcx). See the kyotocabinet documentation for more details.- the remaining arguments
*args, **kwargs
are used to fill the dictionary (like a normaldict
)
persistentdicts.cassandradict.CassandraDict(contact_points=("127.0.0.1",), port=9042, keyspace="dict", table="dict", *args, **kwargs)
contact_points
is an initial list of ip addresses which are part of the Cassandra cluster. The Cassandra driver will automatically discover the rest of the cluster.port
is the port on which Cassandra runs.keyspace
is the keyspace used to store the data. This keyspace will be deleted if the method.delete()
is called on the CassandraDicttable
is the name of the table used to store the data.- the remaining arguments
*args, **kwargs
are used to fill the dictionary (like a normaldict
)