lempel_ziv_complexity module
LempelZiv complexity for a binary sequence, in naive Python code.
 How to use it? From Python, it's easy:
from lempel_ziv_complexity import lempel_ziv_complexity Info: numba.jit seems to be available. s = '1001111011000010' lempel_ziv_complexity(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 8

Note: there is also a Cythonpowered version, for speedup, see :download:
lempel_ziv_complexity_cython.pyx
. 
MIT Licensed, (C) 20172019 Lilian Besson (Naereen) https://GitHub.com/Naereen/LempelZiv_Complexity
# * coding: utf8 * """LempelZiv complexity for a binary sequence, in naive Python code.  How to use it? From Python, it's easy: >>> from lempel_ziv_complexity import lempel_ziv_complexity Info: numba.jit seems to be available. >>> s = '1001111011000010' >>> lempel_ziv_complexity(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 8  Note: there is also a Cythonpowered version, for speedup, see :download:`lempel_ziv_complexity_cython.pyx`.  MIT Licensed, (C) 20172019 Lilian Besson (Naereen) https://GitHub.com/Naereen/LempelZiv_Complexity """ from __future__ import print_function __author__ = "Lilian Besson (Naereen)" __version__ = "0.2" from collections import OrderedDict def lempel_ziv_decomposition(sequence): r""" Manual implementation of the LempelZiv decomposition. It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example: >>> s = '1001111011000010' >>> lempel_ziv_decomposition(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 ['1', '0', '01', '11', '10', '110', '00', '010'] Marking in the different substrings the sequence complexity :math:`\mathrm{LempelZiv}(s) = 8`: :math:`s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010`.  See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details. Other examples: >>> lempel_ziv_decomposition('1010101010101010') ['1', '0', '10', '101', '01', '010', '1010'] >>> lempel_ziv_decomposition('1001111011000010000010') ['1', '0', '01', '11', '10', '110', '00', '010', '000'] >>> lempel_ziv_decomposition('100111101100001000001010') ['1', '0', '01', '11', '10', '110', '00', '010', '000', '0101']  Note: it is faster to give the sequence as a string of characters, like `'10001001'`, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb  Note: there is also a Cythonpowered version, for speedup, see :download:`lempel_ziv_complexity_cython.pyx`. """ sub_strings = OrderedDict() n = len(sequence) ind = 0 inc = 1 while True: if ind + inc > len(sequence): break sub_str = sequence[ind : ind + inc] # print(sub_str, ind, inc) if sub_str in sub_strings: inc += 1 else: sub_strings[sub_str] = 0 ind += inc inc = 1 # print("Adding", sub_str) return list(sub_strings) def lempel_ziv_complexity(sequence): r""" Manual implementation of the LempelZiv complexity. It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example: >>> s = '1001111011000010' >>> lempel_ziv_complexity(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 8 Marking in the different substrings the sequence complexity :math:`\mathrm{LempelZiv}(s) = 8`: :math:`s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010`.  See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details. Other examples: >>> lempel_ziv_complexity('1010101010101010') # 1, 0, 10, 101, 01, 010, 1010 7 >>> lempel_ziv_complexity('1001111011000010000010') # 1, 0, 01, 11, 10, 110, 00, 010, 000 9 >>> lempel_ziv_complexity('100111101100001000001010') # 1, 0, 01, 11, 10, 110, 00, 010, 000, 0101 10  Note: it is faster to give the sequence as a string of characters, like `'10001001'`, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb  Note: there is also a Cythonpowered version, for speedup, see :download:`lempel_ziv_complexity_cython.pyx`. """ sub_strings = set() n = len(sequence) ind = 0 inc = 1 while True: if ind + inc > len(sequence): break sub_str = sequence[ind : ind + inc] if sub_str in sub_strings: inc += 1 else: sub_strings.add(sub_str) ind += inc inc = 1 return len(sub_strings) #  Debugging if __name__ == "__main__": # Code for debugging purposes. from doctest import testmod print("\nTesting automatically all the docstring written in each functions of this module :") testmod(verbose=True)
Functions
def lempel_ziv_complexity(
sequence)
Manual implementation of the LempelZiv complexity.
It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example:
s = '1001111011000010' lempel_ziv_complexity(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 8
Marking in the different substrings the sequence complexity :math:\mathrm{LempelZiv}(s) = 8
: :math:s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010
.
 See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details.
Other examples:
lempel_ziv_complexity('1010101010101010') # 1, 0, 10, 101, 01, 010, 1010 7 lempel_ziv_complexity('1001111011000010000010') # 1, 0, 01, 11, 10, 110, 00, 010, 000 9 lempel_ziv_complexity('100111101100001000001010') # 1, 0, 01, 11, 10, 110, 00, 010, 000, 0101 10
 Note: it is faster to give the sequence as a string of characters, like
'10001001'
, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb
 Note: there is also a Cythonpowered version, for speedup, see :download:
lempel_ziv_complexity_cython.pyx
.
def lempel_ziv_complexity(sequence): r""" Manual implementation of the LempelZiv complexity. It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example: >>> s = '1001111011000010' >>> lempel_ziv_complexity(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 8 Marking in the different substrings the sequence complexity :math:`\mathrm{LempelZiv}(s) = 8`: :math:`s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010`.  See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details. Other examples: >>> lempel_ziv_complexity('1010101010101010') # 1, 0, 10, 101, 01, 010, 1010 7 >>> lempel_ziv_complexity('1001111011000010000010') # 1, 0, 01, 11, 10, 110, 00, 010, 000 9 >>> lempel_ziv_complexity('100111101100001000001010') # 1, 0, 01, 11, 10, 110, 00, 010, 000, 0101 10  Note: it is faster to give the sequence as a string of characters, like `'10001001'`, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb  Note: there is also a Cythonpowered version, for speedup, see :download:`lempel_ziv_complexity_cython.pyx`. """ sub_strings = set() n = len(sequence) ind = 0 inc = 1 while True: if ind + inc > len(sequence): break sub_str = sequence[ind : ind + inc] if sub_str in sub_strings: inc += 1 else: sub_strings.add(sub_str) ind += inc inc = 1 return len(sub_strings)
def lempel_ziv_decomposition(
sequence)
Manual implementation of the LempelZiv decomposition.
It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example:
s = '1001111011000010' lempel_ziv_decomposition(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 ['1', '0', '01', '11', '10', '110', '00', '010']
Marking in the different substrings the sequence complexity :math:\mathrm{LempelZiv}(s) = 8
: :math:s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010
.
 See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details.
Other examples:
lempel_ziv_decomposition('1010101010101010') ['1', '0', '10', '101', '01', '010', '1010'] lempel_ziv_decomposition('1001111011000010000010') ['1', '0', '01', '11', '10', '110', '00', '010', '000'] lempel_ziv_decomposition('100111101100001000001010') ['1', '0', '01', '11', '10', '110', '00', '010', '000', '0101']
 Note: it is faster to give the sequence as a string of characters, like
'10001001'
, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb
 Note: there is also a Cythonpowered version, for speedup, see :download:
lempel_ziv_complexity_cython.pyx
.
def lempel_ziv_decomposition(sequence): r""" Manual implementation of the LempelZiv decomposition. It is defined as the number of different substrings encountered as the stream is viewed from begining to the end. As an example: >>> s = '1001111011000010' >>> lempel_ziv_decomposition(s) # 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010 ['1', '0', '01', '11', '10', '110', '00', '010'] Marking in the different substrings the sequence complexity :math:`\mathrm{LempelZiv}(s) = 8`: :math:`s = 1 / 0 / 01 / 11 / 10 / 110 / 00 / 010`.  See the page https://en.wikipedia.org/wiki/LempelZiv_complexity for more details. Other examples: >>> lempel_ziv_decomposition('1010101010101010') ['1', '0', '10', '101', '01', '010', '1010'] >>> lempel_ziv_decomposition('1001111011000010000010') ['1', '0', '01', '11', '10', '110', '00', '010', '000'] >>> lempel_ziv_decomposition('100111101100001000001010') ['1', '0', '01', '11', '10', '110', '00', '010', '000', '0101']  Note: it is faster to give the sequence as a string of characters, like `'10001001'`, instead of a list or a numpy array.  Note: see this notebook for more details, comparison, benchmarks and experiments: https://Nbviewer.Jupyter.org/github/Naereen/LempelZiv_Complexity/Short_study_of_the_LempelZiv_complexity.ipynb  Note: there is also a Cythonpowered version, for speedup, see :download:`lempel_ziv_complexity_cython.pyx`. """ sub_strings = OrderedDict() n = len(sequence) ind = 0 inc = 1 while True: if ind + inc > len(sequence): break sub_str = sequence[ind : ind + inc] # print(sub_str, ind, inc) if sub_str in sub_strings: inc += 1 else: sub_strings[sub_str] = 0 ind += inc inc = 1 # print("Adding", sub_str) return list(sub_strings)