Python shennanigans.

The following shennanigans were collected during my work on creating a system integration library and framework for hardware and software tests including developer tasks like worktree management, building, target deployment and configuration. Therefore it should be representative for things one might want to do in absence of better tooling. The used Python version was 3.8.2, but most problems still persist.

My tldr; retrospection upfront, which you may be able to reproduce once you try to code long-running services, which recover from all failures, cleanly reset state or generally try to debug spurious problems. Please don’t.

Python on itself is great for prototyping up to a few hundred lines of code, like to quickly receive or send some json over tcp/html. However, it is unfeasible to scale, for example to use as library code. Changes in a leaf function can add exceptions to higher level code paths and handling those via exceptions for user friendly error messages, for example to collect context (information along multiple functions, for example from different combination of traversal) becomes unreasonably verbose and error prone. The alternative is to use C-like error handling, which requires to figure out all possible exceptions of Python libstd methods, which language servers do not support (as of 2024-04-04).

Aside of these more fundamental limitations, here the list of shennanigans I have run into:

  • xml.dom.minidom breaks space and newlines. Use ElementTree.
  • .strip() is necessary after file read, because Python automatically adds \n and there is no way to read without newlines into a list.
  • Testing for subdictionaries with dict is unreadable, so such a method is missing
    def is_subdict(small: dict, big: dict) -> bool:
        """
        Test, if 'small' is subdict of 'big'
        Example: big = {'pl' : 'key1': {'key2': 'value2'}}
        Then small = {'pl' : 'key1': {'key2': 'value2'}, 'otherkey'..} matches,
        but small = {'pl' : 'key1': {'key2': 'value2', 'otherkey'..}}
        or small = {'pl' : 'key1': {'key2': {'value2', 'otherkey'..}}} not.
        """
        # since python3.9:
        # return big | small == big
        # also:
        # return {**big, **small} == big
        return dict(big, **small) == big
    missing_dict_methods1.py
  • dict has no method to check, if the fields of a dictionary are in another dictionary
    def has_fieldsvals(small: dict, big: dict) -> bool:
        """
        Test, if 'small' has all values of of 'big'
        Example: big = {'pl' : 'key1': {'key2': 'value2'}}
        Then small = {'pl' : 'key1': {'key2': 'value2'}, 'otherkey'..} matches,
        small = {'pl' : 'key1': {'key2': 'value2', 'otherkey'..}} matches,
        and small = {'pl' : 'key1': {'key2': {'value2', 'otherkey'..}}} matches.
        """
        for key, value in small.items():
            if key in big:
                if isinstance(small[key], dict):
                    if not has_fieldsvals(small[key], big[key]):
                        return False
                    else:
                        return True
                elif value != big[key]:
                    return False
                else:
                    return True
            else:
                return False
        return True
    missing_dict_methods2.py
  • dict has no method to check, if all fields and values of a dictionary are in another dictionary
    import copy
    from typing import Optional, List
    
    
    def merge_dicts(alpha: dict = {}, beta: dict = {}) -> dict:
      """
      Recursive merge dicts. Not multi-threading safe.
      """
      return _merge_dicts_aux(alpha, beta, copy.copy(alpha))
    
    
    def _merge_dicts_aux(alpha: dict = {}, beta: dict = {}, result: dict = {}, path: Optional[List[str]] = None) -> dict:
      if path is None:
        path = []
      for key in beta:
        if key not in alpha:
          result[key] = beta[key]
        else:
          if isinstance(alpha[key], dict) and isinstance(beta[key], dict):
            # key value is dict in A and B => merge the dicts
            _merge_dicts_aux(alpha[key], beta[key], result[key], path + [str(key)])
          elif alpha[key] == beta[key]:
            # key value is same in A and B => ignore
            pass
          else:
            # key value differs in A and B => raise error
            err: str = f"Conflict at {'.'.join(path + [str(key)])}"
            raise Exception(err)
      return result
    
    missing_dict_methods3.py
  • Tuples and dicts are annoying to differentiate
    # dictionary
    dict1 = {"m1": "cp", "m2": "cp"}
    # tuple
    tup1 = ({"m1": "cp", "m2": "cp"},)
    
    # at least getting the intention correct, but python is still unhelpful with error message
    dict2 = dict({"m1": "cp", "m2": "cp"})
    # tuple
    tup2 = (tuple({"m1": "cp", "m2": "cp"}),)
    
    tup_and_dicts.py
  • Stack trace formatting is inefficient and one can not use gf or gF vim shortcuts to jump to location function to write status + trace to variable.
    import traceback
    
    
    def getStackTrace() -> str:
      return repr(traceback.format_stack())
    stacktrace_fmt.py
  • Mixed double quote (") and single quote (') strings are invalid json
    #!/usr/bin/env python
    import json
    
    # Dict -> str is inconsistent to json -> str, so workaround with
    # dict_asjson_lower = str(dict1).replace("'", '"')
    def combineDictsFromStr() -> None:
      dict1 = {"t1": "val1", "t2arr": [{"t2_int": 0, "t2_str": "12.0"}], "t3int": 30}
      dict1_str_raw = str(dict1)
      dict1_str = dict1_str_raw.replace("'", '"')
      dict2_str = '{"anotherone":"yes", '
      dict2_str += '"t3int":30,"t4str":'
      dict2_str += dict1_str + "}"
      dict2 = json.loads(dict2_str)
      _ = dict2
    
    invalid_json.py
  • os.kill() does not call registered cleanup function atexit.register(exit_cleanup) by daemonized threads. Must store pids of child processes and clean them explicitly or signal main thread via
    def signalMainThread(self) -> None:
      pass
      # before Python 3.10: _thread.interrupt_main()
      # since Python 3.10: _thread.interrupt_main(signum=signal.SIGKILL)
    signal_main_thread.py
  • Socket timeout can cause file-like readline() method to lose data, workaround
    1. Read from Kernel structure and append chunk-wise to buffer from socket until stop event (via select).
    2. After each read, try to line a line from the buffer and remove the line on success (being utf-8 etc).
    3. On failure of reading line, continue with 1.
    4. Teardown should read socket until being empty, if a stop was obtained.
  • Generic module annotation is not allowed and mypy has no explicit docs for this. The following does not work and module: object is the closest we can get as simple annotation.
    # using 'module: ModuleType' via specifying ModuleType as set of types not possible
    def check_fn(module: object) -> int:
      if str(type(module)) != "module":
        return 1
      return 0
    
    module_annotation.py
  • There are no scheduling and watchdog methods, which makes Python thread scheduling very unreliable. Unlucky schedules may cause fatal delay for shuffling data between daemon thread and main thread. As example, an application using 1 main thread and 2 daemon threads may cause the relevant daemon thread not being scheduled for 2 seconds. Empirically 3 seconds work.
  • Trailing comma in dictionary or json.dumps generated string has silent failures, for example on parsing the output as json via php.