=======
Process
=======

Pool
----

Pool is the most common way to in using multiprocessing. Here is a example:

.. code:: python

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      with Pool(5) as p:
          print(p.map(f, [1, 2, 3]))

There are several things worth to mention:

* ``Pool()`` return a context manager. This is the most usual way to use ``Pool``. ``__enter__()`` return a Pool object and ``__exit__()`` calls ``terminate()`` method.
* ``Pool()`` uses a method ``map`` to achieve multiprocessing map.
* You need use ``if __name__ == '__main__':`` to protect you main multiprocessings logic. There are more details about this later.

Process
-------

Process is the raw approach to spawn a process.

.. code:: python

  from multiprocessing import Process

  def f(name):
      print('hello', name)

  if __name__ == '__main__':
      p = Process(target=f, args=('bob',))
      p.start()
      p.join()

Clearly, you need use ``start()`` to start a process and ``join()`` make process in which ``join()`` get called wait caller finish, e.g. main process need to wait ``p`` finish here.

join() and daemon

If in your main process, you call:
  ``p.join()`` make main process wait ``p`` finished.
  ``p.daemon = True``: if main process is closed, also go to close daemoned subprocess ``p``. 

Why if __name__ == '__main__':
------------------------------

Python multiprocessing supports two basic ways to start a process: ``spawn`` and ``fork``. Their relationship is sample:

.. code:: python

  spawn = fork + execve

**fork:**

  * The child process, when it begins, is effectively identical to the parent process.
  * All resources of the parent are independent and identical in child process.

**spawn:**

  * Basically, it's rerun code including import module at start of each child process	
  * run ``execve(path)`` which construct a new process based on path. ``path`` is copied pickable process. 
  * So spawn method need lots of things pickable.
  * import module at start of each child process, including ``__main__`` module since you need function ``f`` in your new process. That's the main reason why you need ``if __name__ == '__main__':`` to prevent resursive.
  * Therefore, child process DON'T gets variables defined in name == main block	
  * Extra ``execve`` call make this operation heavy.

.. note::
  
  ``int execve(const char *pathname, char *const argv[], char *const envp[])``

  execve() executes the program referred to by pathname.  This
  causes the program that is currently being run by the calling
  process to be replaced with a new program, with newly initialized
  stack, heap, and (initialized and uninitialized) data segments.


.. Caution::

  Neither method copies running threads into the child processes. So multithreaded process isn't friendly.

Process Communication
---------------------
    
Queue
^^^^^

.. code:: python

  from multiprocessing import Process, Queue

  def f(q):
      q.put([42, None, 'hello'])

  if __name__ == '__main__':
      q = Queue()
      p = Process(target=f, args=(q,))
      p.start()
      print(q.get())    # prints "[42, None, 'hello']"
      p.join()


Pipe
^^^^

.. code:: python

  out, in = Pipe()

You could think ``out`` and ``in`` as two physical location in memory. You build a bridge between them. So if there is a copy ``out_cp`` and ``in_cp``, you still could use ``in.send()`` and get by ``out_cp.rec()``.

Here is a example:

.. code:: python

  from multiprocessing import Process, Pipe
  import time

  def reader_proc(pipe):
      ## Read from the pipe; this will be spawned as a separate Process
      p_output, p_input = pipe
      p_input.close()    # We are only reading
      while True:
          msg = p_output.recv()    # Read from the output pipe and do nothing
          if msg=='DONE':
              break

  def writer(count, p_input):
      for ii in range(0, count):
          p_input.send(ii)             # Write 'count' numbers into the input pipe
      p_input.send('DONE')

  if __name__=='__main__':
      for count in [10**4, 10**5, 10**6]:
          # Pipes are unidirectional with two endpoints:  p_input ------> p_output
          p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
          reader_p = Process(target=reader_proc, args=((p_output, p_input),))
          reader_p.daemon = True
          reader_p.start()     # Launch the reader process

          p_output.close()       # We no longer need this part of the Pipe()
          _start = time.time()
          writer(count, p_input) # Send a lot of stuff to reader_proc()
          p_input.close()
          reader_p.join()
          print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
              (time.time() - _start)))

In this example, we put pipe as args while construct Process. You may notice p_output is closed by main process but in ``reader_proc``, p_output.recv() is called. If we print the id of those two ``p_output``, you will find they are different. However, p_input in ``writer`` still could send to a copied ``p_output``. 
Put pipe as args of Process will make subprocess have a builtin communication bridge. It's pretty common to see pipe or one end appearing in args of Process.
The object sent must be picklable. Very large pickles (approximately 32 MiB+, though it depends on the OS) may raise a ValueError exception.

Synchronization
---------------

By Lock

.. code:: python

  from multiprocessing import Process, Lock
  from time import sleep

  def f(l, i):
      l.acquire()
      try:
          print('hello world', i)
          sleep(3)
      finally:
          l.release()

  if __name__ == '__main__':
      lock = Lock()

      for num in range(3):
          p = Process(target=f, args=(lock, num))
          p.start()
          p.join()

This example basically is a serial running one by one. ``p.join()`` make main process with ``p`` finish``.