=====================================
Processing async vs concurrent.future
=====================================

Usually, processing are Synchronizing and you could only get all result after all of them finished. What if you want to get each value once it finished. Basically, there are two approcaches and they all involve a concept Future.

If you dont' understand what is Future, check Future section of this tutorial.


Examples:

Diff1: Both have callback mechanisms:
-------------------------------------

.. code:: python
  
  from multiprocessing import Pool

  def work_process(i):
    return i * i

  def print_result(i):
    return i

  def main():
    with Pool(4) as pool:
      for i in range(10):
        pool.apply_async(work_process, args=(i,), callback = print_result)

  if __name__ == "__main__":
    main()


.. code:: python

  from concurrent.futures import ProcessPoolExecutor

  def worker_process(i):
    return i * i

  def print_result(future):
    print(future.result())

  def main():
    with ProcessPoolExecutor(4) as ppe:
      futures = [ppe.submit(worker_process, i) for i in range(10)]
      for future in futures:
        future.add_done_callback(process_result)

  if __name__ == "__main__":
    main()

Notice in ``concurrent.futures`` example, you need pass *future* into callback instead of result. 
Diff2: Get result as complete
-----------------------------

You could use ``as_completed`` function, as guarantee to make sure future has been completed, process result as it complete.

.. code:: python

  from concurrent.futures import ProcessPoolExecutor

  def work_process(i):
    return i * i

  def main():
    with ProcessPoolExecutor(4) as ppe:
      futures = [ppe.submit(worker_process, i) for i in range(10)]
        for future in concurrent.futures.as_completed(futures):
          print(future.result())

  if __name__ == '__main__':
    main()

.. code:: python

  import multiprocessing as mp

  def worker_process(i):
      return i * i # square the argument

  def compute_chunksize(pool_size, iterable_size):
      if iterable_size == 0:
          return 0
      chunksize, extra = divmod(iterable_size, pool_size * 4)
      if extra:
          chunksize += 1
      return chunksize

  def main():
      cpu_count = mp.cpu_count()
      N = 100
      chunksize = compute_chunksize(cpu_count, N)
      with mp.Pool() as pool:
          for result in pool.imap_unordered(worker_process, range(N), chunksize=chunksize):
              print(result)

  if __name__ == '__main__':
      main()


Diff3: Get incompleted result
-----------------------------

.. code:: python

  from concurrent.futures import ProcessPoolExecutor, TimeoutError
  from time import sleep


  def worker_1():
      while True:
          print('hanging')
          sleep(1)


  def main():
      with ProcessPoolExecutor(1) as pool:
          future = pool.submit(worker_1)
          try:
              future.result(3) # kill task after 3 seconds?
          except TimeoutError:
              print('timeout')

  if __name__ == '__main__':
      main()
      print("return from main()")

.. code:: python

  from multiprocessing import Pool, TimeoutError
  from time import sleep


  def worker_1():
      while True:
          print('hanging')
          sleep(1)

  def main():
      with Pool(1) as pool:
          result = pool.apply_async(worker_1, args=())
          try:
              result.get(3) # kill task after 3 seconds?
          except TimeoutError:
              print('timeout')


  if __name__ == '__main__':
      main()
      print("return from main()")

This time, however, even though the timed-out task is still continuing to run and is typing up the process, the with block is not prevented from exiting and thus the program terminates normally. The reason for this is that the context manager for the Pool instance will execute a call to terminate when the block exits and this results in the immediate termination of all processes in the pool. This is contrasted with the context handler for the ProcessPoolExecutor instance, which executes a call to shutdown(wait=True) to await the termination of all processes in the pool when the block it governs exits. The advantage would seem to go to multiprocessing.Pool if you are using context handlers to handle pool termination and the possibility of a timeout exists.


Diff4: Block call:
------------------

.. code:: python

  import multiprocessing as mp

  def worker_process(i):
      return i * i # square the argument

  def main():
      with mp.Pool() as pool:
          print(pool.apply(worker_process, args=(6,)))
          print(pool.apply(worker_process, args=(4,)))

  if __name__ == '__main__':
      main()


The multiprocessing.Pool class has a method apply that submits a task to the pool and blocks until the result is ready. The return value is just the return value from the worker function passed to the apply function.
The concurrent.futures.ThreadPoolExecutor class has no such equivalent. You have to issue a submit and then a call to result against the returned Future instance. It's not a hardship to have to do this, but the pool.apply method is more convenient for the use case where a blocking task submission is appropriate.


Diff5: tie the return value back to the original passed argument
----------------------------------------------------------------

.. code:: python

  import concurrent.futures

  def worker_process(i):
      return i * i # square the argument

  def main():
      with concurrent.futures.ProcessPoolExecutor() as executor:
          futures = {executor.submit(worker_process, i): i for i in range(10)}
          for future in concurrent.futures.as_completed(futures):
              i = futures[future] # retrieve the value that was squared
              print(i, future.result())

  if __name__ == '__main__':
      main()

But with multiprocessing, there is no way to easily tie a result with a submitted job unless the worker process returned the original call arguments along with the return value.