CCTBX tools:
- I. Parallelizing Python code
- II. Analysis of unmerged intensities
CCTBX tools: I. Parallelizing Python code II. Analysis of unmerged - - PowerPoint PPT Presentation
CCTBX tools: I. Parallelizing Python code II. Analysis of unmerged intensities Nathaniel Echols DIALS workshop 3, February 2013 http://cci.lbl.gov/~nat/slides/dials_feb_2013.pdf Parallelization methods in CCTBX Multiprocessing : our tool
* pickle = Python object serialization format, represents objects as binary strings
* http://docs.python.org/2/library/multiprocessing.html
from multiprocessing import Process, Queue def f(q): q.put([42, None, 'hello']) if __name__ == '__main__': q = Queue() p = Process(target=f, args=(q,)) p.start() print q.get() # prints "[42, None, 'hello']" p.join()
advantage of internal fork() calls on Unix-like systems
avoiding pickling
most of the data (not passed as arguments!)
be in shared memory as long as they are not modified
start entirely new Python interpreter processes
class optimize_xyz_refinement_weight (object) : def __init__ (self, model, fmodel, params,
self.model = model self.fmodel = fmodel self.params = params self.trial_results = [] for weight in [0.1, 0.25, 0.5, 1.0, 2.0, 5.0] : self.trial_results.append(self.try_weight(weight)) def try_weight (self, weight) : # function defined elsewhere; modifies objects in place
minimize_coordinates( model=self.model, fmodel=self.fmodel, weight=weight, log=out) sites_cart = self.fmodel.xray_structure.sites_cart() return (self.fmodel.r_free(), weight, sites_cart)
class optimize_xyz_refinement_weight (object) : def __init__ (self, model, fmodel, params,
self.model = model self.fmodel = fmodel self.params = params self.trial_results = libtbx.easy_mp.pool_map( fixed_func=self.try_weight, args=[0.1, 0.25, 0.5, 1.0, 2.0, 5.0], nproc=nproc) def try_weight (self, weight) : ...
class phaser_manager (object) : def __init__ (self, data_file) : self.data_file = data_file def __call__ (self, model) : # the actual implementation is elsewhere return run_phaser(self.data_file, model) def run_all (data_file, models, method=”multiprocessing”, processes=1, qsub_command=None, callback=None) : phaser = phaser_manager(data_file) from libtbx.easy_mp import parallel_map return parallel_map( func=phaser, iterable=models, method=method, processes=processes, callback=callback, qsub_command=qsub_command)
* parallel_map does not have this limitation, but pool_map currently does - we will probably fix this in the near future
* http://www.phenix-online.org/newsletter
appears considerably more complex than simple queuing systems
Statistics by resolution bin: d_max d_min #obs #uniq mult. %comp <I> <I/sI> r_mrg r_meas r_pim cc1/2 28.53 3.77 15699 2254 6.96 99.87 78997.8 23.4 0.061 0.066 0.025 0.997 3.77 2.99 15703 2182 7.20 99.95 47400.1 23.1 0.061 0.066 0.024 0.997 2.99 2.61 15641 2172 7.20 100.00 17930.9 21.1 0.074 0.080 0.030 0.996 2.61 2.37 15309 2138 7.16 100.00 10520.1 18.6 0.090 0.097 0.036 0.995 2.37 2.21 15044 2146 7.01 99.95 9103.8 17.2 0.093 0.101 0.038 0.995 2.20 2.07 14571 2145 6.79 100.00 6560.2 13.5 0.108 0.117 0.045 0.993 2.07 1.97 13973 2135 6.54 100.00 5016.1 10.8 0.121 0.131 0.051 0.992 1.97 1.89 13540 2141 6.32 100.00 3620.6 8.6 0.145 0.158 0.062 0.984 1.88 1.81 13010 2104 6.18 99.95 2070.5 6.8 0.197 0.215 0.085 0.980 1.81 1.75 12963 2140 6.06 99.49 1477.4 5.6 0.247 0.270 0.108 0.970 28.53 1.75 145453 21557 6.75 99.92 18672.0 14.9 0.073 0.079 0.030 0.998