Python 多进程、多线程效率对比
Python界有条不成文的准则:计算密集型任务适合多进程,IO密集型任务适合多线程。本篇来作个比较。
通常来说多线程相对于多进程有优势,因为创建一个进程开销比较大,然而因为在python中有GIL这把大锁的存在,导致执行计算密集型任务时多线程实际只能是单线程。而且由于线程之间切换的开销导致多线程往往比实际的单线程还要慢,所以在python中计算密集型任务通常使用多进程,因为各个进程有各自独立的GIL,互不干扰。
而在IO密集型任务中,CPU时常处于等待状态,操作系统需要频繁与外界环境进行交互,如读写文件,在网络间通信等。在这期间GIL会被释放,因而就可以使用真正的多线程。
以上是理论,下面做一个简单的模拟测试:大量计算用math.sin()+math.cos()来代替,IO密集型用time.sleep()来模拟。在Python中有多种方式可以实现多进程和多线程,这里一并纳入看看是否有效率差异:
- 多进程:joblib.multiprocessing,multiprocessing.Pool,multiprocessing.apply_async,concurrent.futures.ProcessPoolExecutor
- 多线程:joblib.threading,threading.Thread,concurrent.futures.ThreadPoolExecutor
frommultiprocessingimportPool
fromthreadingimportThread
fromconcurrent.futuresimportThreadPoolExecutor,ProcessPoolExecutor
importtime,os,math
fromjoblibimportParallel,delayed,parallel_backend
deff_IO(a):#IO密集型
time.sleep(5)
deff_compute(a):#计算密集型
for_inrange(int(1e7)):
math.sin(40)+math.cos(40)
return
defnormal(sub_f):
foriinrange(6):
sub_f(i)
return
defjoblib_process(sub_f):
withparallel_backend("multiprocessing",n_jobs=6):
res=Parallel()(delayed(sub_f)(j)forjinrange(6))
return
defjoblib_thread(sub_f):
withparallel_backend('threading',n_jobs=6):
res=Parallel()(delayed(sub_f)(j)forjinrange(6))
return
defmp(sub_f):
withPool(processes=6)asp:
res=p.map(sub_f,list(range(6)))
return
defasy(sub_f):
withPool(processes=6)asp:
result=[]
forjinrange(6):
a=p.apply_async(sub_f,args=(j,))
result.append(a)
res=[j.get()forjinresult]
defthread(sub_f):
threads=[]
forjinrange(6):
t=Thread(target=sub_f,args=(j,))
threads.append(t)
t.start()
fortinthreads:
t.join()
defthread_pool(sub_f):
withThreadPoolExecutor(max_workers=6)asexecutor:
res=[executor.submit(sub_f,j)forjinrange(6)]
defprocess_pool(sub_f):
withProcessPoolExecutor(max_workers=6)asexecutor:
res=executor.map(sub_f,list(range(6)))
defshowtime(f,sub_f,name):
start_time=time.time()
f(sub_f)
print("{}time:{:.4f}s".format(name,time.time()-start_time))
defmain(sub_f):
showtime(normal,sub_f,"normal")
print()
print("------多进程------")
showtime(joblib_process,sub_f,"joblibmultiprocess")
showtime(mp,sub_f,"pool")
showtime(asy,sub_f,"async")
showtime(process_pool,sub_f,"process_pool")
print()
print("-----多线程-----")
showtime(joblib_thread,sub_f,"joblibthread")
showtime(thread,sub_f,"thread")
showtime(thread_pool,sub_f,"thread_pool")
if__name__=="__main__":
print("-----计算密集型-----")
sub_f=f_compute
main(sub_f)
print()
print("-----IO密集型-----")
sub_f=f_IO
main(sub_f)
结果:
-----计算密集型----- normaltime:15.1212s ------多进程------ joblibmultiprocesstime:8.2421s pooltime:8.5439s asynctime:8.3229s process_pooltime:8.1722s -----多线程----- joblibthreadtime:21.5191s threadtime:21.3865s thread_pooltime:22.5104s -----IO密集型----- normaltime:30.0305s ------多进程------ joblibmultiprocesstime:5.0345s pooltime:5.0188s asynctime:5.0256s process_pooltime:5.0263s -----多线程----- joblibthreadtime:5.0142s threadtime:5.0055s thread_pooltime:5.0064s
上面每一方法都统一创建6个进程/线程,结果是计算密集型任务中速度:多进程>单进程/线程>多线程,IO密集型任务速度:多线程>多进程>单进程/线程。
以上就是Python多进程、多线程效率比较的详细内容,更多关于Python多进程、多线程的资料请关注毛票票其它相关文章!