Tutorial 5 - Using profiler

This tutorial shows how to use the profiler to measure the performance of your code. We used the same pipeline of the previous tutorial.

The profile is a tool that measures the time spent in each function of your code. It is very useful to identify bottlenecks and to optimize your code. By default it is not enabled and the simplest way to enable the default profiler is to use the register_default_profiler function. It will be shown later, in the notebook.

The profile consists in two steps: 1. Register the profiler and execute the desired pipeline. It will generate different files, which contains traces of the execution, in the current directory. Each file will have different information, depending on the plugins attached. Using the register_default_profiler function, two plugins will be attached, one to measure the dask task time (and other useful information, such as, per task data shape, type, MB, etc.) and a resource usage plugin (which measure GPU, CPU, memory and IO usage). 2. Perform analysis of the generated files. Some analysis are already implemented in the profiler module, but you can also implement your own analysis. For now we have the followig analysis: per function time, per task time and per worker task balance. It will be shown later, in the notebook.

Enabling profiling

[1]:
import numpy as np

from dasf.datasets import make_blobs

n_samples = 100000
n_bins = 3

# Generate 3 blobs with 2 classes where the second blob contains
# half positive samples and half negative samples. Probability in this
# blob is therefore 0.5.
centers = [(-6, -6), (0, 0), (9, 1)]
X, y = make_blobs(n_samples=n_samples, centers=centers, shuffle=False, random_state=42)

np.save("X.npy", X)
np.save("y.npy", y)
[2]:
from dasf.datasets import DatasetArray
from dasf.datasets import DatasetLabeled


class MyMakeBlobs(DatasetLabeled):
    def __init__(self):
        super().__init__(name="My Own make_blobs()", download=False)

        # Let's assign the train and val data.
        self._train = DatasetArray(name="X", download=False, root="X.npy", chunks=(5000, 2))
        self._val = DatasetArray(name="y", download=False, root="y.npy", chunks=(5000))

make_blobs = MyMakeBlobs()
[3]:
from dasf.transforms import Normalize

normalize = Normalize()
[4]:
from dasf.pipeline.executors import DaskPipelineExecutor

dask = DaskPipelineExecutor(local=True, use_gpu=True)
2023-08-15 11:55:48,568 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-08-15 11:55:48,568 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-08-15 11:55:48,568 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-08-15 11:55:48,568 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-08-15 11:55:48,673 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-08-15 11:55:48,673 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-08-15 11:55:48,752 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-08-15 11:55:48,752 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
[5]:
from dasf.ml.cluster import KMeans
from dasf.ml.cluster import SOM

kmeans = KMeans(n_clusters=3, max_iter=100)
som = SOM(x=1, y=3, input_len=2, num_epochs=100)
[6]:
from dasf.transforms import PersistDaskData

persist_kmeans = PersistDaskData()
persist_som = PersistDaskData()
[7]:
from dasf.pipeline import Pipeline

pipeline = Pipeline("A KMeans and SOM Pipeline", executor=dask)

pipeline.add(normalize, X=make_blobs._train) \
        .add(kmeans.fit_predict, X=normalize) \
        .add(som.fit_predict, X=normalize) \
        .add(persist_kmeans, X=kmeans.fit_predict) \
        .add(persist_som, X=som.fit_predict) \
        .visualize()
[7]:
../_images/tutorials_Tutorial_5_8_0.svg

Once you created your pipeline we can attach the profiler to it. The profiler will collect the data and store it in a file (one for each plugin). The file will be stored in the same directory as the notebook. The file name will be controled by the name parameter followed by .msgpack extension. If you do not provide the name parameter it will be default.

The enable_nvtx parameter will enable the NVTX markers. This will allow you to see the pipeline in the NVIDIA Nsight Systems. This is only available on NVIDIA GPUs.

Finally, the add_time_suffix parameter will add the current time to the file name (useful for unique file names).In this way you can run the same pipeline multiple times and get different files. The file name will be name_time.msgpack.

[8]:
from dasf.profile import register_default_profiler

register_default_profiler(
    pipeline,
    name="Tutorial_5_Profile",
    enable_nvtx=False,
    add_time_suffix=False
)
Registered worker plugin: Tutorial_5_Profile-TracePlugin
Registered resource plugin: Tutorial_5_Profile-ResourceMonitor
[9]:
%time pipeline.run()
[2023-08-15 11:55:50-0300] INFO - Beginning pipeline run for 'A KMeans and SOM Pipeline'
[2023-08-15 11:55:50-0300] INFO - Task 'DatasetArray.load': Starting task run...
[2023-08-15 11:55:50-0300] INFO - Task 'DatasetArray.load': Finished task run
[2023-08-15 11:55:50-0300] INFO - Task 'Normalize.transform': Starting task run...
[2023-08-15 11:55:50-0300] INFO - Task 'Normalize.transform': Finished task run
[2023-08-15 11:55:50-0300] INFO - Task 'KMeans.fit_predict': Starting task run...
[2023-08-15 11:55:55-0300] INFO - Task 'KMeans.fit_predict': Finished task run
[2023-08-15 11:55:55-0300] INFO - Task 'SOM.fit_predict': Starting task run...
[2023-08-15 11:56:21-0300] INFO - Task 'SOM.fit_predict': Finished task run
[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Starting task run...
[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Finished task run
[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Starting task run...
[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Finished task run
[2023-08-15 11:56:21-0300] INFO - Pipeline run successfully
CPU times: user 11.8 s, sys: 739 ms, total: 12.5 s
Wall time: 31.2 s

The resource monitor plugin runs indefinitely, until the program exits. As the database are buffered, in order to reduce the IO pressure, the data are flushed to the database only after 5000 entries (or when program exits). We can shutdown the executor, in order to flush the data to the database.

[10]:
dask.shutdown()
2023-08-15 11:56:54,586 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:35871, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.
2023-08-15 11:56:54,587 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:35148, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.
2023-08-15 11:56:54,587 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:38191, but 16 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.
2023-08-15 11:56:54,588 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:39400, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.

Analysing the profile data

Once the code ran, and traceevents were collected, we can perform analysis to understand the performance of the code. The first step is the load the databases. The register_default_profiler attachs two plugins TracePlugin (which trace dask tasks) and ResourceMonitorPlugin (which traces the resource usage of the workers). So there will be two files: Tutorial_5_Profile-ResourceMonitor-c096.msgpack and Tutorial_5_Profile-TracePlugin-c096.msgpack. We can load the events using the EventProfiler class (the same used to register events).

[2]:
from dasf.profile.profiler import EventProfiler
from dasf.profile import MultiEventDatabase

db1 = EventProfiler("Tutorial_5_Profile-ResourceMonitor-c096.msgpack")
db2 = EventProfiler("Tutorial_5_Profile-TracePlugin-c096.msgpack")

db1, db2
[2]:
(EventProfiler(database=FileDatabase at Tutorial_5_Profile-ResourceMonitor-c096.msgpack),
 EventProfiler(database=FileDatabase at Tutorial_5_Profile-TracePlugin-c096.msgpack))

Database are collection of events. Each event is a dataclass and follows the Chrome Trace Event Format. The EventProfiler class is a wrapper around a list of events. It provides methods to add events and to write the database to a file.

We can iterate over the events in a database using the get_traces function. It will return an iterator over the events in the database. The events are returned in the order they were recorded.

[4]:
for event in db1.get_traces():
    print(event)
    break

for event in db2.get_traces():
    print(event)
    break
InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272})
CompleteEvent(name='Compute', timestamp=3837.884328029, duration=0.050518035888671875, phase='X', process_id='c096', thread_id='worker-c096-2', args={'key': 'getGPUs-42237764-1836-4be8-ae62-948ce02e1d6a', 'name': 'tGPUs-42237764-1836-4be8-ae62', 'state': 'memory', 'size': 280, 'shape': [], 'dtype': 'unknown', 'type': "<class 'list'>", 'dependencies': [], 'dependents': []})

The MultiEventDatabase allow use traverse multiple databases at the same time (usually, needed for analyses).

[5]:
database = MultiEventDatabase([db1, db2])
database
[5]:
MultiEventDatabase with 2 databases
[6]:
for event in database:
    print(event)
    break
InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272})

Or generate a list with all events

[7]:
all_events = list(database)
all_events[:10]
[7]:
[InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.018545822, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744729600, 'time': 1692111381.5495014, 'host_net_io.read_bps': 24242915.31342638, 'host_net_io.write_bps': 24415081.283950076, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.118886256, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 69.9, 'memory': 2744729600, 'time': 1692111381.649612, 'host_net_io.read_bps': 30272623.59462569, 'host_net_io.write_bps': 28577096.880356316, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 10024124.945820678, 'num_fds': 189, 'gpu_utilization': 1, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.219099359, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 10.0, 'memory': 2744995840, 'time': 1692111381.7498834, 'host_net_io.read_bps': 3102557.5490633715, 'host_net_io.write_bps': 3103435.1666125604, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 1, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.319312409, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744995840, 'time': 1692111381.850187, 'host_net_io.read_bps': 5343.775123098109, 'host_net_io.write_bps': 5622.927554901742, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.418357631, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744995840, 'time': 1692111381.9493108, 'host_net_io.read_bps': 2703.691829652832, 'host_net_io.write_bps': 3147.58153302867, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.518484708, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 10.0, 'memory': 2744995840, 'time': 1692111382.0494256, 'host_net_io.read_bps': 19927.11012718611, 'host_net_io.write_bps': 19927.11012718611, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),
 InstantEvent(name='Resource Usage', timestamp=3869.620366242, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 60.1, 'memory': 2745241600, 'time': 1692111382.1493354, 'host_net_io.read_bps': 2815376.423511278, 'host_net_io.write_bps': 2813654.866414532, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856}),
 InstantEvent(name='Resource Usage', timestamp=3869.721167103, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 70.0, 'memory': 2745511936, 'time': 1692111382.2492902, 'host_net_io.read_bps': 3200509.781330604, 'host_net_io.write_bps': 3200509.781330604, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856}),
 InstantEvent(name='Resource Usage', timestamp=3869.821650885, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 79.1, 'memory': 2745782272, 'time': 1692111382.350485, 'host_net_io.read_bps': 3472817.2998334193, 'host_net_io.write_bps': 4041423.6761417557, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856})]
[8]:
all_events[-10:]
[8]:
[InstantEvent(name='Managed Memory', timestamp=3907.782270047, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': "('reshape-b1532816c7b1745efbdc8cb9a24f29f9', 0, 0)", 'state': 'memory', 'size': 300048, 'tasks': 17}),
 CompleteEvent(name='Compute', timestamp=3907.792064168, duration=8.106231689453125e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-64bef1cc631f078a2e8f3d2c24d62e99', 'name': 'iginal-array', 'state': 'memory', 'size': 8, 'shape': [2], 'dtype': 'float32', 'type': "<class 'cupy._core.core.ndarray'>", 'dependencies': [], 'dependents': []}),
 InstantEvent(name='Managed Memory', timestamp=3907.792064168, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-64bef1cc631f078a2e8f3d2c24d62e99', 'state': 'memory', 'size': 300032, 'tasks': 15}),
 CompleteEvent(name='Compute', timestamp=3907.792960693, duration=5.4836273193359375e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-808dd9f17eb2378b6fb7a3aeddc6cb36', 'name': 'iginal-array', 'state': 'memory', 'size': 8, 'shape': [2], 'dtype': 'float32', 'type': "<class 'cupy._core.core.ndarray'>", 'dependencies': [], 'dependents': []}),
 InstantEvent(name='Managed Memory', timestamp=3907.792960693, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-808dd9f17eb2378b6fb7a3aeddc6cb36', 'state': 'memory', 'size': 300040, 'tasks': 16}),
 CompleteEvent(name='Compute', timestamp=3907.803720107, duration=5.698204040527344e-05, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': "('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)", 'name': 'array-reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': "<class 'cupy._core.core.ndarray'>", 'dependencies': ['original-array-64bef1cc631f078a2e8f3d2c24d62e99'], 'dependents': []}),
 InstantEvent(name='Managed Memory', timestamp=3907.803720107, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': "('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)", 'state': 'memory', 'size': 300032, 'tasks': 15}),
 CompleteEvent(name='Compute', timestamp=3907.804997844, duration=4.4345855712890625e-05, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': "('array-reshape-f70b70b2e4718d09ca3086bb97bec087', 0, 0)", 'name': 'array-reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': "<class 'cupy._core.core.ndarray'>", 'dependencies': ['original-array-808dd9f17eb2378b6fb7a3aeddc6cb36'], 'dependents': []}),
 InstantEvent(name='Managed Memory', timestamp=3907.804997844, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': "('array-reshape-f70b70b2e4718d09ca3086bb97bec087', 0, 0)", 'state': 'memory', 'size': 300040, 'tasks': 16}),
 CompleteEvent(name='Compute', timestamp=3907.811160065, duration=5.245208740234375e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': "('reshape-22612059257db170fa304ae3f488407c', 0, 0)", 'name': 'reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': "<class 'cupy._core.core.ndarray'>", 'dependencies': ["('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)"], 'dependents': []})]

Given a MultiEventDatabase we can use the TraceAnalyser that perform analyses over the database. It is useful to find bottlenecks in the code and to understand how the code is executed.

[9]:
from dasf.profile import TraceAnalyser
analyser = TraceAnalyser(database)

Function Bottleneck Report

The function bootleneck report will show us the function that is taking the most time to run. This is useful for optimizing code. Here, a function is composed by a set of tasks that applies this function in different chunks of data. It is the function used by map_blocks (where a function is applyed to several chunks). This report is a dataframe with the following columns: - Host: the host where the function is running - GPU: the GPU where the function is running - Function: the function name - Duration: the time spent in the function - Percentage of total time (%): the percentage of total compute time spent in the function - Mean GPU Utilization (%): the mean GPU utilization during the function execution - Mean GPU Memory Used (GB): the mean GPU memory used during the function execution - Mean Data Size (MB): the mean data size used during the function execution (as input) - Mean Data Throughput (MB/s): the mean data throughput during the function execution (as input) - Num tasks (chunks): the number of tasks (chunks) that the function was applied - Mean Task Time (s): the mean time spent in each task (chunk)

[5]:
func_bottleneck_dataframe = analyser.per_function_bottleneck()
func_bottleneck_dataframe.head(10)
Creating annotated task graph: 100%|███████████████████████████████████| 122391/122391 [00:00<00:00, 480393.43it/s]
[function_bottleneck] Analysing traces: 100%|███████████████████████████| 122391/122391 [00:07<00:00, 17441.83it/s]
[function_bottleneck] Creating dataframe: 100%|██████████████████████████████████████| 4/4 [00:00<00:00, 41.34it/s]
[5]:
Host GPU Function Duration (s) Percentage of total time (%) Mean GPU Utilization (%) Mean GPU Memory Used (GB) Mean Data Size (MB) Mean Throughput (MB/s) Num Tasks (chunks) Mean Task time (s)
624 c096 0 unc_fit-62a75b12-6b7a-49b0-b279 2.471439 34.967563 0.0 0.000000 0.160000 0.064740 1 2.471439
1753 c096 1 unc_fit-2d612856-11c9-4d72-8062 2.471328 35.497386 0.0 0.000000 0.200000 0.080928 1 2.471328
10 c096 2 unc_fit-abdc6ba0-f141-4e3c-909e 2.463677 33.690773 0.0 0.000000 0.240000 0.097415 1 2.463677
1228 c096 3 unc_fit-d813c7ea-7042-4c69-8a72 2.463437 34.930821 0.0 0.000000 0.200000 0.081187 1 2.463437
8 c096 2 var-partial 1.172959 16.040208 0.0 0.000000 0.000908 0.619854 685 0.001712
1750 c096 1 var-partial 1.105952 15.885550 0.0 0.000000 0.000907 0.624356 670 0.001651
1225 c096 3 var-partial 1.049221 14.877642 0.0 0.000000 0.000907 0.617285 639 0.001642
622 c096 0 var-partial 1.018737 14.413764 0.0 0.000000 0.000906 0.623162 570 0.001787
1223 c096 3 var 0.825022 11.698578 0.0 0.000000 0.040000 70.562086 711 0.001160
621 c096 0 mean_combine-partial 0.534140 7.557362 1.0 3.158966 0.000752 1.198554 674 0.000792

Task Bootleneck Report

The task bottleneck report shows the tasks that are the most time consuming in the application. This is similar to the above report, but is per task instead of per function (i.e., each block). This is similar to DASK task breakdown report.

[6]:
task_bottleneck_dataframe = analyser.per_task_bottleneck()
task_bottleneck_dataframe.head(10)
Creating annotated task graph: 100%|███████████████████████████████████| 122391/122391 [00:00<00:00, 426913.80it/s]
[task_bottleneck] Analysing traces: 100%|███████████████████████████████| 122391/122391 [00:11<00:00, 10857.61it/s]
[task_bottleneck] Creating dataframe: 100%|██████████████████████████████████████████| 4/4 [00:00<00:00,  5.47it/s]
[6]:
Host GPU Task Key Duration (s) Percentage of total time (%) Memory usage (Mb)
10453 c096 0 _func_fit-62a75b12-6b7a-49b0-b279-160a8ff3ee3b 2.471439 34.967563 0.000048
31484 c096 1 _func_fit-2d612856-11c9-4d72-8062-7dfff911cf37 2.471328 35.497386 0.000048
59 c096 2 _func_fit-abdc6ba0-f141-4e3c-909e-059f1f2eed4f 2.463677 33.690773 0.000048
21017 c096 3 _func_fit-d813c7ea-7042-4c69-8a72-7bdafcec69fb 2.463437 34.930821 0.000048
31507 c096 1 _update-96127d18-5249-4874-accc-a95d91831a7f 0.452484 6.499339 0.000092
10497 c096 0 _update-988648be-51da-405b-a484-a25c9db01ee3 0.430781 6.094980 0.000092
83 c096 2 _update-8c69b565-dc4d-4abc-80bb-e8085e2960bb 0.426465 5.831903 0.000092
21070 c096 3 _update-af7e6144-6bc1-4f85-9b66-4dd7e5f241d0 0.425992 6.040446 0.000092
21006 c096 3 ('var-e4678ef53eb782384180081aa58da0be', 13, 0) 0.363825 5.158930 0.000450
9 c096 2 ('mean_chunk-0ff2c4ec843f304adcd7d25ca35c47a7'... 0.259243 3.545143 0.000376

Task balance

This report show the number of tasks in memory (per worker), in each instant of time. This is useful to check imbalance in the load of the workers.

[7]:
task_balance_dataframe = analyser.per_worker_task_balance()
task_balance_dataframe.head(10)
[task_balance] Analysing traces: 100%|████████████████████████████████| 122391/122391 [00:00<00:00, 1256360.33it/s]
[task_balance] Creating dataframe: 100%|███████████████████████████████████████| 71/71 [00:00<00:00, 359222.66it/s]
[7]:
Time Interval (seconds from begin) worker-c096-0 worker-c096-1 worker-c096-2 worker-c096-3
0 0 0.0 0.0 1.0 1.0
1 1 10.0 12.0 16.0 11.0
2 6 33.0 13.0 15.0 35.0
3 7 8.0 14.0 32.0 17.0
4 8 32.0 15.0 31.0 20.0
5 9 32.0 30.0 19.0 19.0
6 10 14.0 14.0 14.0 15.0
7 11 13.0 15.0 14.0 21.0
8 12 16.0 19.0 9.0 16.0
9 13 18.0 15.0 17.0 20.0