{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Tutorial 5 - Using profiler\n", "\n", "This tutorial shows how to use the profiler to measure the performance of your code. We used the same pipeline of the previous tutorial.\n", "\n", "The profile is a tool that measures the time spent in each function of your code. It is very useful to identify bottlenecks and to optimize your code. By default it is not enabled and the simplest way to enable the default profiler is to use the `register_default_profiler` function. It will be shown later, in the notebook.\n", "\n", "The profile consists in two steps:\n", "1. Register the profiler and execute the desired pipeline. It will generate different files, which contains traces of the execution, in the current directory. Each file will have different information, depending on the plugins attached. Using the `register_default_profiler` function, two plugins will be attached, one to measure the dask task time (and other useful information, such as, per task data shape, type, MB, etc.) and a resource usage plugin (which measure GPU, CPU, memory and IO usage).\n", "2. Perform analysis of the generated files. Some analysis are already implemented in the `profiler` module, but you can also implement your own analysis. For now we have the followig analysis: per function time, per task time and per worker task balance. It will be shown later, in the notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Enabling profiling" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "from dasf.datasets import make_blobs\n", "\n", "n_samples = 100000\n", "n_bins = 3\n", "\n", "# Generate 3 blobs with 2 classes where the second blob contains\n", "# half positive samples and half negative samples. Probability in this\n", "# blob is therefore 0.5.\n", "centers = [(-6, -6), (0, 0), (9, 1)]\n", "X, y = make_blobs(n_samples=n_samples, centers=centers, shuffle=False, random_state=42)\n", "\n", "np.save(\"X.npy\", X)\n", "np.save(\"y.npy\", y)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from dasf.datasets import DatasetArray\n", "from dasf.datasets import DatasetLabeled\n", "\n", "\n", "class MyMakeBlobs(DatasetLabeled):\n", " def __init__(self):\n", " super().__init__(name=\"My Own make_blobs()\", download=False)\n", " \n", " # Let's assign the train and val data.\n", " self._train = DatasetArray(name=\"X\", download=False, root=\"X.npy\", chunks=(5000, 2))\n", " self._val = DatasetArray(name=\"y\", download=False, root=\"y.npy\", chunks=(5000))\n", "\n", "make_blobs = MyMakeBlobs()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from dasf.transforms import Normalize\n", "\n", "normalize = Normalize()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-08-15 11:55:48,568 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize\n", "2023-08-15 11:55:48,568 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n", "2023-08-15 11:55:48,568 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize\n", "2023-08-15 11:55:48,568 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n", "2023-08-15 11:55:48,673 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize\n", "2023-08-15 11:55:48,673 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n", "2023-08-15 11:55:48,752 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize\n", "2023-08-15 11:55:48,752 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n" ] } ], "source": [ "from dasf.pipeline.executors import DaskPipelineExecutor\n", "\n", "dask = DaskPipelineExecutor(local=True, use_gpu=True)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from dasf.ml.cluster import KMeans\n", "from dasf.ml.cluster import SOM\n", "\n", "kmeans = KMeans(n_clusters=3, max_iter=100)\n", "som = SOM(x=1, y=3, input_len=2, num_epochs=100)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from dasf.transforms import PersistDaskData\n", "\n", "persist_kmeans = PersistDaskData()\n", "persist_som = PersistDaskData()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "A KMeans and SOM Pipeline\n", "\n", "\n", "\n", "68414549\n", "\n", "Normalize.transform\n", "\n", "\n", "\n", "499072774\n", "\n", "KMeans.fit_predict\n", "\n", "\n", "\n", "68414549->499072774\n", "\n", "\n", "X\n", "\n", "\n", "\n", "499072781\n", "\n", "SOM.fit_predict\n", "\n", "\n", "\n", "68414549->499072781\n", "\n", "\n", "X\n", "\n", "\n", "\n", "261617047\n", "\n", "DatasetArray.load\n", "\n", "\n", "\n", "261617047->68414549\n", "\n", "\n", "X\n", "\n", "\n", "\n", "499067957\n", "\n", "PersistDaskData.transform\n", "\n", "\n", "\n", "499072774->499067957\n", "\n", "\n", "X\n", "\n", "\n", "\n", "499068127\n", "\n", "PersistDaskData.transform\n", "\n", "\n", "\n", "499072781->499068127\n", "\n", "\n", "X\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dasf.pipeline import Pipeline\n", "\n", "pipeline = Pipeline(\"A KMeans and SOM Pipeline\", executor=dask)\n", "\n", "pipeline.add(normalize, X=make_blobs._train) \\\n", " .add(kmeans.fit_predict, X=normalize) \\\n", " .add(som.fit_predict, X=normalize) \\\n", " .add(persist_kmeans, X=kmeans.fit_predict) \\\n", " .add(persist_som, X=som.fit_predict) \\\n", " .visualize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you created your pipeline we can attach the profiler to it. The profiler will collect the data and store it in a file (one for each plugin). The file will be stored in the same directory as the notebook. The file name will be controled by the `name` parameter followed by `.msgpack` extension. If you do not provide the `name` parameter it will be `default`.\n", "\n", "The `enable_nvtx` parameter will enable the NVTX markers. This will allow you to see the pipeline in the NVIDIA Nsight Systems. This is only available on NVIDIA GPUs.\n", "\n", "Finally, the `add_time_suffix` parameter will add the current time to the file name (useful for unique file names).In this way you can run the same pipeline multiple times and get different files. The file name will be `name_time.msgpack`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Registered worker plugin: Tutorial_5_Profile-TracePlugin\n", "Registered resource plugin: Tutorial_5_Profile-ResourceMonitor\n" ] } ], "source": [ "from dasf.profile import register_default_profiler\n", "\n", "register_default_profiler(\n", " pipeline,\n", " name=\"Tutorial_5_Profile\",\n", " enable_nvtx=False,\n", " add_time_suffix=False \n", ")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2023-08-15 11:55:50-0300] INFO - Beginning pipeline run for 'A KMeans and SOM Pipeline'\n", "[2023-08-15 11:55:50-0300] INFO - Task 'DatasetArray.load': Starting task run...\n", "[2023-08-15 11:55:50-0300] INFO - Task 'DatasetArray.load': Finished task run\n", "[2023-08-15 11:55:50-0300] INFO - Task 'Normalize.transform': Starting task run...\n", "[2023-08-15 11:55:50-0300] INFO - Task 'Normalize.transform': Finished task run\n", "[2023-08-15 11:55:50-0300] INFO - Task 'KMeans.fit_predict': Starting task run...\n", "[2023-08-15 11:55:55-0300] INFO - Task 'KMeans.fit_predict': Finished task run\n", "[2023-08-15 11:55:55-0300] INFO - Task 'SOM.fit_predict': Starting task run...\n", "[2023-08-15 11:56:21-0300] INFO - Task 'SOM.fit_predict': Finished task run\n", "[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Starting task run...\n", "[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Finished task run\n", "[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Starting task run...\n", "[2023-08-15 11:56:21-0300] INFO - Task 'PersistDaskData.transform': Finished task run\n", "[2023-08-15 11:56:21-0300] INFO - Pipeline run successfully\n", "CPU times: user 11.8 s, sys: 739 ms, total: 12.5 s\n", "Wall time: 31.2 s\n" ] } ], "source": [ "%time pipeline.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resource monitor plugin runs indefinitely, until the program exits. As the database are buffered, in order to reduce the IO pressure, the data are flushed to the database only after 5000 entries (or when program exits). We can shutdown the executor, in order to flush the data to the database." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-08-15 11:56:54,586 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:35871, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.\n", "2023-08-15 11:56:54,587 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:35148, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.\n", "2023-08-15 11:56:54,587 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:38191, but 16 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.\n", "2023-08-15 11:56:54,588 - distributed.active_memory_manager - WARNING - Tried retiring worker tcp://127.0.0.1:39400, but 14 tasks could not be moved as there are no suitable workers to receive them. The worker will not be retired.\n" ] } ], "source": [ "dask.shutdown()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Analysing the profile data\n", "\n", "Once the code ran, and traceevents were collected, we can perform analysis to understand the performance of the code. The first step is the load the databases. The `register_default_profiler` attachs two plugins `TracePlugin` (which trace dask tasks) and `ResourceMonitorPlugin` (which traces the resource usage of the workers). So there will be two files: `Tutorial_5_Profile-ResourceMonitor-c096.msgpack` and `Tutorial_5_Profile-TracePlugin-c096.msgpack`. We can load the events using the `EventProfiler` class (the same used to register events)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(EventProfiler(database=FileDatabase at Tutorial_5_Profile-ResourceMonitor-c096.msgpack),\n", " EventProfiler(database=FileDatabase at Tutorial_5_Profile-TracePlugin-c096.msgpack))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dasf.profile.profiler import EventProfiler\n", "from dasf.profile import MultiEventDatabase\n", "\n", "db1 = EventProfiler(\"Tutorial_5_Profile-ResourceMonitor-c096.msgpack\")\n", "db2 = EventProfiler(\"Tutorial_5_Profile-TracePlugin-c096.msgpack\")\n", "\n", "db1, db2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Database are collection of events. Each event is a dataclass and follows the [Chrome Trace Event Format](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview). The `EventProfiler` class is a wrapper around a list of events. It provides methods to add events and to write the database to a file.\n", "\n", "We can iterate over the events in a database using the `get_traces` function. It will return an iterator over the events in the database. The events are returned in the order they were recorded. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272})\n", "CompleteEvent(name='Compute', timestamp=3837.884328029, duration=0.050518035888671875, phase='X', process_id='c096', thread_id='worker-c096-2', args={'key': 'getGPUs-42237764-1836-4be8-ae62-948ce02e1d6a', 'name': 'tGPUs-42237764-1836-4be8-ae62', 'state': 'memory', 'size': 280, 'shape': [], 'dtype': 'unknown', 'type': \"\", 'dependencies': [], 'dependents': []})\n" ] } ], "source": [ "for event in db1.get_traces():\n", " print(event)\n", " break\n", "\n", "for event in db2.get_traces():\n", " print(event)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `MultiEventDatabase` allow use traverse multiple databases at the same time (usually, needed for analyses)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiEventDatabase with 2 databases" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "database = MultiEventDatabase([db1, db2])\n", "database" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272})\n" ] } ], "source": [ "for event in database:\n", " print(event)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or generate a list with all events" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[InstantEvent(name='Resource Usage', timestamp=3869.01274743, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 40.2, 'memory': 2744729600, 'time': 1692111381.5415905, 'host_net_io.read_bps': 10685583.118998485, 'host_net_io.write_bps': 10673829.571746167, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 134589.30723739465, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.018545822, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744729600, 'time': 1692111381.5495014, 'host_net_io.read_bps': 24242915.31342638, 'host_net_io.write_bps': 24415081.283950076, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 189, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.118886256, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 69.9, 'memory': 2744729600, 'time': 1692111381.649612, 'host_net_io.read_bps': 30272623.59462569, 'host_net_io.write_bps': 28577096.880356316, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 10024124.945820678, 'num_fds': 189, 'gpu_utilization': 1, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.219099359, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 10.0, 'memory': 2744995840, 'time': 1692111381.7498834, 'host_net_io.read_bps': 3102557.5490633715, 'host_net_io.write_bps': 3103435.1666125604, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 1, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.319312409, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744995840, 'time': 1692111381.850187, 'host_net_io.read_bps': 5343.775123098109, 'host_net_io.write_bps': 5622.927554901742, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.418357631, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 0.0, 'memory': 2744995840, 'time': 1692111381.9493108, 'host_net_io.read_bps': 2703.691829652832, 'host_net_io.write_bps': 3147.58153302867, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.518484708, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 10.0, 'memory': 2744995840, 'time': 1692111382.0494256, 'host_net_io.read_bps': 19927.11012718611, 'host_net_io.write_bps': 19927.11012718611, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3158966272}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.620366242, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 60.1, 'memory': 2745241600, 'time': 1692111382.1493354, 'host_net_io.read_bps': 2815376.423511278, 'host_net_io.write_bps': 2813654.866414532, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.721167103, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 70.0, 'memory': 2745511936, 'time': 1692111382.2492902, 'host_net_io.read_bps': 3200509.781330604, 'host_net_io.write_bps': 3200509.781330604, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856}),\n", " InstantEvent(name='Resource Usage', timestamp=3869.821650885, phase='I', scope='g', process_id='c096', thread_id=None, args={'cpu': 79.1, 'memory': 2745782272, 'time': 1692111382.350485, 'host_net_io.read_bps': 3472817.2998334193, 'host_net_io.write_bps': 4041423.6761417557, 'host_disk_io.read_bps': 0.0, 'host_disk_io.write_bps': 0.0, 'num_fds': 193, 'gpu_utilization': 0, 'gpu_memory_used': 3194617856})]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_events = list(database)\n", "all_events[:10]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[InstantEvent(name='Managed Memory', timestamp=3907.782270047, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': \"('reshape-b1532816c7b1745efbdc8cb9a24f29f9', 0, 0)\", 'state': 'memory', 'size': 300048, 'tasks': 17}),\n", " CompleteEvent(name='Compute', timestamp=3907.792064168, duration=8.106231689453125e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-64bef1cc631f078a2e8f3d2c24d62e99', 'name': 'iginal-array', 'state': 'memory', 'size': 8, 'shape': [2], 'dtype': 'float32', 'type': \"\", 'dependencies': [], 'dependents': []}),\n", " InstantEvent(name='Managed Memory', timestamp=3907.792064168, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-64bef1cc631f078a2e8f3d2c24d62e99', 'state': 'memory', 'size': 300032, 'tasks': 15}),\n", " CompleteEvent(name='Compute', timestamp=3907.792960693, duration=5.4836273193359375e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-808dd9f17eb2378b6fb7a3aeddc6cb36', 'name': 'iginal-array', 'state': 'memory', 'size': 8, 'shape': [2], 'dtype': 'float32', 'type': \"\", 'dependencies': [], 'dependents': []}),\n", " InstantEvent(name='Managed Memory', timestamp=3907.792960693, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': 'original-array-808dd9f17eb2378b6fb7a3aeddc6cb36', 'state': 'memory', 'size': 300040, 'tasks': 16}),\n", " CompleteEvent(name='Compute', timestamp=3907.803720107, duration=5.698204040527344e-05, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': \"('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)\", 'name': 'array-reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': \"\", 'dependencies': ['original-array-64bef1cc631f078a2e8f3d2c24d62e99'], 'dependents': []}),\n", " InstantEvent(name='Managed Memory', timestamp=3907.803720107, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': \"('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)\", 'state': 'memory', 'size': 300032, 'tasks': 15}),\n", " CompleteEvent(name='Compute', timestamp=3907.804997844, duration=4.4345855712890625e-05, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': \"('array-reshape-f70b70b2e4718d09ca3086bb97bec087', 0, 0)\", 'name': 'array-reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': \"\", 'dependencies': ['original-array-808dd9f17eb2378b6fb7a3aeddc6cb36'], 'dependents': []}),\n", " InstantEvent(name='Managed Memory', timestamp=3907.804997844, phase='I', scope='g', process_id='c096', thread_id='worker-c096-1', args={'key': \"('array-reshape-f70b70b2e4718d09ca3086bb97bec087', 0, 0)\", 'state': 'memory', 'size': 300040, 'tasks': 16}),\n", " CompleteEvent(name='Compute', timestamp=3907.811160065, duration=5.245208740234375e-06, phase='X', process_id='c096', thread_id='worker-c096-1', args={'key': \"('reshape-22612059257db170fa304ae3f488407c', 0, 0)\", 'name': 'reshape', 'state': 'memory', 'size': 8, 'shape': [1, 2], 'dtype': 'float32', 'type': \"\", 'dependencies': [\"('array-reshape-22612059257db170fa304ae3f488407c', 0, 0)\"], 'dependents': []})]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_events[-10:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a `MultiEventDatabase` we can use the `TraceAnalyser` that perform analyses over the database. It is useful to find bottlenecks in the code and to understand how the code is executed. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from dasf.profile import TraceAnalyser\n", "analyser = TraceAnalyser(database)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Function Bottleneck Report\n", "\n", "The function bootleneck report will show us the function that is taking the most time to run. This is useful for optimizing code. Here, a function is composed by a set of tasks that applies this function in different chunks of data. It is the function used by `map_blocks` (where a function is applyed to several chunks). This report is a dataframe with the following columns:\n", "- `Host`: the host where the function is running\n", "- `GPU`: the GPU where the function is running\n", "- `Function`: the function name\n", "- `Duration`: the time spent in the function\n", "- `Percentage of total time (%)`: the percentage of total compute time spent in the function\n", "- `Mean GPU Utilization (%)`: the mean GPU utilization during the function execution\n", "- `Mean GPU Memory Used (GB)`: the mean GPU memory used during the function execution\n", "- `Mean Data Size (MB)`: the mean data size used during the function execution (as input)\n", "- `Mean Data Throughput (MB/s)`: the mean data throughput during the function execution (as input)\n", "- `Num tasks (chunks)`: the number of tasks (chunks) that the function was applied\n", "- `Mean Task Time (s)`: the mean time spent in each task (chunk)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Creating annotated task graph: 100%|███████████████████████████████████| 122391/122391 [00:00<00:00, 480393.43it/s]\n", "[function_bottleneck] Analysing traces: 100%|███████████████████████████| 122391/122391 [00:07<00:00, 17441.83it/s]\n", "[function_bottleneck] Creating dataframe: 100%|██████████████████████████████████████| 4/4 [00:00<00:00, 41.34it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HostGPUFunctionDuration (s)Percentage of total time (%)Mean GPU Utilization (%)Mean GPU Memory Used (GB)Mean Data Size (MB)Mean Throughput (MB/s)Num Tasks (chunks)Mean Task time (s)
624c0960unc_fit-62a75b12-6b7a-49b0-b2792.47143934.9675630.00.0000000.1600000.06474012.471439
1753c0961unc_fit-2d612856-11c9-4d72-80622.47132835.4973860.00.0000000.2000000.08092812.471328
10c0962unc_fit-abdc6ba0-f141-4e3c-909e2.46367733.6907730.00.0000000.2400000.09741512.463677
1228c0963unc_fit-d813c7ea-7042-4c69-8a722.46343734.9308210.00.0000000.2000000.08118712.463437
8c0962var-partial1.17295916.0402080.00.0000000.0009080.6198546850.001712
1750c0961var-partial1.10595215.8855500.00.0000000.0009070.6243566700.001651
1225c0963var-partial1.04922114.8776420.00.0000000.0009070.6172856390.001642
622c0960var-partial1.01873714.4137640.00.0000000.0009060.6231625700.001787
1223c0963var0.82502211.6985780.00.0000000.04000070.5620867110.001160
621c0960mean_combine-partial0.5341407.5573621.03.1589660.0007521.1985546740.000792
\n", "
" ], "text/plain": [ " Host GPU Function Duration (s) \\\n", "624 c096 0 unc_fit-62a75b12-6b7a-49b0-b279 2.471439 \n", "1753 c096 1 unc_fit-2d612856-11c9-4d72-8062 2.471328 \n", "10 c096 2 unc_fit-abdc6ba0-f141-4e3c-909e 2.463677 \n", "1228 c096 3 unc_fit-d813c7ea-7042-4c69-8a72 2.463437 \n", "8 c096 2 var-partial 1.172959 \n", "1750 c096 1 var-partial 1.105952 \n", "1225 c096 3 var-partial 1.049221 \n", "622 c096 0 var-partial 1.018737 \n", "1223 c096 3 var 0.825022 \n", "621 c096 0 mean_combine-partial 0.534140 \n", "\n", " Percentage of total time (%) Mean GPU Utilization (%) \\\n", "624 34.967563 0.0 \n", "1753 35.497386 0.0 \n", "10 33.690773 0.0 \n", "1228 34.930821 0.0 \n", "8 16.040208 0.0 \n", "1750 15.885550 0.0 \n", "1225 14.877642 0.0 \n", "622 14.413764 0.0 \n", "1223 11.698578 0.0 \n", "621 7.557362 1.0 \n", "\n", " Mean GPU Memory Used (GB) Mean Data Size (MB) Mean Throughput (MB/s) \\\n", "624 0.000000 0.160000 0.064740 \n", "1753 0.000000 0.200000 0.080928 \n", "10 0.000000 0.240000 0.097415 \n", "1228 0.000000 0.200000 0.081187 \n", "8 0.000000 0.000908 0.619854 \n", "1750 0.000000 0.000907 0.624356 \n", "1225 0.000000 0.000907 0.617285 \n", "622 0.000000 0.000906 0.623162 \n", "1223 0.000000 0.040000 70.562086 \n", "621 3.158966 0.000752 1.198554 \n", "\n", " Num Tasks (chunks) Mean Task time (s) \n", "624 1 2.471439 \n", "1753 1 2.471328 \n", "10 1 2.463677 \n", "1228 1 2.463437 \n", "8 685 0.001712 \n", "1750 670 0.001651 \n", "1225 639 0.001642 \n", "622 570 0.001787 \n", "1223 711 0.001160 \n", "621 674 0.000792 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "func_bottleneck_dataframe = analyser.per_function_bottleneck()\n", "func_bottleneck_dataframe.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Task Bootleneck Report\n", "\n", "The task bottleneck report shows the tasks that are the most time consuming in the application. This is similar to the above report, but is per task instead of per function (i.e., each block). This is similar to DASK task breakdown report." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Creating annotated task graph: 100%|███████████████████████████████████| 122391/122391 [00:00<00:00, 426913.80it/s]\n", "[task_bottleneck] Analysing traces: 100%|███████████████████████████████| 122391/122391 [00:11<00:00, 10857.61it/s]\n", "[task_bottleneck] Creating dataframe: 100%|██████████████████████████████████████████| 4/4 [00:00<00:00, 5.47it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HostGPUTask KeyDuration (s)Percentage of total time (%)Memory usage (Mb)
10453c0960_func_fit-62a75b12-6b7a-49b0-b279-160a8ff3ee3b2.47143934.9675630.000048
31484c0961_func_fit-2d612856-11c9-4d72-8062-7dfff911cf372.47132835.4973860.000048
59c0962_func_fit-abdc6ba0-f141-4e3c-909e-059f1f2eed4f2.46367733.6907730.000048
21017c0963_func_fit-d813c7ea-7042-4c69-8a72-7bdafcec69fb2.46343734.9308210.000048
31507c0961_update-96127d18-5249-4874-accc-a95d91831a7f0.4524846.4993390.000092
10497c0960_update-988648be-51da-405b-a484-a25c9db01ee30.4307816.0949800.000092
83c0962_update-8c69b565-dc4d-4abc-80bb-e8085e2960bb0.4264655.8319030.000092
21070c0963_update-af7e6144-6bc1-4f85-9b66-4dd7e5f241d00.4259926.0404460.000092
21006c0963('var-e4678ef53eb782384180081aa58da0be', 13, 0)0.3638255.1589300.000450
9c0962('mean_chunk-0ff2c4ec843f304adcd7d25ca35c47a7'...0.2592433.5451430.000376
\n", "
" ], "text/plain": [ " Host GPU Task Key \\\n", "10453 c096 0 _func_fit-62a75b12-6b7a-49b0-b279-160a8ff3ee3b \n", "31484 c096 1 _func_fit-2d612856-11c9-4d72-8062-7dfff911cf37 \n", "59 c096 2 _func_fit-abdc6ba0-f141-4e3c-909e-059f1f2eed4f \n", "21017 c096 3 _func_fit-d813c7ea-7042-4c69-8a72-7bdafcec69fb \n", "31507 c096 1 _update-96127d18-5249-4874-accc-a95d91831a7f \n", "10497 c096 0 _update-988648be-51da-405b-a484-a25c9db01ee3 \n", "83 c096 2 _update-8c69b565-dc4d-4abc-80bb-e8085e2960bb \n", "21070 c096 3 _update-af7e6144-6bc1-4f85-9b66-4dd7e5f241d0 \n", "21006 c096 3 ('var-e4678ef53eb782384180081aa58da0be', 13, 0) \n", "9 c096 2 ('mean_chunk-0ff2c4ec843f304adcd7d25ca35c47a7'... \n", "\n", " Duration (s) Percentage of total time (%) Memory usage (Mb) \n", "10453 2.471439 34.967563 0.000048 \n", "31484 2.471328 35.497386 0.000048 \n", "59 2.463677 33.690773 0.000048 \n", "21017 2.463437 34.930821 0.000048 \n", "31507 0.452484 6.499339 0.000092 \n", "10497 0.430781 6.094980 0.000092 \n", "83 0.426465 5.831903 0.000092 \n", "21070 0.425992 6.040446 0.000092 \n", "21006 0.363825 5.158930 0.000450 \n", "9 0.259243 3.545143 0.000376 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task_bottleneck_dataframe = analyser.per_task_bottleneck()\n", "task_bottleneck_dataframe.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Task balance\n", "\n", "This report show the number of tasks in memory (per worker), in each instant of time. This is useful to check imbalance in the load of the workers." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[task_balance] Analysing traces: 100%|████████████████████████████████| 122391/122391 [00:00<00:00, 1256360.33it/s]\n", "[task_balance] Creating dataframe: 100%|███████████████████████████████████████| 71/71 [00:00<00:00, 359222.66it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Time Interval (seconds from begin)worker-c096-0worker-c096-1worker-c096-2worker-c096-3
000.00.01.01.0
1110.012.016.011.0
2633.013.015.035.0
378.014.032.017.0
4832.015.031.020.0
5932.030.019.019.0
61014.014.014.015.0
71113.015.014.021.0
81216.019.09.016.0
91318.015.017.020.0
\n", "
" ], "text/plain": [ " Time Interval (seconds from begin) worker-c096-0 worker-c096-1 \\\n", "0 0 0.0 0.0 \n", "1 1 10.0 12.0 \n", "2 6 33.0 13.0 \n", "3 7 8.0 14.0 \n", "4 8 32.0 15.0 \n", "5 9 32.0 30.0 \n", "6 10 14.0 14.0 \n", "7 11 13.0 15.0 \n", "8 12 16.0 19.0 \n", "9 13 18.0 15.0 \n", "\n", " worker-c096-2 worker-c096-3 \n", "0 1.0 1.0 \n", "1 16.0 11.0 \n", "2 15.0 35.0 \n", "3 32.0 17.0 \n", "4 31.0 20.0 \n", "5 19.0 19.0 \n", "6 14.0 15.0 \n", "7 14.0 21.0 \n", "8 9.0 16.0 \n", "9 17.0 20.0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "task_balance_dataframe = analyser.per_worker_task_balance()\n", "task_balance_dataframe.head(10)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }