|
GTPin
|
GEN emulator that replays traces generated by GTPin
========================================================================================
========================================================================================
GTReplay is a GEN emulator allowing replaying special traces, called gLITs (Long Instruction Trace for GEN). gLIT is generated on GEN device and GTReplay replays it on CPU side. The picture below shows a schematic flow. The original GEN binary kernel is instrumented by GTReplay to generate gLIT while running on GEN device. gLIT contains all the information, required for replaying the trace within emulator. It is a multi-threaded trace - contains the record of each dispatch of the kernel to any of the GPU hardware thread. Once gLIT is created, one can feed it into GTReplay and replay on CPU. The replay process is deterministic - the order of kernel dispatches to specific Execution Units and specific hardware threads, their IDs, the order of memory accesses are preserved. The trace is generated once and can be replayed multiple times.
One can develop kernel profiling and analysis tools on top of GTReplay. The analysis tools communicate with GTReplay via so called Tool API. Tool API allows traversing the kernel binary, inspection of the GEN instructions, registration the callbacks before and/or after any kernel instruction, and observing the current state of each hardware thread. Profiling and analysis tools can be written in C/C++. Any number of analysis tools can run concurrently on the same kernel replay.
GTReplay is available, along with a set of sample analysis tools. It enables users to develop their own analysis tools.
Tutorial sections:
Reference sections:
========================================================================================
========================================================================================
GTReplay supports the following operating systems:
GTReplay is provided in both 32-bit and 64-bit forms. GTReplay supports cross-platform. One can generate gLIT traces on Linux and replay them on Windows, and wise versa
GTReplay supports the following HW platforms:
With GTReplay a user can easily develop profiling and analysis tools of any complexity. The tools are written in a high-level language (C/C++), any instruction of the kernel can be instrumented and observed. The register state of any HW thread is available at any point.
With GTReplay one can develop memory and cache models, model the traffic via different HW samplers and fixed functions, etc.
A user can debug kernels by using either the built in printing capabilities, or by developing his own debugger.
GTReplay can capture any data available at the EU scope while executing a program. It can capture such data at the lowest granularity possible: the single EU assembly instruction. You can create an unlimited variety of analysis tools using the GTReplay technology.
More details on the existing tools can be found in GTReplay Sample Tools.
gLIT traces are collected separately, for:
A user can limit GTPin profiling for specific kernels and shaders, and for specific Enqueue/Draw commands.
A user can limit gLIT generation to a specific Thread Group IDs (GTPin: Defining the Profiled Thread Group IDs).
========================================================================================
========================================================================================
GTReplay is a part of GTPin release package. It is located within Profilers\GTReplay folder
GTReplay package has the following directory structure:
GTReplay |--common |--examples |--ia32 |--intel64 |--utils
========================================================================================
========================================================================================
In order to create gLITs one needs to run GTPin with a gentrace tool. Gentrace DLL is located within ia32 or intel64 sub-folders. As with the general GTPin tracing tools, Gentrace should be run in two phases - pre-processing and trace-gathering.
To run the pre-processing phase of the gentrace tool (in its default configuration) use the following command:
Profilers/Bin/gtpin -t Profilers\GTReplay\intel64\gentrace.dll --phase 1 -- app
NOTE: You can run this phase only once per application.
To run the trace-gathering phase of the gentrace tool (in its default configuration), use the following command:
Profilers/Bin/gtpin -t Profilers\GTReplay\intel64\gentrace.dll --phase 2 -- app
When you run the Gentrace tool in its default configuration for pre-processing (phase 1), the tool generates a directory called: GTPIN_PROFILE_GENTRACE0. In addition the tool creates the following two files in the current directory:
gentrace_pre_process.txt: Refers to a buffer for kernels dispatched to be executed on the device. The gentrace_pre_process.txt file contains the maximum number of trace records generated by each kernel, out of all instances of Draw/Enqueue commands that this kernel executed (on the device). For example, if the kernel generates between 5 and 15 records when executed, the allocated buffer for that kernel should be large enough to hold 15 records.This file is an input to the trace gathering phase. It has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 4456448
where, for each kernel, the maximum number of required trace records is provided.
gentrace_pre_process_sw_threads.txt: Refers to a information about the SW threads created by each kernel when the kernel is executed on the device. Each line in the .txt file contains the name of the kernel executed on the device; the number of SW threads generated by that kernel’s execution of a Draw or Enqueue command; the ID of the Draw or Enqueue command; and some other metadata.This file contains informational data only, and has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 0 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 1 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 2 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 3 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 4 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 5 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 6 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 7 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 8 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 9 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 10
where each line corresponds to a single kernel for a single Draw/Enqueue command. The fields have the following meaning (from left to right):
When the Gentrace tool is run for the trace gathering phase (phase 2), the tool generates the directory: GTPIN_PROFILE_GENTRACE1. GTPin saves the profiling results in the folder: GTPIN_PROFILE_GENTRACE1\Session_Final. The traces for each kernel is saved in a separate sub-folder that has the same name as the kernel. Each Draw/Enqueue command has a separate trace, which is saved in a corresponding sub-directory. The trace is provided in a compressed binary format.
In order to run GTReplay, you must run the following command line:
Profilers\GTReplay\intel64\gtreplay.exe [-t tool1] [-t tool2] [-t tool3] [GTReplay arguments] -- path-to-the-location-of-the-trace
The list of the arguments and parameters that can be provided to GTReplay is listed in GTReplay Configuration.
========================================================================================
========================================================================================
GTReplay supports several configuration parameters. The most useful parameters are:
-t--threads n--pause--stop--debug--tid-debug tid--debug-data--barrier--emul-time--time
========================================================================================
========================================================================================
The list of the ready-to-use sample tools:
Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4