|
GTPin
|
The Opcodeprof tool provides the dynamic frequencies of each of the kernel instructions ("instruction mix"), in the form of opcode histograms
To run the Opcodeprof tool (in its default configuration), use the following command:
Profilers/Bin/gtpin -t opcodeprof -- app
When you run the in-house GTPin Opcodeprof tool in its default configuration, the tool generates the directory: GTPIN_PROFILE_OPCODEPROF0. The profiling results are stored in the sub-folder: GTPIN_PROFILE_OPCODEPROF0\Session_Final. Results are provided for each kernel/shader configuration, and also in an accumulated (total) configuration.
For each binary kernel/shader that was dispatched to the device, the tool generates a directory with an extended kernel name: KernelName__CompilerGeneratedName, as shown in the following screenshot:
NOTE: If GTPin does not know the name of the kernel, then a compiler-generated name -- with the format CS_asmf54af91315561f54_simd8 -- is assigned as the kernel name. In the compiler-generated name, the prefix indicates the kernel type; the suffix indicates the SIMD width to which this kernel was compiled; and the 16-digit number is the hash ID of the IR representation of this kernel. This is shown in the previous screenshot.
Each kernel/shader folder contains the resulting file opcodeprof_total.out, which summarizes the results for this specific kernel/shader. In addition, an opcodeprof_total.out summarizing the results for this specific kernel/shader. In addition, a opcodeprof_total.out file in the root directory summarizes all the application kernels/shaders together.
Each individual opcodeprof_total.out file has the following format:
DYNAMIC OPCODE HISTOGRAMS PER EXECUTION DATA TYPES
==================================================
DATA TYPE: ud
OPCODE Report : Opcode SIMD Static (%) Dynamic (%)
sendsc 16 2 (16.7%) 11870 (16.7%)
sends 16 2 (16.7%) 11870 (16.7%)
mov 1 2 (16.7%) 11870 (16.7%)
add 1 1 ( 8.3%) 5935 ( 8.3%)
mov 8 1 ( 8.3%) 5935 ( 8.3%)
DATA TYPE: f
OPCODE Report : Opcode SIMD Static (%) Dynamic (%)
pln 16 4 (33.3%) 23740 (33.3%)
// kernel name: PS_asm537c7dc831196b2d_simd32
Static instruction count = 12
Dynamic instruction count = 71220
[ 5935] (W) mov (8|M0) r21.0<1>:ud r0.0<8;8,1>:ud {Compacted}
[ 5935] (W) pln (16|M0) r19.0<1>:f r12.4<0;1,0>:f r3.0<8;8,1>:f {Compacted}
[ 5935] (W) pln (16|M0) r17.0<1>:f r12.0<0;1,0>:f r3.0<8;8,1>:f {Compacted}
[ 5935] (W) add (1|M0) a0.0<1>:ud r11.0<0;1,0>:ud 0x102:ud {Compacted}
[ 5935] (W) mov (1|M0) r21.3<1>:ud r11.1<0;1,0>:ud
[ 5935] (W) pln (16|M16) r15.0<1>:f r12.4<0;1,0>:f r7.0<8;8,1>:f
[ 5935] (W) mov (1|M0) r21.2<1>:ud 0x0:ud {Compacted}
[ 5935] (W) pln (16|M16) r13.0<1>:f r12.0<0;1,0>:f r7.0<8;8,1>:f
[ 5935] sends (16|M0) r0:w r21 r17 a0.0 0x28C00FC
[ 5935] sends (16|M16) r112:w r21 r13 a0.0 0x28C00FC
[ 5935] sendsc (16|M0) null:w r0 null 0x5 0x10031000
[ 5935] sendsc (16|M16) null:w r112 null 0x25 0x10031800 {EOT}
The results are presented as dynamic histograms of opcodes, grouped by operational data type. For each data type and for each opcode, the following information is reported: the opcode mnemonic, its operational SIMD width, the static count of the tuple (Opcode, SIMD, data type) encountered within the kernel, the percentage among all static instructions of the kernel, the dynamic count of the above tuple, and its percentage among all dynamic instructions that were counted. The following data is then reported: kernel name, total number of static instructions, and the total number of dynamic instructions. Finally this data is reported: a listing of all assembly instructions of the kernel, along with the dynamic count of each instruction.
In the root directory, the file opcodeprof_total.out summarizes the results of all kernels/shaders of the application. It provides the total number of binary kernels, the total dynamic number of instructions, the dynamic opcode histograms per data type, and the listing of all assembly kernels, along with dynamic frequencies of each instruction, as shown here:
Total number of kernels: 8
Total number of instructions: 1907624
DYNAMIC OPCODE HISTOGRAMS PER EXECUTION DATA TYPES
==================================================
DATA TYPE: ud
OPCODE Report : Opcode SIMD Dynamic (%)
sends 8 146262 ( 7.7%)
sends 16 92520 ( 4.9%)
add 1 89247 ( 4.7%)
mov 1 38574 ( 2.0%)
sendsc 16 22560 ( 1.2%)
mov 8 9615 ( 0.5%)
sendsc 8 6342 ( 0.3%)
DATA TYPE: d
OPCODE Report : Opcode SIMD Dynamic (%)
mov 8 69960 ( 3.7%)
DATA TYPE: uw
OPCODE Report : Opcode SIMD Dynamic (%)
mov 1 69960 ( 3.7%)
DATA TYPE: f
OPCODE Report : Opcode SIMD Dynamic (%)
mad 8 559680 (29.3%)
mul 8 285182 (14.9%)
add 8 279840 (14.7%)
mov 8 145262 ( 7.6%)
pln 16 45120 ( 2.4%)
mul 16 17408 ( 0.9%)
mov 16 17408 ( 0.9%)
pln 8 12684 ( 0.7%)
[ 34980] (W) mov (1|M0) r0.2<1>:ud 0x0:uw
[ 34980] (W) add (1|M0) a0.0<1>:ud r2.2<0;1,0>:ud 0xA:ud {Compacted}
[ 34980] mov (8|M0) r119.0<1>:f r6.0<8;8,1>:f {Compacted}
[ 34980] (W) sends (16|M0) r117:w r0 null a0.0 0x22843FC // wr:1h+?, rd:2, ?
[ 34980] mov (8|M0) r120.0<1>:f r7.0<8;8,1>:f {Compacted}
[ 34980] (W) mov (8|M0) r112.0<1>:d r1.0<8;8,1>:d
To print the total results of just the application profiling, use the following command line:
Profilers/Bin/gtpin -t opcodeprof --total_only -- app
In the above case, only one file is generated: GTPIN_PROFILE_OPCODEPROF0\Session_Final\opcodeprof_total.out.
To obtain the profiling for each HW thread, use the following command line:
Profilers/Bin/gtpin -t opcodeprof --per_thread_data --num_thread_blocks 0 -- app
In this case, the files like opcodeprof__s_0_ss_2_eu_7_tid_5.out are also generated for each binary kernel/shader in the corresponding sub-folder. In the files, S indicates Slice number, SS indicates Sub-Slice number, EU indicates Execution Unit number, and TID indicates the HW Thread ID number.
(Back to the list of all GTPin Sample Tools)
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2018-2024 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Opcodeprof tool definitions 00009 */ 00010 00011 #ifndef OPCODEPROF_H_ 00012 #define OPCODEPROF_H_ 00013 00014 #include <map> 00015 #include <vector> 00016 #include <string> 00017 #include <tuple> 00018 00019 #include "gtpin_api.h" 00020 #include "gtpin_tool_utils.h" 00021 #include "opcodeprof_utils.h" 00022 00023 using namespace gtpin; 00024 00025 00026 /* ============================================================================================= */ 00027 // Class Opcodeprof 00028 /* ============================================================================================= */ 00029 /*! 00030 * Implementation of the IGtTool interface for the opcodeprof tool 00031 */ 00032 class Opcodeprof : public GtTool 00033 { 00034 public: 00035 /// Implementation of the IGtTool interface 00036 const char* Name() const { return "opcodeprof"; } 00037 00038 void OnKernelBuild(IGtKernelInstrument& instrumentor); 00039 void OnKernelRun(IGtKernelDispatch& dispatcher); 00040 void OnKernelComplete(IGtKernelDispatch& dispatcher); 00041 00042 public: 00043 00044 static Opcodeprof* Instance(); ///< @return Single instance of this class 00045 static void OnFini() { Instance()->Fini(); } ///< Callback function registered with atexit() 00046 00047 private: 00048 Opcodeprof() = default; 00049 Opcodeprof(const Opcodeprof&) = delete; 00050 Opcodeprof& operator = (const Opcodeprof&) = delete; 00051 ~Opcodeprof() = default; 00052 00053 void Fini(); /// Post process and dump profiling data 00054 00055 private: 00056 std::map<GtKernelId, OpcodeprofKernelProfile> _kernels; ///< Collection of kernel profiles 00057 }; 00058 00059 #endif
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2018-2025 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Implementation of the Opcodeprof tool 00009 */ 00010 00011 #include <fstream> 00012 #include <sstream> 00013 #include <iomanip> 00014 #include <algorithm> 00015 #include <functional> 00016 00017 #include "opcodeprof.h" 00018 00019 using namespace gtpin; 00020 00021 /* ============================================================================================= */ 00022 // Configuration 00023 /* ============================================================================================= */ 00024 Knob<int> knobNumThreadBuckets("num_thread_buckets", 32, "Number of thread buckets. 0 - maximum thread buckets"); 00025 00026 /* ============================================================================================= */ 00027 // Opcodeprof implementation 00028 /* ============================================================================================= */ 00029 Opcodeprof* Opcodeprof::Instance() 00030 { 00031 static Opcodeprof instance; 00032 return &instance; 00033 } 00034 00035 void Opcodeprof::OnKernelBuild(IGtKernelInstrument& instrumentor) 00036 { 00037 const IGtKernel& kernel = instrumentor.Kernel(); 00038 const IGtCfg& cfg = instrumentor.Cfg(); 00039 const IGtGenCoder& coder = instrumentor.Coder(); 00040 const IGtGenModel& genModel = kernel.GenModel(); 00041 IGtProfileBufferAllocator& allocator = instrumentor.ProfileBufferAllocator(); 00042 IGtVregFactory& vregs = coder.VregFactory(); 00043 IGtInsFactory& insF = coder.InstructionFactory(); 00044 00045 // Initialize virtual registers 00046 GtReg addrReg = vregs.MakeMsgAddrScratch(); ///< Virtual register that holds address within profile buffer 00047 00048 // Allocate the profile buffer. It will hold single OpcodeprofRecord per each basic block in each thread bucket 00049 uint32_t numThreadBuckets = (knobNumThreadBuckets == 0) ? genModel.MaxThreadBuckets() : knobNumThreadBuckets; 00050 uint32_t numRecords = cfg.NumBbls(); 00051 GtProfileArray profileArray(sizeof(OpcodeprofRecord), numRecords, numThreadBuckets); 00052 profileArray.Allocate(allocator); 00053 00054 // Instrument basic blocks 00055 for (auto bblPtr : cfg.Bbls()) 00056 { 00057 if (!bblPtr->IsEmpty()) 00058 { 00059 GtGenProcedure proc; 00060 uint32_t recordNum = bblPtr->Id(); 00061 00062 // addrReg = address of the current thread's OpcodeprofRecord in the profile buffer 00063 profileArray.ComputeAddress(coder, proc, addrReg, recordNum); 00064 00065 // [addrReg].freq++ 00066 proc += insF.MakeAtomicInc(NullReg(), addrReg, GED_DATA_TYPE_ud); 00067 00068 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00069 InstrumentBbl(instrumentor , *bblPtr, GtIpoint::Before(), proc); 00070 } 00071 } 00072 00073 // Create OpcodeprofKernelProfile object that represents profile of this kernel 00074 _kernels.emplace(kernel.Id(), OpcodeprofKernelProfile(kernel, cfg, profileArray)); 00075 } 00076 00077 void Opcodeprof::OnKernelRun(IGtKernelDispatch& dispatcher) 00078 { 00079 bool isProfileEnabled = false; 00080 00081 const IGtKernel& kernel = dispatcher.Kernel(); 00082 GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc); 00083 if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get())) 00084 { 00085 auto it = _kernels.find(kernel.Id()); 00086 00087 if (it != _kernels.end()) 00088 { 00089 IGtProfileBuffer* buffer = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer); 00090 OpcodeprofKernelProfile& kernelProfile = it->second; 00091 const GtProfileArray& profileArray = kernelProfile.GetProfileArray(); 00092 if (profileArray.Initialize(*buffer)) 00093 { 00094 isProfileEnabled = true; 00095 } 00096 else 00097 { 00098 GTPIN_ERROR_MSG("OPCODEPROF: " + std::string(kernel.Name()) + " : Failed to write into memory buffer"); 00099 } 00100 } 00101 } 00102 dispatcher.SetProfilingMode(isProfileEnabled); 00103 } 00104 00105 void Opcodeprof::OnKernelComplete(IGtKernelDispatch& dispatcher) 00106 { 00107 if (!dispatcher.IsProfilingEnabled()) 00108 { 00109 return; // Do nothing with unprofiled kernel dispatches 00110 } 00111 00112 const IGtKernel& kernel = dispatcher.Kernel(); 00113 auto it = _kernels.find(kernel.Id()); 00114 00115 if (it != _kernels.end()) 00116 { 00117 const IGtProfileBuffer* buffer = dispatcher.GetProfileBuffer(); GTPIN_ASSERT(buffer); 00118 OpcodeprofKernelProfile& kernelProfile = it->second; 00119 const GtProfileArray& profileArray = kernelProfile.GetProfileArray(); 00120 00121 for (uint32_t recordNum = 0; recordNum != profileArray.NumRecords(); ++recordNum) 00122 { 00123 for (uint32_t threadBucket = 0; threadBucket < profileArray.NumThreadBuckets(); ++threadBucket) 00124 { 00125 OpcodeprofRecord record; 00126 if (!profileArray.Read(*buffer, &record, recordNum, 1, threadBucket)) 00127 { 00128 GTPIN_ERROR_MSG("OPCODEPROF: " + std::string(kernel.Name()) + " : Failed to read from memory buffer"); 00129 } 00130 else 00131 { 00132 kernelProfile.Accumulate(record, (BblId)recordNum); 00133 } 00134 } 00135 } 00136 } 00137 } 00138 00139 void Opcodeprof::Fini() 00140 { 00141 DumpProfile(_kernels); 00142 DumpAsm(_kernels); 00143 } 00144 00145 /* ============================================================================================= */ 00146 // GTPin_Entry 00147 /* ============================================================================================= */ 00148 EXPORT_C_FUNC void GTPin_Entry(int argc, const char* argv[]) 00149 { 00150 ConfigureGTPin(argc, argv); 00151 Opcodeprof::Instance()->Register(); 00152 atexit(Opcodeprof::OnFini); 00153 }
(Back to the list of all GTPin Sample Tools)
Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4