2013-12-14

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/12/14/154884.aspx

When you try to make sense of Windows Event Tracing files (.etl) it is often helpful to add some special events into the event stream so you can closer examine near the “Here it was slow” event which helps a lot to look at the right spots. If you are lucky you can do this by routing your tracing data to ETW or you need to set some marker events by hand. The traditional was of doing this was to call xperf –m “Here it was slow” during your repro session. I have observed several times since then that this command did no longer work on some machines.

C:\Program Files (x86)\Windows Kits\8.1\bin\x64> xperf -m "hi"
xperf: error: Set Mark failed                                 
                                                              

These things did never happen on my Windows 7 machine. I thought it was some oversight in Windows 8. Interestingly if I started the tracing session with xperf all worked well again. But when I did start a tracing session with the new WPR/UI tool from the Windows 8.1 SDK this error was consistently there. After digging a little deeper with xperf –Loggers I found that WPR/UI do not create the usual "NT Kernel Logger" session but instead a WPR_initiated_WprApp_WPR System Collector session for which I cannot see the trace flags anymore.

C:\Windows\system32>xperf -Loggers
Logger Name           : WPR_initiated_WprApp_WPR System Collector
Logger Id             : 11
Logger Thread Id      : 0000000000000000
Buffer Size           : 1024
Maximum Buffers       : 1632
Minimum Buffers       : 1632
Number of Buffers     : 1632
Free Buffers          : 1624
Buffers Written       : 0
Events Lost           : 0
Log Buffers Lost      : 0
Real Time Buffers Lost: 0
Flush Timer           : 0
Age Limit             : 0
Log File Mode         : Secure Buffered PersistOnHybridShutdown SystemLogger
Maximum File Size     : 0
Log Filename          :
Trace Flags           :
PoolTagFilter         : *

That does explain why I cannot add mark  events anymore on Windows 8 machines. There is no more "NT Kernel Logger" session to set the mark event. This is bad news for xperf –m but good news for all people who like to keep Process Explorer and other tools open while they are taking kernel trace events. It seems that xperf was not updated to display the tracing flags for the new type of kernel tracing sessions because another tool from the Windows 8.1 SDK can display them: tracelog

                                                                    
C:\Program Files (x86)\Windows Kits\8.1\bin\x64>tracelog.exe  -l    
                                                                    

Logger Name:            WPR_initiated_WprApp_WPR System Collector
Logger Id:              0x11
Logger Thread Id:       0000000000000000
Guid:                   c7f66f95-6045-11e3-8265-d43d7ef06eb2
Session Security:       D:(A;;0x800;;;WD)(A;;0x120fff;;;SY)(A;;0x120fff;;;LS)(A;;0x120fff;;;NS)(A;;0x120fff;;;BA)(A;;0xee5;;;LU)(A;;LC;;;MU)(A;;
Buffer Size:            1024 Kb
Maximum Buffers:        1632
Minimum Buffers:        1632
Number of Buffers:      1632
Free Buffers:           1624
Buffers Written:        0
Events Lost:            0
Log Buffers Lost:       0
Real Time Buffers Lost: 0
AgeLimit:               0
Real Time Consumers:    0
ClockType:              PerfCounter
Log Mode:               Secure  Buffering-only
Hybrid Shutdown:        Persist
Maximum File Size:      not set
Buffer Flush Timer:     not set
Enabled tracing:        Process Thread ProcCounters ImageLoad DiskIo HardFaults CxtSwap Dpc Isr Profile Power MemInfo MemInfoWs Priority Dispatcher … 
Stack Tracing:          DiskRead DiskWrite DiskFlush ThreadDCEnd CSwitch ReadyThread KernelQueueEnqueue KernelQueueDequeue Profile
Log Filename:

 

When you look at the documentation of tracelog you will find many Windows 8 and Windows 8.1 additions which make it a cool tool to record trace sessions. Although I must admit that the sheer number of tools to start traces is getting out of hand. We have

ETW Control Tool

Part Of

WPR/UI

Windows Performance Toolkit

xperf

Windows Performance Toolkit

logman

Windows

tracelog

Since Windows 8.1 SDK formerly only part of WDK.

PerfView

Extra Download from MS

Each tool has another syntax and philosophy behind which makes it a bit cumbersome to use the right tool. But as far as I can tell you should check out tracelog first since it is the only tool which is able to make use of the new cool Windows 8/8.1 enhancements. Below is a list of the new ETW capabilities of Windows 8/8.1

Since

Tracelog Feature

Description

8.1

-bt n

Specifies the number (n) of buffers to fill before starting to flush them.

8.1

-EventIdFilter {-in|-out}nid1 id2 ...

Specifies an event id filter with n event ids (maximum 64 event ids allowed).

8.1

-ExeFilter Executable_file [; Executable_file ...]

Specifies the names of executable files to filter. You can specify a list of files. Separate the names of the files using semi-colons. Files not listed are excluded.

8

-hybridshutdown {stop|persist}

Controls hybrid shutdown logger behavior.

8.1

-independent

Enables independent mode on the trace session.

8.1

-PidFilter n pid1 pid2 ...

Specifies a Pid filter with n Pids (maximum 8 allowed).

8.1

-PkgIdFilter Package Full Name [ ;Package Full Name...]

Specifies a package ID filter. You can specify a list of package files. Separate the names of the files using semi-colons.

8.1

-PkgAppIdFilter PRAID [ ;PRAID...]

Specifies a package-relative app identifier (PRAID) filter. The PRAID is the unique identifier of the application within the package. You can specify more than one PRAID. Separate the ids using semi-colons.

8

-Pmc Ctrs:Events

Configures the performance monitor counter (PMC) sampling on events.

8

-ProfileSource src

Configure profiling source to use. For list of sources, use the command tracelog -ProfileSource Help.

8

-SetProfInt n src

Configure the profiling interval (n) for specified source. Where n represents units of 100ns. The default is 10000 (which is equivalent to 1ms.

8.1

-StackWalkFilter {-in|-out}nid1 id2 ...

Specifies an event id filter with n event ids (maximum 64 event ids allowed).

8

-systemlogger

Logger can receive SystemTraceProvider events. See Configuring and Starting a SystemTraceProvider Session.

By default tracelog will start loggers with kernel mode providers with the systemlogger option so you can start up to 8 additional Kernel Loggers which should be enough for everyone. To log e.g. all Context Switch events with call stacks you can use

                                                                                                                                     
tracelog -start MySession -f c:\Kernel1.etl -img -eflag PROC_THREAD+LOADER+CSWITCH+DISPATCHER -stackwalk CSWITCH+ReadyThread         
                                                                                                                                     

This will create a new kernel trace session (Windows 8 and later only!) without blocking the default Kernel Logger Session which is the most useful thing of the new capabilities of the tracelogger tool. When I stop it with

                         
tracelog -stop mySession 
                         

then no image identification rundown is performed which means that I have call stacks but WPR cannot identify in which images the stack addresses do reside. It seems also not be able to merge etl files like xperf can do. Not to mention that it does not create the manged pdbs when stopping a trace session like wpr and xperf are able to. Still tracelog is the only tool which exposes the full power of ETW as a command line tool. It has the most interesting options to start new trace sessions but it seems to (correct me I really would like to know) be the wrong tool to stop a trace session by initiating all the necessary image identification and managed pdb creation. I really would like to see additional parameters in xperf to create the managed pdbs later when I have gathered and etl file.

To limit the potentially huge output of ETW can now be filtered at least at process level for user mode providers (-PidFilter or -ExeFilter) and to enable/disable stack walking for specific events and not the complete provider (-StackWalkFilter …) which is a welcome addition for Windows 8.1 users of ETW. Based on my limited experience with the new tracelog tool I would use it to start trace sessions for sophisticated analysis tasks on Windows 8 machines but stop the trace session with xperf to get the usual image identification and ngen pdb creation support.

The command

                                                 
xperf -stop MySession -d c:\temp\xperfStop.etl   
                                                 

did the job to stop the system logger and I got readable call stacks with a full .NET support. If you do not want to use a myriad of tools you can also use PerfView which Vance Morrison uses a lot to troubleshoot VS issues. The command line syntax is vastly different than the previous tools but it can be quite handy if you only need to deploy a 10MB exe which can collect and view the ETW data.

perfview start c:\temp\perfview.etl  -nogui

will start profiling with a sample interval of 1ms and also capture all disc related stuff which is a good default to start. If you need to do some wait chain analysis you can use

                                                       
perfview start c:\temp\perfview.etl  -nogui -ThreadTime
                                                       

which will enable an additional bunch of additional providers:

Enabled tracing:       Process Thread ProcCounters ImageLoad DiskIo DiskIoInit HardFaults CxtSwap TcpIp Profile Dispatcher
Stack Tracing:          DiskRead DiskWrite DiskFlush 0x0220 ProcessCreate ImageLoad ThreadCreate CSwitch ReadyThread Profile

To stop tracing you need to do

                           
perfview -zip:false stop   
                           

You can view the generated file with

                                
perfview c:\temp\perfview.etl   
                                

or with WPA or your favorite viewing tool. Perfview is for some managed analysis very handy. With handy I mean that it is the only tool to analyze managed processes for things like

GC wait times

JIT times

Managed Allocation Tracing (type and stack)

thrown exceptions without using a debugger, profiler or your own tracing

I did refuse to try perfview for a long time because it complained always that nearly all my stacks were broken. The issues is that it handles loading pdbs for call stack resolution and filtering quite different.



The first hurdle is to notice that there are always stack filters enabled (See GroupPats) which will fold your stack to nearly useless many times. After deleting the GroupPats text box I get real stacks but no symbols:



Now I have the stacks but still no symols. To fix that I must use the context menu to load for this call stack the corresponding symbols. The MS symbol servers are configured by default. If you have your own symbol server you need to reconfigure it in the File - Set Symbol Path menu first.



After that I get call stacks with method names as usual:

Once I did realize that I found PerfView quite useful. It is not only for performance work interesting but also as a system wide debugger which captures all thrown managed exceptions in all processes at once. The tool itself has many more options which you can explore with

Perfview /?

It supports some really advanced scenarios like collecting data in a ring buffer until a specific performance counter reaches a threshold then capture x seconds further and then stop. The main drawback of ETW with full profiling and context switch capturing is that the amount of data on a busy system exceeds 1 GB of data within one minute easily. If you do want to capture data for a really long time and stop when the interesting condition has happened you need more sophisticated stop triggers. PerfView has captured this operational know how in its command line parameters.

It can stop on

Performance Counter Thresholds

Application Event Log Messages

ETW Events

Gen 2 Collections

Long Gen 2 Collections

Managed exceptions of a specific type

AppFabric request time (Azure)

This makes it a cool tool to diagnose sporadic errors but you cannot afford to let the application to run under a debugger. The biggest pain point of PerfView is that it is great to collect data but it cannot live with the generated etl file and needs to convert it to its own format (etlx) for further processing. This can take some additional minutes to load bigger etl files but I hope it will become faster in future versions.

Below is the full command line of PerfView which has gotten not enough attention (just like tracelog) so far. Tracelog can be excused by living its hidden live in the Windows Driver Development Kit until it was moved to the Windows SDK with 8.1. It seems that device driver developers do not share their tricks online like "normal" .NET developers do.

 

The PerfView application has a number of commands associated with it, each with its own set of parameters and
qualifiers.  They are listed below.  Options that are common to all commands are listed at the end.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView [DataFile]

  View profile data.

  Parameters:
    [DataFile]                           ETL or ETLX file containing profile data.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView run CommandAndArgs ...

  Starts data collection, runs a command and stops.

  Parameters:
    CommandAndArgs ...                   Command to run and arguments.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView collect [DataFile]

  Starts data collection, wait for user input, then stops.

  Parameters:
    [DataFile]                           ETL file containing profile data.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView start [DataFile]

  Starts machine wide profile data collection.

  Parameters:
    [DataFile]                           ETL file containing profile data.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView stop

  Stop collecting profile data (machine wide).

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView mark [Message]

  Add a PerfView 'Mark' event to the event stream with a optional string message

  Parameters:
    [Message]                            The string message to attach to the PerfView Mark event.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView abort

  Insures that any active PerfView sessions are stopped.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView merge [DataFile]

  Combine separate ETL files into a single ETL file (that can be decoded on another machine).

  Parameters:
    [DataFile]                           ETL file containing profile data.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView listSessions

  Lists active ETW sessions.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView ListCpuCounters

  Lists the ListCpuCounters CPU counters available on the system (win8+ only).

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView EnableKernelStacks

  On X64 machines if you have problems with broken stacks when the code is executing in the kernel, setting this option
  and rebooting may improve things

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView DisableKernelStacks

  Resets the registry keys set by EnableKernelStack.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView HeapSnapshot Process [DataFile]

  Take a snapshot of the CLR GC heap of a process.

  Parameters:
    Process                              The process ID or Process Name (Exe without extension) of the process take a
                                         heap snapshot.
    [DataFile]                           The name of the file to place the heap snapshot.

  Qualifiers:
    [-SaveETL]                           Save an ETL file along with the GCDump file when dumping the JS Heap.
    [-MaxDumpCountK:250]                 The maximum number of objects (in K or thousands) to place int the .gcDump
                                         file.  Sample sufficiently to hit this metric.
    [-Freeze]                            Freeze the dump while data is taken.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView ForceGC Process

  Forces a GC on the specified process

  Parameters:
    Process                              The process ID or Process Name (Exe without extension) of the process to force
                                         a GC.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView HeapSnapshotFromProcessDump ProcessDumpFile [DataFile]

  Extract the CLR GC heap from a process dump file specified.

  Parameters:
    ProcessDumpFile                      The name of the input process dump file.
    [DataFile]                           The name of the file to place the heap snapshot.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiRun

  Opens the 'Run' dialog box.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiCollect

  Opens the 'Collect' dialog box.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiHeapSnapshot

  Opens the 'TakeHeapSnapshot' dialog box.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView UserCommand CommandAndArgs ...

  Runs a user defined command.  Type 'PerfView UserCommandHelp' to see the help for all the user commands.  See PerfView
  Extensions in the users guide for more on creating user commands.

  Parameters:
    CommandAndArgs ...                   User command to run and any arguments.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView UserCommandHelp

  Displays help for user commands.  Also see Help->User Command Help in the GUI.

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView CreateExtensionProject [ExtensionName]

  Creates a VS project for creates a perfView extension.

  Parameters:
    [ExtensionName.Global]               The name of the extension (no .DLL)

-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView FetchSymbolsForProcess [DataFile]

  Fetch all the PDBs files needed for viewing locally.

  Parameters:
    [DataFile]                           ETL file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Qualifiers global to all commands:

    [-LogFile:STRING]                    Send messages to this file instead launching the GUI.  Intended for batch
                                         scripts and other automation.
    [-BufferSize:64]                     The size the buffers (in MB) the OS should use to store events waiting to be
                                         written to disk.
    [-Circular:0]                        Do Circular logging with a file size in MB.  Zero means non-circular.
    [-BufferSizeMB:64]                   The size the buffers (in MB) the OS should use to store events waiting to be
                                         written to disk.
    [-CircularMB:0]                      Do Circular logging with a file size in MB.  Zero means non-circular.
    [-MaxCollectSec:0]                   Turn off collection (and kill the program if perfView started it) after this
                                         many seconds.  Zero means no timeout.
    [-StopOnPerfCounter:STRING,...]      This is of the form CATEGORY:COUNTERNAME:INSTANCE OP NUM where
                                         CATEGORY:COUNTERNAME:INSTANCE, identify a performance counter (same as
                                         PerfMon), OP is either < or >, and NUM is a number.  When that condition is
                                         true then collection will stop.  You can specify this qualifier more than once.
                                         See 'Stop Trigger' in the users guide for more.
    [-StopOnGen2GC:STRING]               This will stop on any Gen2 GC from the given process (can be a process ID or a
                                         process Name (exe file name without path or extension) or * (any process)
    [-StopOnEtwEvent:STRING,...]         This is of the form PROVIDER:TASK:OPCODE here only PROVIDER and TASK are
                                         required.  (Opcode=Info, Keywords=ALL, Level=INFORMATIONAL are the defaults.
                                         Provider can be a Guid or Name (for EventSources you can use *Name to represent
                                         the Guid.  Task is the task Name or 'Task(NNN)' where NNN is the task number in
                                         decimal.  You can specify this argument more than once
    [-StopOnException:STRING]            Where the text is a regular expression that will be used to match the full name
                                         of the .NET Exception thrown.The empty string represents any exception.
    [-StopOnEventLogMessage:STRING]      Stop when an event log message that matches the given (ignore case) regular
                                         expression is written to the Windows 'Application' event log.  You can specify
                                         a particular event log with the syntax eventLogName@RegExp.
    [-StopOnRequestOverMsec:0]           Trigger a stop of a collect command if there is any IIS request that is longer
                                         than the given number of MSec.
    [-StopOnGCOverMsec:0]                Trigger a stop of a collect command if there is a .NET Garbage Collection (GC)
                                         is longer than the given number of MSec.
    [-StopOnAppFabricOverMsec:0]         Trigger a stop of a collect command if there is a AppFabric request is longer
                                         than the given number of MSec.
    [-DecayToZeroHours:0]                The trigger value used in StopOnPerfCounter or StopOn*OverMSec will decay to
                                         zero in this interval of time.
    [-DelayAfterTriggerSec:10]           Wait this number of seconds after a trigger before actually stopping the trace.
    [-CollectMultiple:0]                 Collect Multiple instance (used in conjunction with StopTrigger).
    [-MonitorPerfCounter:STRING,...]     This is of the form CATEGORY:COUNTERNAME:INSTANCE@NUM where
                                         CATEGORY:COUNTERNAME:INSTANCE, identify a performance counter (same as
                                         PerfMon), and NUM is a number representing seconds.  The @NUM part is optional
                                         and defaults to 2.  The value of the performance counter is logged to the ETL
                                         file as an event ever NUM seconds
    [-CpuSampleMSec:1]                   The interval (MSec) between CPU samples (.125Msec min).
    [-Merge:BOOLEAN]                     Do a merge after stopping collection.
    [-Zip:BOOLEAN]                       Zip the ETL file (implies /Merge).
    [-NoRundown]                         Don't request CLR Rundown events.
    [-NoNGenRundown]                     Don't do rundown of symbolic information in NGEN images (only needed pre V4.5).
    [-NoClrRundown]                      Don't do rundown .NET (CLR) rundown information )(for symbolic name lookup).
    [-RundownTimeout:120]                Maximum number of seconds to wait for CLR rundown to complete.
    [-MinRundownTime:0]                  Minimum number of seconds to wait for CLR rundown to complete.
    [-KeepAllEvents]                     A debug option to keep all events, even symbolic rundown events.
    [-MaxEventCount:0]                   Limits the total number of events.  Useful for trimming large ETL files.  1M
                                         typically yields 300-400Meg.  of data considered.
    [-SkipMSec:0]                        Skips the first N MSec of the trace.  Useful for trimming large ETL files in
                                         conjunction with the /MaxEventCount qualifier.
    [-CpuCoutners:STRING,...]            A comma separated list of hardware CPU counters specifications NAME:COUNT to
                                         turn on.  See Users guide for details.  See ListCpuCounters for available
                                         sources (Win8 only)
    [-Providers:STRING,...]              Additional providers.  This is comma separated list of
                                         ProviderGuid:Keywords:Level:Stack specs.  This qualifier has the same syntax as
                                         the Additional Providers TextBox in the collection window.  See help on that
                                         for more.
    [-OnlyProviders:STRING,...]          Like the Providers qualifier, but also turns off the default Kernel and CLR
                                         providers.
    [-ThreadTime]                        Shortcut for turning on context switch and readyThread events
    [-GCOnly]                            Turns on JUST GC collections an allocation sampling.
    [-GCCollectOnly]                     Turns on GC collections (no allocation sampling).
    [-DumpHeap]                          Capture a heap snapshot on profile stop
    [-ClrEventLevel:Verbose]             The verbosity for CLR events Legal values: Always, Critical, Error, Warning,
                                         Informational, Verbose.
    [-ClrEvents:Default]                 A comma separated list of .NET CLR events to turn on.  See Users guide for
                                         details.  Legal values: None, GC, GCHandle, Binder, Loader, Jit, NGen,
                                         StartEnumeration, StopEnumeration, Security, AppDomainResourceManagement,
                                         JitTracing, Interop, Contention, Exception, Threading,
                                         JittedMethodILToNativeMap, OverrideAndSuppressNGenEvents, SupressNGen, Type,
                                         GCHeapDump, GCSampledObjectAllocationHigh, GCHeapSurvivalAndMovement,
                                         GCHeapCollect, GCHeapAndTypeNames, GCHeapSnapshot,
                                         GCSampledObjectAllocationLow, PerfTrack, Stack, Default, All.
    [-KernelEvents:Default]              A comma separated list of windows OS kernel events to turn on.  See Users guide
                                         for details.  Legal values: None, Process, Thread, ImageLoad, ProcessCounters,
                                         ContextSwitch, DeferedProcedureCalls, Interrupt, SystemCall, DiskIO,
                                         DiskFileIO, DiskIOInit, Dispatcher, MemoryPageFaults, MemoryHardFaults,
                                         VirtualAlloc, VAMap, NetworkTCPIP, Registry, AdvancedLocalProcedureCalls,
                                         SplitIO, Driver, OS, Profile, Default, TheadTime, FileIO, FileIOInit, Verbose,
                                         All, ReferenceSet, PMCProfile.
    [-DotNetAlloc]                       Turns on per-allocation .NET profiling.
    [-DotNetAllocSampled]                Turns on per-allocation .NET profiling, sampling types in a smart way to keep
                                         overhead low.
    [-OSHeapProcess:0]                   Turn on per-allocation profiling of allocation from the OS heap for the process
                                         with the given process ID.
    [-OSHeapExe:STRING]                  Turn on per-allocation profiling of allocation from the OS heap for the process
                                         with the given EXE (only filename WITH extension).
    [-ForceNgenRundown]                  By default on a V4.0 runtime NGEN rundown is suppressed, because NGEN PDB are a
                                         less expensive way of getting symbolic information

Show more